Panama Papers Neo4j Budapest Meetup

10
Ez a technológiáról fog szólni…. PANAMA PAPERS ÉS A GRÁFOK

Transcript of Panama Papers Neo4j Budapest Meetup

Page 1: Panama Papers Neo4j Budapest Meetup

Ez a technológiáról fog szólni….

PANAMA PAPERS ÉS A GRÁFOK

Page 2: Panama Papers Neo4j Budapest Meetup

FORRÁS

2.6TB adat. Relációs adatbázisok, emailek, különböző banki dokumentumok, cégiratok, amelyek a 215,000 offshore céghez kapcsolódnak, akik a panamai Mossack Fonseca jogi szolgáltató cég ügyfelei voltak 1977 és 2015 között.

Page 3: Panama Papers Neo4j Budapest Meetup

A FOLYAMAT

1. Acquire documents2. Classify documents

a. Scan / OCR —Tesseractb. Extract document metadata — Apache Tika https://tika.apache.org

3. Whiteboard domaina. Determine entities and their relationshipsb. Determine potential entity and relationship propertiesc. Determine sources for those entities and their properties

4. Work out analyzers, rules, parsers and named entity recognition for documents —Apache Solr, Blacklight http://projectblacklight.org, Nuix https://www.nuix.com

5. Parse and store document metadata and document and entity relationships —Talend http://www.talend.coma. Parse by author, named entities, dates, sources and classification

6. Infer entity relationships7. Compute similarities, transitive cover and triangles 8. Analyze data using graph queries and visualizations —Neo4j, Linkurious http://linkurio.us

Page 4: Panama Papers Neo4j Budapest Meetup

ENTITÁSOK

• Clients

• Companies

• Addresses

• Officers (both natural people and companies)

Page 5: Panama Papers Neo4j Budapest Meetup

RELÁCIÓK

• (:Officer)-[:is officer of]->(:Company)

• (:Officier)-[:registered address]->(:Address)

• (:Client)-[:registered]->(:Company)

• (:Officer)-[:has similar name and address]->(:Officer)

Page 6: Panama Papers Neo4j Budapest Meetup

GRÁF MODELL

Page 7: Panama Papers Neo4j Budapest Meetup

GRÁF MODELL

Page 8: Panama Papers Neo4j Budapest Meetup

RUGALMAS ADATMODELL

Új entitások: Documents: E-Mail, PDF, Contract, DB-Record, …

Money Flow: Accounts / Banks / Intermediaries

Új relációk: Family / business ties

Conversations

Peer Groups / Rings

Similar Roles

Mentions / Topic-Of

Money Flow

Page 9: Panama Papers Neo4j Budapest Meetup

FELFEDEZÉS

Once the database was set up, it was a simple matter to install and configure Linkurious to essentially provide a GUI (graphical user interface) atop the database. Having the visual depiction of the graph of names and addresses was critical in making sense of the data, especially for non-technical reporters.

Page 10: Panama Papers Neo4j Budapest Meetup

Demo

https://offshoreleaks.icij.org/nodes/10121110