Stefano romanazzi terrorist network mining.pptx

33
TERRORIST NETWORK MINING CURRENT TRENDS AND OPPORTUNITIES STEFANO ROMANAZZI [email protected] OM CDL INFORMATICA MAGISTRALE

Transcript of Stefano romanazzi terrorist network mining.pptx

Page 1: Stefano romanazzi terrorist network mining.pptx

TERRORIST NETWORK MINING

CURRENT TRENDS AND OPPORTUNITIESSTEFANO

[email protected]

M

CDL INFORMATICA MAGISTRALEA.A. 2016/2017

Page 2: Stefano romanazzi terrorist network mining.pptx

INTRODUCTION➢ The concern about the global security came into limelight

after 9/11 attacks

➢ Terrorists can be:➢ Fully clandestine➢ Rebels➢ Double living

1

Page 3: Stefano romanazzi terrorist network mining.pptx

DEEP WEB➢ The majority of radical and extremist groups hide themselves

under the level of publicly mapped web sites

➢ The Deep Web is an area of the web identical to the Clearnet, aside from the lack of association of web addresses to the web spiders of Google and other search engines

2

Page 4: Stefano romanazzi terrorist network mining.pptx

DARK WEB➢ Many of those individuals interact in social networks totally

hidden to the public web, the darknets

➢ This area of the web is known as Dark Web

3

Page 5: Stefano romanazzi terrorist network mining.pptx

SPIDERS➢SPIDER

Software program that traverses the World Wide Web information space by following hypertext links and retrieving web documents by standard http protocol

➢ Required features:➢ Afford registration requirements➢ Extract desired information➢ Filter out the collected data

4

Page 6: Stefano romanazzi terrorist network mining.pptx

SPIDERS➢GOAL

Provide Dark Web forum data in a timely manner

➢PROBLEMS➢ Accessibility → Human-assisted approach➢ Collection update procedure → Incremental spidering

5

Page 7: Stefano romanazzi terrorist network mining.pptx

SYSTEM DESIGN OF A SPIDER PROGRAM

➢ Data acquisition➢ Spidering➢ Incremental spidering

➢ Data preparation➢ HTML parsing➢ Storage in a local database

➢ System functionalities➢ Browsing and searching➢ Statistics analysis➢ Multilingual translation

6

Page 8: Stefano romanazzi terrorist network mining.pptx

THE GLOBAL TERRORISM DATABASE

➢ New methods to predict terrorist attacks and detect the main actors in a terrorist network can be tested using past data

➢ GTD: the biggest database of terrorist events around the world

7

Page 9: Stefano romanazzi terrorist network mining.pptx

DETECTING RADICALISATION➢ Perform a sentiment analysis on the data gathered during the

crawl

➢ Comments and user profiles must be indexed➢ Stopword removal➢ Dehyphenation➢ Porter’s stemming algorithm➢ Calculation of metrics about word usage

➢ Term Frequency (TF)➢ Document frequency (DF)➢ User Frequency (UF)

➢ List of the most frequently occurring terms for all users

8

Page 10: Stefano romanazzi terrorist network mining.pptx

SENTIMENT ANALYSIS➢ Select the most important concepts of potential interest to

jihadists from the list of the most frequently occurring terms

➢ Concept expansion with spelling variants and synonyms➢ Christianity → Jesus, cross, Christian, Bible

➢ Sentiment analysis on retrieved content

ConceptsAmerica

ChristianityIslamIsrael

JudaismMubarakPalestineAl-Qaeda

9

Page 11: Stefano romanazzi terrorist network mining.pptx

SENTIMENT ANALYSIS➢ APPROACH

Use of a dictionary-based polarity scoring method to assign positivity and negativity scores to comments and user profiles➢ Use of a lexicon (e.g. SentiWordNet)

➢ Calculation of positive and negative scores for each concept➢ Positive and negative scores are calculated separately➢ Given by the mean term orientation in a document➢ In a set of documents the positivity/negativity score is defined as the

mean positivity/negativity score of the documents

10

Page 12: Stefano romanazzi terrorist network mining.pptx

SENTIMENT ANALYSIS - EXAMPLE➢ Males

➢ Females

11

Page 13: Stefano romanazzi terrorist network mining.pptx

SOCIAL NETWORK ANALYSIS➢ SOCIAL NETWORK ANALYSIS

The process of investigating social structures through the use of network and graph theories

➢ Very useful if applied to dark networks

➢ The network is represented by a simple undirected graph using the adjacent matrix A

A i, j = 1, if i and j are connected0, otherwise

12

Page 14: Stefano romanazzi terrorist network mining.pptx

SOCIAL NETWORK ANALYSIS

Node• Individual terrorist

Edge• Link between

terrorists

Size of the node• Node importance in

the network

13

Page 15: Stefano romanazzi terrorist network mining.pptx

SOCIAL NETWORK ANALYSIS - SDD➢ Semi-Discrete Decomposition

➢ For the original data matrix A we can compute reduced k-rank approximation Ak = XkDkYk

T, where k is much smaller than the original rank of A. Highly suitable with dealing with huge networks

➢ It allows to highly reduce the memory consumption, increasing the computation speed and therefore the interpretation of the results

➢ Tests on terrorists networks show good results➢ Reduced loss of information➢ Low ratio of reduction → low rate of corrupted data

14

Page 16: Stefano romanazzi terrorist network mining.pptx

SOCIAL NETWORK ANALYSIS - SDD

k = 5 k = 20

15

Page 17: Stefano romanazzi terrorist network mining.pptx

SOCIAL NETWORK ANALYSIS➢ Cohesion Analysis (CA)

➢ Identical to the analysis of the node connectivity in a network

➢ Role Analysis (RA)➢ Finds out the main roles in the network

➢ Power Analysis (PA)➢ Identifies the most important person in the network

16

Page 18: Stefano romanazzi terrorist network mining.pptx

SOCIAL NETWORK ANALYSIS - CA➢ The number of connections in a graph in terms of forming a clique

Clique N-clique

K-plex K-core

17

Page 19: Stefano romanazzi terrorist network mining.pptx

SOCIAL NETWORK ANALYSIS - CA➢ Properties expected by a cohesive subgroup

➢ FAMILIARITY➢ Each vertex is expected to have few strangers and many neighbors

in the subgroup

➢ REACHABILITY➢ Each subgroup is expected to have a low diameter to facilitate

communication between members

➢ ROBUSTNESS➢ A high connectivity makes it difficult to destroy the network by

removing few members

18

Page 20: Stefano romanazzi terrorist network mining.pptx

SOCIAL NETWORK ANALYSIS - RA➢ Based on the network efficiency E(G)

➢ Measure that quantifies how efficiently the nodes of the network exchange information

➢ Evaluate the importance of each node by calculating the efficiency of the network without considering that node

➢ Use of other measures based on E(G)

19

Page 21: Stefano romanazzi terrorist network mining.pptx

SOCIAL NETWORK ANALYSIS - RA

20

Page 22: Stefano romanazzi terrorist network mining.pptx

SOCIAL NETWORK ANALYSIS - PA➢ Centrality measures help in identifying the key player or the central

person in the network

➢Graph geodesic The distance between two vertices in a graph is the number of edges in a shortest path connecting them

Degree Beetwenness

Closeness Eigenvector

21

Page 23: Stefano romanazzi terrorist network mining.pptx

ANALYTICAL HIERARCHY PROCESS➢ AHP is a 4-step process which combines multiple criteria after

a weight assignment➢ Effective technique in addition with SNA➢ Identifying key players➢ Ranking of nodes

➢ AHP is a tool that relays on solid mathematical principles but also in psychological ones

22

Page 24: Stefano romanazzi terrorist network mining.pptx

➢ Building of the decision hierarchy by considering various criteria and sub-criteria to assess the alternatives that are considered to achieve the final goal

ANALYTICAL HIERARCHY PROCESS - 1

23

Page 25: Stefano romanazzi terrorist network mining.pptx

ANALYTICAL HIERARCHY PROCESS - 2➢ Establish priorities among criteria, sub-criteria and

alternatives using a scale

24

Page 26: Stefano romanazzi terrorist network mining.pptx

ANALYTICAL HIERARCHY PROCESS - 3➢ Use of centrality metrics to estimate the weight of each

criteria and sub-criteria, which result in comparative ranking

25

Page 27: Stefano romanazzi terrorist network mining.pptx

ANALYTICAL HIERARCHY PROCESS - 4➢ For m criteria and n alternatives aggregate criteria weights (m × 1 matrix),

with values of alternatives according to each criteria (m × n matrix), in order to determine the final ranking of each alternative (n × 1 matrix)

26

Page 28: Stefano romanazzi terrorist network mining.pptx

ANALYTICAL HIERARCHY PROCESS - RESULT

27

Page 29: Stefano romanazzi terrorist network mining.pptx

VISUALIZATION TECHNIQUES➢ Multiple and interactive visualization types are required to

help analyze gathered data

Interactive SN with timeline Parallel Coordinate Graph

28

Page 30: Stefano romanazzi terrorist network mining.pptx

CONCLUSIONS➢ Sentiment analysis approach is imprecise

➢ Users have different behaviors in different platforms➢ Polarity orientations of terms do not correspond to subjectivity➢ It needs a domain specific lexicon → supervised learning➢ Multilingual support is needed➢ The sentiment analysis must be refined in order to detect different

stages of terrorists’ lives (early stage, active clandestine phase, organisation abandonment)

➢ Coded languages cannot be analyzed and can lead to misinterpretation

➢ The difference between positive and negative sentiment must be remarkable to detect radicalisation

➢ Social Network Analysis➢ The clique model choice in the CA is of great importance

(too many or too few links)

29

Page 31: Stefano romanazzi terrorist network mining.pptx

CONCLUSIONS➢ Sentiment analysis results have to be polarized in order to

avoid misleading data

➢ SDD must be avoided unless there are computational issues

➢ SNA shows good results, but the choice of a single measure of each analysis phase may lead to misinterpretation➢ SNA + AHP combined together show better results

➢ Comparative cross-ideological researchesPossibility to extend the same methods to detect and analyze other types of radicalisation (e.g. neo-Nazis, animal rights with a history of violence)

➢ https://www.coursera.org/learn/understandingterror

30

Page 32: Stefano romanazzi terrorist network mining.pptx

REFERENCES[1] N. Chaurasia and M. Dhakar, A Survey on Terrorist Network Mining:

Current Trends and Opportunities, August 2012

[2] N. Memon, Erratum: Structural Analysis and Mathematical Methods for Destabilizing Terrorist Networks Using Investigative Data Mining, August 2006

[3] P. Choudhary, Ranking Terrorist Nodes of 26/11 Mumbai Attack using Analytical Hierarchy Process with Social Network Analysis, June 2016

[4] M. Cristiani et al., The spider-man behavior protocol: exploring both public and dark social networks for fake identity detection in terrorism..., September 2015

[5] A. Bermingham et al., Combining Social Network Analysis and Sentiment Analysis to Explore the Potential for Online Radicalisation, 2009

31

Page 33: Stefano romanazzi terrorist network mining.pptx

REFERENCES[6] Y. Zhang et al., Developing a Dark Web Collection and Infrastructure for

Computational and Social Sciences, May 2010[7] V. Snasel and A. Abraham, Link suggestions in terrorists networks using

Semi Discrete Decomposition, September 2010[8] R. Gao et al., Multi-view Display Coordinated Visualization Design for

Crime Solving Analysis, November 2014[9] Wesserman and Faust, Social Network Analysis: Methods and

Applications (Structural Analysis in the Social Sciences), November 1994

32