Stefano romanazzi terrorist network mining.pptx

Post on 16-Apr-2017

68 views 0 download

Transcript of Stefano romanazzi terrorist network mining.pptx

TERRORIST NETWORK MINING

CURRENT TRENDS AND OPPORTUNITIESSTEFANO

ROMANAZZISTEFAN.ROMANAZZI@GMAIL.CO

M

CDL INFORMATICA MAGISTRALEA.A. 2016/2017

INTRODUCTION➢ The concern about the global security came into limelight

after 9/11 attacks

➢ Terrorists can be:➢ Fully clandestine➢ Rebels➢ Double living

1

DEEP WEB➢ The majority of radical and extremist groups hide themselves

under the level of publicly mapped web sites

➢ The Deep Web is an area of the web identical to the Clearnet, aside from the lack of association of web addresses to the web spiders of Google and other search engines

2

DARK WEB➢ Many of those individuals interact in social networks totally

hidden to the public web, the darknets

➢ This area of the web is known as Dark Web

3

SPIDERS➢SPIDER

Software program that traverses the World Wide Web information space by following hypertext links and retrieving web documents by standard http protocol

➢ Required features:➢ Afford registration requirements➢ Extract desired information➢ Filter out the collected data

4

SPIDERS➢GOAL

Provide Dark Web forum data in a timely manner

➢PROBLEMS➢ Accessibility → Human-assisted approach➢ Collection update procedure → Incremental spidering

5

SYSTEM DESIGN OF A SPIDER PROGRAM

➢ Data acquisition➢ Spidering➢ Incremental spidering

➢ Data preparation➢ HTML parsing➢ Storage in a local database

➢ System functionalities➢ Browsing and searching➢ Statistics analysis➢ Multilingual translation

6

THE GLOBAL TERRORISM DATABASE

➢ New methods to predict terrorist attacks and detect the main actors in a terrorist network can be tested using past data

➢ GTD: the biggest database of terrorist events around the world

7

DETECTING RADICALISATION➢ Perform a sentiment analysis on the data gathered during the

crawl

➢ Comments and user profiles must be indexed➢ Stopword removal➢ Dehyphenation➢ Porter’s stemming algorithm➢ Calculation of metrics about word usage

➢ Term Frequency (TF)➢ Document frequency (DF)➢ User Frequency (UF)

➢ List of the most frequently occurring terms for all users

8

SENTIMENT ANALYSIS➢ Select the most important concepts of potential interest to

jihadists from the list of the most frequently occurring terms

➢ Concept expansion with spelling variants and synonyms➢ Christianity → Jesus, cross, Christian, Bible

➢ Sentiment analysis on retrieved content

ConceptsAmerica

ChristianityIslamIsrael

JudaismMubarakPalestineAl-Qaeda

9

SENTIMENT ANALYSIS➢ APPROACH

Use of a dictionary-based polarity scoring method to assign positivity and negativity scores to comments and user profiles➢ Use of a lexicon (e.g. SentiWordNet)

➢ Calculation of positive and negative scores for each concept➢ Positive and negative scores are calculated separately➢ Given by the mean term orientation in a document➢ In a set of documents the positivity/negativity score is defined as the

mean positivity/negativity score of the documents

10

SENTIMENT ANALYSIS - EXAMPLE➢ Males

➢ Females

11

SOCIAL NETWORK ANALYSIS➢ SOCIAL NETWORK ANALYSIS

The process of investigating social structures through the use of network and graph theories

➢ Very useful if applied to dark networks

➢ The network is represented by a simple undirected graph using the adjacent matrix A

A i, j = 1, if i and j are connected0, otherwise

12

SOCIAL NETWORK ANALYSIS

Node• Individual terrorist

Edge• Link between

terrorists

Size of the node• Node importance in

the network

13

SOCIAL NETWORK ANALYSIS - SDD➢ Semi-Discrete Decomposition

➢ For the original data matrix A we can compute reduced k-rank approximation Ak = XkDkYk

T, where k is much smaller than the original rank of A. Highly suitable with dealing with huge networks

➢ It allows to highly reduce the memory consumption, increasing the computation speed and therefore the interpretation of the results

➢ Tests on terrorists networks show good results➢ Reduced loss of information➢ Low ratio of reduction → low rate of corrupted data

14

SOCIAL NETWORK ANALYSIS - SDD

k = 5 k = 20

15

SOCIAL NETWORK ANALYSIS➢ Cohesion Analysis (CA)

➢ Identical to the analysis of the node connectivity in a network

➢ Role Analysis (RA)➢ Finds out the main roles in the network

➢ Power Analysis (PA)➢ Identifies the most important person in the network

16

SOCIAL NETWORK ANALYSIS - CA➢ The number of connections in a graph in terms of forming a clique

Clique N-clique

K-plex K-core

17

SOCIAL NETWORK ANALYSIS - CA➢ Properties expected by a cohesive subgroup

➢ FAMILIARITY➢ Each vertex is expected to have few strangers and many neighbors

in the subgroup

➢ REACHABILITY➢ Each subgroup is expected to have a low diameter to facilitate

communication between members

➢ ROBUSTNESS➢ A high connectivity makes it difficult to destroy the network by

removing few members

18

SOCIAL NETWORK ANALYSIS - RA➢ Based on the network efficiency E(G)

➢ Measure that quantifies how efficiently the nodes of the network exchange information

➢ Evaluate the importance of each node by calculating the efficiency of the network without considering that node

➢ Use of other measures based on E(G)

19

SOCIAL NETWORK ANALYSIS - RA

20

SOCIAL NETWORK ANALYSIS - PA➢ Centrality measures help in identifying the key player or the central

person in the network

➢Graph geodesic The distance between two vertices in a graph is the number of edges in a shortest path connecting them

Degree Beetwenness

Closeness Eigenvector

21

ANALYTICAL HIERARCHY PROCESS➢ AHP is a 4-step process which combines multiple criteria after

a weight assignment➢ Effective technique in addition with SNA➢ Identifying key players➢ Ranking of nodes

➢ AHP is a tool that relays on solid mathematical principles but also in psychological ones

22

➢ Building of the decision hierarchy by considering various criteria and sub-criteria to assess the alternatives that are considered to achieve the final goal

ANALYTICAL HIERARCHY PROCESS - 1

23

ANALYTICAL HIERARCHY PROCESS - 2➢ Establish priorities among criteria, sub-criteria and

alternatives using a scale

24

ANALYTICAL HIERARCHY PROCESS - 3➢ Use of centrality metrics to estimate the weight of each

criteria and sub-criteria, which result in comparative ranking

25

ANALYTICAL HIERARCHY PROCESS - 4➢ For m criteria and n alternatives aggregate criteria weights (m × 1 matrix),

with values of alternatives according to each criteria (m × n matrix), in order to determine the final ranking of each alternative (n × 1 matrix)

26

ANALYTICAL HIERARCHY PROCESS - RESULT

27

VISUALIZATION TECHNIQUES➢ Multiple and interactive visualization types are required to

help analyze gathered data

Interactive SN with timeline Parallel Coordinate Graph

28

CONCLUSIONS➢ Sentiment analysis approach is imprecise

➢ Users have different behaviors in different platforms➢ Polarity orientations of terms do not correspond to subjectivity➢ It needs a domain specific lexicon → supervised learning➢ Multilingual support is needed➢ The sentiment analysis must be refined in order to detect different

stages of terrorists’ lives (early stage, active clandestine phase, organisation abandonment)

➢ Coded languages cannot be analyzed and can lead to misinterpretation

➢ The difference between positive and negative sentiment must be remarkable to detect radicalisation

➢ Social Network Analysis➢ The clique model choice in the CA is of great importance

(too many or too few links)

29

CONCLUSIONS➢ Sentiment analysis results have to be polarized in order to

avoid misleading data

➢ SDD must be avoided unless there are computational issues

➢ SNA shows good results, but the choice of a single measure of each analysis phase may lead to misinterpretation➢ SNA + AHP combined together show better results

➢ Comparative cross-ideological researchesPossibility to extend the same methods to detect and analyze other types of radicalisation (e.g. neo-Nazis, animal rights with a history of violence)

➢ https://www.coursera.org/learn/understandingterror

30

REFERENCES[1] N. Chaurasia and M. Dhakar, A Survey on Terrorist Network Mining:

Current Trends and Opportunities, August 2012

[2] N. Memon, Erratum: Structural Analysis and Mathematical Methods for Destabilizing Terrorist Networks Using Investigative Data Mining, August 2006

[3] P. Choudhary, Ranking Terrorist Nodes of 26/11 Mumbai Attack using Analytical Hierarchy Process with Social Network Analysis, June 2016

[4] M. Cristiani et al., The spider-man behavior protocol: exploring both public and dark social networks for fake identity detection in terrorism..., September 2015

[5] A. Bermingham et al., Combining Social Network Analysis and Sentiment Analysis to Explore the Potential for Online Radicalisation, 2009

31

REFERENCES[6] Y. Zhang et al., Developing a Dark Web Collection and Infrastructure for

Computational and Social Sciences, May 2010[7] V. Snasel and A. Abraham, Link suggestions in terrorists networks using

Semi Discrete Decomposition, September 2010[8] R. Gao et al., Multi-view Display Coordinated Visualization Design for

Crime Solving Analysis, November 2014[9] Wesserman and Faust, Social Network Analysis: Methods and

Applications (Structural Analysis in the Social Sciences), November 1994

32