Post on 16-Apr-2017
TERRORIST NETWORK MINING
CURRENT TRENDS AND OPPORTUNITIESSTEFANO
ROMANAZZISTEFAN.ROMANAZZI@GMAIL.CO
M
CDL INFORMATICA MAGISTRALEA.A. 2016/2017
INTRODUCTION➢ The concern about the global security came into limelight
after 9/11 attacks
➢ Terrorists can be:➢ Fully clandestine➢ Rebels➢ Double living
1
DEEP WEB➢ The majority of radical and extremist groups hide themselves
under the level of publicly mapped web sites
➢ The Deep Web is an area of the web identical to the Clearnet, aside from the lack of association of web addresses to the web spiders of Google and other search engines
2
DARK WEB➢ Many of those individuals interact in social networks totally
hidden to the public web, the darknets
➢ This area of the web is known as Dark Web
3
SPIDERS➢SPIDER
Software program that traverses the World Wide Web information space by following hypertext links and retrieving web documents by standard http protocol
➢ Required features:➢ Afford registration requirements➢ Extract desired information➢ Filter out the collected data
4
SPIDERS➢GOAL
Provide Dark Web forum data in a timely manner
➢PROBLEMS➢ Accessibility → Human-assisted approach➢ Collection update procedure → Incremental spidering
5
SYSTEM DESIGN OF A SPIDER PROGRAM
➢ Data acquisition➢ Spidering➢ Incremental spidering
➢ Data preparation➢ HTML parsing➢ Storage in a local database
➢ System functionalities➢ Browsing and searching➢ Statistics analysis➢ Multilingual translation
6
THE GLOBAL TERRORISM DATABASE
➢ New methods to predict terrorist attacks and detect the main actors in a terrorist network can be tested using past data
➢ GTD: the biggest database of terrorist events around the world
7
DETECTING RADICALISATION➢ Perform a sentiment analysis on the data gathered during the
crawl
➢ Comments and user profiles must be indexed➢ Stopword removal➢ Dehyphenation➢ Porter’s stemming algorithm➢ Calculation of metrics about word usage
➢ Term Frequency (TF)➢ Document frequency (DF)➢ User Frequency (UF)
➢ List of the most frequently occurring terms for all users
8
SENTIMENT ANALYSIS➢ Select the most important concepts of potential interest to
jihadists from the list of the most frequently occurring terms
➢ Concept expansion with spelling variants and synonyms➢ Christianity → Jesus, cross, Christian, Bible
➢ Sentiment analysis on retrieved content
ConceptsAmerica
ChristianityIslamIsrael
JudaismMubarakPalestineAl-Qaeda
9
SENTIMENT ANALYSIS➢ APPROACH
Use of a dictionary-based polarity scoring method to assign positivity and negativity scores to comments and user profiles➢ Use of a lexicon (e.g. SentiWordNet)
➢ Calculation of positive and negative scores for each concept➢ Positive and negative scores are calculated separately➢ Given by the mean term orientation in a document➢ In a set of documents the positivity/negativity score is defined as the
mean positivity/negativity score of the documents
10
SENTIMENT ANALYSIS - EXAMPLE➢ Males
➢ Females
11
SOCIAL NETWORK ANALYSIS➢ SOCIAL NETWORK ANALYSIS
The process of investigating social structures through the use of network and graph theories
➢ Very useful if applied to dark networks
➢ The network is represented by a simple undirected graph using the adjacent matrix A
A i, j = 1, if i and j are connected0, otherwise
12
SOCIAL NETWORK ANALYSIS
Node• Individual terrorist
Edge• Link between
terrorists
Size of the node• Node importance in
the network
13
SOCIAL NETWORK ANALYSIS - SDD➢ Semi-Discrete Decomposition
➢ For the original data matrix A we can compute reduced k-rank approximation Ak = XkDkYk
T, where k is much smaller than the original rank of A. Highly suitable with dealing with huge networks
➢ It allows to highly reduce the memory consumption, increasing the computation speed and therefore the interpretation of the results
➢ Tests on terrorists networks show good results➢ Reduced loss of information➢ Low ratio of reduction → low rate of corrupted data
14
SOCIAL NETWORK ANALYSIS - SDD
k = 5 k = 20
15
SOCIAL NETWORK ANALYSIS➢ Cohesion Analysis (CA)
➢ Identical to the analysis of the node connectivity in a network
➢ Role Analysis (RA)➢ Finds out the main roles in the network
➢ Power Analysis (PA)➢ Identifies the most important person in the network
16
SOCIAL NETWORK ANALYSIS - CA➢ The number of connections in a graph in terms of forming a clique
Clique N-clique
K-plex K-core
17
SOCIAL NETWORK ANALYSIS - CA➢ Properties expected by a cohesive subgroup
➢ FAMILIARITY➢ Each vertex is expected to have few strangers and many neighbors
in the subgroup
➢ REACHABILITY➢ Each subgroup is expected to have a low diameter to facilitate
communication between members
➢ ROBUSTNESS➢ A high connectivity makes it difficult to destroy the network by
removing few members
18
SOCIAL NETWORK ANALYSIS - RA➢ Based on the network efficiency E(G)
➢ Measure that quantifies how efficiently the nodes of the network exchange information
➢ Evaluate the importance of each node by calculating the efficiency of the network without considering that node
➢ Use of other measures based on E(G)
19
SOCIAL NETWORK ANALYSIS - RA
20
SOCIAL NETWORK ANALYSIS - PA➢ Centrality measures help in identifying the key player or the central
person in the network
➢Graph geodesic The distance between two vertices in a graph is the number of edges in a shortest path connecting them
Degree Beetwenness
Closeness Eigenvector
21
ANALYTICAL HIERARCHY PROCESS➢ AHP is a 4-step process which combines multiple criteria after
a weight assignment➢ Effective technique in addition with SNA➢ Identifying key players➢ Ranking of nodes
➢ AHP is a tool that relays on solid mathematical principles but also in psychological ones
22
➢ Building of the decision hierarchy by considering various criteria and sub-criteria to assess the alternatives that are considered to achieve the final goal
ANALYTICAL HIERARCHY PROCESS - 1
23
ANALYTICAL HIERARCHY PROCESS - 2➢ Establish priorities among criteria, sub-criteria and
alternatives using a scale
24
ANALYTICAL HIERARCHY PROCESS - 3➢ Use of centrality metrics to estimate the weight of each
criteria and sub-criteria, which result in comparative ranking
25
ANALYTICAL HIERARCHY PROCESS - 4➢ For m criteria and n alternatives aggregate criteria weights (m × 1 matrix),
with values of alternatives according to each criteria (m × n matrix), in order to determine the final ranking of each alternative (n × 1 matrix)
26
ANALYTICAL HIERARCHY PROCESS - RESULT
27
VISUALIZATION TECHNIQUES➢ Multiple and interactive visualization types are required to
help analyze gathered data
Interactive SN with timeline Parallel Coordinate Graph
28
CONCLUSIONS➢ Sentiment analysis approach is imprecise
➢ Users have different behaviors in different platforms➢ Polarity orientations of terms do not correspond to subjectivity➢ It needs a domain specific lexicon → supervised learning➢ Multilingual support is needed➢ The sentiment analysis must be refined in order to detect different
stages of terrorists’ lives (early stage, active clandestine phase, organisation abandonment)
➢ Coded languages cannot be analyzed and can lead to misinterpretation
➢ The difference between positive and negative sentiment must be remarkable to detect radicalisation
➢ Social Network Analysis➢ The clique model choice in the CA is of great importance
(too many or too few links)
29
CONCLUSIONS➢ Sentiment analysis results have to be polarized in order to
avoid misleading data
➢ SDD must be avoided unless there are computational issues
➢ SNA shows good results, but the choice of a single measure of each analysis phase may lead to misinterpretation➢ SNA + AHP combined together show better results
➢ Comparative cross-ideological researchesPossibility to extend the same methods to detect and analyze other types of radicalisation (e.g. neo-Nazis, animal rights with a history of violence)
➢ https://www.coursera.org/learn/understandingterror
30
REFERENCES[1] N. Chaurasia and M. Dhakar, A Survey on Terrorist Network Mining:
Current Trends and Opportunities, August 2012
[2] N. Memon, Erratum: Structural Analysis and Mathematical Methods for Destabilizing Terrorist Networks Using Investigative Data Mining, August 2006
[3] P. Choudhary, Ranking Terrorist Nodes of 26/11 Mumbai Attack using Analytical Hierarchy Process with Social Network Analysis, June 2016
[4] M. Cristiani et al., The spider-man behavior protocol: exploring both public and dark social networks for fake identity detection in terrorism..., September 2015
[5] A. Bermingham et al., Combining Social Network Analysis and Sentiment Analysis to Explore the Potential for Online Radicalisation, 2009
31
REFERENCES[6] Y. Zhang et al., Developing a Dark Web Collection and Infrastructure for
Computational and Social Sciences, May 2010[7] V. Snasel and A. Abraham, Link suggestions in terrorists networks using
Semi Discrete Decomposition, September 2010[8] R. Gao et al., Multi-view Display Coordinated Visualization Design for
Crime Solving Analysis, November 2014[9] Wesserman and Faust, Social Network Analysis: Methods and
Applications (Structural Analysis in the Social Sciences), November 1994
32