Subscriber Churn Prediction Model using Social Network Analysis In Telecommunication Industry...

Post on 16-Apr-2017

854 views 2 download

Transcript of Subscriber Churn Prediction Model using Social Network Analysis In Telecommunication Industry...

Subscriber Churn Prediction Model using Social Network Analysis

In Telecommunication Industry

Chettapong Punyachonkool

Dr. Arnond Sakworawich

The First NIDA Business Analytics and Data Sciences Contest/ConferenceSeptember 2, 2016

Chettapong PunyachonkoolData Engineer, Business Intelligence Strategy

The Siam Commercial Bank

chettapongp@gmail.comwww.linkedin.com/in/chettapong-punyachonkool

Business Analytic and Research

Applied Statistics, NIDA

Topics

› Social Network Analysis basic concepts› Social Network Analysis with R› Visualizing Social Network› Using SNA to predict Subscriber Churn in

Telco

Social Network Analysis Basic Concepts

Social Network

Social Network: A social structure composed of individuals (or organizations) interconnected by one or more specific types of interdependencies such as friendship, kinship, financial exchanges, communication exchanges, etc.

Source: Wael Elrifai (2013), Social Network Analysis: Practical Uses and Implementation. Peak Consulting, http://www.peakconsulting.eu

Social Network Analysis

Social Network Analysis: The application of graph theory to understand, categorize and quantify relationships in a social network.

Source: Wael Elrifai (2013), Social Network Analysis: Practical Uses and Implementation. Peak Consulting, http://www.peakconsulting.eu

Why should you care about SNA? Traditional marketing practices are becoming obsolete.

• Test and control group methodologies no longer work as intended. • Information exchange between individuals within

an online social network is extremely high. • Difficult to keep control group “pure”.

• Need to understand behaviour across and within communities rather than focusing just on individuals.

• Leverage (and protect against) high velocity of information exchange within on-line social networks.

Source: Wael Elrifai (2013), Social Network Analysis: Practical Uses and Implementation. Peak Consulting, http://www.peakconsulting.eu

Why should you care about SNA?

Customer are sceptical: if you want to sell your products to your customers, convince their friends.

Use social network analysis to understand more about your customers and their communities.

Source: Wael Elrifai (2013), Social Network Analysis: Practical Uses and Implementation. Peak Consulting, http://www.peakconsulting.eu

Customer with the Role of an Influencer

• Influential user adopts a product or behaviour.• Influential user tells (and influences) his or her

immediate contacts within the community.• These immediate contacts tell their contacts.

It is important...

• To identify these people.• To influence these people.• To monitor the behaviour of these people.

Source: Wael Elrifai (2013), Social Network Analysis: Practical Uses and Implementation. Peak Consulting, http://www.peakconsulting.eu

Social Network Analysis Application

Source: http://www.martingrandjean.ch/connected-world-air-traffic-network/

Social Network Analysis Application

Source: http://www.martingrandjean.ch/connected-world-air-traffic-network/

Social Network Analysis Application

https://linkurious.icij.org/widget/4c040978

Social Network Analysis Application

Source: Apichart Wisitkitchakarn (2013), Risk Analysis of East Asian Stock Markets,The Capital Market Research Institute, The Stock Exchange of Thailand.

Social Network Analysis Application

Source: Valdis Krebs (2001). Connecting the Dots. Tracking Two Identified Terroristshttp://orgnet.com/tnet.html

Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore

Social Network Analysis (SNA)including a tutorial on concepts and methods

Social Media – Dr. Giorgos Cheliotis (gcheliotis@nus.edu.sg)Communications and New Media, National University of Singapore

Practical applications

Newman et al, 2006

Newman et al, 2006

A very early example of network analysis comes from the city of Königsberg (now Kaliningrad). Famous mathematician Leonard Euler used a graph to prove that there is no path that crosses each of the city’s bridges only once (Newman et al, 2006).

SNA has its origins in both social science and in the broader fields of network analysis and graph theory

Network analysis concerns itself with the formulation and solution of problems that have a network structure; such structure is usually captured in a graph (see the circled structure to the right)

Graph theory provides a set of abstract concepts and methods for the analysis of graphs. These, in combination with other analytical tools and with methods developed specifically for the visualization and analysis of social (and other) networks, form the basis of what we call SNA methods.

But SNA is not just a methodology; it is a unique perspective on how society functions. Instead of focusing on individuals and their attributes, or on macroscopic social structures, it centers on relationsbetween individuals, groups, or social institutions

Basic Concepts

} Networks} Tie Strength} Key Players} Cohesion

How to represent various social networks

How to identify strong/weak ties in the network

How to identify key/central nodes in network

Measures of overall network structure

Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore

Representing relations as networks

Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore

12

34

1 2 3 4

Graph

Anne JimMary

John

Can we study their interactions as a

network?

Communication

Anne: Jim, tell the Murrays they’re invited

Jim: Mary, you and your dad should come for dinner!

Jim: Mr. Murray, you should both come for dinner

Anne: Mary, did Jim tell you about the dinner? You must come.

John: Mary, are you hungry?

Network terminology

12

34

Graph

Vertex Edge

Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore

points lines

vertices edges, arcs math

nodes links computer science

sites bonds physics

actors ties, relations sociology

Entering data on a directed graph

12

34

Graph (directed)

Vertex Vertex

1 2

1 3

2 3

2 4

3 4

Edge list

Vertex 1 2 3 4

1 - 1 1 0

2 0 - 1 1

3 0 0 - 0

4 0 0 1 -

Adjacency matrix

Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore

Entering data on a directed graph

12

34

Graph (directed)

Vertex Vertex

1 2

1 3

2 3

2 4

3 4

Edge list

Vertex 1 2 3 4

1 - 1 1 0

2 0 - 1 1

3 0 0 - 0

4 0 0 1 -

Adjacency matrix

Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore

Representing an undirected graph

Vertex Vertex

1 2

1 3

2 3

2 4

3 4

Edge list remains the same

Vertex 1 2 3 4

1 - 1 1 0

2 1 - 1 1

3 1 1 - 1

4 0 1 1 -

Adjacency matrix becomes symmetric

12

34

Graph (undirected)

Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore

Basic Concepts

Networks} Tie Strength

Key PlayersCohesion

How to represent various social networks

How to identify strong/weak ties in the network

How to identify key/central nodes in network

Measures of overall network structure

Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore

Adding weights to edges

Vertex Vertex Weight

1 2 30

1 3 5

2 3 22

2 4 2

3 4 37

Edge list: add column of weights

Vertex 1 2 3 4

1 - 30 5 0

2 30 - 22 2

3 5 22 - 37

4 0 2 37 -

Adjacency matrix: add weights instead of 1

12

34

30

2

37

225

Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore

Adding weights to edges

Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore

1 2 3 4

Anne JimMary

John

Communication

Anne: Jim, tell the Murrays they’re invited

Jim: Mary, you and your dad should come for dinner!

Jim: Mr. Murray, you should both come for dinner

Anne: Mary, did Jim tell you about the dinner? You must come.

John: Mary, are you hungry?

12

34

30

37

225

Graph (undirected)add weights

Edge weights as relationship strength

Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore

} Edges can represent interactions, flows of information or goods, similarities/affiliations, or social relations

} Specifically for social relations, a ‘proxy’ for the strength of a tie can be:(a) the frequency of interaction (communication)

or the amount of flow (exchange)(b) reciprocity in interaction or flow(c) the type of interaction or flow between the

two parties (e.g., intimate or not)(d) other attributes of the nodes or ties (e.g., kin

relationships)(e) The structure of the nodes’ neighborhood (e.g.

many mutual ‘friends’)

} Surveys and interviews allows us to establish the existence of mutual or one-sided strength/affection with greater certainty, but proxies above are also useful

Basic Concepts

NetworksTie Strength

} Key PlayersCohesion

How to represent various social networks

How to identify strong/weak ties in the network

How to identify key/central nodes in network

Measures of overall network structure

Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore

Interpretation of measures

} Degree

} Betweenness

} Closeness

} Eigenvector

How many people can this person reach directly?

How likely is this person to be the most direct route between two people in the network?

How fast can this person reach everyone in the network?

How well is this person connected to other well-connected people?

Centrality measure Interpretation in social networks

Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore

Degree centrality

Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore

} A node’s (in-) or (out-)degree is the number of links that lead into or out of the node

} In an undirected graph they are of course identical

} Often used as measure of a node’s degree of connectedness and hence also influence and/or popularity

} Useful in assessing which nodes are central with respect to spreading information and influencing others in their immediate ‘neighborhood’

12

3

45

67

2

3

4

14

11

Nodes 3 and 5 have the highest degree (4)

Hypothetical graph

Betweenness centrality

Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore

} For a given node v, calculate the number of shortest paths between nodes i and j that pass through v, and divide by all shortest paths between nodes i and j

} Sum the above values for all node pairs i,j

} Sometimes normalized such that the highest value is 1or that the sum of all betweenness centralities in the network is 1

} Shows which nodes are more likely to be in communication paths between other nodes

} Also useful in determining points where the network would break apart (think who would be cut off if nodes 3 or 5 would disappear)

12

3

45

67

0

1.5

6.5

09

00

Node 5 has higher betweenness centrality than 3

Closeness centrality

Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore

} Calculate the mean length of all shortest paths from a node to all other nodes in the network (i.e. how many hops on average it takes to reach every other node)

} Take the reciprocal of the above value so that higher values are ‘better’ (indicate higher closeness) like in other measures of centrality

} It is a measure of reach, i.e. the speed with which information can reach other nodes from a given starting node

12

3

45

67

0.5

0.67

0.75

0.460.75

0.460.46

Nodes 3 and 5 have the highest (i.e. best) closeness, while node 2 fares almost as well

Note: Sometimes closeness is calculated without taking the reciprocal of the mean shortest path length. Then lower values are ‘better’.

Eigenvector centrality

Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore

} A node’s eigenvector centrality is proportional to the sum of the eigenvector centralities of all nodes directly connected to it

} In other words, a node with a high eigenvector centrality is connected to other nodes with high eigenvector centrality

} This is similar to how Google ranks web pages: links from highly linked-to pages count more

} Useful in determining who is connected to the most connected nodes

12

3

45

67

0.36

0.49

0.54

0.190.49

0.170.17

Node 3 has the highest eigenvector centrality, closely followed by 2 and 5 Note: The term ‘eigenvector’ comes from mathematics (matrix algebra),

but it is not necessary for understanding how to interpret this measure

Network centralities1

2

3

45

67

0

1.5

6.5

09

00

12

3

45

67

2

3

4

14

11

12

3

45

67

0.5

0.67

0.75

0.460.75

0.460.46

12

3

45

67

0.36

0.49

0.54

0.190.49

0.170.17

Degree centrality Betweenness centrality

Closeness centrality Eigenvector centrality

Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore

Closeness centrality

Source: www.activenetwork.net

Betweenness centrality

Source: www.activenetwork.net

Eigenvector centrality

Source: www.activenetwork.net

Interpretation of measures

Centrality Descripting Value meaningDegree บอกระดบการไหลผานของขอมลของ node - คานอย ขอมลไหนผานนอย

คามาก ขอมลไหนผานมากCloseness - บอกเปนศนยกลางภายของ node ใน

เครอขาย

- นามาใชวดอตราเรวในการแพรกระจาย

ขอมลจาก node ดงกลาวไปในเครอขาย

- คานอย สาคญมาก คามาก สาคญนอย- คานอย กระจายเรว คามาก กระจายชา

Betweenness บอกถงระดบความสาคญหรอระดบบทบาท

ภายในเครอขาย

- คานอย สาคญนอย คามาก สาคญมาก

Eigenvector วดระดบอทธพลของ node ภายในเครอขาย - คานอยมอทธพลนอย

คามาก มอทธพลมาก

Basic Concepts

NetworksTie StrengthKey Players

} Cohesion

How to represent various social networks

How to identify strong/weak ties in the network

How to identify key/central nodes in network

How to characterize a network’s structure

Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore

Reciprocity (degree of)

1 2

3 4

} The ratio of the number of relations which are reciprocated (i.e. there is an edge in both directions) over the total number of relations in the network

} …where two vertices are said to be related if there is at least one edge between them

} In the example to the right this would be 2/5=0.4 (whether this is considered high or low depends on the context)

} A useful indicator of the degree of mutuality and reciprocal exchange in a network, which relate to social cohesion

} Only makes sense in directed graphs

Reciprocity for network = 0.4

Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore

Density

12

34

} A network’s density is the ratio of the number of edges in the network over the total number of possible edges between all pairs of nodes (which is n(n-1)/2, where n is the number of vertices, for an undirected graph)

} In the example network to the right density=5/6=0.83 (i.e. it is a fairly dense network; opposite would be a sparse network)

} It is a common measure of how well connected a network is (in other words, how closely knit it is) – a perfectly connected network is called a clique and has density=1

} A directed graph will have half the density of its undirected equivalent, because there are twice as many possible edges, i.e. n(n-1)

} Density is useful in comparing networks against each other, or in doing the same for different regions within a single network

12

34

density = 5/6 = 0.83

density = 5/12 = 0.42

Edge present in network

Possible but not present

Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore

12

3

45

67

1

0.67

0.33

N/a0.17

N/aN/a

} A node’s clustering coefficient is the number of closed triplets in the node’s neighborhood over the total number of triplets in the neighborhood. It is also known as transitivity.

Network clustering coefficient = 0.375 (3 nodes in each triangle x 2 triangles = 6 closed triplets divided by 16 total)

Cluster A

Cluster B

Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore

Clustering

closed triplets (CT) total number of triplets (TT)

clustering coefficient = #$$$3

12

3

45

67

1

0.67

0.33

N/a0.17

N/aN/a

} A node’s clustering coefficient is the number of closed triplets in the node’s neighborhood over the total number of triplets in the neighborhood. It is also known as transitivity.

} E.g., node 1 to the right has a value of 1 because it is only connected to 2 and 3, and these nodes are also connected to one another (i.e. the only triplet in the neighborhood of 1 is closed). We say that nodes 1,2, and 3 form a clique.

Network clustering coefficient = 0.375 (3 nodes in each triangle x 2 triangles = 6 closed triplets divided by 16 total)

Cluster A

Cluster B

Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore

Clustering

clustering coefficient (node1) = 133

= 1

12

3

45

67

1

0.67

0.33

N/a0.17

N/aN/a

} A node’s clustering coefficient is the number of closed triplets in the node’s neighborhood over the total number of triplets in the neighborhood. It is also known as transitivity.

Network clustering coefficient = 0.375 (3 nodes in each triangle x 2 triangles = 6 closed triplets divided by 16 total)

Cluster A

Cluster B

Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore

Clustering

Kn = Degree of node v

Nn = Number of Link between neighbors of node n

CCn = Clustering coefficient of node n

CCn = 2 ) *+,+ ) (,+ − 1)

CC(node2) = 2(2)3 ) (3 − 1) =

46 = 2. 45

12

3

45

67

0

1.5

6.5

09

00

} Clustering algorithms identify clusters or ‘communities’ within networks based on network structure and specific clustering criteria

} Hierarchical clustering

} Similarity based clustering

} Betweenness clustering (example shown to the right with two clusters is based on edge betweenness, an equivalent for edges of the betweenness centrality presented earlier for nodes)

Network clustering coefficient = 0.375 (3 nodes in each triangle x 2 triangles = 6 closed triplets divided by 16 total)

Cluster A

Cluster B

Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore

Clustering

Average and longest distance

12

3

45

67

} The longest shortest path (distance) between any two nodes in a network is called the network’s diameter

} The diameter of the network on the right is 3; it is a useful measure of the reach of the network (as opposed to looking only at the total number of vertices or edges)

} It also indicates how long it will take at most to reach any node in the network (sparser networks will generally have greater diameters)

} The average of all shortest paths in a network is also interesting because it indicates how far apart any two nodes will be on average (average distance)

diameter

Source: Dr. Giorgos Cheliotis, (201x). Social Network Analysis (SNA) including a tutorial on concepts and methods. Communications and New Media, National University of Singapore

Social Network Analysis with R&

Visualizing Social Network

Visualizing Social Network

Katherine Ognyanova, www.kateto.netNetSciX 2016 School of Code Workshop, Wroclaw, Poland

Assistant Professor at the School of Communication and Information at Rutgers University.

Network Analysis and Visualization with R and igraph

Basic SNA with R and igraph

} Networks} Tie Strength} Key Players} Cohesion

Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)

Basic SNA with R and igraph

} Networks} Tie Strength} Key Players} Cohesion

} Create network} Edge, Vertex and Attributes} Read network data from files} Turning networks into igraph

objects

Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)

Create Network› g1 <- graph( edges=c(1,2, 2,3, 3, 1), n=3, directed=F )

› plot(g1)

Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)

Create Network› g2 <- graph( edges=c(1,2, 2,3, 3, 1), n=10 )

› plot(g2)

Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)

Create Network› g3 <- graph( c("John", "Jim", "Jim", "Jill", "Jill", "John"))

› plot(g3)

Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)

Create Network› g4 <- graph( c("John", "Jim", "Jim", "Jack", "Jim", "Jack", "John", "John"), isolates=c("Jesse", "Janis", "Jennifer", "Justin") )

› plot(g4, edge.arrow.size=.5, vertex.color="gold", vertex.size=15, vertex.frame.color="gray", vertex.label.color="black", vertex.label.cex=0.8, vertex.label.dist=2, edge.curved=0.2)

Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)

Create Network› plot(graph_from_literal(a---b, b---c))

› plot(graph_from_literal(a--+b, b+--c))

› plot(graph_from_literal(a+-+b, b+-+c))

Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)

Edge, Vertex and Attributes› # The edges of the object

› E(g4)

› # The vertices of the object

› V(g4)

› # The network matrix

› g4[]

Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)

Edge, Vertex and Attributes

Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)

g4

vertex_attr

name gender

edge_attr

type weight

Read Network from files3.1 DATASET 1: edgelist

› Dataset1-Media-Example-NODES.csv

› Dataset1-Media-Example-EDGES.csv

3.2 DATASET 2: matrix

› Dataset2-Media-User-Example-NODES.csv

› Dataset2-Media-User-Example-EDGES.csv

Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)

Turning networks into igraph objectsWe start by converting the raw data to an igraph network object. Here we use igraph’s graph.data.frame function, which takes two data frames: d and vertices.

d describes the edges of the network. Its first two columns are the IDs of the source and the target node for each edge. The following columns are edge attributes (weight, type, label, or anything else).

vertices starts with a column of node IDs. Any following columns are interpreted as node attributes.

Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)

Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)

Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)

Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)

Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)

NYT

Basic SNA with R and igraph

} Networks} Tie Strength} Key Players} Cohesion

Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)

} Plotting networks with igraph} Assign weight} Assign type of Tie} Deleted Tie

Plotting networks with igraphWe can set the node & edge options in two ways

› Specify them in the plot()

› Set attributes and add them to the igraph object

Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)

} Networks centrality with igraph} Degree} Closeness} Betweenness} Eigenvector

Basic SNA with R and igraph

} Networks} Tie Strength} Key Players} Cohesion

Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)

Network centrality with igraph

› Degree centrality› Closeness centrality › Betweenness centrality › Eigenvector centrality

Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)

Network centrality with igraph

Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)

} Density and Reciprocity} Clustering} Averages & longest distance

Basic SNA with R and igraph

} Networks} Tie Strength} Key Players} Cohesion

Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)

Transitivity with igraph

Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)

Community Clustering based on edge betweenness

Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)

Averages & longest distance

Source: Asst. Prof Katherine Ognyanova. Network Analysis with R and igraph: NetSci X Tutorialthe NetSciX 2016 School of Code Workshop, Wroclaw, Poland (www.kateto.net)

Using SNA to predict Subscriber churn

in Telecommunication industry

Customer Churn

“Churn represents the loss of an existing customer

to a competitor”

A prevalent problem in:› Telecommunication services› Home mortgage refinance› Credit card

Churn is especially important to mobile phone service providers› Easy for a subscriber to switch services.› Mobile Number Portability (MNP) will remove last important

obstacle

What is Customer Churn ?

The CRM core objectives included:

– Customer Acquisition– Customer Retention

› Churn Perdition– Cross-sell/Up Sell– Maximizing Lifetime Customer Value

Core CRM

§ Segmentation

§ Predictive Analytics

§ Customer Acquisition§Costs of Customer Acquisition. (COCA)

§ Servicing§QoS (Network)§Call Center, Service Hall/Center (Net Promotor Score)

§ Customer Retention§Churn Perdition

§ Customer Lifetime Value (CLV)

Core CRM in Telecommunication

Most Telco define their customer segments using some of the following

› Payment type (prepaid vs. postpaid)

› ARPU (Average Revenue Per User >> revenue generated)

› Tenure (age of user : AOU)

› Demographics (location, income, job, gender, age, … etc.)

› Usage – voice, data, other non-voice, roaming

› Handsets/Devices– 2G,3G, 4G Device, Smartphone vs Feature phone

› Package– Package, Price plans

Segmentation

Process current and historical data in order to make predictions about future events.

› Making customer decisions. › Next Best Offer› Package & Price plan

› Cross-sell & Up-sell opportunities

› Credit scoring for setting dynamic limits (risk management)

› Fraud detection (postpaid only)

› Revenue Allocation

› Customer Lifetime Value

Predictive Analytics

› Quality of Service – Network Utilization– Drop Call

› Call Center, Service Hall– The number of times that Customer contact (complain) via

Call Center/ Service Hall– Service Scoring (Net Promotor Score)

Servicing

› Type of Churn

– Voluntary Churn– Involuntary Churn

• Type of Customer Retention

– Reactive– Proactive

Customer Retention

Source: Mohammed Akram Ayyubi Define - The Churn Reasons Taxonomy

The Churn Taxonomy

Data SourceCustomer Demographic:• Zip code• Income• Occupation• Age• Gender• Living Address• Occupation Address

Order:• Customer Type Crop/SME/Indy• Payment Type (Pre/Post)• Current Package• Package Plan• ARPU• Additional produce/service

Customer Relation:• Number of Questions about the services

from e.g. IVR• Number of Visits to retail shops or online

website• Number of Complaints solved• Number of total complaints

Service Usage: (CDR)• Number of calls• Volume of Data usage• Number of Outgoing calls• Number of Incoming calls• Number of Roaming calls• Number of International calls• Total minutes of usage (MOU)/Volume• Number of Drop calls

Data SourceBilling Data:• Total amount of bill• Total number of barred (one-way barred)• Total number of full barred (two-way

barred)

Network:• Cell Site Location• Network Type ( 4G/3G/2G )• Network Utilization• QoS

Data Mining Functionalities, Techniques & CRM Applications

Churn Modeling Techniques

Subscribers 25MPrepaid Subs

4MPostpaid Subs

21MPrepaid APRU

550 THBPostpaid APRU

160 THB

Source: NBTC (http://ttid.nbtc.go.th/mobile_db.html)

ผใชบรการเครอขายโทรศพทแบบรายเดอนประเภทบคคลทOวไปของผ

ใหบรการเครอขายโทรศพทมอถอแหงหนO งทOมสถานะเปนผยกเลกบรการในชวง

เดอนกรกฎาคมถงเดอนสงคาคม พ.ศ. 2558

July August

Subscribers 1.5 MChurner 8.2 K

1. Subscriber Centrality

– X1: Subscriber Centrality Degree – X2: Subscriber Centrality Closeness – X3: Subscriber Centrality Betweenness– X4: Subscriber Centrality Eigenvector

2. Number of Neighbors

– X5: Number of Churn Neighbors – X6: Number of Non-Churn Neighbors

3. Churn Neighbors Centrality

– X7: Churn Neighbors Centrality Degree– X8: Churn Neighbors Centrality Closeness – X9: Churn Neighbors Centrality Betweenness– X10: Churn Neighbors Centrality Eigenvector

4. Non- Churn Neighbors Centrality

– X11: Non-Churn Neighbors Centrality Degree– X12: Non-Churn Neighbors Centrality Closeness– X13: Non-Churn Neighbors Centrality Betweenness – X14: Non-Churn Neighbors Centrality Eigenvector

Subscriber Centrality

Churn Neighbors Centrality

Non- Churn Neighbors Centrality

Subscriber Churn StatusNumber of Neighbors

adjacency matrix

ChurnerCDR

K-Meanclustering

Cluster 1

Cluster 2

Cluster 3

Cluster selection

Centralitycalculation

ChurnPrediction

12 3

6 5 4

JulyJune

July August

Subscribers 1.5 MChurner 8.2 K

ทาการ summarize ขอมลจานวนนาทในการโทร (Call Duration) และ จานวนครR งในการโทร (Number of Call) ใหอยในรปแบบ per subscriber per call

จะไดจานวน ความสมพนธ (links) = 1,747,835 transections

MO MT CallDuration

Numberof Call

A B 3 1

A C 2 4

A D 16 2

B D 23 9

E D 1 1

แบงกลมโดยใชเทคนค K-mean clustering โดยใชจานวนนาทในการโทร (Call Duration) และจานวนครR งในการโทร (Number of Call) เปนตวแปรในการแบงกลม โดยกาหนดให k = 3 จะได cluster ดงรป

Blue : Cluster 1Yellow : Cluster 2

Red : Cluster 3

1. พจารณาจากจานวน สมาชกในแตละกลม เมOอแบงจานวนกลมมากขR น

NumberofClusterK=2 K=3 K=4 K=5 K=6

#tran

sectionin

eachclusterg

roup 1 1,737,166 1,725,216 1,705,381 1,675,572 1,638,3022 10,669 21,801 38,463 61,628 88,1673 818 3,831 9,388 16,9244 160 1,160 3,7015 87 6646 77

2. พจารณา ratio ของคาInter-cluster distances and Intra-cluster distances เมO อแบงจานวนกลมมากขR น

ทาไมถงเลอก K= 3 ?

Blue : Cluster 1 Yellow : Cluster 2 Red : Cluster 3

Cluster Median of Call Duration

Median of number of Call Group Description

Cluster#1 2.13 2.00 โทรนอย - โทรสRนCluster#2 1,998.03 348.50 โทรนอย - โทรนานCluster#3 215.97 101.00 โทรบอย - โทรส-น

เมO อรวมทRงสอง Cluster#2 และ และ Cluster#3 เขาดวยกน จะได cluster ดงรป

Yellow : Cluster 2Red : Cluster 3

Prepaid Customer Heavy Voice Call on Net

A B C D E

A 0 1 1 1 0

B 1 0 0 1 0

C 1 0 0 0 0

D 1 1 0 0 1

E 0 0 0 1 0

MO MT

A B

A C

A D

B D

E D

adjacency matrix Social Network GraphEdge list

ChurnerNon-churner

ChurnerNon-churner

ChurnerNon-churner

นา adjacency matrix ทOไดจากขRนตอนทO 4 มาคานวณคาระดบความเปนศนยกลางของแตละ Subscriber ทRงหมด 4 คา โดยอางองทฤษฎกราฟ ดงนR

Centrality Descripting Value meaning

Degree บอกระดบการไหลผานของขอมลของ node - คานอย ขอมลไหนผานนอย คามาก ขอมลไหนผานมาก

Closeness - บอกเปนศนยกลางภายของ node ในเครอขาย- นามาใชวดอตราเรวในการแพรกระจายขอมลจาก

node ดงกลาวไปในเครอขาย

- คานอย สาคญมาก คามาก สาคญนอย- คานอย กระจายเรว คามาก กระจายชา

Betweenness บอกถงระดบความสาคญหรอระดบบทบาทภายใน

เครอขาย

- คานอย สาคญนอย คามาก สาคญมาก

Eigenvector วดระดบอทธพลของ node ภายในเครอขาย - คานอยมอทธพลนอย

คามาก มอทธพลมาก

5.1 ทาการคานวนคาความเปนศนยกลางของ Subscriber Centrality ทRงหมด จะไดสวนทO5.2 หลงจากนRนคานวนหาคา Neighbours Centrality ของทกๆ subscriber จะไดสวนทO

2, 3, 4 และ 5 ตามลาดบ

(1) Subscriber Centrality

(4) Churn Neighbors Centrality

(5) Non- Churn Neighbors Centrality

(2) Number of Neighbors A B C D E

A 0 1 1 1 0

B 1 0 0 1 0

C 1 0 0 0 0

D 1 1 0 0 1

E 0 0 0 1 0

adjacency matrixข RนตอนทO 5.1

Subscriber Centrality

ข RนตอนทO 5.2 Neighbors Centrality

คานวณคาความระดบความเปนศนยกลางเพOอสรางตวแปรอสระทRงหมด 14 ตวแปร

1. Subscriber Centrality– X1: Subscriber Centrality Degree – X2: Subscriber Centrality Closeness – X3: Subscriber Centrality Betweenness– X4: Subscriber Centrality Eigenvector

2. Number of Neighbors – X5: Number of Churn Neighbors – X6: Number of Non-Churn Neighbors

คานวณคาความระดบความเปนศนยกลางเพOอสรางตวแปรอสระทRงหมด 14 ตวแปร

3. Churn Neighbors Centrality– X7: Churn Neighbors Centrality Degree– X8: Churn Neighbors Centrality Closeness – X9: Churn Neighbors Centrality Betweenness– X10: Churn Neighbors Centrality Eigenvector

4. Non- Churn Neighbors Centrality– X11: Non-Churn Neighbors Centrality Degree– X12: Non-Churn Neighbors Centrality Closeness– X13: Non-Churn Neighbors Centrality Betweenness – X14: Non-Churn Neighbors Centrality Eigenvector

Principal Component Analysis ตวแปรอสระ X3 –X14 เพOอทา Principal Component Analysis โดยเลอกจานวน factor

ทRงหมด 3 factors เพOอใหสามารถแตละ factor เพOอใหสามารถนยามความหมายของแตละ factor ได อยางชดเจน

Principal Component Analysis จาก PCA สามารถสกดปจจย (factor extraction) ไดดงนR

VariablesFactor1:Churner

NeighborsCentrality

Factor2:Non-Churner

NeighborsCentrality

Factor3:SocialNetworkEigenvalue

X8 ChurnNeighborsCentralityCloseness 1X5 NumberofChurnNeighbors 0.998X9 ChurnNeighborsCentralityBetweenness 0.701X7 ChurnNeighborsCentralityDegree 0.555X6 NumberofNon-ChurnNeighbors 1.013X12 Non-ChurnNeighborsCentralityCloseness 1.013X11 Non-ChurnNeighborsCentralityDegree 0.596X13 Non-ChurnNeighborsCentralityBetweenness 0.315X10 ChurnNeighborsCentralityEigenvector 1.048X4 SubscriberCentralityEigenvector 0.866X14 Non-ChurnNeighborsCentralityEigenvector 0.407

Principal Component Analysis

X1: Subscriber Centrality Degree

X2: Subscriber Centrality Closeness

X3:Subscriber Centrality Betweenness

X8: Churn Neighbors Centrality ClosenessX5: Number of Churn NeighborsX9: Churn Neighbors Centrality BetweennessX7: Churn Neighbors Centrality Degree

X6: Number of Non-Churn NeighborsX12: Non-Churn Neighbors Centrality ClosenessX11: Non-Churn Neighbors Centrality DegreeX13: Non-Churn Neighbors Centrality Betweenness

X10: Churn Neighbors Centrality EigenvectorX4: Subscriber Centrality EigenvectorX14: Non-Churn Neighbors Centrality Eigenvector

PA1: Churn Neighbors Centrality

PA2: Non-Churn Neighbors Centrality

PA3: Social Network Eigenvalue

Y: Subscriber Churn Status

ในขRนตอนนR จะสราง Churn prediction model และ Model evaluation

X3:Subscriber Centrality Betweenness

X1: Subscriber Centrality Degree

PC1: Churn Neighbors CentralityPC2: Non-Churn Neighbors Centrality

X2: Subscriber Centrality Closeness

Churn model evaluation

Train

ChurnModels

Test

ChurnPrediction

ROC

PC3: Social Network Eigenvalue

Model#1 Model#2 Model#3Dependent Variable (DV) Y: Subscriber Churn Status Y: Subscriber Churn Status Y: Subscriber Churn Status

Independent Variable (IV) X1: Subscriber Centrality DegreePC1: Non-Churn Neighbors Centrality PC2 :Churn Neighbors Centrality

X1: Subscriber Centrality Degree X2: Subscriber Centrality ClosenessX3: Subscriber Centrality Betweenness

X1: Subscriber Centrality Degree X2: Subscriber Centrality ClosenessPC3: Social Network Eigenvalue

Number of Independent variables 3 3 3Akaike information criterion (AIC) 22710 23546 19540

Pseudo R-Squared

llh -11351.1129 -11769.0146 -9765.7565llhNull -14840.6155 -14840.6155 -14840.6155

McFadden's pseudo r-squared 0.2351 0.2070 0.3420

Cragg and Uhler's pseudo r-squared 0.3687 0.3305 0.5009

Model evaluation

Accuracy 0.7247 0.7201 0.727095% CI (0.7155, 0.7338) (0.7108, 0.7292) (0.7178, 0.736)Kappa 0.4072 0.3969 0.4115

Sensitivity 0.9891 0.9864 0.9944Specificity 0.3938 0.3867 0.3923

Pos Pred Value 0.6713 0.6681 0.6719Neg Pred Value 0.9666 0.9579 0.9824

Prevalence 0.5559 0.5559 0.5559Detection Rate 0.5498 0.5483 0.5527

Detection Prevalence 0.8191 0.8207 0.8226Balanced Accuracy 0.6915 0.6866 0.6934

AUC 0.8556 0.8511 0.8581

Model#1 Model#2 Model3

Confusion Matrix Reference Reference Reference

1 0 1 0 1 0

Predicted1 1,622 56

Predicted1 1,593 70

Predicted1 1,616 29

0 2,497 5,099 0 2,526 5,085 0 2,503 5,126

AUC 0.8556 0.8511 0.8581

ROC Model#1 ROC Model#2 ROC Model#3

Churner Non-churner

Churner / Predicted ChurnerNon-churner / Predicted Non-churnerNon-churner / Predicted ChurnerChurner / Predicted Non-churner

Churner / Predicted ChurnerNon-churner / Predicted Non-churnerNon-churner / Predicted ChurnerChurner / Predicted Non-churner

Churner / Predicted ChurnerNon-churner / Predicted Non-churnerNon-churner / Predicted ChurnerChurner / Predicted Non-churner