Fast community structure identi cation of small … community...Fast community structure identi...

Fast community structure identification ofsmall world networks

( スモールワールドネットワークに対するコミュニティ構造の高速発見法 )

東京工業大学大学院社会理工学研究科人間行動システム専攻

学籍番号 12M55087池　光龍

2014年修士論文指導教員脇田建准教授

提出日: 2014年 7月 24日

September 10, 2014

Abstract

Detecting communities is of great importance for research and application in var-

ious disciplines where systems are often represented as networks. The existing

works mainly focus on the creation of new metric or technical improvements to

improve the processing power ( mainly speed ) and accuracy of community de-

tection. This paper focuses on the efficiency of community detection, which can

also be one way of improving processing power. For this purpose, we utilize small

world property, which is also the reason of the existence of community structures,

and propose a measure for indicating the efficiency. In addition, by proposing two

other countermeasures, we made a heuristic finally, which can detect communities

in a very high efficiency, while only outperforms with the part of networks. As a

result, although our method failed to improve processing power, we saw the pos-

sibility of improvement by only using small world property, and furthermore, the

possibility to create a new high-quality algorithm.

1

Contents

1 Introduction 8

1.1 Main contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.2 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Social networks 14

2.1 Social networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Small world networks . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3 Scale-free networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3 Community detection 20

3.1 Community detection . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 Modularity-based community detection . . . . . . . . . . . . . . . . 23

3.3 Louvain method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 Proposed method 30

4.1 Basic idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.2 Modularity computation efficiency . . . . . . . . . . . . . . . . . . . 32

4.3 Max neighbor community selection . . . . . . . . . . . . . . . . . . 33

4.4 Changed neighbor community selection . . . . . . . . . . . . . . . . 35

4.5 Hybrid heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.6 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5 Evaluation 48

5.1 Modularity computation efficiency . . . . . . . . . . . . . . . . . . . 48

5.2 Time and modularity . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.3 Community contents . . . . . . . . . . . . . . . . . . . . . . . . . . 57

2

CONTENTS

6 Summary 61

6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

6.2 Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Bibliography 64

3

Glossary

Q Modularity of the whole network. 23

∆Qvici Modularity change amount of one node-neighbor community pair, when

the node vi move to the neighbor community ci. 27

C It means community set in a network V =|C|∪i=0

Ci, i, j < |C|, Ci ∩ Cj = ∅. 15

E It means edge set in a network E ⊂ (V× V). 13

G It always represents a network G = {V,E}. 13

K It means degree set in a network K = {ki |∑|V|

j=0 ei,j, ei,j ∈ E, i, j < |V|}. 16

V It means node set in a network. 13

|U → W| It means all available edges between nodes set U and W. U, W ⊂ V,|U → W| = |(U×W) ∩ V| . 23

|{ci} → V| It means the total number of edges of community ci. 27

|{ci} → {cj}| It means the weight of edges in the network that connect nodes

between community i and community j. 23

|{vi} → {cj}| It means the weight of edges that connect node vi and community

ci. 27, 32, 33, 36, 37

|{vi} → {vj}| It means the weight of edge between node vi and vj. 27

4

List of Tables

3.1 Comparison of some community optimization methods . . . . . . . 25

4.1 Changes in the number of moved nodes per Pass . . . . . . . . . . 36

5.1 Empirical Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.2 Synthetic Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.3 Modularity of all datasets . . . . . . . . . . . . . . . . . . . . . . . 55

5.4 Time of dataset scale free . . . . . . . . . . . . . . . . . . . . . . . 56

5.5 Datasets1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.6 Datasets2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.7 Datasets3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5

List of Figures

2.1 An example of network . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 An example of social network . . . . . . . . . . . . . . . . . . . . . 14

2.3 An example of adjacency matrix . . . . . . . . . . . . . . . . . . . . 15

2.4 Community Structure in a network. . . . . . . . . . . . . . . . . . 17

2.5 An example of power-law distribution. . . . . . . . . . . . . . . . . 19

3.1 Examples of modularity . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2 Louvain method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.3 Computation details of Louvain method . . . . . . . . . . . . . . . 28

4.1 A virtual network including three communities . . . . . . . . . . . . 31

4.2 Intermediate state of Louvain method . . . . . . . . . . . . . . . . . 31

4.3 Modularity change comparison between Louvain and Basic idea . . 34

4.4 MCE comparison between Louvain and Basic idea . . . . . . . . . . 35

4.5 The possible movement of one node’s unconnected nodes . . . . . . 37

5.1 Modularity of DBLP . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.2 MCE of DBLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.3 Modularity of Web-Google . . . . . . . . . . . . . . . . . . . . . . . 52

5.4 MCE of Web-Google . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.5 Modularity of YouTube . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.6 MCE of YouTube . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.7 Modularity of Pokec . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.8 MCE of Pokec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.9 Modularity of Small world . . . . . . . . . . . . . . . . . . . . . . . 53

5.10 MCE of Small world . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6

LIST OF FIGURES

5.11 Modularity of Scale free . . . . . . . . . . . . . . . . . . . . . . . . 53

5.12 MCE of Scale free . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.13 MCE changing of a high modularity scale-free network . . . . . . . 54

5.14 Time(s) of each Heuristics . . . . . . . . . . . . . . . . . . . . . . . 56

5.15 NMI when only average degree is different . . . . . . . . . . . . . . 60

5.16 NMI when only modularity is different . . . . . . . . . . . . . . . . 60

7

Chapter 1

Introduction

There are varieties of interacting systems in which scientists show interest that

are made up of individuals or components ( collection of individuals ) connected

together in some way. For instance, human societies are a collection of person

or organization connected by acquaintance or social interaction, such as business

relationship, friendship [12], sexual relationship [33] and so on, Internet [18], which

is a collection of computers and other devices connected by physical lines, such as

optical fiber lines. In World Wide Web, homepages are connected by hyperlinks,

while in electric power grid [47], generating stations and switching substations are

connected by high-voltage lines.

Facing these systems, some people feel interested in individuals or components,

like how a company should approach their sustainable development [16, 36], or how

to make a webpage for page views increment. Some other people feel interested

in the connections, like how to make and keep a good human relations [8], or

what is the optimal distance between two substations of electric power grid in a

specific area. However there are also some people who feel interested in studying

the pattern of connections between individuals or components, such as which is

more excellent topology of electric power grid with strong capacity to respond

to natural disasters, or what kind of person or organizations easily connect each

other.

In order to easily comprehend these patterns of network connections, one con-

venient to use graphs to represent them in a given system. A graph is a simple

form, which is made up of just points and lines connecting the pair of points.

8

CHAPTER 1. INTRODUCTION

Here, we call the point as node and line as edge [ See figure 2.1 ]. In addition, a

network could be represented as an adjacency matrix. An n× n adjacency matrix

can represent a network with n nodes, where the element of matrix ai,j indicates

the number of edges from node i to node j. Then the above interacting systems

should be represented as a graph or an adjacency matrix, where the individuals

or components of the system being nodes and the connections the edges ( The

graphs are described in detail in section 2.1 ). Using these graphs and adjacency

matrix, one can observe the nature of the networks, or find out something about

the structure of these networks by analyzing them in kinds of ways.

The first use of a graph dates back to Euler’s solution of Konigsberg’s Bridges

problem in 1736 [17]. Since then the scientists form a wide variety of fields studied

about networks and tried to solve all kinds of problems[6]. For example, the

shortest path problem [4, 13] would determine the shortest path ( minimum number

of edges ) between two nodes, graph coloring problem [28, 23] which would coloring

the nodes of graphs such that no two adjacent nodes share the same color. Graph

partition problem [32, 20] that would divide a graph into several components where

the number of edges between separated components is small.

On the other hand, there are also some mathematicians and physicists who

focus on some statistical properties that most networks seem to share. One of

them is scale-free property [3]. The degree of a node in a network is the number of

other nodes to which it is connected, and the degree distribution is the probability

distribution of these degrees over the whole network. Scale-free property is that

degree distribution of many networks follow power law, with a small number of

nodes holding very high degree while the others low [3, 15]. Another is small

world property. It consists of two parts, short average distance and community

structure. Short average distance means, in fact, the distance between two nodes

in most networks is short [38, 62], while community structure [24] means there are

lots of nodes groups included in most networks, where nodes in the same group

are connected densely, but nodes between different groups are connected sparsely.

Community structure is one of the most important properties that exists in

many networks. One community is a collection, which is made up of part of nodes

in a network. As mentioned above, it has the characteristic that, over the whole

network, nodes in the same community are connected densely, while nodes between

different communities are connected sparsely [ See figure 2.4 ]. A network consists

9


of many such communities. The earliest paper on community is Stuart Rice’s paper

in 1927, where they found communities ( they called it parties or blocs at that

time ) in small political bodies, based on the similarity of their voting patterns

[53]. After that, scientists of a variety of fields found community structures in

many networks, and it was found that these community structures are not just

some simple nodes collections but some meaningful collections, i.e. video topics

in a YouTube network [22], research topics in a collaboration network [42], global

economic entities in a worldwide air transportation network [27], genres in an

IMDb ( internet movie ) network [19], and cycles and pathways in a metabolic

networks [26, 48].

Since the community structures included in networks are meaningful, it is nec-

essary to identify and analyze them. However, community identification capacity

is limited in artificial way, although the accuracy of the result will be high, espe-

cially when the network size becomes large. Therefore, many community detection

methods, which would be done automatically by computers, are proposed.

In community detection literature, there are many kinds of research activities.

They could be classified by the types of networks; bipartite networks, multilayer

networks, and homogeneous networks. A bipartite network means a network whose

nodes can be divided into two independent sets U and V such that only edges be-

tween node ui ∈ U and node vi ∈ V are available. For instance, Amazon network

is made up of users sets and goods sets, which connect users and goods by a

purchasing relationship. One community detection method for bipartite networks

can detect communities that contain nodes from U and V [35, 57]. A multilayer

network is a complicated concept, which contains a few kinds of subclasses. A

simple subclass is a network whose nodes would be connected in different pattern

under different viewpoint or different times. For instance, relations of employees

in a company would make acquaintance network and superior-subordinate rela-

tionship network, and these two networks may make up a multilayer network [61].

In addition, these relationship networks would make another multilayer network,

since the relationship may change over time. One community detection method of

a multilayer network could detect communities that across multi layer [39, 40].

The third one is the homogeneous network. Homogeneous networks are tradi-

tional networks, which are commonly used in research. Unlike bipartite networks,

all of nodes in a network can be connected. It contains only one layer and the

10


nodes of the network are connected under one kind of interaction. It is valuable to

analyse these kinds of networks since they could simplify the problem and there

are many networks whose topology do not often change.

The research of community detection is one of the hottest topics in the net-

work literature. There are several kinds of approaches: Spectral bisection method

is a method which is based on the eigenvectors of Laplacian matrix made from

adjacency matrix of a network, while belief propagation divides networks by the

probability of one node belongs to one partition [46]. Hierarchical clustering is

another approach that will make communities by the similarity of node pairs [31];

Newman et al proposed edge betweenness which focuses on the number of shortest

paths between pairs of nodes that run along an edge, and modularity which fo-

cuses on the percentage of edges between inside community and outside community

[24, 45].

The detection methods of homogeneous networks mainly focus on accuracy and

processing power ( mainly indicate time and scale ), since one want to correctly

identify communities included in networks and the data size of networks becomes

bigger and bigger [24, 5, 55, 60, 10, 43].

In this research, we focus on the processing power. However, unlike the existing

works that mainly focus on the creation of new metric or technical improvements

to improve the processing power of community detection, we focus on the efficiency

of community detection as a way of improving processing power, and utilize small

world property ( one of the most important property of networks itself ) to achieve

the goals. In addition, some technical countermeasures are made to improve the

power further.

1.1 Main contribution

Main contributions of the thesis can be summarized as follows:

Small world

As mentioned above, small world property is one of the most important properties

of the networks, since it is the reason of community structures’ existence. However,

most of the previous approaches [25, 30, 9, 44, 10, 60, 5, 55] mainly focus on

11


community structures itself instead of on the reason of their existence, which would

also be important and valuable when detecting the structures from networks. Thus,

in this thesis, we approach the community detection problem from small world

property itself, not from the engineering viewpoint. We propose the possibilities

of utilizing small world property in community detection problem, and additionally

verify its’ correctness by making some heuristics and analyzing the results of them.

Modularity Computation Efficiency ( MCE )

As a means to enhance the processing power of community detection method,

making new metric [44, 51, 6, 46, 49, 30, 24] or attempting to obtain improvement

in technical way [44, 10, 60, 5, 55] are commonly used thus far. Nevertheless, in

this thesis, we attempt to achieve the goal another way, by improving community

detection efficiency which is a simple but efficient way. For instance, in Louvain

method [5], one can observe that in fact, there are many useless computations

when detecting communities, which can affect the processing power, therefore it

is obvious that one can improve processing power just by reducing these useless

computation. Several countermeasures should be taken for reducing useless com-

putation, and Modularity Computation Efficiency measure is proposed to evaluate

whether these countermeasures are effective. Here, modularity [45] is a metric for

evaluating how good the detected community structures are, and in order to get

the best modularity value, which means good detecting result, iterative modularity

computations should be done. Many useless computations are done during this

iterative computation, and we hope to decrease these useless part while getting

the same result as the original method. So that modularity computation efficiency

should be the quotient of modularity change amount and computation times, and

by using it, one can evaluate the efficiency of countermeasure without depending

on devices, such as computer memory or hard disk.

Effectiveness

After adding three valid countermeasures, we greatly improved modularity com-

putation efficiency over the whole process of community detection, keeping mod-

ularity at the same level as Louvain method.

12


1.2 Outline of the thesis

The structure of our thesis after this chapter is organized as follows. In chapter 2,

we first introduce the social networks, including small world networks and scale-free

networks. Small world networks is the subjects of our research. Chapter 3 reviews

representative existing works in community detection literature. The main work of

our research is written in chapter 4 and 5. Chapter 4 is the heart of this thesis, and

introduces one basic idea of our research, two countermeasures for overcoming each

drawback, and in addition, a metric for evaluating community detection efficiency

in detail. Through these works in chapter 4, we aim to make community detection

more efficient. Chapter 5 demonstrates the effect of the proposed method in various

ways, such as modularity computation efficiency, time, modularity, and the quality

of the detected community. Both synthetic networks and empirical networks are

studied here. Finally, Conclusion and future works are given in Chapter 7.

13

Chapter 2

Social networks

In this chapter, we will talk about social networks, including it’s definition, feature,

and so on. Small world networks will be concretely written, because this feature

not only is the reason why social networks include community structure, but also

will be the basis of our proposal. In section 3, we will also mention about scale-free

networks, another important feature of social networks.

2.1 Social networks

Figure 2.1: An example of network with

5 vertices and 7 edges.

Figure 2.2: A classic social network. Rep-

resent friendship between members in a

karate club.[65]

A graph is just a simple representation of some structure in a variety of science

14

CHAPTER 2. SOCIAL NETWORKS

3

1

4

5

2

G =

0 0 1 1 10 0 1 1 11 1 0 0 01 1 0 0 11 1 0 1 0

!

"

######

$

%

&&&&&&

Figure 2.3: This is an example of adjacency matrix. The left network is represented

as the right matrix. The element of matrix gi,j = 1, if the node i and j are

connected, gi,j = 0 or else.

fields, consist of points and lines, which connect points in pairs. In network science,

the points are referred to as node or vertex, and the lines are referred to as edges.

Mathematically, it is always represented as G= {V,E}, where V means node set

and E means edge set ( E⊂ (V×V) ). Furthermore, one can use an adjacency

matrix, when doing some calculation or analysis, to represent nodes of a network

are adjacent to which other nodes [ See figure 2.1 ].

Using this simple form one can represent some complex problem in a simple

way, for instance, the famous Seven Bridges of Konigsberg problem, and also can

often lead to new and useful insights via thinking this way. It is used in many

fields, example include food chain, a network linked by predator-prey relationships,

and World Wide Web, a network linked by web page. Moreover, when it comes

to human societies, a network linked by friendship or social interaction is used

to analysis the pattern of connections between components. We call it social

networks.

A social network means a social structure made up of individuals, organizations

and the relationship, made by their social activity. Therefore the above-mentioned

components means individuals or organizations, which is also the nodes or vertices

in a network, while the relation means the connection of the components, which is

edges in a network as well. Sociologists also refer to the nodes or vertices as actors

and the edges as ties. In this thesis, we will use vertices and edges.

15


Social networks are made and changed by human’s social activity or social

interaction. However, conversely the connection in a social network affect how

people learn, form opinions, and gather news, as well as affecting other less obvious

phenomena, such as the spread of disease.[41] Therefore, it is necessary to analysis

to know more about human society, and service for it. There are many methods

for research and application, such as community detection. We will describe in

detail about it in chapter 3.

As a property, compare to other type networks, social networks can be varying

by the types of connections of which it is composed. In other words, because there

are many different possible definitions of an edge, we can make kinds of networks

by different point of view, including friendship, acquaintance patterns, contacts

between business people, movie actors and musicians, even criminal contacts be-

tween terrorists, drug users, and so on.[41]

As a second, there are small world property and scale free distribution in social

networks. May be it can not be regard as property of social network, because small

world property and scale free distribution are also exist in other type of networks

according to Watts and Strogatz’s research.[62]

2.2 Small world networks

As mentioned in section 2.1, though small world property is not an indigenous

property of social networks, we will only consider under social networks, because

it is the main subject of our research.

Under human society, we have the tendency to associate and bond with similar

others, called homophily. Another hand, we also have the tendency to connect with

different individuals, called heterophily. Homophily is stronger than heterophily

in affecting human activity. Because of it, the possibility becoming friends with

friend’s friend is higher than being friends with complete stranger. Then we can

form groups with the others who have similarity to each other. Obviously, we

will have strong connection with others in the group than out of group. In other

world, if we represent a specific society as a network, considering individual as

node, friendship as edge, there will exist many such groups, which nodes inside it

probably share common properties and/or play similar roles within networks. In

fact, one of the most important contents of the small world property, and we call

16


Figure 2.4: Community Structure in a network.

the group structure community structure. In Figure 2.4, a schematic example of a

network with community structures is shown. Then, also, we can represent some

relationships between these structures and networks in mathematical way. If C

means community set in a network G. Then, V =|C|∪i=0

Ci, i, j < |C|, Ci ∩ Cj = ∅.

Another important content of small world property is short path-length. Short

path-length means in a network, two randomly selected nodes can be connected

with each other with short average path length. Here, one path means one edge

connecting two nodes in a network. This property was first verified by Stanley

Milgram[38]. After that, many researchers did kinds of experiments to verify

it, for instance, Peter Dodds conducted the Milgram’s experiment in large-scale

involving 60000 e-mail chains[14]. In addition, there are some interest applications

uses this property, like Erdos number in mathematicians community, and Bacon

number in actors community.

In summary, networks with small world property are called small world net-

works. A certain category of small world networks was identified by Duncan Watts

and Steven Strogatz in 1998.[63] Small world property are not only included in so-

cial networks, but also be included in other networks, like road maps, food chains,

and electric power grids. In addition, not all social networks have small world

17


property. Some of them have scale free distribution, be described in next section,

and even some of them have both scale free distribution and small world property.

In this thesis, we will utilize this small world property for implementing our

fast community structure identification algorithm, and WS model, a small world

network generation model, proposed by Duncan J.Watts and Steven Strogatz, will

be used to verify our proposal’s correctness and evaluate the performance of our

algorithm.

2.3 Scale-free networks

In the study of networks, the degree of one node is the number of edges it has and

we call the probability distribution of these degrees over the whole network, degree

distribution. If assume, we use a symbol K to represent the set of degrees. Then it

could be written as K = {ki |∑n

j=0 ei,j, ei,j ∈ E, n = |V |, i, j < n}, and the degree

distribution P (k) should be defined as the equation of degree k. For instance, a

Bernoulli random graph, in which each of n nodes is connected with independent

probability p, has a binomial distribution of degrees k:

P (k) =

(n− 1

k

)pk(1− p)n−1−k (2.1)

About the degree distribution, in 1999, Albert-Laszlo Barabasi found that some

nodes had many more connections than others and that the whole network had a

power-law distribution in some social, biological networks and World Wide Web

networks[1]. After that a large portion of real world networks are found to have

power-law distributions, in which the degree distribution of them has the form

P (K = k) ∝ k−γ (2.2)

where r is a parameter which should be greater than 2.

Under a random graph, there exists a scale of degrees, which could be embodied

by ”average node”, since the most of nodes in random graph have almost the same

number of degrees, and there are very few nodes whose number of degrees are

greater than average. However, under a power-distribution, there is no ”average

node”, which can characterize the degree of the network, because of the existence

of a few highest-degree nodes. Thus one can not decide a scale to represent this

18


Figure 2.5: An example of power-law distribution[3]. Here, the number of links

k means node degree. Most nodes have small degrees, while a few nodes (hubs)

have a large number of connections.

class of networks.[3] When the degree distribution of a network follows a power

law, we call it a scale-free network.[2] It means in a scale-free network, there are

a few nodes that have a degree that greatly exceeds the average, when most of

others have low degree [ See figure 2.5 ]. We call the highest-degree nodes hubs. In

real networks, many of them are conjectured to be scale-free, but a few networks

claimed to be scale-free, such as World Wide Web, biological networks, airline

networks, some financial networks[56] and so on. In social network, researchers

also discovered some scale-free networks, for instance sexual relationships among

people in Sweden, e-mail network[15], network of scientific papers[52], connected

by citations, collaboration network, connected by co-author relationships[42].

In this thesis, we will use BA model, a scale-free network generation model,

proposed by Albert-Laszlo Barabasi, making some scale-free networks, to evaluate

performance of our algorithm.

19

Chapter 3

Community detection

In this chapter, we will give an overview about community detection, including

its necessity, objective, type, application and so on. In addition, we will introduce

about modularity-base community detection researches, one of the hottest series

in community detection areas. Finally, we will write concretely about Louvain

method, which is the base algorithm of our research.

3.1 Community detection

In Section 2.2, we referred community structures exist in small world networks.

Community structure refers to the occurrence of a set of nodes, who are con-

nected more densely internally than with the rest of the network [ Figure 2.4 ] and

probably share common properties and/or play similar roles within the networks.

Because community structure is formed by social interaction in social networks,

communities in a social network might represent real social groupings, for instance,

an interest group of music in a university friendship network, a research topic in

a citation network, a category in a web network. So identify these communities

could help us understand and exploit these networks more effectively. Furthermore,

through identify communities, we can identify some communities never know, de-

crease problem scale and create data structure to efficiently store and handle large-

scale networks, even can decrease compute cost and improve accuracy of systems,

such as community detection preprocessing in item recommendation systems of

online sale company ( like, e.g., www.amazon.com ).

20

CHAPTER 3. COMMUNITY DETECTION

Community structures in small-scale networks can be simply detected by hu-

man power, while in large-scale networks, it is impossible. Moreover, community

structures in large-scale networks may be holding more information than small

one. Therefore, it is hoped to detect these structures by computer algorithms.

These algorithms are called community detection, community finding, community

identification or clustering.

Community detection is one of the hottest research topics in network science.

Weiss and Jacobson were first carried out the analysis of community structure,

trying to separate groups by studying the relationships between members of a

government agency. [21] After that, the problem has appeared in various forms in

several disciplines. Then kinds of community detection algorithms were proposed.

• Hierarchical clustering

Hierarchical clustering is a traditional technique to find communities in so-

cial networks. The method’s basic idea is from the hierarchical structure

displayed in many social networks. Like, a company is composed by some

departments, a department is composed by some teams, and a team is com-

posed by some staffs. The method will continuously do, computing similarity

or distance between clusters or nodes and merging similar clusters or nodes

into one cluster, two steps until the networks are merged into one cluster.

• Statistical inference

Statistical inference based methods aims at making a generative model by

deducing properties of network data sets and fitting it to actual networks for

finding community structures. Stochastic block model [29] is one of the most

referenced models in this literature. The method will divide a network to

some groups of equivalence, such as, structural equivalence, in which nodes

have the same neighbors will be evaluated as equivalent nodes, or regular

equivalence, in which nodes have similar connection patterns to nodes of

other classes. In addition, there are some other methods, which have good

performance, like Belief propagation [25] and agglomerativeMonte Carlo[50].

This class of methods not only can find community structures from social

networks, but also can be used to generate network data.

• Dynamic methods

This class of methods uses processes running on the network, like random

21


walk. The basic idea of these methods is that, if there are strong community

structures in a network, a random walker will use long time to walk inside a

community due to the high density of internal edges. Under this idea, there

are kinds of approaches being proposed, for instance, Zhou’s [67] ”global

attractor” and ”local attractor”, Zhou and Lipowsky’s net-walk, and Van

Dongen’s[59] Markov Cluster Algorithm.

• Divisive methods

Divisive methods focus on edges in networks. It will gradually find the edges,

who connecting nodes of different communities, and remove them. Above

two steps will stop when a network is decomposed into several disconnected

components, and each component will form community. A classic method

of divisive methods is the algorithm proposed by Girvan and Newman [24].

In their method, they proposed the concept of betweenness, including edge

betweenness, random-walk betweenness and current-flow betweenness. An

edge’s edge betweenness means the number of shortest paths between all node

pairs that get through the edge. An edge’s random-walk betweenness is given

by the frequency of the paths across the edge when a random walker running

on the network. Current-flow betweenness is defined as the amount of current

carried by the edge between two nodes, when a voltage difference is applied

between any two nodes in networks, under the concept of considering the

network as a resistor network, with edges having unit resistance. Using these

betweenness, the algorithm will gradually find and remove edges between

communities, so that finding community structures in networks.

• Others

There are many other kinds of community detection methods as well. Ac-

cording to network type, there are methods for bipartite networks [66] and

multipartite networks. Overlapping community structure detections are also

a hot research topic. Meanwhile, modularity maximization community detec-

tion methods are one of the most widely used methods, though it’s drawback

has been known. About these methods, more description that is detailed will

be written in 3.2 and 3.3, because our proposal is an extension of the litera-

ture.

22


3.2 Modularity-based community detection

As mentioned in above section, modularity-based methods are one of the most

widely researched community detection methods. Before talking about these meth-

ods, it will start talk from Modularity [45].

As we know, community detection methods will partition a network into num-

bers of communities, which existed in networks. Since we use machine to detect

communities existed, naturally it will occur quality problem. In other words,

though we hope to detect community structure as close to the exist one as possi-

ble; the quality of the detected result will be different between kinds of methods.

However, we do not know how good one algorithm is. Moreover, it will be more

serious in case of detecting unknown community structure from networks. So in

order to distinguish ”good” and ”bad” community detection result, we need some

measures to represent quality.

Modularity [45] is one of the most popular quality function, proposed by New-

man and Girvan. Through this quality function, one could evaluate how good the

structure found is. The basic idea of the modularity is like this: Since there is no

community structure in a random graph, one could use the comparison between

intra-community ( within-community ) edges ratio under current partition of a

network and the expected ratio value under random network. So, modularity can

then be written as follows:

Q =c∑

i=1

(ei,i − a2i ) (3.1)

where the sum runs over all nodes in the network. c means community numbers

divided from the network, ei,j the fraction of edges in the network that connect

nodes between community i and community j. ai =∑

j ei,j, which represent the

fraction of edges, connecting the outside nodes to inside nodes of community i.

Since in a network, an edge fall between nodes without regard for the communities

the nodes belong to, we would have ei,j = aiaj. Then, we can know that a2i is

the expected value of ei,i. If the detected community is close to the real one,

ei,i will greater than it’s expected value, a2i , otherwise it will less than a2i . The

closer the detected community to the real one, the greater modularity value will

be given to the community, as a result the modularity of the whole networks will

become greater. Though modularity was originally defined as a stopping criterion

23


Figure 3.1: The greater the modularity value, the better the partition. The dotted

line parts that contain some nodes is communities.

for Girvan and Newman’s algorithm, it is employed directly and/or indirectly in

many community detection methods. One major application of them is modularity

optimization. The motivation of these class methods is that there should be at least

one partition whose modularity will be the max value of all possible partitions,

since high values of modularity indicate good partitions. Under the partition with

best modularity value, every community in the partition will also hold the best

modularity. However, it was proved by Brandes [7] that modularity optimization is

an NP-complete problem. So most of modularity optimization methods try to find

good approximations of the modularity maximum in a reasonable time. For ease of

understanding and convenience, in this thesis, we would decide some symbols and

use them all over the thesis. First, we would use |U → W| to indicate the number

of available edges between nodes set U and W. In addition, we use Q as symbol of

modularity, ∆Q as ∆modularity. Then, we can use |{ci} → {cj}|= |E∩(ci×cj)| toindicate the weight of edge between community ci and cj. As a result modularity

Q could be represented as:

24


Table 3.1: Comparison of some community optimization methods

Method CharacteristicCommunityFormation Scale

Newman+ ( 2004 ) Modularity Definition Merge 104 nodes

Clauset+ ( 2004 ) Efficient Data Structure Merge 106 nodes

Wakita+ ( 2007 ) Consolidation Ratio Merge 107 nodes

Blondel+ ( 2008 ) Partial Optimization Move 108 nodes

Shiokawa+ ( 2013 ) Incremental Aggregation Merge 108 nodes

Q =

|C|∑i=0

(|{ci} → {ci}| − (∑j

|{ci} → {cj}|)2) (3.2)

The first modularity optimization method was approached by Newman [44] as

a greedy method. In this method, every nodes in a network will be considered as

a solely community at first. After that, compute ∆modularity of all community

pairs, those who are connected each other. Here ∆modularity means modularity

increment on the assumption that a community pair becomes one community.

Finally, select and merge the community pair with largest ∆modularity value into

one community. The compute and merge step should repeat until all communities

merged in one or community numbers reach a certain number. Utilize the method

one can handle up to 104 nodes. In a later work [10], Clauset et al. made an

accelerated version by using a special data structure, named max-heaps. This data

structure successfully decreased many useless computes in Newman’s method, and

the scale of networks that one can handle was extended to 106 nodes.

Immediately, a tendency that often led to poor values of the maximum modu-

larity was found. It is the fact that the greedy methods would form quickly large

communities at the expenses of small ones. In respect of this problem, Wakita

and Tsurumi [60] noticed that Clauset method often hold bias towards large com-

munities. To resolve this problem, they proposed consolidation ratio, which could

take balance between ∆modularity and community size. As a result, analyzable

nodes was extended to 107 nodes.

25


Blondel et al. [5] made a different approach ( Louvain method ) which can

handle up to 108 nodes. In this method, they move nodes to make community

instead of merge them. In addition, it will not select the best community pair

of the whole network; it will just select one best community pair of the partial

network. ( About Louvain method, because our work is based on this method, it

will be described in detail in section 3.3. ) The work by Shiokawa et al.　 [55]

made a heuristic utilizing community merge and partial modularity computation

ideas.

There are also some other kinds of modularity-based methods, but in thesis,

we will not describe about them.

3.3 Louvain method

In this section, we will give a detail explanation about Louvain method. Because,

as mentioned above, our proposal is based on this method.

Compare to other modularity optimization methods, Louvain method takes

two different steps. One is that it does not select global ( all community pairs

in the whole network ) max modularity of community pair every time, it would

select local ( one node and its neighbor community pairs ) max modularity of node-

community pair to form community structure. The other one is that the selected

node would be moved to the selected community, while in Newman method, the

two selected communities would be merged into one community. With these two

main differences, Louvain method got higher modularity than others, besides short

detect times.

Louvain method is divided into two phases that are repeated iteratively. First,

like Newman method, every node in the network is considered as a community

at initial time. Therefore, in initial partition community numbers are equal to

node numbers. Then, for each node i, the method would compute ∆modularity

between node i and its neighbor communities. Here, i’s neighbor community means

the community i’s connected node belong to. ∆modularity means the increment

of modularity that would take place under the assumption of node i’s movement

from current community to neighbor community. Finally, node i moves and belong

to the community which have the largest ∆modularity. In other word, every node

in the network takes the best movement to the neighbor communities. If the

26


Figure 3.2: Visualization of Louvain method. From a raw network on the left,

it computes until no node movement. Node color means community label. Here

phase two is described as community aggregation. Reprinted figure from Blondel’s

paper.[5]

value of node-current community pair is the largest one, the node stays in its

current community. Moreover, a node-community pair whose ∆modularity less

than 0 would never take part in comparison, because minus ∆modularity means

decrement of modularity in case of corresponding node-community pair becomes

one community. So, one node stops moving when ∆modularity with all of its

neighbor communities become minus value or smaller than threshold ( e.x.10−5 ).

In addition, the ∆modularity computation stops under the condition that there is

no node’s movement throughout the network. Meanwhile, because after one node

move to one neighbor community, the others continually update it’s information

( element nodes, total weight, etc. ), which affect ∆modularity value, every node

should take part in computation several Passes. Here the Pass means one round

computation of all nodes in the network. This is the first phase that complete if

no movement take place.

The second phase of the method rebuilds the network whose nodes are now

the communities found during the first phase. Moreover, the edges between these

new nodes should be the sum of the weight of the edges from the first phase

27


Figure 3.3: Visualization of node-community pairs’ ∆modularity computation.

Nodes in one cloud equal to one community. ∆modularity between center node

and its neighbors would be computed.

communities. Edges between nodes of the same community lead to self-weight

which would be used in ∆modularity computation. Once the second phase is

completed, the method would continually do phase one and phase two with the

new network. Like this, the method would switchover between computation step

and reset step repeatedly until no modularity improvement take place over the

network.

In other word, one person, assume there is a friendship network, temporarily

select one best community where his friends belong to, and if there is one com-

munity, where his other friends belong to, better than current one, he will move

and take part in that community. In addition, one would continue this movement

until he feels no communities, his friends fell in, better than current one. Un-

like Newman method, each person can freely select and move between neighbor

communities, while in Newman method, each person would just take part in one

community only if his best ∆modularity value is also the best one in the whole

network. This is the whole process of Louvain method.

On the above, ∆modularity is referred several times, and is described as mod-

ularity increment on assumption that one node move from original community to

neighbors. In this part, we will talk about this ∆modularity. As we mentioned

28


above, modularity is a quality function that can evaluate one network’s partition.

It means each partition of a network has its own modularity value. Furthermore,

we can say that each network has its own modularity value, as we can assume

every network was partitioned from other networks. In Newman/Louvain method,

since they gradually merge/move node into neighbor community, network struc-

ture should also gradually be changed. Therefore, we can evaluate the impact of

one node’s merge/move on modularity by comparing the modularity before and af-

ter node moving/merging. That is ∆modularity. Then from modularity’s equation

( Equation 3.1 ), we can define ∆modularity as follows:

∆Q =

[∑in + ki,in2m

−(∑

tot + ki2m

)2]−

[∑in

2m−(∑

tot

2m

)2

−(

ki2m

)2]

=1

2m

(ki,in −

ki∑

tot

m

)(3.3)

where∑

in is the sum of the weights of the edges inside neighbor community,∑

tot

is the sum of the weights of the edges incident to nodes in neighbor community,

ki is the self-weights of node i, initially it should be 0, ki,in is the weight of edge

between node and its neighbor community. m is the sum of the weights of all links

in the network.

In addition, we would define some symbol and rewrite the above original

∆modularity for convenience. Here, we use ∆Qvici to indicate the change amount

of modularity when node vi moves to community ci, while |{vi} → {cj}|= |E ∩(vi × ci)| denotes the number of edges that connect node vi and community ci.

|{ci} → V|= |E ∩ (ci × V)| represents the total number of edges of community

ci. Finally, we use |{vi} → {vj}| to indicate the weight of edge between node vi

and vj, so that |vi → vi| = |E ∩ (vi × vi)| could shows the weight of node vi itself.

According to these symbols, we can rewrite ∆modularity in an easy-to-understand

way as follows,

∆Qvici =1

2|E|(|vi → ci| −

|vi → vi| × |ci → V||E|

) (3.4)

29

Chapter 4

Proposed method

In this chapter, we will talk about our proposal. It consists of four heuristics.

Each one is proposed to overcome partial weakness of Louvain method. Finally we

propose a hybrid heuristic by merging these four proposal to make our community

detection heuristics more effective.

4.1 Basic idea

In Newman method, all community-community pairs of network (in fact, it is

node-node pairs, because every community-community pairs would be merged into

one nodes as soon as they are selected) need to be computed for one merge. So

assume there is a network with n nodes and m edges. It would need m times

compute for one merge. However, in Louvain method, it just need to compute

node and its neighbor community pairs for one move, which means it just computemn

times on an average. Mainly with this difference on compute time, Louvain

method outperforms all the other methods both in modularity and in speed. From

this natural progression, we wonder whether there is any idea that can decrease

community times even more.

On the other hand, from the description of related research(Chapter 2,3), we

should find that all of these methods try to give solution from the viewpoint of en-

gineering. Moreover, they can detect meaningful and useful community structure.

However, we wonder whether it should be more effective and realistic if utilize some

characteristics of social network itself, since the objective is detecting community

30

CHAPTER 4. PROPOSED METHOD

Figure 4.1: Assume a network including

three communities.

Figure 4.2: This figure represents one in-

termediate state of Louvain method.

structures existing in network. Above two points are our original motivations of

this research. Then, let us observe small world property again. First, small world

property is the reason why there are community structures in network. It also is

theoretical rationale of community detection methods. However, in fact we can

consider another thing from this property. If assume a node i belongs to a real

communityK(not detected community by detection method. See Figure 4.1), then

we can consider that most of node i’s neighbor nodes should belong to the same

community K with a high probability, because of homophily’s effect(See Section

2.2). So that at the intermediate state of Louvain method, the number of node i’s

neighbor communities(Community A, B and C in Figure 4.2) who finally belong to

real community K should be more than those not belong to. In addition, since as

long as we can guarantee node i’s affiliation community is Community K finally,

all we have to do is just always(at least finally) guarantee node i belong to one

of Community K’s sub community, e.g. Community A, B or C, at intermediate

state. On the other word, in fact, there is no need to compute node i and all of its

neighbor community pairs to select best to move, but just compute some of them

and select best may also get correct community structures. Because first, though

just select some of neighbor communities, the communities who finally belong to

real community K should be selected and take part in compute at high proba-

31


bility. Moreover, it will guarantee node i always belong to one sub community

of real Community K, though it may be not the best neighbor community of all

neighbors. This is our basic proposal.

We can consider this idea in real network like this. In Louvain method, one

person will find his affiliation community considering all communities that his

friends belong to, then he selects the best community to belong to. In addition,

he will observe neighbor communities’ change and move from one community to

another when one of these neighbor communities become better than current one

until no change take place in his neighbor communities. However, under our basic

idea, one person would just observe part of neighbors’ change and select the best

community among them. Although he always belongs to partially best community,

he finally can be a member of the best community of all neighbor communities,

because of small world property.

In summary, we propose the possibility of decreasing compute times utilizing

small world property. It is our basic idea, and we will take some other measures

additionally on this basic idea to optimize it.

4.2 Modularity computation efficiency

In section 4.1, we described about our basic proposal. In this part, we will talk

about another proposal, Modularity Computation Efficiency ( MCE ).

There are kinds of measures that can be used to evaluate clustering method

from various viewpoints. Through modularity, we can measure the quality of

the community detection, while times can represent data processing capability of

community detection algorithms. Rand Index could be used to get the similarity

between two communities. When measure how much information is shared between

a communities and a ground-truth data ( a data which we know correct community

structures included in ), we use Normalized mutual information ( NMI ). Pair-

counting F-Measure also can be used to compare community detection method

with different numbers of communities.

Using these measures one not only can evaluate a single community detection

method, but also can compare quality between several methods. Nevertheless, we

can notice that these measures just assess the accuracy of the result, though one

also need some tools which can represent the quality or efficiency of community

32


detection process. Especially for programmers or analysts, they need to know the

quality or efficiency of community detection for further optimization or analysis

that is more detailed.

Here we consider a simple measure that can represent the efficiency of commu-

nity detection process. We call it Modularity computation efficiency ( MCE ). Use

it one get a community detection efficiency independent of programmer language,

technique or computer devices ( e.g. memory, cpu or hard disk ). The measure is

as follows:

MCE =DegreeOfModularityIncrease

ComputeT imes(4.1)

where Degree of Modularity Increase means modularity increment per a certain

compute times ( e.g. 10000 ), and Compute Times means ”a certain compute

times” mentioned above.

In this research, we would use this measure to compare and evaluate community

detection efficiency between Louvain and some other heuristics we proposed.

4.3 Max neighbor community selection

In section 4.1 we have proposed a new idea that one utilized small world property

of social network, and on basis of this basic idea, we proposed a new evaluate

metric in addition. Using above two ideas, we do a preliminary experiment to

verify the correctness of our basic idea.

As a result [ See Figure 4.3 ], we can find that the heuristic based on our basic

idea can get much the same value in modularity, with higher MCE. However, the

better performance is just appeared in the forepart of all computation, after that,

the computation efficiency change make the same curve as the Louvain method.

Therefore, it is necessary to take some measures to improve the modularity com-

putation efficiency in the latter part.

Under this purpose, we consider the reasons from two viewpoints, MCE and

formula of ∆modularity itself.

First, in order to compare modularity change condition between Louvain method

and our heuristic, we make a figure, consists of compute times and MCE. From

Figure 4.4,we can see two peaks in modularity change, which be observed that

33


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 5000 10000 15000 20000 25000 30000 35000

Modula

rity

Compute Times(*10000)

Louvain methodBasic idea heuristics

Figure 4.3: Modularity change comparison between Louvain and Basic idea heuris-

tic. Here x-axis means compute times, and y-axis means modularity. The whole

graph would represent modularity’s change curve, while slope of it means MCE.

there are more than two in some other data, but they are not take place continu-

ously. Even the first peak is just sustained for a shot time and the whole compute

time is more than Louvain method. However, in other hand, the figure gives us

the information that it is possible to optimize it by taking some measure to make

the peak of our heuristic has longer peak time and higher modularity change, since

the final modularity is a stable value.

Secondly, we consider ∆modularity equation itself, which is as follows:

∆Qvici =1

2|E|(|{vi} → {ci}| −

|{vi} → {vi}| × |{ci} → V||E|

) (4.2)

In the equation, it is obvious that numeric expression |{vi} → {vi}|×|{ci} → V| ≪|E|, |{vi}→{vi}|×|{ci}→V|

E ≪ |{vi} → {ci}| should be satisfied in the part process of

the whole computation, though one can not know how long it will be and when

it would be. So it is conceivable that we can select part of neighbor communities

in order of decreasing |{vi} → {cj}| value, because when the numeric expression

is satisfied, the greater the |{vi} → {cj}|’s value, the greater the ∆modularity’s

value. In addition, through select high-|{vi} → {cj}| value neighbor communities,

improvement of community detection efficiency can be considered. The reason is

34


0

0.0001

0.0002

0.0003

0.0004

0.0005

0.0006

0.0007

0 20000 40000 60000 80000 100000 120000 140000

Delta M

odula

rity


Louvain methodBasic idea heuristics

Figure 4.4: MCE Comparison between Louvain and Basic idea heuristic

as follows. In basic heuristic, although it can reduce compute times by just refer-

ring part of neighbor communities, one node’s participate times of computation

should be increase ( represent as Pass times increasing, About Pass, see section

3.3 ), because the selected neighbor community does not always hold the greatest

∆modularity value, while Louvain always select best one.

From above consideration, we propose the countermeasure for improving MCE

at latter part of computation as follows: Relative to basic heuristic, we will select

part of neighbor communities in order of decreasing |{vi} → {cj}| value. In addi-

tion, the time utilize this idea may be from Pass2 to the last Pass. Because at

Pass1, most node-community pairs’ |{vi} → {cj}| values equal to 1, while, at the

last n Passes, the community who holds the greatest ∆modularity value may also

have great |{vi} → {cj}| value according to our observation.

4.4 Changed neighbor community selection

In section 4.3, we proposed an idea to improve MCE at the latter part of the whole

computation. However, in fact there is another problem that need to solve.

Figure 4.3 represents modularity change in the whole process of community

detection. From the figure we can see that modularity changes drastically at the

35


Table 4.1: Changes in the number of moved nodes per Pass

Pass movement Pass movement

1 43426 7 380

2 15926 8 128

3 8912 9 32

4 3765 10 25

5 1618 11 24

6 799 12 13

fore part of the computation, while at latter part it becomes slowly but surely

( After this, we will call these two parts as drastically changing part and slowly

changing part ). In addition, the computation times of latter part is about 5,6

times more than fore part, which is undesirable. Because relative to fore part, it

is obvious that node movements are less at slowly changing part, which means the

modularity computation efficiency is terribly lower than drastically changing part.

In order to solve this problem, we first observed changes in the number of

nodes that has moved. As a result [ See Table 4.1. It is a SNS data with 58228

nodes and 214078 edges from http://urx.nu/9NE5. ], we found that the number

of nodes which has moved are gradually becoming less, while the community de-

tection method would compute all nodes iteratively both in drastically changing

part and slowly changing part. In other words, if we assume some computations

are effective, when one node’s movement took place after node and its neighbor

communities computations. There are many ineffective computations doing in the

slowly changing part. Therefore, we consider that it is possible to improve MCE at

the slowly changing part just by identifying and computing effective nodes, which

probably moves from one community to another community.

Then, we began to see details about the moving nodes for finding some char-

acteristics of these nodes so that identify them directly. Before long, we found

an interesting phenomenon as follows ( maybe it is matter of course ). When the

slowly changing part sets in, one node’s movement would cause its neighbor nodes’

movement, and these neighbor nodes are the only movable nodes. Other nodes

36


Community C1

Community C2

!VV

Community C3

Community C1

Community C2

!V

V

Community C3 Community C4

Figure 4.5: The figure shows two cases, which represents the possible movement of

node v ’s unconnected nodes, when node v moves from one community to another

community. Under above two cases, other nodes that not be connected to node

v would never move the next time. It was proved by us. However, the remaining

cases, for instance the possibility of nodes belong to the under community in left

figure move to the upper two communities, are not yet proved.

that are not connected would never move. Moreover, this phenomenon would take

place throughout the slowly changing part. Furthermore, in fact, it would hap-

pen all over the whole computation. However, the difference between drastically

changing part and slowly changing part is that the former one is also including

unconnected nodes movement, while the latter one is only cause connected nodes

move. So we should improve MCE at the latter part through only compute the

neighbor nodes next time, which are connected to nodes moved this time. However,

about the phenomenon, we just proved part of it in a mathematical way, because of

the complexity of the problem and the coming deadline of the thesis. The proved

part is the two cases represented in figure 4.5. So we decide to remain this part

as future work and take another less effective countermeasure, though just neglect

the nodes who satisfy above two cases, when computing, may still can improve

compute efficiency.

The countermeasure we finally used is that mark up per community whether

it is changed. So that when we take one node, before computing node-community

pair, the method would just firstly check whether there is at least one neighbor

communities of the node has been changed after last compute, and if it has not

been changed, the node would be neglected with no computing. In other word, one

node, whose neighbor communities has no change after last computation would be

ignored, because if there is no neighbor community change, there should be no

node movement. It is our third proposal.

37


4.5 Hybrid heuristic

In this section, we would describe about a hybrid heuristic, which automatically

merges and switches among the countermeasures proposed so far, in order to detect

community under the maximum modularity computation efficiency.

Let us review these countermeasures again for deciding what we should do in

this heuristic. At first, we proposed an idea that can reduce compute times by

utilizing small world property. This is our basic proposal and it should be done

all over the whole community detection, since it is effective all the time. Then,

a countermeasure, selecting part of neighbor communities in order of decreasing

|{vi} → {cj}| [ See equation 4.2 ], was given to improve modularity computation

efficiency at the latter part. Here the latter part is not the slowly changing part

mentioned in section 4.4. It just indicates the latter part of drastically changing

part [ See section 4.4 ]. Another point which needs to notice is that at the fore

part of the computation, it is not possible to use this countermeasure because the

weight of all edges should be 1 at first time, which means all |{vi} → {cj}| shouldbe 1. Therefore, it will select neighbors randomly at fore part, and we need to

switch to this heuristic from randomly selection automatically. Finally, an idea

for improve MCE [ See section 4.2 ] at the drastically changing part is considered.

This part is also always effective. Therefore, there is no need to switch to this

heuristic during computation.

According to above review, we can grasp and design the whole process of hybrid

heuristic as follows:

1. Detect community by randomly selecting part of neighbor communities.

2. Switch selection method of neighbor communities.

3. Detect community by selecting part of neighbor communities in decreasing

order of |{vi} → {cj}|.

4. Just compute nodes connected to previous moved node [ See section 4.4 ]

About the switch time between random selection and decrease order selection,

we would implement it by compare the slope of the modularity curve [ Figure 4.3

]. It can be determined easily just by keeping modularity values per a certain

compute time ( e.x. 10000 compute times ). Then m = (Modularity2−Modularity1)t2−t1

38


would be the equation of curve slope. When m begins to decrease, the program

will switch into decrease order selection mode.

4.6 Implementation

In this section, it will be written about the detail of our implementation, including

pseudo codes of all heuristics. In order to observe and evaluate per heuristic, we

would make four heuristics and give a name to every heuristic.

First of all, we should introduce the whole process of Louvain method and one

important function, for expressing the difference of our heuristics more clearly.

Algorithm 1 Louvain Method

Require: G = {V,E}Ensure: community detection result of G1: Read G;

2: improvement ⇐ false;

3: do

4: improvement ⇐ one level();

5: ResetGraph();

6: while improvement

7: Print result;

Algorithm 1 represents Louvain method, where function one level() and Re-

setGraph() means Phase one and Phase two in Louvain.[ See section 3.3 ]. In

one level(), every nodes would be iterated and computed itself and neighbor com-

munities pair to select best neighbor community, which holds the greatest ∆modularity

value. After that, the whole network would be reset to a new network, whose nodes

are now the communities found during the first phase. The variable improvement

becomes to true when the modularity of the whole network is increased by running

one level().

The function one level() is the most important function in Louvain method,

where ∆modularity computations are done. Our proposal also would be appended

here.

39


Algorithm 2 one level()

Require: G = {V,E}Ensure: improvement

1: improvement ⇐ false, moves ⇐ 0;

2: newMod ⇐ getModularity(G);

3: currentMod ⇐ newMod;

4: do


6: for ∀v ∈ V do

7: bestComm ⇐ currentComm;

8: ▷ currentComm means node v’s currentCommunity

9: remove(v,currentComm);

10: ∆Modmax ⇐getDeltaMod(v,currentComm);

11: C′ ⇐getNeighborCommunity(v);

12: for Ci ∈ C′ do

13: ∆Mod ⇐getDeltaMod(v, Ci);

14: if ∆Mod > ∆Modmax then

15: ∆Modmax ⇐ ∆Mod;

16: bestComm ⇐ Ci;

17: end if

18: end for

19: insert(v,bestComm);

20: if bestCommm=currentComm then

21: moves++;

22: end if

23: end for

24: if moves > 0 then

25: improvement ⇐ true;

26: end if

27: newMod ⇐ getModularity(G);

28: while moves > 0 and (newMod - currentMod) > minMod

29: return improvement;

From the Algorithm 2, we can see detail process of the whole computation. It

40


would do two steps, computing ∆modularity for each node-neighbor community

pairs ( Line 12-18 ) and selecting the best neighbor community ( Line 7,9,14,19 ),

repeatedly until there is no nodes movement or the modularity’s change amount of

the whole network is less than threshold ( Line 2,27,28 ). The variable improvement

would be returned to Algorithm 1 for deciding whether do the next iteration after

reset the network ( Line 6 of Algorithm 1 ), which should be assigned true when

there is node movement ( Line 24-25 ). Function getDeltaMode [ Line 10,13 ] is

the part for computing ∆modularity. The number of times of this part being run

would be counted when compute MCE [ See section 4.2 ].

One point we want to especially explain is the part between Line 12 and 18.

In this part, Louvain method select all of node v’s neighbor for computation and

selection, while in our basic heuristic it just selects part of them under the support

of small world property.

41


Algorithm 3 one level() in Basic Heuristic


1: ......

2: do



5: ......


7: if |C ′| > 3 then

8: ∃C′′, C′′ ∈ {A : a set | A ⊆ C′, |A| = 3}9: else

10: C′′ ⇐ C ′;

11: end if

12: for Ci ∈ C′′ do





17: end if

18: end for

19: ......

20: end for

21: ......



Algorithm 3 describes our basic heuristic. As we mentioned in section 4.1, it

just randomly selects 3 neighbor communities per time to compute ( Line 7-8 ).

In the case that there are neighbors less than 3, it would do computation for all

neighbors ( Line 8-10 ).

The second heuristic which selects neighbors by decreasing order of ∥vi → Ci∥value is as follows. We will call it max neighbor heuristic.

42


Algorithm 4 one level() in Max Neighbor Heuristic


1: ......

2: do



5: ......


7: if |C ′| > 3 then

8: C′′ =getMaxNeighborCommunity(v, 3)

9: else

10: C′′ ⇐ C ′;

11: end if

12: for Ci ∈ C′′ do





17: end if

18: end for

19: ......

20: end for

21: ......



Different to basic heuristic, algorithm 4 selects 3 neighbor communities who are

the top three members of ∥vi → Ci∥ value. Function getMaxNeighborCommunity

returns the set C′′ including these 3 members [ Line 8 ], which can be represent like

this. ∃C′′,C′′ ∈ Q = {A : a set |A ⊆ C′, ∥A∥ = 3} ∧ ∀Bi ∈ B ∈ Q \ {C′′}, ∀Ci ∈C′′, ∥v, Ci∥ > ∥v,Bi∥.

The third heuristic is changed neighbor heuristic. It this heuristic, we just select

nodes whose neighbor communities were changed during the previous computation.

43


In order to specify these changed communities, variables for each community would

be given to remember whether it has been changed. Then the pseudo code can be

written as follows.

44


Algorithm 5 one level() in Changed Neighbor Heuristic


1: ......

2: Init Array isChanged ⇐ false;

3: isCompute = false;

4: do



7: ......


9: if |C ′| > 3 then

10: ∃C′′, C′′ ∈ {A : a set | A ⊆ C′, |A| = 3}11: else

12: C′′ ⇐ C ′;

13: end if

14: if isChanged[∃Ci ∈ C′′]==true then

15: isCompute = true;

16: end if

17: for Ci ∈ C′′ and isCompute==true do





22: end if

23: end for

24: ......


26: moves++;

27: isChanged[bestComm]=true;

28: isChanged[currentComm]=true;

29: end if

30: end for

31: ......


33: return improvement; 45


Here, line 2 will set all communities in network as no change, while Line 27,

28 set communities where node movements ( remove or insert ) took place as

changed. According to the variable isChanged, in Line 14-16, the algorithm would

identify the node whose neighbors have been changed. One node which at least

one neighbor community changed would be selected for computation. Through this

countermeasure, one can reduce unnecessary computation, and it would be more

effective in slowly changing part [ See section 4.4 ], since the number of moveable

node would become fewer and fewer.

Finally, we implement a hybrid heuristic for getting more effective algorithm,

by merging all above ideas in one heuristic, which can absorb advantages from

each above heuristics. About Max neighbor selection, because ∀v ∈ V, ∀Ci ∈ Call ∥v → Ci∥ are the same value ( maybe 1 when the network is an unweighted

network ) at initial time, the heuristic would select neighbor communities randomly

in a period of time, and would switch into Max neighbor selection at appropriate

time.

Algorithm 6 is the detail of hybrid heuristic. The variable m, m′ will store

network’s modularity changing rate [ See section 4.5 ], which would be computed

per a certain compute times ( Here we computer per 10000 compute times, Line

33-37 ). Comparing these two changing rate, we can decide whether it is time to

switch to Max neighbor community selection mode.

46


Algorithm 6 one level() in Hybrid Heuristic


1: ......

2: computeTime ⇐ 0; ▷ store ∆modularity’s compute time

3: switch ⇐ false; ▷ control whether switch or not

4: m,m′ ⇐ 0; ▷ store network’s modularity changing rate

5: beforeMod ⇐ 0; ▷ store modularity 10000 computation times before

6: do



9: ......

10: if |C ′| > 3 then

11: if switch==false then

12: ∃C′′, C′′ ∈ {A : a set | A ⊆ C′, |A| = 3}13: else

14: C′′ =getMaxNeighborCommunity(v, 3);

15: end if

16: else

17: C′′ ⇐ C ′;

18: end if

19: if isChanged[∃Ci ∈ C′′]==true then

20: isCompute = true;

21: end if

22: for Ci ∈ C′′ and isCompute==true do

23: ......

24: computeTime++;

25: end for

26: ......


28: moves++;

29: isChanged[bestComm]=true; isChanged[currentComm]=true;

30: end if

31: if computeTime==10000 then

32: m′ = getModularity()−beforeModcomputeT ime

;

33: if m′ < m then switch==true; end if

34: m = m′,beforeMod=getModularity();

35: end if

36: end for

37: ......



47

Chapter 5

Evaluation

In this chapter, we will do several experiments to compare and evaluate our heuris-

tics. The experiment would be done in three ways to evaluate MCE, time, mod-

ularity and community contents. About community contents, Normalized mutual

information ( NMI ) would be used to measure how much information is shared

between extracted community and true community existed in network. Both syn-

thetic network and empirical network are studied here. As synthetic network we

would use Watts-Strogatz model and Barabasi-Albert model to get small world

network and scale-free network, another generating model would also be used to

make networks with known planted community structure for applying our heuris-

tics to study. About empirical network datasets, they will be described in detail

in every section before being used.

5.1 Modularity computation efficiency

In this section, we conduct evaluations to confirm the effectiveness of our heuristics

on Modularity Computation Efficiency ( MCE ). In the experiments, we use a

number of empirical networks which are commonly used for efficiency comparison

and two synthetic networks which are generated by Watts-Strogatz model and

Barabasi-Albert model.

All of the datasets used here are downloaded from Stanford University’s Dataset

Collection homepage. ( https://snap.stanford.edu/data/ ) The following is

the description of each dataset. In table 5.1, 2|E||V | means the average degree of the

48

CHAPTER 5. EVALUATION

networks.

• Web-Google: Web-Google is a web network where nodes represent web pages

and edges represent hyperlinks between them. It consists of 875,713 nodes

and 5,105,039 edges. [37]

• DBLP: DBLP is a co-authorship network where two authors are connected

if they publish at least one paper together. It consists of 317,080 nodes and

1,049,866 edges. [64]

• YouTube: YouTube is a video-sharing web site, and when sharing video,

one user can create a group that only share videos of a particular theme,

person or event with users who have similar hobby or interest. Therefore,

the communities in it would be user-defined groups, and users should be

connected by such a friendship. It is consists of 1,134,890 nodes and 2,987,624

edges. [64]

• Pokec: This dataset is also from an on-line social network. Friendships

between two users may become edge, and it consists of 1,632,803 nodes and

30,622,564 edges. [58]

Table 5.1: Empirical Datasets

Web-Google DBLP YouTube Pokec

|V | 875,713 317,080 1,134,890 1,632,803

|E| 5,105,039 1,049,866 2,987,624 30,622,5642|E||V | 11 7 5 37.5

Meanwhile, in order to evaluate the performance under pure small world network

and scale free network, we would generate network data using Watts-Strogatz

model and Barabasi-Albert model. From the experience, however not be proved

or no analytical result about it, Scale free networks made from Barabasi-Albert

model are also contains community structures in it, while their average path length

increases with the size of the network. Further, they include some high-degree

nodes ( called hub ) in it, which is common in real world, and not included in

49


the network from Watts-Strogatz model. Therefore, we want to observe the per-

formance of our heuristics under networks with pure small world property and

networks with hub.

The Watts-Strogatz model is a network generation model that produces net-

works with so-called small world property. Therefore, the generated data should

contain community structures in it and may have short average path lengths. It

was proposed by Duncan J. Watts and Steven Strogatz, and the process of gener-

ation is as follows.

• Set the number of nodes in network as N , the mean degree K, a probability

p.

• Make a ring lattice of N nodes, in which each node would be connected to K

neighbors, K2on each side. So if we label these nodes as n0...nN−1, the two

nodes ni, nj, whose labels satisfy the inequality 0 < |i− j|mod(n− K2) ≤ K

2,

would be connected initially.

• For every node ni, all of its edges (ni, nj) with i < j would be rewired under

probability p.

Here, probability p is the probability that one node connect to other nodes in

different communities with itself, while one node and its neighbor nodes should con-

struct communities. We would generate networks under parameterN = 1, 000, 000,

K = 20, p = 0.1.

The Barababasi-Albert model is another generation model which can generate

scale-free networks. It was proposed by Albert-Laszio Barabasi and Reke Albert.

The model takes two general concepts called growth and preferential attachment,

which are widely observed in real networks. Here, growth indicates the growth

of the number of nodes in the networks over time, while preferential attachment

means, when the network is growing, the more connected nodes are the more likely

to receive new links. As a result, it would make some nodes with significantly high-

degree, and including these nodes, the network would have power-law (or scale-free

) degree distributions. The algorithm of the model is like this.

• Give a connected network with N nodes as an initial input

• New nodes are added to the network one at a time.

50


Table 5.2: Synthetic Datasets

small world scale free

|V | 1,000,000 10,000

|E| 40,000,000 99,9702|E||V | 80 19

Parameter k = 20, p = 0.1 m = 10, power = 0.6

• The probability that new node connects to existing node ni should be pi =ki∑j kj

, where ki means the degree of node ni, and the denominator is the sum

of degrees over all nodes in the network.

Here, high-degree nodes would have high probability to be connected to new

node, while low-degree has less chance to be connected, so that the new nodes

have a ”preference” to attach themselves to the already heavily connected nodes.

Table 5.2 is the datasets we generated. We used igraph package included in

R, free software for statistical computing and graphics. Parameters in table are

the parameters for generating, and the former two parameters k, p are the same

meanings as described in Watts-Strogatz above, while the latter two parametersm,

p means the number of edges to add in each time and the power of the preferential

attachment.

According to above six datasets, we do community detection with our four

heuristics and Louvain method. In order to compare MCE, each program would

store modularity and ∆modularity per a 10000 compute times, and we plot these

records. The stored data are just the data of the first phase one of the whole

detection process. The experiment would be done under a tsubame interactive

node with 6GB RAM and Intel Xeon CPU X5670 2.93GHz.

From Figure 5.1 to Figure 5.12 are the results. On the whole, we can find

that the hybrid version has outstanding performance in modularity computation

efficiency. It arrives at the final value in shortest time with minimal compute

times, under the influence of countermeasure, computing part of neighbors and

selecting neighbors by the decreasing order of node-neighbor community connec-

tion strength, and it also converges faster than other heuristics by reducing invalid

calculation ( Invalid calculation means there is no node movement, though the

51


0

0.1

0.2

0.3

0.4

0.5

0.6

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Mod

ular

ity


Louvain MethodBasic Heuristic

Max Neighbor HeuristicChange Neighbor Heuristic

Hybrid Heuristic

Figure 5.1: Modularity of DBLP

0

0.005

0.01

0.015

0.02

0.025

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Mod

ular

ity C

ompu

tatio

n Ef

ficie

ncy




Hybrid Heuristic

Figure 5.2: MCE of DBLP

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 500 1000 1500 2000 2500 3000 3500 4000

Mod

ular

ity




Hybrid Heuristic

Figure 5.3: Modularity of Web-Google

0

0.005

0.01

0.015

0.02

0.025

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Mod

ular

ity C

ompu

tatio

n Ef

ficie

ncy




Hybrid Heuristic

Figure 5.4: MCE of Web-Google

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 500 1000 1500 2000 2500 3000 3500 4000 4500

Mod

ular

ity




Hybrid Heuristic

Figure 5.5: Modularity of YouTube

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0 500 1000 1500 2000 2500 3000 3500 4000 4500

Mod

ular

ity C

ompu

tatio

n Ef

ficie

ncy




Hybrid Heuristic

Figure 5.6: MCE of YouTube

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 5000 10000 15000 20000 25000 30000 35000 40000

Mod

ular

ity




Hybrid Heuristic

Figure 5.7: Modularity of Pokec

0

0.0005

0.001

0.0015

0.002

0.0025

0 5000 10000 15000 20000 25000 30000 35000 40000

Mod

ular

ity C

ompu

tatio

n Ef

ficie

ncy




Hybrid Heuristic

Figure 5.8: MCE of Pokec

52


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Mod

ular

ity




Hybrid Heuristic

Figure 5.9: Modularity of Small world

0

0.005

0.01

0.015

0.02

0.025

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Mod

ular

ity C

ompu

tatio

n Ef

ficie

ncy




Hybrid Heuristic

Figure 5.10: MCE of Small world

0

0.05

0.1

0.15

0.2

0.25

0 50 100 150 200 250 300 350 400 450

Mod

ular

ity




Hybrid Heuristic

Figure 5.11: Modularity of Scale free

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0 50 100 150 200 250 300 350 400 450

Mod

ular

ity C

ompu

tatio

n Ef

ficie

ncy




Hybrid Heuristic

Figure 5.12: MCE of Scale free

node-neighbor pairs were computed.).

Dataset Pokec and Small world show the greatest MCE changing from Louvain

method, since the two datasets hold higher average degree than other datasets, and

it is obvious that the higher average degree of the network, the more effective our

heuristics are, under the condition that the network includes clear community

structure ( or high modularity value).

Under the dataset scale free [ Figure 5.11 and Figure 5.12 ], where there is

almost no community structures, our heuristics are also hold high MCE. It means

our basic idea is still effective even when there is no clear community structure

in network data. However, in modularity [ See Figure 5.11, 5.12 ], it shows lower

value than Louvain method. In fact, it can be interpreted from the characteristic of

scale free networks. In scale free network, some nodes with greatly exceeds degree

are exists in it [ See section 2.3 ], which are called hub in network literature.

These hubs would have greater part of degrees in network, though hubs are few in

number. So it is obvious that the probability of one hub’s neighbor communities

belong to the same community finally should be very low, when the networks data

has no clear community structure in it. So that, when one compute a hub h1, those

neighbor communities which should finally be belong to the same community with

the hub h1 ( like Community A, B and C in figure 4.2 ) should be select in a low

53


probability, and it may lead hub to an incorrect community at a high probability.

Furthermore, when a hub belongs to an incorrect community, it would affect the

connection strength between the incorrect community and its connected nodes.

The above is the reason why there would be lower modularity value finally

in our heuristics than Louvain method, and additionally in order to verify our

argument, we do community detection with another scale free network with a

clear community structures in it this time. Then from the result [ See figure 5.13

and 5.14 ], we can see the same changing tendency of modularity.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 50 100 150 200 250 300 350 400 450 500

Mod

ular

ity




Hybrid Heuristic

0

0.005

0.01

0.015

0.02

0.025

0.03

0 50 100 150 200 250 300 350 400 450 500

Mod

ular

ity C

ompu

tatio

n Ef

ficie

ncy




Hybrid Heuristic

Figure 5.13: MCE and modularity changing of another scale free dataset, which

is consisting of 1,000,000 nodes and 1,999,998 edges. It contains clear community

structure with a high modularity value 0.98.

Dataset YouTube is another point we need to notice, since it makes a differ-

ent changing in modularity than others especially when using basic and change

neighbor heuristics [ Figure 5.5 and 5.6]. In figure 5.5, the modularity changing

becomes slowly after the value of 0.3, though finally it gets the same modularity

as other heuristics. About the reason, we believe it is also related to the charac-

teristic of the dataset. As we mentioned in datasets description part, communities

in YouTube would be consist of users who have similar interest or hobby. The

groups may share videos in a particular theme, person or event. Obviously, there

would be some users ( may be the account is a government, organization and so

on.) who are connected very large number of other users, because the popularity

of the videos they share, but the number of them would be not great in number.

These users could become a hub and, moreover, there would be unclear commu-

nity structures around them, even though the modularity of the whole network is

high, since these users are connected to kinds of users, but almost never take the

initiative to connect to other users. On the contrary, most of users should only

connect with few users, since most of users do not join a particular group, and

54


share various types of videos freely. So that, dataset YouTube show the similar

tendency with scale free network, which has low modularity value, even though

YouTube itself has a clear community structures ( or has high modularity value).

5.2 Time and modularity

About the comparison of time and modularity between Louvain method and our

four heuristics, we would continue to use the six datasets mentioned in section 5.1.

First, about the modularity, all of our heuristics finally get almost the same

modularity values as Louvain method, which means our proposal is theoretically

correct [ The modularity of each dataset is as Table 5.3 ]. Especially, under the

scale free network with low modularity ( it means there is almost no community

structure in it ), they are also get the same values finally, even though in figure

5.11, it gets lower modularity value in the first phase one.

Table 5.3: Modularity of all datasets

Dataset Modularity Dataset Modularity

Web-Google 0.98 Pokec 0.72

DBLP 0.82 Small world 0.77

YouTube 0.72 Scale free 0.27

On the other hand, about the time of the heuristics and Louvain method, there

exists a large gap between them [ See figure 5.14 ] when the data size becomes big.

Especially when the dataset up to million nodes, there is 3 - 6 times gap between

them, but one interesting point is that, under dataset youtube, our hybrid and

max neighbor heuristics have close speed to the Louvain method. It also may

be related to hub included in YouTube, since dataset small world with no hubs

becomes about 6 times slower than Louvain method.

The Louvain method holds the fastest speed, even though it has the worst mod-

ularity computation efficiency. It means, although our heuristics could compute

and detect community in a high efficiency, the cost applied by per node-neighbor

55


25.7

7.49 13.98

80.27

22.49 31.63

19.45

48.26

222.87

90.93

17.35 10.99

19.76

202.05

43.39

23.31 13.44

51.31

223.98

86.64

16.28 7.51

24.5

217.09

115.82

0

50

100

150

200

250

Web-‐Google DBLP Youtube Pokec Smallworld

Louvain Basic HeurisIc Max Neighbor HeurisIc Change Neighbor HeurisIc Hybrid HeurisIc

Figure 5.14: Time(s) of each Heuristics

community computation is far higher than Louvain method, so decreasing cost of

per computation would be one of our future work.

Meanwhile, when it comes to dataset with one hundred thousand nodes level,

our hybrid heuristic would complete community detection almost at the same time

as Louvain method under DBLP, while with dataset Google, it becomes faster than

Louvain method, even though its size is bigger than DBLP. It may be because

dataset Google has higher average degree and clear community structures in it.

About dataset scale free, because its runtime is too short to represent in the

figure 5.14, we give the times in numerical way. They are as follows.

Table 5.4: Time of dataset scale free

Method Time ( s ) Method Time ( s )

Louvain 0.22 Basic Heuristic 0.44

Max Neighbor Heuristic 0.34 Change Neighbor Heuristic 0.36

Hybrid Heuristic 0.36

56


5.3 Community contents

In this section, we would do some experiments to evaluate performance by quan-

titative comparison of the community assignments found by community detection

algorithms.

As a metric, here, we use Normalized Mutual Information ( NMI ), which is

based on information theory concepts. It is defined as follows[11]. Let Cd be a

detected community, Cr as a true community existed in network ( we would call

it real community ), and nd,r = {vi|Cd ∩ Cr}, which can represent the number

of shared nodes between a detected community Cd and a real community Cr.

Then define P (X = d,X = r) =nd,r

|V | to be the joint probability that a randomly

selected nodes is both in community Cd and Cr. Using this joint probability over

the random variables X and Y, one can compute Mutual Information, which can

describe a measure of two random variables’ mutual dependence.

I(X;Y ) = −∑y∈Y

∑x∈X

p(x, y) log(p(x, y)

p(x)p(y)) (5.1)

Equation 5.1 represents mutual information between two variables, and, here,

p(x) = |Cx||V | , should be the probability of one community ( detected or real one).

With mutual information, one can measure the similarity between detected and

real existing community assignments, and in order to fit the value between 0 and 1,

it would be normalized by Shannon entropy[54]. Then the value would be 1 if the

two assignments were identical, while it would be 0 if they were uncorrelated. So

that the normalized mutual information becomes as follows, where H(X), H(Y )

means Shannon entropy of the random variables X and Y .

H(X) = −∑x∈X

p(x) log p(x) (5.2)

H(Y ) = −∑y∈Y

p(y) log p(y) (5.3)

Inorm =2I(X;Y )

H(X) +H(Y )(5.4)

On the other hand, for comparing NMI, we need some datasets, whose real

community assignments are known in advance. Here, we would generate some

synthetic networks with community structures planted in it, since we hope to

57


know the change of NMI when only one condition is changed ( e.g., when degree

or modularity is the only changing value ), and one real data would never satisfy

this needs. As a generation model, we would use the model from a paper of

Lancichinetti, et al.[34]. Use the model, one can generate network data that hold

both scale-free and small world property. As parameters, one can specify the

average degree size k, the mixing parameter µ that can decide the prabability p1,

p2. Here, p1 means the probability of the connections between two nodes of the

same communities, while p2 means the probability of the connections between two

nodes of the different communities. In addition, one can decide the node size (

N ), the range of degree size ( k ), the range of community size ( minc, maxc ),

the exponents of the degree ( t1 ) and the community size distributions ( t2 ). For

our experiment, we would mainly use k, µ and C to generate network data, whose

degree size or modularity change gradually. We generate three sets of data, where

one consists of three data and the other two are made up from 5 data each.

Table 5.5: Datasets1

Data1( Louvain / Hybrid )



Parameters

N = 100,000,k = 10, µ=0.1,minc = 3,000,maxc = 4,000

N = 30,000,k = 100, µ=0.5,minc = 1,000,maxc = 4,000

N = 1,000,k = 10, µ=0.01,

minc = 10,maxc = 20

|V | 100,000 30,000 1,000

|E| 1,000,000 2,988,272 4,9992|E||V | 10 199.2 19.8

p1/p2 0.26e-02 / 0.10e-06 0.27e-01 / 0.18e-02 0.68 / 0.10e-03

|C| 29 14 62

|C ′| 11/27 14 / 11 62 / 62

Modularity 0.73 / 0.76 0.42 / 0.40 0.97 / 0.97

NMI 0.32 / 0.34 0.25 / 0.25 0.61 / 0.61

58



Data4 Data5 Data6 Data7 Data8

Parameters

N=10,000k=50µ=0.08

minc=100maxc=300

N=10,000k=100µ=0.08

minc=100maxc=300

N=10,000k=150µ=0.08

minc=100maxc=300

N=10,000k=200µ=0.08

minc=100maxc=300

N=10,000k=400µ=0.08

minc=100maxc=300

|V | 10,000 10,000 10,000 10,000 10,000

|E| 250,000 500,000 750,000 1,000,000 2,000,0002|E||V | 50 100 150 200 400

p1 0.30 0.55 0.65 0.74 0.76

p2 0.41e-03 0.82e-03 0.12e-02 0.1e-02 0.34e-02

|C| 60 54 45 39 20

|C ′| 60 54 45 39 20

Modularity 0.90 0.90 0.90 0.89 0.87


Data9 Data10 Data11 Data12 Data13

Parameters

N=10,000k=30µ=0.01

minc=100maxc=300

N=10,000k=30µ=0.1

minc=100maxc=300

N=10,000k=30µ=0.2

minc=100maxc=300

N=10,000k=30µ=0.5

minc=100maxc=300

N=10,000k=30µ=0.7

minc=100maxc=300

|V | 10,000 10,000 10,000 10,000 10,000

|E| 150,000 150,000 150,000 150,000 150,0002|E||V | 30 30 30 30 30

p1 0.18 0.16 0.13 0.09 0.56e-01

p2 0.31e-06 0.31e-03 0.92e-03 0.15e-02 0.21e-02

|C| 56 54 55 56 56

|C ′| 57 54 56 56 13

Modularity 0.96 0.88 0.68 0.48 0.25

Using Datasets1 [ table 5.5 ], we compared NMI between Louvain method and

our hybrid heuristics, where |C| and |C ′| represents number of real communities

59


Average degree of network

Nor

mal

ized

mut

ual i

nfor

mat

ion

0.3

0.32

0.34

0.36

0.38

0.4

0.42

0.44

0.46

50 100 150 200 400

Figure 5.15: The comparison of NMI

when only average degree is different.

Modularity

Nor

mal

ized

mut

ual i

nfor

mat

ion

0.425

0.43

0.435

0.44

0.445

0.45

0.455

0.46

0.465

0.47

0.475

0.96 0.88 0.68 0.48 0.25

Figure 5.16: The comparison of NMI

when only modularity is different.

and detected communities. Then from the data we can see that the modularity

and NMI in both are always the same, which means our proposal is theoretically

correct, and so that, after this, we would only consider the hybrid heuristic.

Then, with datasets 2 and 3, we do two experiments to observe the changing

of NMI. From datasets 2, we would observe the change of NMI when average

degree of datasets increase with other conditions are the same, while in datasets

3, modularity would be the only different element. The result is as follows.

From the result, we can find some interesting points. One point is that the

NMI is higher when a network has low degree and clear community structure

[ See data 2 and datasets 2 ]. It means the accuracy of Louvain method ( or

our heuristic ) would become higher when one network has stronger small world

property, since low degree and clear community structure are the contents of it (

It is obvious because community detection methods are just designed for detecting

existing community ).

Another point is that in some cases the method get high NMI value just by

making less number of communities than real one [ See data 4, 1, 13 ]. About this

point, although can not make any conclusion, we feel very interesting because it

means, in some cases, the method can not identify invalid node movement, and

this thing may be caused by network itself, the drawback of Louvain method ( or

our heuristic ) or even by hub. Furthermore, maybe from this point, we could find

some ideas to improve the accuracy. It also is one of our future works.

60

Chapter 6

Summary

This chapter concludes our work, and talk about our possible future works.

6.1 Summary

In this thesis, we have proposed some heuristics for improving the processing power

of community detection method, mainly focusing on its efficiency. As a main

countermeasure, we used a property of the network itself - the small world property,

and successfully reduced the computing times without affecting the acccuracy and

modularity. In addition, we proposed two ideas in selecting more meaningful

neighbor communities for improving the efficiency of community detection in the

latter part of the computation and shortening the convergence time, which also

led to the improvement of modularity computation efficiency ( MCE ), a measure

proposed in this thesis. Especially in the changed neighbor selection part [ Section

4.4], we found an interesting phenomenon. Although we failed to complete the

proof in a mathematical way, it may help to further improve the MCE. Finally,

we implemented a hybrid version heuristic to absorb the benefits of each idea.

In the second part of this research, we evaluated our heuristics in three ways;

MCE, time/modularity, and quality of detected communities. As a result, our

heuristics were better than Louvain method ( up to 3.5 times in small world dataset

) for MCE, while modularity and quality remained about the same. Especially,

when the dataset has a large average degree, it showed superiority in MCE. An-

other interesting point is that the heuristic performed just as well even with scale

61

CHAPTER 6. SUMMARY

free networks. Meanwhile, we also found some problems, for instance, in Louvain

method ( or our heuristic ), that they would create far bigger communities than

the real ones in some cases. Finally, about the time, the computation completed

quickly in some datasets such as Web-Google, DBLP and YouTube where com-

munity structures are clear or where hubs are included. In addition, although

our heuristic’s run time is slower than Louvain, it is obvious that this slowness is

caused by the cost of per node-neighbor community computation, since our heuris-

tic holds less computation times. In other words, our basic idea of utilizing the

small world property was effective enough.

6.2 Future works

The thesis leaves a number of possible directions open for future investigations.

The first one is from the phenomenon mentioned in section 4.4. If the phenomenon

occurs every time irrespective of network structures and is proved in a mathemat-

ical way, then one can directly identify those nodes that might move the next

without computing each node-neighbor community pairs in slowly changing part.

It should lead to further improvement of MCE. The second one is about the run

time. As we mentioned above, our heuristic is slower than the Louvain method

despite it holding higher modularity computation efficiency, which means each

computation applies heigher cost than the Louvain method. Therefore, we need

to improve it in a technical way. Finally, we also hope to investigate the reason

why the Louvain method ( or our heuristic ) creates far bigger communities than

the real ones in some cases.

62

CHAPTER 6. SUMMARY

Acknowledgement

Foremost, I would like to offer my sincerest gratitude to my supervisor, Professor

KenWakita, who has supported me from 3 years ago when I was a research student.

Without his patience and advice on my research and study, it would have been

impossible for me to write this thesis and get a master’s degree. Thank you for

reaching out to me when I failed in entrance examination and was wondering what

to do next. I attribute the level of my Masters degree to his encouragement and

effort.

In addition to my advisor, I would also like to express my gratitude to all

members of the Wakita Laboratory. Theis research proceeded smoothly, under

their insightful comments and valuable discussion.

Last but not least, I would like to thank my parents for their unconditional

support throughout my life, my two uncles’ family for their financial and emotional

support throughout my degree.

63

Bibliography

[1] Albert-Laszlo Barabasi and Reka Albert. Emergence of scaling in random

networks. science, 286(5439):509–512, 1999.

[2] Albert-Laszlo Barabasi, Reka Albert, and Hawoong Jeong. Mean-field the-

ory for scale-free random networks. Physica A: Statistical Mechanics and its

Applications, 272(1):173–187, 1999.

[3] A.L. Barabsi. Linked: The New Science of Networks. Perseus Pub., 2002.

[4] Richard Bellman. On a routing problem. Technical report, DTIC Document,

1956.

[5] Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne

Lefebvre. Fast unfolding of communities in large networks. Journal of Statis-

tical Mechanics: Theory and Experiment, 2008(10):P10008, 2008.

[6] Bela Bollobas. Modern graph theory, volume 184. Springer, 1998.

[7] U. Brandes. On Modularity - NP-completeness and Beyond. Interner Bericht.

Univ., Fak. fur Informatik, Bibliothek, 2006.

[8] Dale Carnegie. How to Win Friends and Influence People. Simon and Schus-

ter, 2009.

[9] Eunjoon Cho, Seth A Myers, and Jure Leskovec. Friendship and mobility:

user movement in location-based social networks. In Proceedings of the 17th

ACM SIGKDD international conference on Knowledge discovery and data

mining, pages 1082–1090. ACM, 2011.

64

BIBLIOGRAPHY

[10] Aaron Clauset, Mark EJ Newman, and Cristopher Moore. Finding community

structure in very large networks. Physical review E, 70(6):066111, 2004.

[11] Leon Danon, Albert Diaz-Guilera, Jordi Duch, and Alex Arenas. Comparing

community structure identification. Journal of Statistical Mechanics: Theory

and Experiment, 2005(09):P09008, 2005.

[12] Jorn Davidsen, Holger Ebel, and Stefan Bornholdt. Emergence of a small

world from local interactions: Modeling acquaintance networks. Physical Re-

view Letters, 88(12):128701, 2002.

[13] Edsger W Dijkstra. A note on two problems in connexion with graphs. Nu-

merische mathematik, 1(1):269–271, 1959.

[14] Peter Sheridan Dodds, Roby Muhamad, and Duncan J Watts. An experi-

mental study of search in global social networks. science, 301(5634):827–829,

2003.

[15] Holger Ebel, Lutz-Ingo Mielsch, and Stefan Bornholdt. Scale-free topology of

e-mail networks. arXiv preprint cond-mat/0201476, 2002.

[16] Kee-Cheok Cheong Edmund Terence Gomez, Franois Bafoil. Government-

Linked Companies and Sustainable, Equitable Development(Routledge

Malaysian Studies Series). Routledge, 2014.

[17] Leonhard Euler. Solutio problematis ad geometriam situs pertinentis. Com-

mentarii Academiae Scientiarum Imperialis Petropolitanae, 8:128–140, 1736.

[18] Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. On power-law

relationships of the internet topology. In ACM SIGCOMM Computer Com-

munication Review, volume 29, pages 251–262. ACM, 1999.

[19] Maryam Fatemi and Laurissa Tokarchuk. An empirical study on imdb and

its communities based on the network of co-reviewers. In Proceedings of the

First Workshop on Measurement, Privacy, and Mobility, page 7. ACM, 2012.

[20] Charles M Fiduccia and Robert M Mattheyses. A linear-time heuristic for

improving network partitions. In Design Automation, 1982. 19th Conference

on, pages 175–181. IEEE, 1982.

65

BIBLIOGRAPHY

[21] Santo Fortunato. Community detection in graphs. Physics Reports, 486(3):75–

174, 2010.

[22] Ullas Gargi, Wenjun Lu, Vahab S Mirrokni, and Sangho Yoon. Large-scale

community detection on youtube for topic discovery and exploration. In

ICWSM, 2011.

[23] Fanica Gavril. Algorithms for minimum coloring, maximum clique, minimum

covering by cliques, and maximum independent set of a chordal graph. SIAM

Journal on Computing, 1(2):180–187, 1972.

[24] Michelle Girvan and Mark EJ Newman. Community structure in social

and biological networks. Proceedings of the National Academy of Sciences,

99(12):7821–7826, 2002.

[25] Prem K Gopalan and David M Blei. Efficient discovery of overlapping commu-

nities in massive networks. Proceedings of the National Academy of Sciences,

110(36):14534–14539, 2013.

[26] Roger Guimera and Luis A Nunes Amaral. Functional cartography of complex

metabolic networks. Nature, 433(7028):895–900, 2005.

[27] Roger Guimera, Stefano Mossa, Adrian Turtschi, and LA Nunes Amaral.

The worldwide air transportation network: Anomalous centrality, community

structure, and cities’ global roles. Proceedings of the National Academy of

Sciences, 102(22):7794–7799, 2005.

[28] Magnus M Halldorsson. A still better performance guarantee for approximate

graph coloring. Information Processing Letters, 45(1):19–23, 1993.

[29] Paul W Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. Stochas-

tic blockmodels: First steps. Social networks, 5(2):109–137, 1983.

[30] Xiufang Jiang, Guiquan Liu, and Zhiting Lin. A fast algorithm for finding

community structure based on community closeness. In Proceedings of the

2010 Third International Joint Conference on Computational Science and

Optimization - Volume 01, CSO ’10, pages 436–439, Washington, DC, USA,

2010. IEEE Computer Society.

66

BIBLIOGRAPHY

[31] Leonard Kaufman and Peter J Rousseeuw. Finding groups in data: an intro-

duction to cluster analysis, volume 344. John Wiley & Sons, 2009.

[32] Brian W Kernighan and Shen Lin. An efficient heuristic procedure for parti-

tioning graphs. Bell system technical journal, 49(2):291–307, 1970.

[33] Mirjam Kretzschmar. Sexual network structure and sexually transmitted

disease prevention: a modeling perspective. Sexually transmitted diseases,

27(10):627–635, 2000.

[34] Andrea Lancichinetti, Santo Fortunato, and Filippo Radicchi. Benchmark

graphs for testing community detection algorithms. Physical Review E,

78(4):046110, 2008.

[35] Daniel B Larremore, Aaron Clauset, and Abigail Z Jacobs. Efficiently inferring

community structure in bipartite networks. arXiv preprint arXiv:1403.2933,

2014.

[36] Chris Laszlo. The Sustainable Company: How to Create Lasting Value through

Social and Environmental Performance. Island Press, 2013.

[37] Jure Leskovec, Kevin J Lang, Anirban Dasgupta, and Michael W Mahoney.

Community structure in large networks: Natural cluster sizes and the absence

of large well-defined clusters. Internet Mathematics, 6(1):29–123, 2009.

[38] Stanley Milgram. The small world problem. Psychology Today, 67(1):61–67,

1967.

[39] Peter J Mucha and Mason A Porter. Communities in multislice voting net-

works. Chaos, 20(4):041108, 2010.

[40] Peter J Mucha, Thomas Richardson, Kevin Macon, Mason A Porter, and

Jukka-Pekka Onnela. Community structure in time-dependent, multiscale,

and multiplex networks. Science, 328(5980):876–878, 2010.

[41] Mark Newman. Networks: An Introduction. Oxford University Press, Inc.,

New York, NY, USA, 2010.

67

BIBLIOGRAPHY

[42] Mark EJ Newman. The structure of scientific collaboration networks. Pro-

ceedings of the National Academy of Sciences, 98(2):404–409, 2001.

[43] Mark EJ Newman. Detecting community structure in networks. The European

Physical Journal B-Condensed Matter and Complex Systems, 38(2):321–330,

2004.

[44] Mark EJ Newman. Fast algorithm for detecting community structure in net-

works. Physical review E, 69(6):066133, 2004.

[45] Mark EJ Newman and Michelle Girvan. Finding and evaluating community

structure in networks. Physical review E, 69(2):026113, 2004.

[46] Mikael Onsjo and Osamu Watanabe. A simple message passing algorithm for

graph partitioning problems. In Algorithms and Computation, pages 507–516.

Springer, 2006.

[47] Giuliano Andrea Pagani and Marco Aiello. The power grid as a complex

network: a survey. arXiv preprint arXiv:1105.3338, 2011.

[48] Gergely Palla, Imre Derenyi, Illes Farkas, and Tamas Vicsek. Uncovering the

overlapping community structure of complex networks in nature and society.

Nature, 435(7043):814–818, 2005.

[49] Christopher R Palmer and Christos Faloutsos. Electricity based external simi-

larity of categorical attributes. In Advances in Knowledge Discovery and Data

Mining, pages 486–500. Springer, 2003.

[50] Tiago P Peixoto. Efficient monte carlo and greedy heuristic for the inference

of stochastic block models. Physical Review E, 89(1):012804, 2014.

[51] Arnau Prat-Perez, David Dominguez-Sal, and Josep-Lluis Larriba-Pey. High

quality, scalable and parallel community detection for large real graphs. In

Proceedings of the 23rd international conference on World wide web, pages

225–236. International World Wide Web Conferences Steering Committee,

2014.

68

BIBLIOGRAPHY

[52] Sidney Redner. How popular is your paper? an empirical study of the cita-

tion distribution. The European Physical Journal B-Condensed Matter and

Complex Systems, 4(2):131–134, 1998.

[53] Stuart A Rice. The identification of blocs in small political bodies. American

Political Science Review, 21(03):619–627, 1927.

[54] Claude Elwood Shannon. A mathematical theory of communication. ACM

SIGMOBILE Mobile Computing and Communications Review, 5(1):3–55,

2001.

[55] Hiroaki Shiokawa, Yasuhiro Fujiwara, and Makoto Onizuka. Fast algorithm

for modularity-based graph clustering. In Twenty-Seventh AAAI Conference

on Artificial Intelligence, 2013.

[56] Kimmo Soramaki, Morten L Bech, Jeffrey Arnold, Robert J Glass, and Wal-

ter E Beyeler. The topology of interbank payment flows. Physica A: Statistical

Mechanics and its Applications, 379(1):317–333, 2007.

[57] Kenta Suzuki and Ken Wakita. Extracting multi-facet community structure

from bipartite networks. In Computational Science and Engineering, 2009.

CSE’09. International Conference on, volume 4, pages 312–319. IEEE, 2009.

[58] Lubos Takac and Michal Zabovsky. Data analysis in public social networks.

In International Scientific Conference AND International Workshop Present

Day Trends of Innovations, 2012.

[59] Stijn van Dongen. Graph Clustering by Flow Simulation. PhD thesis, Univer-

sity of Utrecht, Utrecht, May 2000.

[60] Ken Wakita and Toshiyuki Tsurumi. Finding community structure in mega-

scale social networks:[extended abstract]. In Proceedings of the 16th interna-

tional conference on World Wide Web, pages 1275–1276. ACM, 2007.

[61] Stanley Wasserman. Social network analysis: Methods and applications, vol-

ume 8. Cambridge university press, 1994.

[62] D.J. Watts. Six Degrees: The Science of a Connected Age. William Heine-

mann, London, 2003.

69

BIBLIOGRAPHY

[63] Duncan J Watts and Steven H Strogatz. Collective dynamics of‘small-world’networks. nature, 393(6684):440–442, 1998.

[64] Jaewon Yang and Jure Leskovec. Defining and evaluating network communi-

ties based on ground-truth. In Proceedings of the ACM SIGKDD Workshop

on Mining Data Semantics, page 3. ACM, 2012.

[65] Wayne Zachary. An information flow model for conflict and fission in small

groups. J. of Anthropological Research, 33:452–473, 1977.

[66] Peng Zhang, Jinliang Wang, Xiaojia Li, Menghui Li, Zengru Di, and Ying Fan.

Clustering coefficient and community structure of bipartite networks. Physica

A: Statistical Mechanics and its Applications, 387(27):6869–6875, 2008.

[67] Haijun Zhou. Distance, dissimilarity index, and network community structure.

Physical review e, 67(6):061901, 2003.

70

Fast community structure identi cation of small … community...Fast community structure identi...

Documents

Transcript of Fast community structure identi cation of small … community...Fast community structure identi...