Network analysis and applications Sushmita Roy BMI/CS 576 [email protected] Dec 2 nd, 2014.

35
Network analysis and applications Sushmita Roy BMI/CS 576 www.biostat.wisc.edu/bmi576 [email protected] Dec 2 nd , 2014

Transcript of Network analysis and applications Sushmita Roy BMI/CS 576 [email protected] Dec 2 nd, 2014.

Page 1: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Network analysis and applications

Sushmita RoyBMI/CS 576

www.biostat.wisc.edu/[email protected]

Dec 2nd, 2014

Page 2: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Computational problems in networks

• Network reconstruction– Infer the structure and parameters of networks– We examined this problem in the context of “expression-

based network inference”

• Network evaluation/analysis – Properties of networks

• Network applications– Prioritization of genes– Interpretation of gene sets– Identify new biological pathways

• Densely connected subnetworks

Page 3: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Using networks as tools for discovery

• So far we have considered problems in network inference

• Topological properties of networks can be informative

• Biological networks can also be used for numerous applications– Prioritization of genes– Identify new biological pathways

• Densely connected subnetworks

Page 4: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Network properties

• Degree distribution• Average shortest path length• Clustering coefficient• Modularity• Network motifs

Page 5: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Why should we care about network measures?

• From Barabasi and Oltvai 2004:“Probably the most important discovery of

network theory was the realization that despite the remarkable diversity of networks in nature, their architecture is governed by a few simple principles that are common to most networks of major scientific and technological interest”

Page 6: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Node degree

• Undirected network– Degree, k: Number of neighbors of a node

• Directed network– In degree, kin: Number of incoming edges

– Out degree, kout: Number of outgoing edges

Directed Edge

A

B C

D

E

FIn degree of F is 4Out degree of E is 0

Page 7: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Average degree

• Consider an undirected network with N nodes • Let ki denote the degree of node i• Average degree is

Page 8: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Degree distribution

• P(k) the probability that a node has k edges• Different networks can have different degree

distributions• A fundamental property that can be used to

characterize a network

Page 9: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Different degree distributions

• Poisson distribution– The mean is a good representation of ki of all nodes– Networks that have a Poisson degree distribution are

called Erdos Renyi or random networks

• Power law distribution– Also called scale free – There is no “typical” node that captures the degree of

nodes.

Page 10: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Poisson distribution

• A discrete distribution

• The Poisson is parameterized by which can be easily estimated by maximum likelihood

k

P(X

=k)

Page 11: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Power law distribution

• Used to capture the degree distribution of most real networks

• Typical value of is between 2 and 3.

• MLE exists for but is more complicated– See Power-Law Distributions in Empirical

Data. Clauset, Shalizi and Newman, 2009 for details

P(k)

Page 12: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Erdos Renyi random graphs

• Dates back to 1960 due to two mathematicians Paul Erdos and Alfred Renyi.

• Provides a probabilistic model to generate a graph• Starts with N nodes and connects two nodes with

probability p• Node degrees follow a Poisson distribution• Tail falls off exponentially, suggesting that nodes with

degrees different from the mean are very rare

Page 13: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Scale free networks

• Degree distribution is captured by a power law distribution

• There is no “typical” node that describes the degree of all other nodes

• Such networks are ubiquitous in nature

Page 14: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Poisson versus Scale free

Barabasi & Oltvai 2004, Nature Genetics Review

Page 15: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Yeast protein interaction network is believed to be scale free

• “Whereas most proteins participate in only a few interactions, a few participate in dozens”

• Such high degree nodes are called hubs

Barabasi & Oltvai 2004, Nature Genetics Review

Page 16: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Degree of a node is correlated to functional importance of a node

Red nodes on deletion cause the organism to dieRed nodes also among the most degree central

Yeast protein-protein interaction network

Page 17: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Origin of scale free networks

• Scale free networks are ubiquitous is nature• How do such networks form?• Such networks are the result of two processes– a growth process where new nodes join the network over

an extended period of time• Think about how the internet has grown

– Preferential attachment: new nodes tend to connect to nodes with many neighbors• Rich get richer.

Page 18: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Growth and preferential attachment in scale free networks

A new node (red) is more likely to connect to node 1than 2

Page 19: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Path lengths

• The shortest path length between two nodes A and B:– The smallest number of edges that need to be traversed to

get from A to B

• Mean path length is the average of all shortest path lengths

• Diameter of a graph is the longest of all shortest paths in the network

Page 20: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Scale-free networks tend to be ultra-small

• Two nodes on the network are connected by a small number of edges

• Average path length is log(log(N)), where N is the number of nodes in the network

• In a random network (Erdos Renyi network) the average path length is log(N)

Page 21: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Modularity in networks

• Modularity “refers to a group of physically or functionally linked nodes that work together to achieve a distinct function”

-- Barabasi & Oltvai

• Two questions– Given a network is it modular?

• Modularity can be assessed using “Clustering coefficient”• Modularity can also be assessed using the difference between the

number of edges within and between a given grouping of nodes.

– Given a network what are the modules in the network?• Graph clustering

Page 22: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

A modular network

Module 1

Module 2

Module 3

Page 23: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Clustering coefficient

• Measure of transitivity in the network that asks– If A is connected to B, and B is connected to C, how often is A

connected to C?

• Clustering coefficient Ci for each node i is

• ki Degree of node i

• ni is the number of edges among neighbors of i• Average clustering coefficient gives a measure of “modularity”

of the network

A

BC

?

Page 24: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Clustering coefficient example

A

C

BG

D

Page 25: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Finding modules in a graph

• Given a graph find the modules– Modules are represented by densely connected subgraphs

• The graph can be partitioned into modules using “Graph clustering”– Hierarchical or flat clustering using a notion of similarity

between nodes– Markov clustering algorithm– Spectral clustering– Girvan-Newman algorithm

Page 26: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Girvan-Newman algorithm

• General idea: “If two communities are joined by only a few inter-community edges, then all paths through the network from vertices in one community to vertices in the other must pass along one of those few edges.”

• Betweenness of an edge e is defined as the number of shortest paths that include e

• Edges that lie between communities tend to have high betweenness

M. E. J. Newman and M. Girvan. Finding and evaluating community structure

Page 27: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Girvan-Newman algorithm

• Initialize– Compute betweenness for all edges

• Repeat until convergence criteria1. Remove the edge with the highest betweenness2. Recompute betweenness of affected edges

• Convergence criteria can be– No more edges– Desired modularity.

Page 28: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Evaluating the “modularity” of the clusters

• Given K groups of nodes, we can compute modularity (Q) also as– difference between within group (community) connections and

expected connections within a group

K: number of groupseij: Fraction of total edges that link nodes in group i to group j

Page 29: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Zachary’s karate club study

Each node is an individual and edges represent social interactions among individuals. The shape and colors represent different groups.

Node grouping based on betweenness

Page 30: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Network motifs

• Network motifs are defined as small recurring subnetworks that occur much more than a randomized network

• A subgraph is called a network motif of a network if its occurrence in randomized networks is significantly less than the original network.

• Some motifs are associated to explain specific network dynamics

Milo Science 2002

Page 31: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Network motifs of size three in a directed network

Page 32: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Finding network motifs

• Enumerating motifs– Subgraph enumeration

• Calculating the number of occurrences in randomized networks

Milo 2002

Page 33: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Network motifs found in many complex networks

The occurrence of the feedforward loop in both networks suggests a fundamental similarity in the design on these networks

Page 34: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Structural common motifs seen in the yeast regulatory network

Lee et.al. 2002, Mangan & Alon, 2003

Auto-regulation Multi-component Feed-forward loop

Single Input Multi Input

Regulatory Chain

Feed-forward loops involved in speeding up in response of target gene

Page 35: Network analysis and applications Sushmita Roy BMI/CS 576  sroy@biostat.wisc.edu Dec 2 nd, 2014.

Summary of network analysis

• Given a network, its topology can be characterized using different measures– Degree distribution– Average path length– Clustering coefficient

• Degree distribution can be– Poisson– Power law

• Such networks are called scale free

• Network modularity– Clustering coefficient– Edge betweennness

• Network motifs– Overrepresentation of subgraphs of specific types