PPI Network Alignment

84
PPI Network PPI Network Alignment Alignment 陳陳 陳陳陳陳 陳陳陳陳 陳陳陳 、、、 陳陳陳 陳陳陳陳 陳陳陳 、、

description

PPI Network Alignment. 陳琨、朱安強、林晏禕、翁翊鐘 陳縕儂、呂哲安、楊孟翰. Protein-protein Interaction Network Alignment. Protein Biosynthesis. From DNA to life. Biology Technology. How do we measure protein interaction? Two-hybrid screens Co-immunoprecipitation. Two-hybrid screens. UAS. Reporter gene (LacZ). - PowerPoint PPT Presentation

Transcript of PPI Network Alignment

Page 1: PPI Network Alignment

PPI Network PPI Network AlignmentAlignment

陳琨、朱安強、林晏禕、翁翊鐘陳縕儂、呂哲安、楊孟翰

Page 2: PPI Network Alignment

PROTEIN-PROTEIN PROTEIN-PROTEIN INTERACTIONINTERACTIONNETWORK NETWORK ALIGNMENTALIGNMENT

Page 3: PPI Network Alignment

Protein BiosynthesisProtein Biosynthesis

Page 4: PPI Network Alignment

From DNA to lifeFrom DNA to life

Page 5: PPI Network Alignment

Biology TechnologyBiology TechnologyHow do we measure protein

interaction?◦Two-hybrid screens◦Co-immunoprecipitation

Page 6: PPI Network Alignment

Two-hybrid screensTwo-hybrid screens

A. Regular transcription of the reporter gene

UASReporter gene

(LacZ)

Page 7: PPI Network Alignment

Two-hybrid screensTwo-hybrid screens

B. One fusion protein only (Gal4-BD + Bait) – no transcription

UASReporter gene

(LacZ)

no transcription

Page 8: PPI Network Alignment

Two-hybrid screensTwo-hybrid screens

C. One fusion protein only (Gal4-AD + Prey) – no transcription

UASReporter gene

(LacZ)

no transcription

Page 9: PPI Network Alignment

Two-hybrid screensTwo-hybrid screens

D. Two fusion proteins with interacting Bait and Prey

UASReporter gene

(LacZ)

Page 10: PPI Network Alignment

Co-immunoprecipitationCo-immunoprecipitation

Known viral proteinProtein A

AntibodyUnknown proteinX

Y

Page 11: PPI Network Alignment

Protein-Protein Interaction Protein-Protein Interaction Networks?Networks?Protein are nodesInteractions are edges

Yeast PPI network

Page 12: PPI Network Alignment

Network comparisonsQuery for a modulePredict functions of a modulePredict protein functionsValidate protein interactionsPredict protein interactions

Page 13: PPI Network Alignment

Random networkRandom networkConnect each pair of node with

prob p Expect value of edge is pN(N-1)/2Poisson distribution

◦The node with high degree is rare

Page 14: PPI Network Alignment

Scale-free networkScale-free networkPower-law degree distributionHubs and nodesWhen a node add into network, it

prefer to link to hubs

Page 15: PPI Network Alignment

The Network Alignment The Network Alignment ProblemProblemGiven k different protein

interaction networks belonging to different species, we wish to find conserved sub-networks within these networks

Conserved in terms of protein sequence similarity (node similarity) and interaction similarity (network topology similarity)

Page 16: PPI Network Alignment

General Framework For General Framework For Network Alignment AlgorithmsNetwork Alignment Algorithms

Page 17: PPI Network Alignment

PATHBLASTPATHBLASTConserved pathways within bacteria and yeast as revealed by global protein network alignment. Brian P. Kelley , Roded Sharan  , Richard M. Karp  , Taylor Sittler , David E. Root , Brent R. Stockwell , and Trey Ideker (2003)

Page 18: PPI Network Alignment

Protein SimilarityProtein SimilarityHomologous proteins:

two proteins that have common ancestry.

Orthologous proteins: two protein from different species that diverged after a speciation event.

Paralogous proteins: two proteins from the same species that diverged after a duplication event.

Source: Roded Sharan, Protein-protein Interaction: Network Alignment Lecture Note

Page 19: PPI Network Alignment

Path BlastPath BlastPathBlast is a strategy for aligning two protein

interaction networks to elucidate their conserved pathways.

This method identifies pairs of interaction paths, drawn from the networks of different species or from different processes within a species, where proteins at equivalent path positions share strong sequence homology.

Source: Conserved pathways within bacteria and yeast as revealed by global protein network alignment. PNAS, 2003.

Page 20: PPI Network Alignment

Alignment GraphAlignment GraphVertical solid line:

protein-protein intertactions.

Horizontal dotted line: significant sequence similarity.

Node: a homologous protein pair.

Link: protein interaction relations of three types: direct, gap, and mismatch. Source: Conserved pathways within bacteria and

yeast as revealed by global protein network alignment. PNAS, 2003.

Page 21: PPI Network Alignment

Yeast & Bacteria PPI Alignment Yeast & Bacteria PPI Alignment graph graph The yeast and bacteria global alignment

graphs v.s. randomized networks obtained by permuting the protein name.

This suggests that both species share conserved interaction pathways.

“direct interaction” are rare. “mismatches” and “gaps” were permitted,

allowed overcome false negatives.

Source: Roded Sharan, Protein-protein Interaction: Network Alignment Lecture Note

Page 22: PPI Network Alignment

Scoring FunctionScoring Function

p(v) is the probability of true homology with in the protein pair represented by v.

q(e) is the probability that the protein-protein interactions represented by e.

The background probabilities are the expected values of p(v) and q(e) over global alignment graph.

Page 23: PPI Network Alignment

Pathways & Protein Pathways & Protein ComplexesComplexesPathBLAST is used to find

conserved paths and then overlapping paths are merged into complexs.

Source: Roded Sharan, Protein-protein Interaction: Network Alignment Lecture Note

Page 24: PPI Network Alignment

Yeast v.s. BacteriaYeast v.s. BacteriaOrthologous PathwaysSelect the 150 highest-

scoring pathway of length four from alignment graph.

Combing overlapping pathways, found fell into 5 network regions.

Right figure involves the union of 6 paths.

With similar function.Solid link: direct

interactions, dotted link: gaps or mismatches.

Source: Conserved pathways within bacteria and yeast as revealed by global protein network alignment. PNAS, 2003.

Page 25: PPI Network Alignment

Yeast vs. Yeast.Yeast vs. Yeast.Paralogous PathwaysProteins were not

allowed to pair with themselves or their neighbors.

Analyzed 150 highest-scoring pathway alignments of length 4 from alignment graph.

distinct alignments but homologous in function.

Source: Conserved pathways within bacteria and yeast as revealed by global protein network alignment. PNAS, 2003.

Page 26: PPI Network Alignment

Pathway QueriesPathway Queries

PATHBLAST identified two other well known MAPK pathways as the highest-scoring hits,indicating that the algorithm was sufficiently sensitive and specific to identify known paralogous pathways.

Source: Conserved pathways within bacteria and yeast as revealed by global protein network alignment. PNAS, 2003.

Page 27: PPI Network Alignment

Identification of Identification of Protein ComplexesProtein Complexes

Roded Sharan, Trey Ideker, Brian P. Kelley, Ron Shamir, Richard M. Karp:

Identification of Protein Complexes by Comparative Analysis of Yeast and Bacterial Protein Interaction Data.

Journal of Computational Biology 12(6): 835-846 (2005)

Page 28: PPI Network Alignment

State-of-The-Art

Page 29: PPI Network Alignment

Flashback[Input] the alignment graph of 2

PPI networks.We already can handle the

problem of finding conserved linear pathways.

Now this is not the end: How can we step further?

Page 30: PPI Network Alignment

MotivationFinding more complex conserved

structures is of practical interest.

Page 31: PPI Network Alignment

MotivationFinding more complex conserved

structures is of practical interest. [Reduction] Now we can merge

overlapping paths into complexes.

Page 32: PPI Network Alignment

MotivationFinding more complex conserved

structures is of practical interest. [Reduction] Now we can merge

overlapping paths into complexes. Or we can develop another model

to identify conserved complexes.

Page 33: PPI Network Alignment

A New Model: The Main Idea How do you recognize protein

complexes?◦ Dense Subgraphs◦ Comparative Analysis

"When I use a word," Humpty Dumpty said in a rather a scornful tone, "it means just what I choose it to mean -- neither more nor less."

Lewis Carroll, Through the Looking-Glass

Page 34: PPI Network Alignment

Dense Subgraph: LikelihoodLikelihood Formula 0.1: given an

induced subgraph,◦ L(C) = |Ec|/ { ½ * |Vc| * ( |Vc| - 1 ) }

It makes sense: graphs with more edges have higher likelihood.

Page 35: PPI Network Alignment

Dense Subgraph: Likelihood(Cont.)

Likelihood Formula 0.1: given an induced subgraph,◦ L(C) = |Ec|/ { ½ * |Vc| * ( |Vc| - 1 ) }

It makes sense: graphs with more edges have higher likelihood.

We only consider the structure of graphs.

Problems of link analysis are often data-dependent.

Page 36: PPI Network Alignment

Dense Subgraph: Likelihood(Cont.)

Likelihood Formula 0.1: given an induced subgraph,◦ L(C) = |Ec|/ { ½ * |Vc| * ( |Vc| - 1 ) }

Likelihood Formula 0.2: given an induced subgraph,

What the hell is it?

Page 37: PPI Network Alignment

Dense Subgraph: Likelihood(Cont.)

What do you expect about the behavior of revised formulas?

Higher likelihood: The scores of dense graphs are higher.

Adjustment: The weakest link ◦ Bonus: Interaction with low

probability happens.

Page 38: PPI Network Alignment

Dense Subgraph: Likelihood(Cont.)

Higher likelihood: The scores of dense graphs are higher.

We assume that every 2 proteins in a complex interact with some probability p( 0.8 is used in this work).

We can use the model as a baseline for comparing density.

Page 39: PPI Network Alignment

Dense Subgraph: Likelihood(Cont.)

Adjustment: The weakest link!p(u,v) is defined to be the fraction

of graphs in FG that includes this edge.◦ FG : the family of graphs with V and

the same degree sequence.Edges incident on vertices with

higher degrees have higher probability.

Page 40: PPI Network Alignment

Dense Subgraph: Likelihood(Cont.)

Likelihood Formula 0.2: given an induced subgraph,

What the hell is it?◦ For p(u,v) = 0.2, we have 4 and ¼ in

both side.◦ For p(u,v) = 0.6, we have 4/3 and 1/2 in

both side.◦ It makes sense! We emphasize the

weakest link.

Page 41: PPI Network Alignment

Dense Subgraph: Likelihood(Cont.)

Likelihood Formula 0.1: given an induced subgraph,

◦ L(C) = |Ec|/ { ½ * |Vc| * ( |Vc| - 1 ) } Likelihood Formula 0.2: given an induced

subgraph,

Likelihood Formula 0.3: given an induced subgraph,

Page 42: PPI Network Alignment

The Main Idea Revisited How do you recognize protein

complexes?◦ Dense Subgraphs

We have some revised formula for density in a PPI network.

◦ Comparative Analysis

Page 43: PPI Network Alignment

Comparative AnalysisIdea: If some structure occurs in

different species, it is of high probability to be some meaningful structure.

How do you define dense substructures on alignment graphs?

Page 44: PPI Network Alignment

Comparative Analysis(Cont.)

Consider two subsets U1 ={ u1,..., uk}, V2 ={ v1,..., vk} and Θ: U1 → V2 is a many-to-many correspondence.

Since you already have

You may derive the formula 1.1 as follows:

Does it make sense?

Page 45: PPI Network Alignment

Comparative Analysis(Cont.)Θ is useful information:

You have the formula 1.2:

{ A/(A+B) }/ {X/(X+Y)}

Page 46: PPI Network Alignment

The Main Idea Revisited How do you recognize protein

complexes?◦ Dense Subgraphs

We have some revised formula for density in a PPI network.

◦ Comparative Analysis We have some revised formula for

density in an alignment network.

Page 47: PPI Network Alignment

Search the Complexes Now we only need to find heavy

subgraphs in the alignment graph.The problem is NP-Hard.

Page 48: PPI Network Alignment

Search the Complexes(Cont.)

[Seed] Compute a seed around each node v.

[Refined Seed] Enumerate all subsets of the seed that have size 3 and contain v.

[Local Search] Iteratively modify the refined seed.

[Output Heavy Subgraphs] For each node, we record at most k heaviest subgraphs.

Page 49: PPI Network Alignment

Search the Complexes(Cont.)

[Seed] Compute a seed around each node v.

[Restrict the Size] Keep seeds small![Refined Seed] Enumerate all subsets of

the seed that have size 3 and contain v.[Local Search] Iteratively modify the

refined seed.[Output Heavy Subgraphs] For each node,

we record at most k heaviest subgraphs.[Filtering overlapping ones] Greedy

method is used!

Page 50: PPI Network Alignment

The Main Idea Revisited How do you recognize protein

complexes?◦ Dense Subgraphs

We have some revised formula for density in a PPI network.

◦ Comparative Analysis We have some revised formula for

density in an alignment network. Finally, we have some practical method

to search complexes!

Page 51: PPI Network Alignment

PATH QUERIESPATH QUERIES

Page 52: PPI Network Alignment

Path QueriesProblem definitionInput

◦a target network represented as an undirected weighted graph G(V, E), with a weight function on the edges w:E×E→R

◦A path queries Q=(q1,…,qk)

Scoring function of node similarity H:Q×V

Page 53: PPI Network Alignment

Output: a set of best matching pathways P=(p1,…,pl) in G, where a good match is measured in two respects:

1. The matched nodes are similar by scoring function H.

2. The reliability of edges in the matched pathway is high.

Page 54: PPI Network Alignment
Page 55: PPI Network Alignment

Algorithm

1. Introduce a mapping M from Q to P∪{0} where deleted query nodes are mapped to 0 by M.

2. Path Scoring:• interaction score and sequence

score

k

qMiii

l

iii

i

qMqHppw0,1

1

11 ,,

Page 56: PPI Network Alignment

Interaction score◦Edges weights represent the

logarithm of reliability of interaction between two proteins.

Sequence score◦BLAST E-value for the two proteins

normalized by the maximal E-value over all pairs of proteins from the two networks.

Page 57: PPI Network Alignment

AlgorithmAvoiding cycles

◦N. Alon, R. Yuster, and U. Zwick: Color-coding. J.ACM, 1995.

Finding the best matching paths:

deldeldel

del

idel

Vmdel

NSmiW

EjmjmwjcSmiW

EjmjqHjmwjcSmiW

SjiW

,1,,,1

,,,,,,

,,,,,,,1

max,,,

Page 58: PPI Network Alignment

Dataset and ResultsYeast and fly PPI networks

◦ The yeast (S. cerevisiae) PPI network contains 4,726 proteins and 15,166 known interactions between them.

◦ The fly (D. melanogaster) PPI network contains 7,028 proteins and 22,837 interactions.

271 pathways were discovered which were better than 99% of randomly chosen from yeast PPI network, and then were used as queries for the fly PPI network.

Page 59: PPI Network Alignment

Results

Page 60: PPI Network Alignment

APPLICATION OF PPI APPLICATION OF PPI NETWORK NETWORK ALIGNMENT: ALIGNMENT: ORTHOLOGY ORTHOLOGY MAPPINGMAPPING

S. Bandyopadhyay, R. Sharan, and T. Ideker. Systematic identification of functional orthologs based on protein network comparison. Genome Research, 16(3):428–435, 2006

Page 61: PPI Network Alignment

IntroductionIntroductionAnnotating protein function across species is

often complicated by the presence of paralogous proteins

Most of the methods of dealing with this problem are sequence-based models, thus sequences of proteins from different species were compared to find a group of proteins that have the same functional annotation

A protein and its functional ortholog are likely to interact with proteins in their respective networks that are themselves functional orthologs

This introduced a strategy for identifying functionally related proteins that supplements sequence-based comparisons with information on conserved protein-protein interactions

Page 62: PPI Network Alignment

Introduction (cont’d)Introduction (cont’d)

a b

a’b’

a’

b’b

a

Page 63: PPI Network Alignment

Functional orthologyFunctional orthology When the protein in question has

similarity to not one but many paralogous proteins, it’s harder to distinguish which of these is the true ortholog, the protein that is directly inherited from a common ancestor

Definite functional orthologs are defined as proteins that are functionally equivalent as a result of direct ancestry

Page 64: PPI Network Alignment

Model reviewModel reviewThe protein interaction networks of two species

are aligned by assigning proteins to sequences homology groups using the Inparanoid algorithm

Networks are aligned into a merged graph representation

Probabilistic inference is performed on the aligned networks to identify pairs of proteins, one from each species, that are likely to retain the same function based on conservation of their interacting partners

A logistic function is used to compute the probability of functional orthology for a protein pair i given the states of functional orthology for its network neighbors

The previous probability is updated for each pair over successive iterations of Gibbs sampling

Page 65: PPI Network Alignment

Model review (cont’d)Model review (cont’d)

Page 66: PPI Network Alignment

Conservation indexConservation indexConsider an alignment graph G

◦Nodes represent sequence-similar protein pairs

◦Edges link nodes (a, b) and (a’, b’) if one of (a, a’) or (b, b’) directly interacts, and the other interacts via a neighbor, which is directly connected to them

◦An edge is strongly conserved if its endpoints are true functional orthologs

Page 67: PPI Network Alignment

Conservation index Conservation index (cont’d)(cont’d)

network itsin bprotein of degree the:)(

network itsin aprotein of degree the:)(

i node involving links conservedstrongly ofnumber the:)(

i node a ofindex on conservati :)(

)()(

)(2)(

bd

ad

id

ic

bdad

idic

Page 68: PPI Network Alignment

Probabilistic modelProbabilistic modelThe probability of functional

orthology for a pair of proteins is influenced by the probabilities of functional orthology for their network neighbors, which in turn depend on their network neighbors, and so on

This type of probabilistic model is known as a Markov random field

Page 69: PPI Network Alignment

Probabilistic model Probabilistic model (cont’d)(cont’d)

Positive training examples: the definite functional orthologs having as least one conserved interaction

Negative training examples: the protein paired with its best BLAST e-value matching protein not the same cluster by the Inparanoid algorithm

examples trainingnegative allover ))|(1(

and examples trainingpositive allover )|(

ofproduct themaximizingby optimized are and Parameter

)(such that all ofset the:Z

i node of neighbors ofset the:)(

i node of state the:

)}(exp{1

1)|(

)(

)(

N(i)

)(

iNi

iNi

j

i

iNi

ZzP

ZzP

iNjz

iN

z

icZzp

Page 70: PPI Network Alignment

Orthology inferenceOrthology inferenceThe above model was used to estimate the

final posterior probabilities P(zi) using the Gibbs sampling

Nodes representing ambiguous functional orthologs are each assigned a temporary state z=0 or z=1, initially at random

At each iteration, a node i is sampled (with replacement) and its value if zi is updated given the states of its neighbors, ZN(i). The new value of zi is set to 0 or 1 with probability P(zi|ZN(i))

Over all iterations, the nodes designed as definite functional orthologs and non-orthologs are forced to states of 1 and 0, respectively

Page 71: PPI Network Alignment

Experimental resultsExperimental results

A total of 2244 clusters were generated by the Inparanoid algorithm, covering 2834 proteins in yeast and 3881 proteins in fly

Of these, 1552 clusters contained only a single yeast and fly protein pair and were assumed to represent definite functional orthologs

They applied above method to resolve the remaining 692 clusters which were assumed to represent ambiguous functional orthologs, and found 121 contained protein pairs for which at least one pair had conserved interations between networks

In 60 of these, the highest probability was assigned to the protein pair that was also the most sequence-similar via BLAST

Page 72: PPI Network Alignment

Experimental results Experimental results (cont’d)(cont’d)

Page 73: PPI Network Alignment

ConclusionConclusionThese findings confirm that

yeast/fly proteins classified as definite functional orthologs are more likely to have equivalent functional roles in the protein network

The conserved network context could be used to help discriminate functional orthology from general sequence similarity

Page 74: PPI Network Alignment

MULTIPLE NETWORK MULTIPLE NETWORK ALIGNMENTALIGNMENT

R. Sharan, S. Suthram, R.M. Kelley, T. Kuhn, S. McCuine, P. Uetz, T. Sittler, R.M. Karp, and T. Ideker.Conserved patterns of protein interaction in multiple species. PNAS, 102(6):1974–1979, 2004

Page 75: PPI Network Alignment

The alignment graphThe alignment graph Each node in this graph consists of a group

of sequence-similar proteins, one from each species

Each link between a pair of nodes in the alignment graph represent conserved protein interactions between the corresponding protein group

A search over the alignment graph is performed to identify:1. Short linear paths of interacting proteins, which

model signal transduction pathways2. Dense clusters of interactions, which model

protein complexes

Page 76: PPI Network Alignment

The alignment graph The alignment graph (cont’d)(cont’d)

Page 77: PPI Network Alignment

Experimental resultsExperimental resultsThey applied the multiple network

alignment framework to three PPI networks:◦ Yeast: 14319 interactions among 4389 proteins◦ Worm: 3926 interactions among 2718 proteins◦ Fly: 20720 interactions among 7038 proteins

It identified 183 protein clusters and 240 paths conserved at a significance level of P < 0.01; groups of conserved clusters overlap to define 71 distinct network regions

Page 78: PPI Network Alignment

Experimental results Experimental results (cont’d)(cont’d)

Page 79: PPI Network Alignment

Experimental results Experimental results (cont’d)(cont’d)

Page 80: PPI Network Alignment

Prediction of protein Prediction of protein functionfunctionWhenever the set of proteins in a

conserved cluster or path (over all species) was significantly enriched for a particular GO annotation and at least half of the proteins in the cluster or path had that annotation, all remaining proteins in the sub-network were predicted to have that annotation

Page 81: PPI Network Alignment

Fast and accurate alignment of Fast and accurate alignment of multiple PPI networksmultiple PPI networksBy Maxim Kalaev, Vineet Bafna,

and Roded Sharan, 2007Drawback of the alignment graph:

exponential growth of the graph with the number of species

They introduced a new algorithm avoiding the explicit representation of every set of potentially orthologous proteins, thereby reducing time and memory requirements

Page 82: PPI Network Alignment

The layered alignment The layered alignment graph (1/3)graph (1/3)Given k PPI networks (for k species

respectively)A layered alignment graph: each layer

corresponds to a species and contains the corresponding network. Additional edges connect proteins from different layers if they are sequence similar

A k-spine: a sub-graph of size k which includes a vertex from each of the layers. A k-spine corresponds to a set of truly orthologous proteins

A collection of connected k-spines induces a candidate conserved sub-network

Page 83: PPI Network Alignment

The layered alignment The layered alignment graph (2/3)graph (2/3)

Species 1 Species 2 Species k

k-spin U[3]……

Inter-layer edge

PPI edge

U1 U2 U3 Uk

Page 84: PPI Network Alignment

The layered alignment The layered alignment graph (3/3)graph (3/3)If considering every k-spine to be a

node in a graphAn m-subnet: a collection U of k multi-

sets Ui = {ui[1],…, ui[m]}◦ For all 1≦ i ≦ k and 1≦ j ≦ m, ui[j] belongs

to Vi

◦ For all 1≦ j ≦ m, the set U[j] = {u1[j], u2[j],…, uk[j]} is a k-spine

The task is to look for high scoring m-subnets, for a fixed m