Using structure in protein function annotation: predicting protein interactions Donald Petrey, Cliff...

Post on 13-Dec-2015

217 views 0 download

Transcript of Using structure in protein function annotation: predicting protein interactions Donald Petrey, Cliff...

Using structure in protein function annotation:

predicting protein interactions

Donald Petrey, Cliff Qiangfeng Zhang, Raquel Norel, Barry Honig

Howard Hughes Medical InstituteDepartment of Biochemistry and Molecular Biophysics Center for Computational Biology and Bioinformatics

Columbia University

Fold

Superfamily

Family

Classification

●●

● ●

●●

●●

●●

●● ●

●●

●●

Discrete islands

ThioredoxinQ8L5D4

Glutaredoxin-4

protein disulfide oxidoreductase

L-VVVDFS-A-----TWCGPCKMI-KPFFH-SLSEKKSSLVVLY-A-----PWCSFSQAM-DESYN-DVAEK P--ILLYM-KGSPKLPSCGFSAQA-VQALA-AC---

Iron-sulfur cluster assembly

P22 Cro repressor λ Cro repressor

25%

Afe142%

Xfaso 1

39%

44%

42%

Pfl6

Continuous space

Putative active site(SCREEN)

Formyl-CoA transferasefrom O. formigenes

NESG Target TM1055from T. maritima

Coenzyme-A

CoA from Formyl-CoAtransferase

SAH from DNAmethyltransferaseTyrosine from tyrosyl

tRNA synthetaseThiamin diphosphate fromDXP synthetase

TM1055

Structural neighbors of TM1055

• 1793 proteins• 70 SCOP folds• 3 CATH architectures• 10 CATH topologies• 48 CATH homologous superfamilies• ~ 500 distinct ligands

“jelly roll” “β-propeller”“β-prism”

virus cell bacterium cell

“jelly roll” “β-propeller”

phagosome lyzosome

“β-prism”

Experimental interactions (from BIND+Cellzome)

Modeled interactions Davis FP, Braberg H, et. al. (2006). Nucleic Acids Research 34(10): 2943-52

19,424 12,867

409

target sequences?

sequence similarity

structural similarity

template complex

Modeled complex

Structures from the same SCOP family (non-redundant): 8 (SCOP domain d.17.4.2)

Structures from the same SCOP superfamily (non-redundant) : 23 (SCOP domain d.17.4)SCOP fold (non-redundant):44 (SCOP domain d.17)

Structural neighbors by structure alignment: 420 (PSD < 0.8, the SCOP domain id of the green structure here is d.17.4.4 )

Structure model

the overlap of modeled interface with predicted (shown in red)

good bad

B. subtilis lethal factor

PelleB. Subtilis

lethal factor

n

i xyi

xyin

iin

IcP

IcPcLRccLR

111 ~

|

|,,

Gene co-expression profiles

RGS4 block RASD1

CKS1A interact SKP2

CD4 bind TFAP2A

GPNMB contain PPFIBP1

TACR1 require PARP1

GeneWays (literature) Structures

Figure 8. Use Bayesian method to integrate PPI evidence from various sources. The likelihood ratio of an interaction between two proteins (x and y), , is inferred from different evidences (ci). Here and represent the probability that a “clue”, ci, is observed for proteins x and y that are known to interact or not (represented as and ).

),,( 1 nccLR xyi IcP | xyi IcP

~|

xyI xyI~

ThioredoxinQ8L5D4

Glutaredoxin-4

protein disulfide oxidoreductase

L-VVVDFS-A-----TWCGPCKMI-KPFFH-SLSEKKSSLVVLY-A-----PWCSFSQAM-DESYN-DVAEK P--ILLYM-KGSPKLPSCGFSAQA-VQALA-AC---

Iron-sulfur cluster assembly

Conclusions

• Structural information needs to be leveraged

• Interactively combining overall function annotation with analysis that depends on local bioinformatic/biophysical features.

• Infrastructure applies equally to analyzing subtle differences within families.

Acknowledgements

NIH grant U54-GM074958

Honig LabMarkus Fischer

Cliff ZhangKely Norel