YMIB Maze in biology: the pathway problem Ueng-Cheng Yang ( 楊永正 ) Institute of Bioinformatics...
-
date post
21-Dec-2015 -
Category
Documents
-
view
215 -
download
2
Transcript of YMIB Maze in biology: the pathway problem Ueng-Cheng Yang ( 楊永正 ) Institute of Bioinformatics...
YMIB
Maze in biology: the pathway problem
Ueng-Cheng Yang (楊永正 )
Institute of Bioinformatics
National Yang-Ming University
Nov. 14, 2003
http://www.flint.umich.edu/Departments/ITS/crac/mazeorig.form.html
YMIB
oogenesis
mRNAlocalization
fertilization 1st cleavage
2nd cleavage
3rd cleavage
2 identical cells
4 identical cells
8 cells with 2 different cell types
sperm
oocyte
embryonic development
Genome is the complete set of genetic material, which is similar to the
programs in the ROM
YMIB
Gene expression of eukaryotes
Picture taken fromLehninger’s “Principles of Biochemistry”
YMIB
Microarray (Gene chip) is a high-throughput technique that may measure thousands of gene expressi
on at a time
Black box
Changes in geneexpression
Perturbation
YMIB
Presentation of life and knowledge management
Sequence information
decompress
Expression level
Tissue (spatial)
Development(temporal)Genes
YMIB
Transform or out of the game?
http://www.sciencemag.org/cgi/content/full/291/5507/1221/F1
Global
High-throughput
analysis
Local
Individualanalysis
YMIB
Bioinformatics should provide the direction for future biology
Bioinformaticsresearch
Genome, transcriptomeand proteome research
Collectdata
Interpretdata
tatttctctactgatttgaacaagattgtcgagaaattcccaaaacaagccgaaaaattg
Data => Information => Knowledge => Technique => Economy
YMIB
Are there rules in biology?
* Picture made from screenshot of http://www.shef.ac.uk/~chem/web-elements/
YMIB
Should there be rules in biology?
Geneduplication
Variation(mutation)
Geneduplication
Recombination
+
YMIB
Pathway study is the one of most fundamental problems for biological research at molecular level
• Metabolism• Signal transduction• Biosynthesis of
macromolecules (mechanism study)– Replication
– Transcription
– RNA processing
– Translation
YMIB
Similar chemistry can be re-used in different enzymes
COOH COOHCH2 CH2
CH2 CH2
C O + NAD+ + CoASH C O + CO2 + NADH+H+
COOH S CoAketoglutarate succinyl CoA
YMIB
Paralogous genes may have similar functions
Linear molecule pyruvate (3) → acetyl CoA (2) + CO2
-ketobutyrate (4) → propionyl CoA (3) + CO2
-ketoglutarate (5) → succinyl CoA (4) + CO2
-ketoadipic acid (6) → glutaryl CoA (5) + CO2
Branched molecule-ketoisovalerate (5) → isobutyryl CoA (4) + CO2
-ketoisocaproic acid (6) → isovaleryl CoA (5) + CO2
-keto--methylvalerate (6) → -methylbutyryl CoA (5) + CO2
YMIB
Observation (III): “Dehydrogenation, hydration, dehydrogenation” is a pathway module
OAA citrate
isocitrate
-ketoglutarate
succinyl CoAsuccinate
malate
fumarate
-2H-CO2
-2H-CO2
CoA
-2H
-2H
CoA + GTP
acetyl CoA
release CO2
reforming the carrier
H2O
TCA cycle
YMIB
A set of reactions can be “re-used” together
RCH2CH2 CH2C-S-CoA
O
RCH2CH=CHC-S-CoA
OH O
RCH2CH CH2C-S-CoA
OH O
RCH2C CH2C-S-CoA
O O
-2H
+H2O
-2H
RCH2CH2CH2CH2CH2CH2C-S-CoA
O
RCH2CH2CH2CH2C-S-CoA
O
RCH2CH2C-S-CoA
O
Acetyl CoA
Acetyl CoA
YMIB
A single reaction may create a new pathway
3 1
5 + 5
3 + 3 + 7
6
4 + 6
Trans-ketolase 6 Trans-aldolase 1
5 + 5
33 + 7
6
4 + 6
Photosynthesis Pentose phosphate cycle
YMIB
The pathway problems that might be obvious to physicists
Pathway simulation => hypothetical cell– Flux balance analysis– S-system– … etc.
YMIB
Complicated feedback regulation
A B C D
W X Y Z(-)
(-)
"x"(such as ADP) will accumulate if this reaction is inhibited.
YMIB
M
G1
S
G2
Cell cycle and simulation of complex biological events
M G1 S G2 M
interphase
YMIB
Other types of pathway problems
• Pathway discovery– From protein-protein interaction and microarray
• Pathway reconstruction– Genome annotation and interpretation
• Pathway simulation => hypothetical cell– Flux balance analysis– S-system
YMIB
Information integration is the first step for data mining
Modification, expression, interaction, structure
DNA
RNA
transcription
translation
protein
Genomic seq.
EST, SAGE,Gene chips
Annotation,comparison
YMIB
Different cells have the same genome, but they express different set of genes after differentiation
Colon KidneyLung OvarySmallintestineTestis Thyroid… Total
EGF 0 15 1 0 0 0 0 … 19EGFR 3 4 19 9 0 0 0 … 103PLCG1 1 3 7 1 2 1 0 … 68SHC1 4 10 22 1 0 3 1 … 249GRB2 1 1 3 2 0 0 2 … 77SOS1 4 3 0 2 0 0 0 … 36HRAS 1 7 10 0 2 1 0 … 58RAF1 4 6 28 1 3 4 0 … 197MAP3K1 2 8 2 2 0 0 0 … 44MAP2K4 5 6 1 3 1 4 0 … 81MAP2K1 4 10 3 2 0 2 0 … 82MAPK8 1 2 2 0 0 1 0 … 33STAT1 13 32 14 6 4 6 3 … 260STAT3 3 7 17 7 0 1 0 … 135MAPK3 9 10 9 4 1 1 0 … 181
YMIB
Organizing the known information: Integrating different types of pathways
Signal transduction Gene regulatorynetwork
Metabolicpathway
CDK E2F PFK
F6P
F1,6P
EGF
Glycolysis
YMIB
Steps in pathway discovery
Factors involved => Components
Molecular interaction => Events
Order of events => Pathways
Pathway interaction => Circuits
YMIB
The dream of molecular biologists
?
Cell., 100(1):57–70 Review, 2000.
PNAS, Vol. 95, 14863-14868
Science. Vol 292. May,2001
YMIB
Appropriate presentation format is essential for computation
[EGFR]+[EGF] <-> [EGF-EGFR]
[EGF-EGFR]+[EGF-EGFR] <->[(EGF-EGFR)2]
[(EGF-EGFR)2]<->[(EGF-EGFR*)2]
[(EGF-EGFR*)2]+[GAP]<->[(EGF-EGFR*)2-GAP]
[(EGF-EGFR*)2-GAP]+[Grb2]<->[(EGF-EGFR*)2-GAP-Grb2]
[(EGF-EGFR*)2-GAP-Grb2]+[Sos]<->[(EGF-EGFR*)2-GAP-Grb2-Sos]
[(EGF-EGFR*)2-GAP-Grb2-Sos]+[Ras-GDP]<->[(EGF-EGFR*)2-GAP-Grb2-Sos-Ras-GDP]
[(EGF-EGFR*)2-GAP-Grb2-Sos-Ras-GDP]<->[(EGF-EGFR*)2-GAP-Grb2-Sos]+[Ras-GTP]
[Raf]+[Ras-GTP]<->[Raf-Ras-GTP]
[Raf-Ras-GTP]<->[Raf*]+[Ras-GTP*]
Nature biotechnology 20, 370-375
YMIB
Strategy
Nucleus
cellmembrane
Zoutwardreconstruction
Y
X
?
?
inwardreconstruction
Receptor
adaptor
?
?connector
YMIB
Reconstructing pathways based on protein-protein interaction
Receptor
adaptor
… etc.inward
reconstruction
YMIB
Identifying new receptor is the starting point for inward reconstruction
YMIB
1
2
9
10
1112
13
1415
16 17
19
21
22
232425
2627
2829
3
45
678
18
20
30?
The distribution of death domain containing genes in human genome
YMIB
A
B
C
D
E
F
0.1
16 UNC5D10 UNC5A
21 UNC5B7 UNC5C
23 NFKB231
8 NFKB119 DAPK1
34 NY-REN-6436 MALT1
33 IRAK235 IRAK1
26 IRAK-M12
23 EDAR
529 NGFR
27 CRADD6
24 FADD28 TRADD
11 RIPK113 TNFRSF21
32 LRDD1 TNFRSF12
25 TNFRSF1A14 TNFRSF10A
15 TNFRSF10B18 TNFRSF11B
22 TNFRSF630 P84
4 MYD8820 ANK317 ANK1
9 ANK2
Phylogenetic clusters correlate with protein functions
YMIB
Functional correlation: Tissue specificity of gene expression
brain tissues
Paralogous genes
YMIB
Specificity of protein-protein interaction
A
B
C
D
E
F
0.1
16 UNC5D10 UNC5A
21 UNC5B7 UNC5C
23 NFKB231
8 NFKB119 DAPK1
34 NY-REN-6436 MALT1
33 IRAK235 IRAK1
26 IRAK-M12
23 EDAR
529 NGFR
27 CRADD6
24 FADD28 TRADD
11 RIPK113 TNFRSF21
32 LRDD1 TNFRSF12
25 TNFRSF1A14 TNFRSF10A
15 TNFRSF10B18 TNFRSF11B
22 TNFRSF630 P84
4 MYD8820 ANK317 ANK1
9 ANK2
TNFRSF1A, 12 --- TRADD --- FADDTNFRSF6, 10A, 10B --- FADD
YMIB
Reconstructing pathways based gene expression and pathway information
Nucleus
cellmembrane
Jun
outwardreconstruction
MAPK8-P*
MAPK8-P*
MAP2K4-P*
?
YMIB
Related pathways in heart
YMIB
Related pathways can be discovered by looking for shared components among pathways
25
25
23
16
14
14
20
17
18
13
1915
19
15
1517
1517
13
13
13
1813
Shared
Component
Pathway1 Pathway2 Index
pdgfPathway egfPathway 1.96e-40
pdgfPathway tpoPathway 9.89e-27
pdgfPathway igf1Pathway 2.26e-22
pdgfPathway insulinPathway 2.26e-22
egfPathway igf1Pathway 2.26e-22
egfPathway insulinPathway 2.26e-22
pdgfPathway ngfPathway 2.20e-22
… … …
YMIB
To die, or not to die? It’s a
signaling problem
YMIB
If PDGF receptor does not exist in colon, why do we need the downstream
components in PDGF
signaling pathway?
YMIB
“MAP2K4, MAPK8, Jun” is a pathway
module shared by at least 3 pathways
PDGF 11
EGF 11
TNF 21
EGF/PDGF 16
ALL 4
YMIB
Pathway modules
MAP3K1(MEKK1)module
RAF1(RAF)
module
MAP3K7(TAK)
module
Death signalGrowth signal Stress signal
HRASTRAF2
FOS JUN ATF2 SP1
Gene expression regulation, (including transcription, splicing), translation and protein modification…
RPS6KA5
YMIB
Connector
Factors involved => Components
Molecular interaction => Events
Order of events => Pathways
YMIB
Inducible gene sets are co-regulated.
Picture taken from http://genomics.stanford.edu/yeast/additional_figures_link.html
YMIB
Most constitutively expressed genes are not regulated
Pyruvate kinase
Rate-limiting step is usually the target for regulation
YMIB
Microarray exp. is the nature’s way to cla
ssify genes
Collect sections from different angles
Image reconstructionhttp://www.npcc.gov.tw/npcc/chn/imaging/imaging.htm
Tomography(斷層掃瞄 )
YMIB
In extreme environment, the whole pathway can be turned on/offALPHA = alpha factor arrest 18; ELU = centrifugal elutriation 14; CDC15 = cdc15 ts 15; SPO = sporulation 7; HT = shock by high temp 6; D = reducing agent 4; C = low temp 4; DX = diauxic shift 7
Clustering is driven by these features
ALPHA ELU CDC15 SPO HT D C DX
Conflicts?
YMIB
Unrelated sequences of similar function cluster together
Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. (1998) Cluster analysis and display of genome-wide expression pattern. Proc. Natl. Acad. Sci. USA 95, 14863-14868.
YMIB
How good is the classification?
• In microarray clustering– hexokinase II– phosphofructokinase– aldolase– triose phosphate isomerase– GAPDH 1, 2, 3– phosphoglycerate kinase– phosphoglycerate mutase– Enolase II– pyruvate kinase
• In glycolysis, in total there are 10 enzymes involved
• Microarray experiment only missed phospho-glucose isomerase
• Pyruvate (de)carboxylase and transaldolase are mis-placed
Pretty good
YMIB
Pathway is a subset of components in a regulatory network
How can we reconstruct the network from partial pathways?
YMIB
Tri-component relation is better than bi-component relation
YMIB
Distinguishing branch and linear structures is sufficient
YMIB
Distinguish the branch and linear structures
YMIB
Exact order within a subset is not essential to reconstruct the pathway
4 5 6 73
{4,5,6}{5,6,7}
{3,4,5}
3=>4=>5=>6=>7
{5,4,6}
{7,5,6}
{4,5,3}
YMIB
Integrating discontinuous tri-component relation
YMIB
Summary
• Inward reconstruction– Look for novel receptors by protein domain search– Look for possible pathways by protein-protein interaction
information.
• Connector– Look for trio-relation by learning Bayesian network
• Outward reconstruction– Look for pathway modules– Establish transcription regulation network
Need a user-centric environment for information-
driven biomedical research
YMIB
Acknowledgements
• Yuh-Fan Liu: Genome wide motif scanning
• Yung-Wen Deng: Death domain resource and cross talks among pathways
• Yu-Tai Wang: Pathway knowledge management system
• Kai-Lung Tang: Pathway visualization
• Shih-Te Yang: Pathway prediction
• Collaborator: Dr. Der-Ming Liou
YMIB
Complications in regulation
Alternative pathways caused by alternative splicing events
YMIB
Differential Processing of The Calcitonin Gene Transcript in Rats
Picture taken from Lehninger’s “Principles of Biochemistry”
YMIB
A tumor necrosis factor receptor that lacks of transmembrane region
YMIB
A FADD protein that lacks of DED domain
YMIB
Information-driven biomedical research
Make observations and working hypotheses by comparing information