Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases,...

78
Machine Reading for Cancer Panomics Hoifung Poon 1

Transcript of Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases,...

Page 1: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Machine Reading for

Cancer Panomics

Hoifung Poon

1

Page 2: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Overview

2

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

……

……

Disease Genes

Drug Targets

……KB

Cancer Systems Modeling

High-Throughput Data

Page 3: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

3

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

……

……

Disease Genes

Drug Targets

…KB

Extract Pathways

from PubMed

Overview

High-Throughput Data

Grounded

Semantic Parsing

Page 4: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

4

Page 5: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Panomics

5

… ATTCGGATATTTAAGGC …

Genome Transcriptome Epigenome

……

Page 6: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Genotype Phenotype

6

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

……

……

Disease Genes

Drug Targets

……

High-Throughput Data

?

Page 7: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Precision Medicine

Page 8: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

8

Before Treatment 15 Weeks

Vemurafenib on BRAF-V600 Melanoma

Page 9: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Vemurafenib on BRAF-V600 Melanoma

9

Before Treatment 15 Weeks 23 Weeks

Page 10: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Cancer

Hundreds of mutations

Most are “passenger”, not driver

Can we identify likely drivers?

10

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC … Normal cells

Tumor cells

Page 11: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Traditional Biology

11

Targeted Experiments Discovery

One

hypothesis

Page 12: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Genomics

12

High-Throughput ExperimentsDiscovery

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

Many

hypotheses

?

Page 13: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Genomics

13

Discovery

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

Bottleneck #1: Knowledge

High-Throughput Experiments

Bottleneck #2: Reasoning

Page 14: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Example: Tumor Molecular Board

14

www.ucsf.edu/news/2014/11/120451/bridging-gap-precision-medicine

Page 15: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

10-20 highly trained specialists

Tens of hours on each patient

Problem: Hard to scale

U.S. 2014: 1.6 million new cases, 585K deaths

Wanted: Decision support for clinical genomics

15

Example: Tumor Molecular Board

Page 16: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Pathway Knowledge

Genes work synergistically in pathways

16

Page 17: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Why Hard to Identify Drivers?

Complex diseases Perturb multiple pathways

17Hanahan & Weinberg [Cell 2011]

Page 18: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Why Cancer Comes Back?

Subtypes with alternative pathway profile

Compensatory pathways can be activated

18

EphA2 EphB2

Ovarian Cancer

Page 19: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Why Cancer Comes Back?

Subtypes with alternative pathway profile

Compensatory pathways can be activated

19

EphA2 EphB2

Ovarian Cancer

X

Page 20: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Cancer Systems Modeling

20

Gene A DNA mRNA Protein Protein Active

Transcription Translation Activation

… ATTCGGATATTTAAGGC …

Functional activity

Mutation effect

Drug Target

……

Page 21: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

21

Gene A DNA mRNA Protein Protein Active

Gene B DNA mRNA Protein Protein Active

Gene C DNA mRNA Protein Protein Active

Transcription Factor

Protein Kinase

Knowledge Model

Page 22: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

22

Gene A DNA mRNA Protein Protein Active

Gene B DNA mRNA Protein Protein Active

Gene C DNA mRNA Protein Protein Active

Transcription Factor

Protein Kinase

?Knowledge Model

Page 23: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

23

Gene A DNA mRNA Protein Protein Active

Gene B DNA mRNA Protein Protein Active

Gene C DNA mRNA Protein Protein Active

Transcription Factor

Protein Kinase

?Knowledge Model

Page 24: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

24

Gene A DNA mRNA Protein Protein Active

Gene B DNA mRNA Protein Protein Active

Gene C DNA mRNA Protein Protein Active

Transcription Factor

Protein Kinase

!Knowledge Model

Page 25: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Approach: Graph HMM

25

Gene A DNA mRNA Protein Protein Active

Transcription Factor

Protein Kinase

Gene B DNA mRNA Protein Protein Active

Gene C DNA mRNA Protein Protein Active

Page 26: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Extract Pathways from PubMed

26

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

……

……

Disease Genes

Drug Targets

……KBHigh-Throughput Data

Page 27: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

PubMed

24 millions abstracts

Two new abstracts every minute

Adds over one million every year

27

Page 28: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

VDR+ binds to

SMAD3 to form

JUN expression

is induced by

SMAD3/4

PMID: 123

PMID: 456

……

28

Machine Reading

Page 29: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

29

Involvement of p70(S6)-kinase activation in IL-10

up-regulation in human monocytes by gp41 envelope

protein of human immunodeficiency virus type 1 ...

Machine Reading

Page 30: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

30

Involvement of p70(S6)-kinase activation in IL-10

up-regulation in human monocytes by gp41 envelope

protein of human immunodeficiency virus type 1 ...

IL-10human

monocytegp41 p70(S6)-kinase

Machine Reading

PROTEINPROTEINPROTEINCELL

Page 31: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

31

Involvement of p70(S6)-kinase activation in IL-10

up-regulation in human monocytes by gp41 envelope

protein of human immunodeficiency virus type 1 ...

Involvement

up-regulation

IL-10human

monocyte

SiteTheme Cause

gp41 p70(S6)-kinase

activation

Theme Cause

Theme

Machine Reading

REGULATION

REGULATION REGULATION

PROTEINPROTEINPROTEINCELL

Page 32: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

32

Involvement of p70(S6)-kinase activation in IL-10

up-regulation in human monocytes by gp41 envelope

protein of human immunodeficiency virus type 1 ...

Involvement

up-regulation

IL-10human

monocyte

SiteTheme Cause

gp41 p70(S6)-kinase

activation

Theme Cause

Theme

Machine Reading

REGULATION

REGULATION REGULATION

PROTEINPROTEINPROTEINCELL

Semantic Parsing

Page 33: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Long Tail of Variations

33

TP53 inhibits BCL2.

Tumor suppressor P53 down-regulates the activity of BCL-2 proteins.

BCL2 transcription is suppressed by P53 expression.

The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 …

……

negative regulation532 inhibited, 252 inhibition, 218 inhibit, 207 blocked, 175

inhibits, 157 decreased, 156 reduced, 112 suppressed, 108 decrease, 86 inhibitor, 81 Inhibition, 68 inhibitors, 67 abolished, 66 suppress, 65 block, 63 prevented, 48 suppression, 47 blocks, 44 inhibiting, 42 loss, 39 impaired, 38 reduction, 32 down-regulated, 29 abrogated, 27 prevents, 27 attenuated, 26 repression, 26 decreases, 26 down-regulation, 25 diminished, 25 downregulated, 25 suppresses, 22 interfere, 21 absence, 21 repress ……

Page 34: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Bottleneck: Annotated Examples

GENIA (BioNLP Shared Task 2009-2013)

1999 abstracts

MeSH: human, blood cell, transcription factor

Challenge for “supervised” machine learning

Can we breach this bottleneck?

34

Page 35: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Free Lunch #1:

Distributional Similarity

Similar context Probably similar meaning

Annotation as latent variables

Textual expression Recursive clusters

Unsupervised semantic parsing

35

Poon & Domingos, “Unsupervised Semantic Parsing”.

EMNLP 2009. Best Paper Award.

Page 36: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Recursive Clustering

36

TP53 inhibits BCL2.

Tumor suppressor P53 down-regulates the activity of BCL-2 proteins.

BCL2 transcription is suppressed by P53 expression.

The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 …

……

Page 37: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Recursive Clustering

37

TP53 inhibits BCL2.

Tumor suppressor P53 down-regulates the activity of BCL-2 proteins.

BCL2 transcription is suppressed by P53 expression.

The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 …

……

Page 38: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Recursive Clustering

38

TP53 inhibits BCL2.

Tumor suppressor P53 down-regulates the activity of BCL-2 proteins.

BCL2 transcription is suppressed by P53 expression.

The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 …

……

Page 39: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Recursive Clustering

39

TP53 inhibits BCL2.

Tumor suppressor P53 down-regulates the activity of BCL-2 proteins.

BCL2 transcription is suppressed by P53 expression.

The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 …

……

BCL2, BCL-2 proteins,

B-cell CLL/Lymphoma 2

……

TP53,Tumor

suppressor P53

……

inhibits, down-regulates,

suppresses, inhibition, …

Theme Cause

Page 40: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Free Lunch #2:

Existing KBs

Many KBs available

Gene/Protein: GeneBank, UniProt, …

Pathways: NCI, Reactome, KEGG, BioCarta, …

Annotation as latent variables

Textual expression Table, column, join, …

Grounded semantic parsing

40

Page 41: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Relation Extraction

41

Regulation Theme Cause

Positive A2M FOXO1

Positive ABCB1 TP53

Negative BCL2 TP53

… … …

NCI-PID

Pathway KB

TP53 inhibits BCL2.

Tumor suppressor P53 down-regulates the activity of BCL-2 proteins.

BCL2 transcription is suppressed by P53 expression.

The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 …

……

Page 42: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Relation Extraction

42

Regulation Theme Cause

Positive A2M FOXO1

Positive ABCB1 TP53

Negative BCL2 TP53

… … …

NCI-PID

Pathway KB

TP53 inhibits BCL2.

Tumor suppressor P53 down-regulates the activity of BCL-2 proteins.

BCL2 transcription is suppressed by P53 expression.

The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 …

……

Distant Supervision

Page 43: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

43

Involvement of p70(S6)-kinase activation in IL-10

up-regulation in human monocytes by gp41 envelope

protein of human immunodeficiency virus type 1 ...

Involvement

up-regulation

IL-10human

monocyte

SiteTheme Cause

gp41 p70(S6)-kinase

activation

Theme Cause

Theme

Nested Events

REGULATION

REGULATION REGULATION

PROTEINPROTEINPROTEINCELL

Page 44: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

GUSPEE

Generalize distant supervision to

extracting nested events

Prior: Favor semantic parse grounded in KB

Outperformed 19 out of 24 participants in

GENIA Shared Task [Kim et al. 2009]

44

Parikh, Poon, Toutanova. “Grounded Semantic Parsing for

Complex Knowledge Extraction”, NAACL-15.

Page 45: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

45

GUSPEE

Semantic parser for pathway extraction

Page 46: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Tree HMM

46

Page 47: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Tree HMM

47

Page 48: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Expectation Maximization

48

Virtual EvidenceKey challenge:

non-local evidence

Page 49: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Syntax-Semantics Mismatch

49

Page 50: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Syntax-Semantics Mismatch

50

Page 51: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Syntax-Semantics Mismatch

51

Page 52: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Best Supervised System

52

Page 53: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Preliminary Results

53

Page 54: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

54

Prototype-Driven Learning

Page 55: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

55

Prototype-Driven LearningOutperformed 19 out of 24 supervised participants

Page 56: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Incomplete KB

56

Page 57: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

http://literome.azurewebsites.net

57

Literome

Poon et al., “Literome: PubMed-Scale Genomic Knowledge

Base in the Cloud”, Bioinformatics-14.

Page 58: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

PubMed-Scale Extraction

Preliminary pass:

1.5 million instances

13,000 genes, 838,000 unique regulations

Applications:

UCSC Genome Browser, MSR Interactions Track

Expression profile modeling

Validate de novo pathway prediction

Etc.

58

Poon, Toutanova, Quirk, “Distant Supervision for Cancer

Pathway Extraction from Text”. PSB-15.

Page 59: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Machine Science

59

Evans & Rzhetsky, “Machine Science”.

Science, Vol. 329, 2010.

Page 60: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Machine Science

60

Big Data

Page 61: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Machine Science

61

Big Data Rich Knowledge

KB

Page 62: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Machine Science

62

Deep Model

Big Data Rich Knowledge

KB

Page 63: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Machine Science

63

Deep Model

Big Data Rich Knowledge

Hypotheses

KB

Page 64: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Machine Science

64

Deep Model

Big Data Rich Knowledge

Hypotheses

Experiments

KB

Page 65: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Machine Science

65

Deep Model

Big Data Rich Knowledge

Hypotheses

Experiments

KB

Page 66: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Roadmap

Extract richer knowledge:

Cell type, experimental condition, …

Hedging, negation, …

Formulate coherent models:

Supporting evidence, contradiction, …

Intellectual gaps, hypotheses, …

Integrate w. data & experiments:

Cancer panomics Driver genes / pathways

Single-drug response Drug combo prioritization

66

Page 67: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

67

Berkeley

AMP Lab

OHSU

Microsoft

Research

Page 68: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

68

Berkeley

AMP Lab

OHSU

Microsoft

Research

Page 69: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Decision Support for

Clinical Genomics

69

Raw

Reads

Variant Call

RNA-Seq

Clinical Observation

Knowledge Graph

Decision Support

Clinicians

Literature

Page 70: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

70

Raw

Reads

Variant Call

RNA-Seq

Clinical Observation

Knowledge Graph

Decision Support

Clinicians

Literature

NLP

Decision Support for

Clinical Genomics

Page 71: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

71

Page 72: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

72

Page 73: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

73

Page 74: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

We Have Digitized Life

74

Page 75: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Next: Digitize Medicine

75

Knock down genes A, B, C → Cure

Pennisi, “The CRISPR Craze”.

Science, Vol. 341, 2013.

Page 76: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Summary

Precision medicine is the future

Cancer systems modeling

Graphical model: Pathways + Panomics data

Extract pathways from PubMed

Machine reading by grounded semantic parsing

Literome: KB for genomic medicine

76

Page 77: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Collaborators

77

Chicago: Andrey Rzhetsky, Kevin White

OHSU: Brian Drucker, Jeff Tyner

Berkeley AMP Lab: David Patterson

Wisconsin: Mark Craven, Anthony Gitter

UCSC: Max Haeussler, David Haussler

Microsoft Research: Chris Quirk, Kristina

Toutanova, David Heckerman, Scott Yih, Lucy

Vanderwende, Bill Bolosky, Ravi Pandya

Page 78: Machine Reading for Cancer Panomics · Problem: Hard to scale U.S. 2014: 1.6 million new cases, 585K deaths ... Machine Reading. 29 Involvement of p70(S6)-kinase activation in IL-10

Summary

78

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

……

……

Disease Genes

Drug Targets

……KBHigh-Throughput Data