Approaches for the Integration of Visual and Computational Analysis of Biomedical Data

Post on 11-Feb-2017

165 views 3 download

Transcript of Approaches for the Integration of Visual and Computational Analysis of Biomedical Data

Approaches for the Integration of Visual and Computational Analysis of Biomedical Data

HARVARD MEDICAL SCHOOL DEPARTMENT OF BIOMEDICAL INFORMATICS

NILS GEHLENBORG

@nils_gehlenborg

http://gehlenborglab.org

FRITZ LEKSCHAS HARVARD MEDICAL SCHOOL

BIG PILES OF DATA …

Data Repositories

general specialized

ArrayExpress GEO

Metabolights PRIDE

dbGAP …

ENCODE Roadmap Epigenomics

… OFFER OPPORTUNITIES …

SINGLE OR FEW DATA SETS

Test hypotheses without generating new data.

Use published data as supporting evidence for findings based on our your own data sets.

MANY DATA SETS

Conduct meta analyses, e.g. characterize expression patterns in human tissues or to link diseases.

M. Lukk, et al., Nature Biotechnology, 28(4):322–324 (2010)

S. Suthram et al.,PLoS Computational Biology 6(2)(2010)

SINGLE OR FEW DATA SETS

Test hypotheses without generating new data.

Use published data as supporting evidence for findings based on our your own data sets.

MANY DATA SETS

Conduct meta analyses, e.g. characterize expression patterns in human tissues or to link diseases.

COMMON BEHAVIOR OF RESEARCH PARASITES!

N Gehlenborg et al. , manuscript in preparation

!

!

|

DATA REPOSITORY

VISUALIZATION TOOLS

ANALYSIS PIPELINES

N Gehlenborg et al. , manuscript in preparation

!

!

|

DATA REPOSITORY

VISUALIZATION TOOLS

ANALYSIS PIPELINES

! ANALYSIS PIPELINES

N Gehlenborg et al. , manuscript in preparation

!

!

|

DATA REPOSITORY

VISUALIZATION TOOLS

ANALYSIS PIPELINES!

! ANALYSIS PIPELINES

N Gehlenborg et al. , manuscript in preparation

!

!

|

DATA REPOSITORY

VISUALIZATION TOOLS

ANALYSIS PIPELINES

GALAXY! Toolshed

Workflow Editor

Tools

REST API

! ANALYSIS PIPELINES

N Gehlenborg et al. , manuscript in preparation

!

!

|

DATA REPOSITORY

VISUALIZATION TOOLS

ANALYSIS PIPELINES

GALAXY! Toolshed

Workflow Editor

Tools

REST API

Workflow Inputs

Workflow Outputs

N Gehlenborg et al. , manuscript in preparation

!

!

|

DATA REPOSITORY

VISUALIZATION TOOLS

ANALYSIS PIPELINES

http://www.refinery-platform.org

… BUT NOT SO FAST!

Z

Text-Based Search

Data Sets

Metadata

Data Files

X Y

Ontologies

Z

A1

X Y

Z

A2A3A4

X Y

Z- -

K K K K

L M L M

Free Text

AnnotationMapping

K

L, M

X, Y

Z

X YZX Y

Terminal

Root

subc

lass

of

Keywords

Z

Text-Based Search

Data Sets

Metadata

Data Files

X Y

Ontologies

Z

A1

X Y

Z

A2A3A4

X Y

Z- -

K K K K

L M L M

Free Text

AnnotationMapping

K

L, M

X, Y

Z

X YZX Y

Terminal

Root

subc

lass

of

Keywords

Z

Text-Based Search

Data Sets

Metadata

Data Files

X Y

Ontologies

Z

A1

X Y

Z

A2A3A4

X Y

Z- -

K K K K

L M L M

Free Text

AnnotationMapping

K

L, M

X, Y

Z

X YZX Y

Terminal

Root

subc

lass

of

Keywords

Z

Text-Based Search

Data Sets

Metadata

Data Files

X Y

Ontologies

Z

A1

X Y

Z

A2A3A4

X Y

Z- -

K K K K

L M L M

Free Text

AnnotationMapping

K

L, M

X, Y

Z

X YZX Y

Terminal

Root

subc

lass

of

Keywords

Z

Text-Based Search

Data Sets

Metadata

Data Files

X Y

Ontologies

Z

A1

X Y

Z

A2A3A4

X Y

Z- -

K K K K

L M L M

Free Text

AnnotationMapping

K

L, M

X, Y

Z

X YZX Y

Terminal

Root

subc

lass

of

Keywords

X

Semantic VisualExploration

YZ

Text-Based Search

Data Sets

Metadata

Data Files

X Y

Ontologies

Z

A1

X Y

Z

A2A3A4

X Y

Z- -

K K K K

L M L M

Free Text

AnnotationMapping

K

L, M

X, Y

Z

X YZX Y

SATORI

Terminal

Root

subc

lass

of

Keywords

YX

Z

Z

X

SATORI: A System for Ontology-Guided Visual Exploration of Biomedical Data Repositories

http://satori.refinery-platform.org

Data set

Repository

Collection of interest

Data Analyst Group Leader Data Curator

Data set

Repository

Collection of interest

Data Analyst Group Leader Data Curator

Data set

Repository

Collection of interest

Data Analyst Group Leader Data Curator

Data set

Repository

Collection of interest

Data Analyst Group Leader Data Curator

Need 1 find data sets that match certain experimental characteristics.

Need 2 find data sets that are similar (or dissimilar) to given data sets.

Need 3 get an overview of the distribution of the experimental characteristics across a collection of data sets.

Need 4 get an overview of the annotation term hierarchy and term usage.

Peter Pirolli and Stu Card

SATORI: A System for Ontology-Guided Visual Exploration of Biomedical Data Repositories

http://satori.refinery-platform.org

C A B

C

List graphB C

B

Tree

Tree map A

A B

C

Data sets

BC

BC

BC

CB

CB

A B

C

Scenario 1:

Scenario 2:

Scenario 3:

AnnotationsTerm

1 2 3 4

SATORI: A System for Ontology-Guided Visual Exploration of Biomedical Data Repositories

http://satori.refinery-platform.org

SATORI: A System for Ontology-Guided Visual Exploration of Biomedical Data Repositories

http://satori.refinery-platform.org

The Art Institute of Chicago

HARVARD MEDICAL SCHOOL

JOHANNES KEPLER UNIVERSITY LINZ Stefan Luger, Holger Stitz, Marc Streit

Web http://satori.refinery-platform.org · http://refinery-platform.org

AcknowledgementsPeter J Park & all members of the Computational Genomics Lab Fritz Lekschas, Jennifer K Marx, Scott Ouellette, Anton Xue, Psalm Haseley

HARVARD SCHOOL OF PUBLIC HEALTH Ilya Sytchev, Shannan Ho Sui

UNIVERSITY OF SHEFFIELD David R Jones, Winston Hide

Funding NIH/NHGRI R00 HG007583, Harvard Stem Cell Institute

We are hiring postdocs & developers!

HARVARD MEDICAL SCHOOL DEPARTMENT OF BIOMEDICAL INFORMATICS

See http://gehlenborglab.org or http://dbmi.med.harvard.edu for details.

Data visualization, analysis, and management for: • genomic structural variants • dynamics of the 3D genome • cancer subtypes in patient cohorts • exploration tools for data repositories • provenance graphs

X

B

A

D

A

X XX Term Terminal term To be deleted

AA

X To be duplicated

A A

C

ABA

C

B

C'

0 0 00 5 5 5 5

0 5

1 5

5 10 5 10

Term size Cumulative sizeX1 2

2 7

2 7

1 5

D

C

F D

C

F

F'

1. Global 2. Tree Map 3. Node-Link Diagram

5 10

1 5 1 105 5

0 10

G G

BB

B

C

C

C E EA'C