Deep learning based multi-omics integration, a survey

Deep Learning based Multi-omics integration

A survey

Deep Learning in Bioinformatics

Min, Seonwoo, Byunghan Lee, and Sungroh Yoon. "Deep learning in bioinformatics." Briefings in Bioinformat-ics (2016)

Outline• Summarize three related works on deep learning based

feature extraction / survival prediction on omics data• Unsupervised feature construction and knowledge extraction

from genome-wide assays of breast cancer with denoising au-toencoders• A deep learning approach for cancer detection and relevant

gene identification• Deep Learning based multi-omics integration robustly predicts

survival in liver cancer

Unsupervised feature construction and knowledge extraction from genome-wide assays of breast cancer with denoising autoencodersPacific Symposium on Biocomputing, 2015

Denoising Auto-Encoder (DAE)• Build features that recon-

struct initial input data from corrupted data• Generate robust features• Unsupervised learning• Extract features in the

non-linear space

Data• Two largest breast cancer dataset• Train DAs and identify predictive features with METABRIC dataset• 2137 samples, 3000 2520 genes• gene expression data from European Genomephenome Ar-

chive• Evaluate with TCGA dataset independently• 547 samples, 2520 genes

Features to clinical characteristics• Genes are not linked to their neigh-

bors• Genes are linked by transcription

factors, pathway memberships• Are constructed features linked to

clinical and molecular features of the samples?• Categorize tumor / normal samples• Categorize ER+/- samples• Categorize samples into molecular

subtypes(Luminal A/B, Basal-like, HER2-enriched, Normal-like)

Features to clinical characteristics• classifying tumor from

normal samples• classifying ER + from ER -

samples

Robust performance across datasets

Features to transcription factor• Breast cancer related transcription factors are linked to these

high-weight features (Node58)• It contained genes that reflect activity of key ER-associated TFs

Most genes gave zero or low weight to a hidden node

High positive weightHigh negative weight

Features to patient survival• Node whose activities best sepa-

rated two high / low survival groups (Node5)• Highly predictive of patient sur-

Features to Biological pathways• Pathways significantly associated with genes that con-

sistently gave high weights to a nodePID pathways enriched in Node5(5th fea-ture)

Summary• Unsupervised feature construction based on DAEs and

interpretation• Apply to a breast cancer gene expression data• Consistent results across different datasets• In the future..• Multiple layers of stacked DAEs• Consistency across datasets will useful for data integration• Limitations for large-scale data integration

A deep learning approach for cancer de-tection and relevant gene identificationPacific Symposium on Biocomputing, 2016

RNA-seqsamples

TCGAHealth

yCancer

Test Train

SDAE fea-turesDCGs

ModelValidation

weights

Overview

Supervised classification(cancer detection)

Highly interactive genes identification

1210 breast cancer samples

Stacked Denoising Auto-Encoder• Extract functional features from high dimensional, noisy gene ex-

pression profiles with reduced loss of information• Select a layer has both low dimension and low validation error

Classification result• Classify cancer samples from

healthy control samples• Feature extraction

• SDAE• Differentially expressed genes

(DIFFEXP)• PCA• KPCA (RBF kernel)

• Classification model• SVM• SVM (RBF kernel)• single-layer ANN

Deeply connected genes• Genes with the largest weights in W (the product of the

weight matrices for each layer) are the most strongly connected to the extracted and highly predictive fea-tures

But lower performance than SDAE feateures

Summary• SDAE to transform high-dimensional, noisy gene expression data to a

lower dimensional, meaningful representation• Classify breast cancer samples from the healthy control samples using

new compact features• Identify a set of highly interactive genes critical for the diagnosis of

breast cancer• In the future..

• Need to improve the extraction of DCGs• Limitation on the requirement for large data sets• Identify cross-cancer biomarkers through the analysis of aggregated heteroge-

neous cancer data

Deep Learning based multi-omics integra-tionrobustly predicts survival in liver cancerpreprint, 2017

360 tumor samples

15629 genes 365 miRNAs 19883 genes

100 features

37 features

high/poor survival

Why Autoencoders?• Produce features linked to

clinical outcomes• Analyze high-dimensional

gene expression data• Integrate heterogeneous

data• Interpret the biological func-

tions (aggregate genes shar-ing similar pathways)

Classification result

Single-omics based DL models

Validation in five cohorts• Robustness of the model at predicting survival out-

Adding clinical information• Age, Stage, Grade, Race, Risk factors (HBV, HCV, Alco-

hol, …)• DL-based multi-omics model performs sufficiently well

even without clinical features

Functional analysis of the survival-subgroups

• KEGG pathway analysis to pinpoint the pathways en-riched in two subtypes• Two subtypes have different

and disjoint active pathways

Enriched pathway-gene analysis for upregulated genes• S1 aggressive tu-

mor sub-group

• Enriched with can-cer related path-ways

Enriched pathway-gene analysis for upregulated genes• S2 less aggressive tu-

mor sub-group

• Activated metabolism related pathways

Summary• Contributions• Identified two subtypes from the molecular level• Consistent performance implying the reliability and robustness

of the model• Sufficient performance without adding clinical features• AE has much more efficiency to infer features linked to survival• Validated in five additional cohorts

• Challenges• The absence of cluster label information in original reports• Lack of survival data in some cases

Conclusion• Feature extraction with SDAE• Robust to noisy datasets• Extract meaningful features and reflect both linear and non-

linear relationships• Consistent performance, good for multi-omics integration

• Multi-omics integration• More sophisticated strategy to combine multiple features• May incorporate pathways, handle overlapping genes

Thank you!Q & A

Deep learning based multi-omics integration, a survey

Data & Analytics

Transcript of Deep learning based multi-omics integration, a survey

Amazon Redshift Integration Deep Dive

Integração de dados globais de análises omics (genômica, … · 2011-06-08 · Integração de dados globais de análises “omics”(genômica, transcriptômica e epigenômica)

VMworld 2014: VMware NSX and vCloud Automation Center Integration Technical Deep Dive

Omics Data Analysis Using SOP (Search of Omics Pathway) Web … · 2009-11-17 · Omics Data Analysis Using SOP (Search of Omics Pathway) Web Tool Seung Yong Lee 1, Jun-Sub Kim2,

Multi-omics Integration Analysis Robustly Predicts High-Grade … · Translational Cancer Mechanisms and Therapy Multi-omics Integration Analysis Robustly Predicts High-Grade Patient

About OMICS Group

VMWorld 2014 - VMware NSX and VCloud Automation Center Integration Technical Deep Dive

From deep IT-infrastructure to deep waters (Norwegian)

Achim Tresch Computational Biology ‘Omics’ - Analysis of high dimensional Data.

Multi-platform 'Omics Analysis of Human Ebola Virus ... fileCell Host & Microbe, Volume 22 Supplemental Information Multi-platform 'Omics Analysis of Human Ebola Virus Disease Pathogenesis

results.indiaresults.comresults.indiaresults.com/cg/sua/notification/pdf/ufm_23092017.pdfGeeta si h Renu Gu ta L avkesh Kumar Y Shanti E Omics - Il ... Rink C ind omics Il Sociolo

Oh the Deep Deep Love

Bezpečnosť kritickej infraštruktúry bez kompromisov! · Deep SCADA Understanding Unintrusive Passive Monitoring … our Answer is an Active Integration between SCADAguardian and

OMICS INSIGHTS INTO RUMEN UREOLYTIC BACTERIAL COMMUNITY ... thesis_Di Jin.pdf · OMICS INSIGHTS INTO RUMEN UREOLYTIC BACTERIAL COMMUNITY AND UREA ... Essai présenté ... Omics insights

Einführung ins Deep Learning · Einführung ins Deep Learning deep space computing AG 6. Dezember 2019 deep-space.ch

Integration Summit 16 - Hybrid Integration

Integrating Biology and New 'Omics to Guide Diet, Health ...

MULTIMODAL INTEGRATION FOR ROBOT SYSTEMS USING DEEP LEARNING · With regard to sensory feature extraction and multimodal integration learning mech-anisms, deep learning approaches

Thermo Fisher Connect Omics Comparator マニュアル...3 Omics Comparator とPathway Over-representationのはじめ方 Thermo Fisher Connectのホーム画面で「Omics Comparator

A Deep Dive into the rollout of Eligibility and integration with ETM University of Alabama & Centricity™ EDI Services April 30, 2015.