VMWorld 2014 - VMware NSX and VCloud Automation Center Integration Technical Deep Dive
Deep learning based multi-omics integration, a survey
-
Upload
so-yeon-kim -
Category
Data & Analytics
-
view
12 -
download
0
Transcript of Deep learning based multi-omics integration, a survey
Deep Learning based Multi-omics integration
A survey
Deep Learning in Bioinformatics
Min, Seonwoo, Byunghan Lee, and Sungroh Yoon. "Deep learning in bioinformatics." Briefings in Bioinformat-ics (2016)
Outline• Summarize three related works on deep learning based
feature extraction / survival prediction on omics data• Unsupervised feature construction and knowledge extraction
from genome-wide assays of breast cancer with denoising au-toencoders• A deep learning approach for cancer detection and relevant
gene identification• Deep Learning based multi-omics integration robustly predicts
survival in liver cancer
Unsupervised feature construction and knowledge extraction from genome-wide assays of breast cancer with denoising autoencodersPacific Symposium on Biocomputing, 2015
Denoising Auto-Encoder (DAE)• Build features that recon-
struct initial input data from corrupted data• Generate robust features• Unsupervised learning• Extract features in the
non-linear space
Data• Two largest breast cancer dataset• Train DAs and identify predictive features with METABRIC dataset• 2137 samples, 3000 2520 genes• gene expression data from European Genomephenome Ar-
chive• Evaluate with TCGA dataset independently• 547 samples, 2520 genes
Features to clinical characteristics• Genes are not linked to their neigh-
bors• Genes are linked by transcription
factors, pathway memberships• Are constructed features linked to
clinical and molecular features of the samples?• Categorize tumor / normal samples• Categorize ER+/- samples• Categorize samples into molecular
subtypes(Luminal A/B, Basal-like, HER2-enriched, Normal-like)
Features to clinical characteristics• classifying tumor from
normal samples• classifying ER + from ER -
samples
Robust performance across datasets
Features to transcription factor• Breast cancer related transcription factors are linked to these
high-weight features (Node58)• It contained genes that reflect activity of key ER-associated TFs
Most genes gave zero or low weight to a hidden node
High positive weightHigh negative weight
Features to patient survival• Node whose activities best sepa-
rated two high / low survival groups (Node5)• Highly predictive of patient sur-
vival
Features to Biological pathways• Pathways significantly associated with genes that con-
sistently gave high weights to a nodePID pathways enriched in Node5(5th fea-ture)
Summary• Unsupervised feature construction based on DAEs and
interpretation• Apply to a breast cancer gene expression data• Consistent results across different datasets• In the future..• Multiple layers of stacked DAEs• Consistency across datasets will useful for data integration• Limitations for large-scale data integration
A deep learning approach for cancer de-tection and relevant gene identificationPacific Symposium on Biocomputing, 2016
RNA-seqsamples
TCGAHealth
yCancer
Test Train
SDAE fea-turesDCGs
ModelValidation
weights
Overview
Supervised classification(cancer detection)
Highly interactive genes identification
1210 breast cancer samples
Stacked Denoising Auto-Encoder• Extract functional features from high dimensional, noisy gene ex-
pression profiles with reduced loss of information• Select a layer has both low dimension and low validation error
Classification result• Classify cancer samples from
healthy control samples• Feature extraction
• SDAE• Differentially expressed genes
(DIFFEXP)• PCA• KPCA (RBF kernel)
• Classification model• SVM• SVM (RBF kernel)• single-layer ANN
Deeply connected genes• Genes with the largest weights in W (the product of the
weight matrices for each layer) are the most strongly connected to the extracted and highly predictive fea-tures
But lower performance than SDAE feateures
….
Summary• SDAE to transform high-dimensional, noisy gene expression data to a
lower dimensional, meaningful representation• Classify breast cancer samples from the healthy control samples using
new compact features• Identify a set of highly interactive genes critical for the diagnosis of
breast cancer• In the future..
• Need to improve the extraction of DCGs• Limitation on the requirement for large data sets• Identify cross-cancer biomarkers through the analysis of aggregated heteroge-
neous cancer data
Deep Learning based multi-omics integra-tionrobustly predicts survival in liver cancerpreprint, 2017
360 tumor samples
15629 genes 365 miRNAs 19883 genes
100 features
37 features
high/poor survival
Why Autoencoders?• Produce features linked to
clinical outcomes• Analyze high-dimensional
gene expression data• Integrate heterogeneous
data• Interpret the biological func-
tions (aggregate genes shar-ing similar pathways)
Classification result
PCA
Classification result
Single-omics based DL models
Validation in five cohorts• Robustness of the model at predicting survival out-
comes
Adding clinical information• Age, Stage, Grade, Race, Risk factors (HBV, HCV, Alco-
hol, …)• DL-based multi-omics model performs sufficiently well
even without clinical features
Functional analysis of the survival-subgroups
• KEGG pathway analysis to pinpoint the pathways en-riched in two subtypes• Two subtypes have different
and disjoint active pathways
Enriched pathway-gene analysis for upregulated genes• S1 aggressive tu-
mor sub-group
• Enriched with can-cer related path-ways
Enriched pathway-gene analysis for upregulated genes• S2 less aggressive tu-
mor sub-group
• Activated metabolism related pathways
Summary• Contributions• Identified two subtypes from the molecular level• Consistent performance implying the reliability and robustness
of the model• Sufficient performance without adding clinical features• AE has much more efficiency to infer features linked to survival• Validated in five additional cohorts
• Challenges• The absence of cluster label information in original reports• Lack of survival data in some cases
Conclusion• Feature extraction with SDAE• Robust to noisy datasets• Extract meaningful features and reflect both linear and non-
linear relationships• Consistent performance, good for multi-omics integration
• Multi-omics integration• More sophisticated strategy to combine multiple features• May incorporate pathways, handle overlapping genes
Thank you!Q & A