Eric Proegler Oredev Interpreting Performance Testing Results
Genomic and epigenomic signatures for interpreting complex...
Transcript of Genomic and epigenomic signatures for interpreting complex...
![Page 1: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/1.jpg)
Genomic and epigenomic signatures for interpreting complex disease
Manolis Kellis
MIT Computer Science & Artificial Intelligence Laboratory Broad Institute of MIT and Harvard
![Page 2: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/2.jpg)
ATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAACTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTGATATGCTTTGCGCCGTCAAAGTTTTGAACGAGAAAAATCCATCCATTACCTTAATAAATGCTGATCCCAAATTTGCTCAAAGGAATCGATTTGCCGTTGGACGGTTCTTATGTCACAATTGATCCTTCTGTGTCGGACTGGTCTAATTACTTTAAATGTGGTCTCCATGTTGCACTCTTTTCTAAAGAAACTTGCACCGGAAAGGTTTGCCAGTGCTCCTCTGGCCGGGCTGCAAGTCTTCTGTGAGGGTGATGTACCATGGCAGTGGATTGTCTTCTTCGGCCGCATTCATTTGTGCCGTTGCTTTAGCTGTTGTTAAAGCGAATATGGGCCCTGGTTATCATATCCAAGCAAAATTTAATGCGTATTACGGTCGTTGCAGAACATTATGTTGGTGTTAACAATGGCGGTATGGATCAGGCTGCCTCTGTTTGGTGAGGAAGATCATGCTCTATACGTTGAGTTCAAACCGCAGTTGAAGGCTACTCCGTTTAAATTTCCGCAATTAAAAAACCATGAATAGCTTTGTTATTGCGAACACCCTTGTTGTATCTAACAAGTTTGAAACCGCCCCAACCAACTATAATTTAAGAGTGGTAGAAGTCACCAGCTGCAAATGTTTTAGCTGCCACGTACGGTGTTGTTTTACTTTCTGGAAAAGAAGGATCGAGCACGAATAAAGGTAATCTAAGAGTTCATGAACGTTTATTATGCCAGATATCACAACATTTCCACACCCTGGAACGGCGATATTGAATCCGGCATCGAACGGTTAACAAAGGCTAGTACTAGTTGAAGAGTCTCTCGCCAATAAGAAACAGGGCTTTAGTGTTGACGATGTCGCACAATCCTTGAATTGTTCTCGCGAAATTCACAAGAGACTACTTAACAACATCTCCAGTGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGCGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATCATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAATTGGGCAGCTGTCTATATGAATTATAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGCTTGGCAAGTTGCCAACTGACGAGATGCAGTAAAAAGAGATTGCCGTCTTGAAACTTTTTGTCCTTTTTTTTTTCCGGGGACTCTACGAACCCTTTGTCCTACTGATTAATTTTGTACTGAATTTGGACAATTCAGATTTTAGTAGACAAGCGCGAGGAGGAAAAGAAATGACAAAAATTCCGATGGACAAGAAGATAGGAAAAAAAAAAAGCTTTCACCGATTTCCTAGACCGGAAAAAAGTCGTATGACATCAGAATGAAATTTTCAAGTTAGACAAGGACAAAATCAGGACAAATTGTAAAGATATAATAAACTATTTGATTCAGCGCCAATTTGCCCTTTTCCATTCCATTAAATCTCTGTTCTCTCTTACTTATATGATGATTAGGTATCATCTGTATAAAACTCCTTTCTTAATTTCACTCTAAAGCATCCCATAGAGAAGATCTTTCGGTTCGAAGACATTCCTACGCATAATAAGAATAGGAGGGAATAATGCCAGACAATCTATCATTACATTAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACACAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCCACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCT
![Page 3: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/3.jpg)
ATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAACTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTGATATGCTTTGCGCCGTCAAAGTTTTGAACGAGAAAAATCCATCCATTACCTTAATAAATGCTGATCCCAAATTTGCTCAAAGGAATCGATTTGCCGTTGGACGGTTCTTATGTCACAATTGATCCTTCTGTGTCGGACTGGTCTAATTACTTTAAATGTGGTCTCCATGTTGCACTCTTTTCTAAAGAAACTTGCACCGGAAAGGTTTGCCAGTGCTCCTCTGGCCGGGCTGCAAGTCTTCTGTGAGGGTGATGTACCATGGCAGTGGATTGTCTTCTTCGGCCGCATTCATTTGTGCCGTTGCTTTAGCTGTTGTTAAAGCGAATATGGGCCCTGGTTATCATATCCAAGCAAAATTTAATGCGTATTACGGTCGTTGCAGAACATTATGTTGGTGTTAACAATGGCGGTATGGATCAGGCTGCCTCTGTTTGGTGAGGAAGATCATGCTCTATACGTTGAGTTCAAACCGCAGTTGAAGGCTACTCCGTTTAAATTTCCGCAATTAAAAAACCATGAATAGCTTTGTTATTGCGAACACCCTTGTTGTATCTAACAAGTTTGAAACCGCCCCAACCAACTATAATTTAAGAGTGGTAGAAGTCACCAGCTGCAAATGTTTTAGCTGCCACGTACGGTGTTGTTTTACTTTCTGGAAAAGAAGGATCGAGCACGAATAAAGGTAATCTAAGAGTTCATGAACGTTTATTATGCCAGATATCACAACATTTCCACACCCTGGAACGGCGATATTGAATCCGGCATCGAACGGTTAACAAAGGCTAGTACTAGTTGAAGAGTCTCTCGCCAATAAGAAACAGGGCTTTAGTGTTGACGATGTCGCACAATCCTTGAATTGTTCTCGCGAAATTCACAAGAGACTACTTAACAACATCTCCAGTGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGCGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATCATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAATTGGGCAGCTGTCTATATGAATTATAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGCTTGGCAAGTTGCCAACTGACGAGATGCAGTAAAAAGAGATTGCCGTCTTGAAACTTTTTGTCCTTTTTTTTTTCCGGGGACTCTACGAACCCTTTGTCCTACTGATTAATTTTGTACTGAATTTGGACAATTCAGATTTTAGTAGACAAGCGCGAGGAGGAAAAGAAATGACAAAAATTCCGATGGACAAGAAGATAGGAAAAAAAAAAAGCTTTCACCGATTTCCTAGACCGGAAAAAAGTCGTATGACATCAGAATGAAATTTTCAAGTTAGACAAGGACAAAATCAGGACAAATTGTAAAGATATAATAAACTATTTGATTCAGCGCCAATTTGCCCTTTTCCATTCCATTAAATCTCTGTTCTCTCTTACTTATATGATGATTAGGTATCATCTGTATAAAACTCCTTTCTTAATTTCACTCTAAAGCATCCCATAGAGAAGATCTTTCGGTTCGAAGACATTCCTACGCATAATAAGAATAGGAGGGAATAATGCCAGACAATCTATCATTACATTAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACACAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCCACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCT
Genes
Encode proteins
Regulatory motifs
Control gene expression
![Page 4: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/4.jpg)
Building systems-level views of genomes and disease
Goal: A systems-level understanding of genomes and gene regulation: • The regulators: Transcription factors, microRNAs, sequence specificities • The regions: enhancers, promoters, and their tissue-specificity • The targets: TFstargets, regulatorsenhancers, enhancersgenes • The grammars: Interplay of multiple TFs prediction of gene expression The parts list = Building blocks of gene regulatory networks Our tools: Comparative genomics & large-scale experimental datasets. • Evolutionary signatures for coding/non-coding genes, microRNAs, motifs • Chromatin signatures for regulatory regions and their tissue specificity • Activity signatures for linking regulators enhancers target genes • Predictive models for gene function, gene expression, chromatin state Integrative models = Define roles in development, health, disease
![Page 5: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/5.jpg)
Challenge: interpreting disease-associated variants
• GWAS, case-control,… reveal disease-associated variants Molecular mechanism, cell-type specificity, drug targets
• Challenges towards interpreting disease variants – Find ‘true’ causative SNP among many candidates in LD – Use ‘causal’ variant: predict function, pathway, drug targets – Non-coding variant: type of function, cell type of activity – Regulatory variant: upstream regulators, downstream targets
• This talk: genomics tools for addressing these challenges
CATGACTG CATGCCTG
Disease-associated variant (SNP/CNV/…)
Gene annotation (Coding, 5’/3’UTR, RNAs) Evolutionary signatures
Non-coding annotation Chromatin signatures
Roles in gene/chromatin regulation Activator/repressor signatures
Other evidence of function Signatures of selection (sp/pop)
![Page 6: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/6.jpg)
Recombination breakpoints Fa
mily
Inhe
rita
nce
Me vs. my brother
My dad Dad’s mom Mom’s dad
Hum
an a
nces
try
Dis
ease
risk
Genomics: Regions mechanisms drugs Systems: genes combinations pathways
Goal: Towards personal systems genomics
![Page 7: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/7.jpg)
Systems-level views of disease epigenomics • Evolutionary signatures gene/genome annotation
– High-resolution annotation: genes, RNAs, motif instances – Measuring selection within the human population
• Chromatin states for interpreting disease association – Annotate dynamic regulatory elements in multiple cell types – Activity-based linking of regulators enhancers targets
• Interpreting disease-associated sequence variants – Mechanistic predictions for individual top-scoring SNPs – Functional roles of 1000s of disease-associated SNPs
• Systematic manipulation of 2000+ human enhancers – Test effect of single-motif and single-nucleotide disruptions – Role of activator/repressor motifs, disease-associated SNPs
• Personal genomes/epigenomes in health and disease – Allele-specific activity.Alzheimer’sbrain methylationSNP – Global repression of distal enhancers. NRSF, ELK1, CTCF
![Page 8: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/8.jpg)
Large-scale comparative genomics datasets 29 mammals 17 fungi 12 flies
8 Candida
9 Yeasts
Post
-dup
licat
ion
Dip
loid
H
aplo
id
Pre-
dup
P
P
P
P
P
P
N
N
Kellis Nature 2003 Nature 2004; Stark Nature 2007; Clark Nature 2007; Butler Nature 2009; Lindblad-Toh Nature 2011
![Page 9: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/9.jpg)
Comparative genomics and evolutionary signatures
• Comparative genomics can reveal functional elements – For example: exons are deeply conserved to mouse, chicken, fish – Many other elements are also strongly conserved: exons / regulatory?
• Can we also pinpoint specific functions of each region? Yes!
– Patterns of change distinguish different types of functional elements – Specific function Selective pressures Patterns of mutation/inse/del
• Develop evolutionary signatures characteristic of each function Kellis Nature 2003 Nature 2004; Stark Nature 2007; Clark Nature 2007; Butler Nature 2009; Lindblad-Toh Nature 2011
![Page 10: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/10.jpg)
Evolutionary signatures for diverse functions Protein-coding genes - Codon Substitution Frequencies - Reading Frame Conservation
RNA structures - Compensatory changes - Silent G-U substitutions
microRNAs - Shape of conservation profile - Structural features: loops, pairs - Relationship with 3’UTR motifs
Regulatory motifs - Mutations preserve consensus - Increased Branch Length Score - Genome-wide conservation
Stark et al, Nature 2007
![Page 11: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/11.jpg)
Implications for genome annotation / regulation Novel protein-coding genes Revised gene annotations Unusual gene structures
Novel structural families Targeting, editing, stability Riboswitches in mammals
Novel/expanded miR families miR/miR* arm cooperation Sense/anti-sense miR switches
Novel regulatory motifs Regulatory motif instances TF/miRNA regulatory networks Single binding site resolution
Stark et al, Nature 2007
![Page 12: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/12.jpg)
Translational read-through in human & fly
Protein-coding conservation
Continued protein-coding conservation
No more conserv
Stop codon read through
2nd stop codon
Jungreis, Genome Research 2011
Overlapping selection in human exons
Reveal splicing signals, RNA structures, enhancer motifs, dual-coding genes
Synonym. Substitut.
Rate
Lin, Genome Research 2011
RNA structure families: ortholog/paralog cons
Ex:MAT2A S-adeosyl-methionic level detection Structure/loop sequence deep conservation
Parker Gen. Res. 2011
Regions of codon-level positive selection
Distributed vs. localized positive selection Immunity/taste vs. retinal/bone/secretion
distributed
localized
Lindblad-Toh Nature 2011
![Page 13: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/13.jpg)
Measuring constraint at individual nucleotides
• Reveal individual transcription factor binding sites • Within motif instances reveal position-specific bias • More species: motif consensus directly revealed
NRSF motif
![Page 14: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/14.jpg)
Detect SNPs that disrupt conserved regulatory motifs
• Functionally-associated SNPs enriched in states, constraint • Prioritize candidates, increase resolution, disrupted motifs
![Page 15: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/15.jpg)
Measuring selection within the human lineage
![Page 16: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/16.jpg)
Human constraint outside conserved regions
• Non-conserved regions: – ENCODE-active regions
show reduced diversity Lineage-specific constraint in
biochemically-active regions
• Conserved regions: – Non-ENCODE regions
show increased diversity Loss of constraint in human
when biochemically-inactive
Average diversity (heterozygosity) Aggregate over the genome
Active regions
![Page 17: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/17.jpg)
Strongest: motifs, short RNA, Dnase, ChIP, lncRNA
• Significant derived allele depletion in active features
![Page 18: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/18.jpg)
Bound motifs show increased human constraint
Position-specific reduction in bound motif heterozygosity Aggregate across thousands of CTCF motif instances
![Page 19: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/19.jpg)
Most constrained human-specific enhancer functions
Regulatory genes: Transcription, Chromatin, Signaling. Developmental enhancers: embryo, nerve growth
Transcription initiation from Pol2 promoter Transcription coactivator activity
Transcription factor binding Chromatin binding
Negative regulation of transcription, DNA-dependent Transcription factor complex
Protein complex Protein kinase activity
Nerve growth factor receptor signaling pathway Signal transducer activity
Protein serine/threonine kinase activity Negative regulation of transcription from Pol2 prom
Protein tyrosine kinase activity In utero embryonic development
![Page 20: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/20.jpg)
Systems-level views of disease epigenomics • Evolutionary signatures gene/genome annotation
– High-resolution annotation: genes, RNAs, motif instances – Measuring selection within the human population
• Chromatin states for interpreting disease association – Annotate dynamic regulatory elements in multiple cell types – Activity-based linking of regulators enhancers targets
• Interpreting disease-associated sequence variants – Mechanistic predictions for individual top-scoring SNPs – Functional roles of 1000s of disease-associated SNPs
• Systematic manipulation of 2000+ human enhancers – Test effect of single-motif and single-nucleotide disruptions – Role of activator/repressor motifs, disease-associated SNPs
• Personal genomes/epigenomes in health and disease – Allele-specific activity.Alzheimer’sbrain methylationSNP – Global repression of distal enhancers. NRSF, ELK1, CTCF
![Page 21: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/21.jpg)
Chromatin signatures for genome annotation
Ernst et al Nature Biotech 2010
See also: Amos Tanay, Bill Noble.
2. Histone modifications
3. DNA accessibility
1. DNA methylation
Epigenomic maps
![Page 22: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/22.jpg)
ENCODE: Study nine marks in nine human cell lines 9 human cell types 9 marks
H3K4me1
H3K4me2
H3K4me3
H3K27ac
H3K9ac
H3K27me3
H4K20me1
H3K36me3
CTCF
+WCE
+RNA
HUVEC Umbilical vein endothelial
NHEK Keratinocytes
GM12878 Lymphoblastoid
K562 Myelogenous leukemia
HepG2 Liver carcinoma
NHLF Normal human lung fibroblast
HMEC Mammary epithelial cell
HSMM Skeletal muscle myoblasts
H1 Embryonic
x
81 Chromatin Mark Tracks (281 combinations)
Ernst et al, Nature 2011
• Learned jointly across cell types (virtual concatenation)
• State definitions are common
• State locations are dynamic
Brad Bernstein ENCODE Chromatin Group
![Page 23: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/23.jpg)
Chromatin states dynamics across nine cell types
• Single annotation track for each cell type • Summarize cell-type activity at a glance • Can study 9-cell activity pattern across
Predicted linking
Correlated activity
![Page 24: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/24.jpg)
Link enhancers to target genes
Introducing multi-cell activity profiles
HUVEC NHEK GM12878 K562 HepG2
NHLF
HMEC
HSMM
H1
Gene expression
Chromatin States
Active TF motif enrichment
ON OFF
Active enhancer Repressed
Motif enrichment Motif depletion
TF regulator expression
TF On TF Off
Dip-aligned motif biases
Motif aligned Flat profile
![Page 25: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/25.jpg)
Enhancer-gene links supported by eQTL-gene links
25
-1.4
3.2
4.4
-1.8
1.1
3.1
-1.8
-1.5
-0.5 Indiv. 1 Indiv. 2 Indiv. 3 Indiv. 4 Indiv. 5 Indiv. 6 Indiv. 7 Indiv. 8 Indiv. 9
Sequence variant at distal position
A A A C A A A C C
…
Example: Lymphoblastoid (GM) cells study
• Expression/genotype across 60 individuals (Montgomery et al, Nature 2010)
• 120 eQTLs are eligible for enhancer-gene linking based on our datasets
• 51 actually linked (43%) using predictions 4-fold enrichment (10% exp. by chance)
Individuals
… …
Expression level of gene
15kb
• Independent validation of links.
• Relevance to disease datasets.
Validation rationale:
• Expression Quantitative Trait Loci (eQTLs) provide independent SNP-to-gene links
• Do they agree with activity-based links?
eQTL study
![Page 26: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/26.jpg)
Visualizing 10,000s predicted enhancer-gene links
• Overlapping regulatory units, both few and many • Both upstream and downstream elements linked • Enhancers correlate with sequence constraint
26
![Page 27: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/27.jpg)
Link TFs to target enhancers Predict activators vs. repressors
Introducing multi-cell activity profiles
HUVEC NHEK GM12878 K562 HepG2
NHLF
HMEC
HSMM
H1
Gene expression
Chromatin States
Active TF motif enrichment
ON OFF
Active enhancer Repressed
Motif enrichment Motif depletion
TF regulator expression
TF On TF Off
Dip-aligned motif biases
Motif aligned Flat profile
![Page 28: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/28.jpg)
Ex2: Gfi1 repressor of K562/GM cells
Ex1: Oct4 predicted activator of embryonic stem (ES) cells
Coordinated activity reveals activators/repressors
• Enhancer networks: Regulator enhancer target gene
Activity signatures for each TF Enhancer activity
![Page 29: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/29.jpg)
Causal motifs supported by dips & enhancer assays
29
Dip evidence of TF binding (nucleosome displacement)
Enhancer activity halved by single-motif disruption
Motifs bound by TF, contribute to enhancers
Tarjei Mikkelsen
Predicted causal HNF motifs (that also showed dips)
in HepG2 enhancers
![Page 30: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/30.jpg)
Systems-level views of disease epigenomics • Evolutionary signatures gene/genome annotation
– High-resolution annotation: genes, RNAs, motif instances – Measuring selection within the human population
• Chromatin states for interpreting disease association – Annotate dynamic regulatory elements in multiple cell types – Activity-based linking of regulators enhancers targets
• Interpreting disease-associated sequence variants – Mechanistic predictions for individual top-scoring SNPs – Functional roles of 1000s of disease-associated SNPs
• Systematic manipulation of 2000+ human enhancers – Test effect of single-motif and single-nucleotide disruptions – Role of activator/repressor motifs, disease-associated SNPs
• Personal genomes/epigenomes in health and disease – Allele-specific activity.Alzheimer’sbrain methylationSNP – Global repression of distal enhancers. NRSF, ELK1, CTCF
![Page 31: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/31.jpg)
Genotype Disease GWAS
Interpret variants using Epigenomics - Chromatin states: Enhancers, promoters, motifs
- Enrichment in individual loci, across 1000s of SNPs in T1D
Interpreting disease-association signals
CATGACTG CATGCCTG
Epigenome changes in disease
![Page 32: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/32.jpg)
xx
• Disease-associated SNPs enriched for enhancers in relevant cell types • E.g. lupus SNP in GM enhancer disrupts Ets1 predicted activator
Revisiting disease- associated variants
![Page 33: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/33.jpg)
Mechanistic predictions for top disease-associated SNPs
Disrupt activator Ets-1 motif Loss of GM-specific activation Loss of enhancer function Loss of HLA-DRB1 expression
Erythrocyte phenotypes in K562 leukemia cells Lupus erythromatosus in GM lymphoblastoid
`
Creation of repressor Gfi1 motif Gain K562-specific repression Loss of enhancer function Loss of CCDC162 expression
![Page 34: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/34.jpg)
Allele-specific chromatin marks: cis-vs-trans effects
• Maternal and paternal GM12878 genomes sequenced • Map reads to phased genome, handle SNPs indels • Correlate activity changes with sequence differences
![Page 35: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/35.jpg)
HaploReg: systematic ENCODE mining of variants (compbio.mit.edu/HaploReg)
• Start with any list of SNPs or select a GWA study – Mine publically available ENCODE data for significant hits – Hundreds of assays, dozens of cells, conservation, motifs – Report significant overlaps and link to info/browser
![Page 36: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/36.jpg)
Functional enrichment for 1000s of SNPs
![Page 37: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/37.jpg)
Full T1D association spectrum 1000s of causal SNPs
GM12878 Lymphoblastoid
K562 Myelogenous leukemia
• Rank all SNPs by P-value • Find chromatin states with
enrichment in high ranks • Signal spans 1000s of SNPs
GM12878 enhancer enrichment now seen
Cell type specific: GM and K562 enhancers Chromatin state specific: Enhancers/promoters
Could bias in array design contribute to these enrichments? Evaluate all 1000 genomes SNPs by imputing those in LD
![Page 38: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/38.jpg)
Imputing SNPs in LDstronger cell/state separation
• Excess of 30,000 SNPs2049 enhancers (excess 392) • Mostly found in independent loci (1730 with R2<0.2) Systematically measure their regulatory contributions
Enhancers across cell types Chromatin states in GM12878
Enhancers: 2049 (excess 392) 1940 distinct loci (R^2<.8)
Promoters: 462 (excess 81)
Transcribed: 4740 (excess 522)
Repressed: 1351 (excess 76)
Insulator: 240 (excess 23)
Other: 21k (deplete 1093)
![Page 39: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/39.jpg)
Systems-level views of disease epigenomics • Evolutionary signatures gene/genome annotation
– High-resolution annotation: genes, RNAs, motif instances – Measuring selection within the human population
• Chromatin states for interpreting disease association – Annotate dynamic regulatory elements in multiple cell types – Activity-based linking of regulators enhancers targets
• Interpreting disease-associated sequence variants – Mechanistic predictions for individual top-scoring SNPs – Functional roles of 1000s of disease-associated SNPs
• Systematic manipulation of 2000+ human enhancers – Test effect of single-motif and single-nucleotide disruptions – Role of activator/repressor motifs, disease-associated SNPs
• Personal genomes/epigenomes in health and disease – Allele-specific activity.Alzheimer’sbrain methylationSNP – Global repression of distal enhancers. NRSF, ELK1, CTCF
![Page 40: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/40.jpg)
High-throughput experiments: 10,000s enhancers
• Experiment features: – Multiplexed enhancer assays – 10,000s of elements – Each w/ unique barcode – Multiple human cell types – Repeat experiments on same
array / diff barcodes • Applied to:
– Test enhancer offsets – Test causal motifs
• With: Tarjei Mikkelse – Broad Institute, ARRA funds – See also: Barak Cohen,
Jay Shendure, Eran Segal Melnikov, Nature Biotech 2012
![Page 41: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/41.jpg)
Systematic motif disruption for 5 activators and 2 repressors in 2 human cell lines
54000+ measurements (x2 cells, 2x repl)
![Page 42: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/42.jpg)
Example activator: conserved HNF4
motif match WT expression
specific to HepG2
Non-disruptive changes maintain
expression
Motif match disruptions reduce
expression to background
Random changes depend on effect to motif match
![Page 43: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/43.jpg)
Results hold across 2000+ enhancers
• Scramble abolishes reporter expression
• Neutral mutations show no change
• Increasing mutations show more expression
• However, only 40% show wild-type expression: context?
![Page 44: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/44.jpg)
Features of functional wildtype enhancers
• Nucleosome exclusion, motif conservation, other TFs
• Each of these features is encoded in primary sequence
![Page 45: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/45.jpg)
Repressors of HepG2 enhancer act in K562
Repressor disruption aberrant expression in opposite cell types
![Page 46: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/46.jpg)
Testing effect of SNP change in enhancer constructs
• SNPs in enhancer regions can lead to expression changes in downstream reporter genes
• Currently testing all T1D-associated enhancer SNPs
![Page 47: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/47.jpg)
Systems-level views of disease epigenomics • Evolutionary signatures gene/genome annotation
– High-resolution annotation: genes, RNAs, motif instances – Measuring selection within the human population
• Chromatin states for interpreting disease association – Annotate dynamic regulatory elements in multiple cell types – Activity-based linking of regulators enhancers targets
• Interpreting disease-associated sequence variants – Mechanistic predictions for individual top-scoring SNPs – Functional roles of 1000s of disease-associated SNPs
• Systematic manipulation of 2000+ human enhancers – Test effect of single-motif and single-nucleotide disruptions – Role of activator/repressor motifs, disease-associated SNPs
• Personal genomes/epigenomes in health and disease – Allele-specific activity.Alzheimer’sbrain methylationSNP – Global repression of distal enhancers. NRSF, ELK1, CTCF
![Page 48: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/48.jpg)
Genotype Disease GWAS
(1) Interpret variants using Epigenomics - Chromatin states: Enhancers, promoters, motifs
- Enrichment in individual loci, across 1000s of SNPs in T1D
Interpreting disease-association signals
CATGACTG CATGCCTG
(2) Epigenome changes in disease - Intermediate molecular phenotypes associated with disease
- Variation in brain methylomes of Alzheimer’s patients
mQTLs MWAS Epigenome
![Page 49: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/49.jpg)
Phil de Jager: Methylation in 750 Alzheimer patients
500,000 methylation
probes
750 individuals
• Patients followed for 10+ years with cognitive evaluations • Brain samples donated post-mortem methylation/genotype • Seek predictive features: SNPs, QTLs, mQTLs, regulation
Phil de Jager, Roadmap disease epigenomics
Brad Bernstein REMC mapping
Genome Epigenome
meQTL
Phenotype
Epigenome Classification MWAS
1 2
![Page 50: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/50.jpg)
2,500 mQTLs for neighboring SNPs at 10-14
• Overlay Manhattan plots of 450,000 methylation probes
• Cutoff of 10-14 (10-8 after Bonferroni correction)
• Use to pinpoint disrupted motifs, predict epigenome 50
Chromosome and genomic position
P-va
lue
expo
nent
(-lo
g 10P)
Distance from CpG (MB) -1 1
![Page 51: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/51.jpg)
Focusing on 2831 most variable probes Probe intensity distribution
Inte
r-in
divi
dual
var
iabi
lity
• Hemi-methylated probes are also the most variable
• Tiny fraction (0.6%) of all probes
• Promoters: Stable low (active)
• Gene bodies: Stable high (active)
• Enhancers/poised: Most variable
![Page 52: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/52.jpg)
138,731
184 2,647
Multimodal probes (~3Κ)
SNP-associated probes (29% of all)
1 Active promoter
2 Promoter flanking
3 Active enhancer
4 Weak enhancer
5 Gene bodies
6 Active gene bodies
7 Repetitive
8 Heterochromatin
9 Low signal
% of CpG probes
MultimodalSNP-associatedPromoter-depleted
• SNP-associated probes depleted in promoters (driven epigenetically>genetically, open chrom)
SNP-associated All probes
• 93.5% of multimodal probes are SNP-associated
• Importance of distinguishing contribution of genotype to disease associations
![Page 53: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/53.jpg)
Phil de Jager: Methylation in 750 Alzheimer patients
500,000 methylation
probes
750 individuals
• Patients followed for 10+ years with cognitive evaluations • Brain samples donated post-mortem methylation/genotype • Seek predictive features: SNPs, QTLs, mQTLs, regulation
Phil de Jager, Roadmap disease epigenomics
Brad Bernstein REMC mapping
Genome Epigenome
meQTL
Phenotype
Epigenome Classification MWAS
1 2
![Page 54: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/54.jpg)
Global hyper-methylation trend in AD-associated probes
Alzheimer’s Normal
Alzheimer’s Normal
Hypomethylated probes (active)
Hypermethylated probes (repressed) Alzheimer’s-associated probes are hypermethylated
480,000 probes, ranked by Alzheimer’s association
P-v
alue
M
ethy
latio
n
Top 7000 probes
• Global effect across 1000s of probes – Rank all probes by Alzheimer’s association – Observe functional changes down ranklist – 7000 probes show shift in methylation
Complex disease: genome-wide effects, 1000s of loci
![Page 55: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/55.jpg)
Chromatin state breakdown reveals ↓ activity
* => fisher exact test, p-value <= 0.001
% p
robe
s
1 A
ctiv
e pr
omot
er
2 Pr
omot
er
flank
ing
3 A
ctiv
e en
hanc
er
4 W
eak
enha
ncer
5 G
ene
bodi
es
6 A
ctiv
e ge
ne
bodi
es
7 Re
petit
ive
8 H
eter
ochr
omat
in
9 Lo
w s
igna
l
Red: More methylated in Alhzeimer’s Blue: Less methylated in Alzheimer’s
Significant probes are in enhancers Not promoters
![Page 56: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/56.jpg)
Alzheimer’s prediction vs. likely biological pathways
Predictive power: 6k probes + APOE
Regulatory motifs associated with Alzheimer-associated probes
suggest potential pathways
CTCF
NRSF
ELK1
We have not solved Alzheimer’s, but new insights gained
All probes, ranked by AD assoc. P-value
All probes, ranked by AD assoc. P-value
![Page 57: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/57.jpg)
Systems-level views of disease epigenomics • Evolutionary signatures gene/genome annotation
– High-resolution annotation: genes, RNAs, motif instances – Measuring selection within the human population
• Chromatin states for interpreting disease association – Annotate dynamic regulatory elements in multiple cell types – Activity-based linking of regulators enhancers targets
• Interpreting disease-associated sequence variants – Mechanistic predictions for individual top-scoring SNPs – Functional roles of 1000s of disease-associated SNPs
• Systematic manipulation of 2000+ human enhancers – Test effect of single-motif and single-nucleotide disruptions – Role of activator/repressor motifs, disease-associated SNPs
• Personal genomes/epigenomes in health and disease – Allele-specific activity.Alzheimer’sbrain methylationSNP – Global repression of distal enhancers. NRSF, ELK1, CTCF
![Page 58: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/58.jpg)
Goal: A systems-level understanding of genomes and gene regulation: • The regulators: Transcription factors, microRNAs, sequence specificities • The regions: enhancers, promoters, and their tissue-specificity • The targets: TFstargets, regulatorsenhancers, enhancersgenes • The grammars: Interplay of multiple TFs prediction of gene expression The parts list = Building blocks of gene regulatory networks
CATGACTG CATGCCTG
Disease-associated variant (SNP/CNV/…)
Gene annotation (Coding, 5’/3’UTR, RNAs) Evolutionary signatures
Non-coding annotation Chromatin signatures
Roles in gene/chromatin regulation Activator/repressor signatures
Other evidence of function Signatures of selection (sp/pop)
Understanding human variation and human disease
• Challenge: from loci to mechanism, pathways, drug targets
![Page 59: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/59.jpg)
Collaborators and Acknowledgements
• ENCODE – Brad Bernstein, Tarjei Mikkelsen,
Noam Shoresh, David Epstein • Massively parallel enhancer reporter assays
– Tarjei Mikkelsen, Broad Institute • Epigenome Roadmap
– Bing Ren, Brad Bernstein, John Stam, Joe Costello • 2X mammals
– Kerstin Lindblad-Toh, Eric Lander, Manuel Garber, Or Zuk • Funding
– NHGRI, NIH, NSF Sloan Foundation
![Page 60: Genomic and epigenomic signatures for interpreting complex ...compbio.mit.edu/slides/ManolisKellis_TeachersProgram.pdf · Genomic and epigenomic signatures for interpreting complex](https://reader033.fdocument.pub/reader033/viewer/2022052717/5f045ddb7e708231d40da038/html5/thumbnails/60.jpg)
Daniel Marbach
Mike Lin
Jason Ernst
Jessica Wu
Rachel Sealfon
Pouya Kheradpour
(#187) Manolis Kellis
Chris Bristow
Loyal Goff
Irwin Jungreis
MIT Computational Biology group Compbio.mit.edu
Sushmita Roy
#331: Luke Ward
Stata4 Stata3
Louisa DiStefano Dave
Hendrix
Angela Yen
Ben Holmes Soheil
Feizi Mukul Bansal
#19:Bob Altshuler
Stefan Washietl
Matt Eaton