20140613 Analysis of High Throughput DNA Methylation Profiling
-
Upload
yi-feng-chang -
Category
Presentations & Public Speaking
-
view
359 -
download
3
Transcript of 20140613 Analysis of High Throughput DNA Methylation Profiling
Analysis Of High-throughput DNA Methylation Profiling
張益峯Yi-Feng ChangPhD Candidate, Biomedical Informatics NYMU
Outline
DNA methylation
The fundamental of bisulfite sequencing technology
Current status of published BS-Seq resources
Information could be presented in a BS-Seq study
Published tools for analyzing BS-Seq data
A comprehensive BS-Seq analysis tool: MethPipe
2
Epigenetics Overview
3http://commonfund.nih.gov/epigenomics/figure.aspx
DNA Methylation
4
Singal, R. & Ginder, G.D. DNA Methylation. Blood Journal 93, 4059-4070 (1999).
DNA Methylation Pathway
5
Moore, L.D., Le, T. & Fan, G. DNA methylation and its basic function. Neuropsychopharmacology 38, 23-38 (2013).
Singal, R. & Ginder, G.D. DNA Methylation. Blood Journal 93, 4059-4070 (1999).
DNA Demethylation Pathway
6Moore, L.D., Le, T. & Fan, G. DNA methylation and its basic function. Neuropsychopharmacology 38, 23-38 (2013).
• Tet: Ten-eleven translocation enzymes
• AID/ APOBEC: activation-induced cytidine
deaminase/apolipo- protein B mRNA-editing
enzyme complex
• TDG: Thymine DNA glycosylase
• SMUG1: Single-strand-selective
monofunctional uracil-DNA glycosylase 1
• 5mC: 5-Methylcytosine
• 5hmC: 5-hydroxymethyl-cytosine
• 5hmU: 5-hydroxymethyl-uracil
• 5fC: 5-formyl-cytosine
• 5caC: 5-carboxy-cytosine
Timeline of DNA Methylation Analysis
7Harrison, A. & Parle-McDermott, A. DNA methylation: a timeline of methods and applications. Front Genet 2, 74 (2011).
MS-HRM
MeDIP-Seq
BS-Seq
MethylC-Seq
TAB-Seq
Bisulfite Sequencing Technology
The Steps to Determining the Methylation Status of Cytosine in a Known DNA Sequence by The Bisulfite Conversion Method
9Singal, R. & Ginder, G.D. DNA Methylation. Blood Journal 93, 4059-4070 (1999).
Techniques for Enrichment of Methylated or Target Regions Prior to BS Sequencing
10Lister, R. & Ecker, J.R. Finding the fifth base: genome-wide sequencing of cytosine methylation. Genome Res 19, 959-66 (2009).
Genomic DNA
Deep Sequencing
Harrison, A. & Parle-McDermott, A. DNA
methylation: a timeline of methods and
applications. Front Genet 2, 74 (2011).
Techniques for Genome-Wide Sequencing of Cytosine Methylation Sites
11
Lister, R. & Ecker, J.R. Finding the fifth base: genome-wide sequencing of cytosine methylation. Genome Res 19, 959-66 (2009).
Genomic DNA
Deep Sequencing
TAB-Seq: Tet-Assisted Bs-Seq
Yu, M. et al. Tet-assisted bisulfite sequencing of 5-hydroxymethylcytosine. Nat Protoc 7, 2159-70 (2012).Yu, M. et al. Base-resolution analysis of 5-hydroxymethylcytosine in the mammalian genome. Cell 149, 1368-80 (2012).
Genomic Coverage of MeDIP-seq, MethylCap-seq, RRBS and Infinium
12Bock, C. et al. Quantitative comparison of genome-wide DNA methylation mapping technologies. Nat Biotechnol 28, 1106-14 (2010).
MeDIP-seq and MethylCap-seq provide broad cover- age of the genome, whereas RRBS and Infinium are more restricted to CpG islands and promoter regions
Key Metrics of the Technology Comparison
13Beck, S. Taking the measure of the methylome. Nat Biotechnol 28, 1026-8 (2010).
Sequencing Coverages of NGS Platforms
14Sims, D., Sudbery, I., Ilott, N.E., Heger, A. & Ponting, C.P. Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet 15, 121-32 (2014).
Purified gDNA
5 mg End RepairFragmentation
3’ End AdenylationMethylated
adapter ligation
Fragment size
selection 200-400 bp
Whole Genome Bisulfite Sequencing Library Construction
Purify ligation
product
Library preparation using
PE sample prep kit
200-250 bp
250-300 bp
300-350 bp
Bisulfite
conversion
Zymo EZ DNA
Methylation Kit
(Qiagen EpiTec Kit)
C C
C U
Purify
3 separate tubes
for each library3 libraries
PCR, 4 to 8
cycles
PfuTurbo Cx
Hotstart DNA
polymerase
Purify
Validate library
15陽明大學榮陽基因體研究中心
Whole Genome Bisulfite Sequencing Library Construction
16
回收 200-400 bp片段
純化 3-5 μg 基因體 DNA
修補端點(End repair)DNA 斷裂
3’ End
Adenylation
C-Methylated
adapter 黏合 純化黏合後序列
使用 PE sample prep kit 進行 Library preparation
200-250 bp
250-300 bp
300-350 bp
亞硫酸氫鹽處理(Bisulfite
conversion)
Zymo EZ DNA
Methylation Kit
(Qiagen EpiTec Kit)
C C
C U
純化
3 separate tubes
for each library3 libraries
PCR, 4 to 8
cycles
PfuTurbo Cx
Hotstart DNA
polymerase
純化
Validate library
定序
IVC (Intensity versus Cycle) Plot of Bisulfite Sequencing
17
Library size 250 bpPhiX control
45% GC
Read 1 Read 2
% Base % Intensity
29% GC
Library size 350 bp Library size 430 bp
40% GC
22% GC
Read 1 Read 2
% Intensity
定序到adapter
% Base
IVC (Intensity versus Cycle) Plot of Bisulfite Sequencing
18
PhiX control Library size 250 bp
45% GC
Read 1 Read 2
% Base % Intensity
29% GC
Library size 350 bp Library size 430 bp
40% GC
22% GC
Read 1 Read 2
% Intensity
Reading
into adapter
% Base
Library size 300 bp
Library size 400 bp
Library size 500 bp
Fragment Size Effects
19
PhiX control
Reading into adapter Genomic coverage will be uneven
Read length 2x75
bp
Amplification bias, bisulfite conversion bias, sequencing bias
DNA fragments size <
250 bp,
library size < 350 bp
(insert +121 bp)
Public BS-Seq Resources from MethBase
http://smithlabresearch.org/software/methbase/
20
Human
Acute Myeloid Leukemia
B Cells
BCell/Fibro/iPSC
Blood Cells from Different Ages
Brains (Chimp)
Breast Cancer
Buccal cells
Chronic Lymphocytic Leukemia
Colon Cancer
Colorectal Cancer and Adenomatous Polyp
Developing human brain
ENCODE RRBS Dataset
ESC Differentiation
Fetal Lung Fibroblasts
Fibroblasts
Hematopoietic Stem Cells (Chimp)
Induced Pluripotent Stem Cells
Leukocytes
LuWen-Brain-2014
Lymphoblastoid
Mutiple tissue
Neuroepithelium Cells
Neuronal Cells
Peripheral Blood Mononuclear Cells
Placenta, kidney, etc
Sperm (Chimp)
21
Mouse
5hmC in ESC
Aid Deficiency
Colon Epithelial Cells
Developing human brain
Early Embryo
Embryonic Fibroblasts
Embryonic Stem Cells and Neuroprogenitors
Frontal Cortex
Gamete and Early Embryo
Normal liver vs HCC (HBx TG mouse liver) GEO: GSE48052
Hematopoietic Cells, DNMT3A KO
Hematopoietic Cells, IDH1-R132H KI
Intestinal stem cell
Lung Tissue
mESC
mESC (Tet1)
Mouse B Lymphocyte
Multiple tissues (17)
Nucleus-transferred Zygotes
Oocyte
Oocytes and PreimplantationEmbryos
Primordial Germ Cells
22
Plant
Endosperm, embryo and aerial tissue
Floral and leaf (IDN mutant)
Floral buds methylome: C24 and Ler hybrid
IDM1 regulates active demethylation
Leaf: ATXR5/ATXR6 mutants
Leaf: spontaneous epimutation
Rossetes: spontaneous epimutation
Seedling: hybrid
23
Other Organisms (from NCBI GEO)
Glycine max (Soy beans)
Schistocerca gregaria (Locust)
Rattus norvegicus (Rat)
Danio rerio (Zebra fish)
Drosophila melanogaster (Fruit fly)
Oryza sativa (Rice)
Pan troglodytes (Chimp)
Macaca mulatta (Rhesus monkey)
Mus musculus domesticus (Western Europen house mouse)
Xenopus (Silurana) tropicalis (Frog)
Cynoglossus semilaevis (Tongue sole, bony fish)
Bombyx mori (Silkworm)
Harpegnathos saltator (Jerdon'sjumping ant)
Camponotus floridanus (Florida carpenter ant)
24
To access MethBase
25
http://smithlabresearch.org/software/methbase/
Information could be Presented in a BS-Seq Study
Sequencing depth
Coverage of Genome length
CpG sites
Bisulfite conversion rates Lambda virus DNA
CHG, CHH Sites (H = Not G = A, C, or T)
Statistics of methylation ratios of CpG, CHG, CHH
Methylation ratios of gene structures
Association with regulatory elements
Differential methylation region (DMR)
26
DNA Methylome Studies
27
Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomicdifferences. Nature 462, 315-22 (2009).
Cokus, S.J. et al. Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 452, 215-9 (2008).
Methylome only Methylome/Transcriptome
Contrast Studies
Hon, G.C. et al. Epigenetic memory at embryonic enhancers identified in DNA methylation maps from adult mouse tissues. Nat Genet 45, 1198-206 (2013).
Lister, R. et al. Global epigenomic reconfiguration during
mammalian brain development. Science 341, 1237905 (2013).28
17 Tissues
Human/Mouse Brain Development
Association with Regulatory Elements
29Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315-22 (2009).
Hon, G.C. et al. Epigenetic memory at embryonic enhancers identified in DNA methylation maps from adult mouse tissues. Nat Genet 45, 1198-206 (2013).
Differential methylation region (DMR)
30Hon, G.C. et al. Epigenetic memory at embryonic enhancers identified in DNA methylation maps from adult mouse tissues. Nat Genet 45, 1198-206 (2013).
DNA Methylome Analysis Using BS-Seq Data
31
Effect and Problems of Bisulfite Treatment of DNA
32
Krueger, F., Kreck, B., Franke, A. & Andrews, S.R. DNA methylomeanalysis using short bisulfite sequencing data. Nat Methods 9, 145-51 (2012).
Xi, Y. & Li, W. BSMAP: whole genome bisulfite sequence MAPpingprogram. BMC Bioinformatics 10, 232 (2009).
Mapping bisulfite reads to 4
possible bisulfite strands
(BSW/BSWR/BSC/BSCR) is
equivalent to mapping the
bisulfite read and its reverse
complementary read to both
Watson/Crick strands of the
original reference sequence.
How to Align BS Reads Against Reference Genome?
33Krueger, F. & Andrews, S.R. Bismark: A flexible aligner and methylation caller for Bisulfite-Seqapplications. Bioinformatics (2011).
. Bock, C. Analysing and interpreting DNA methylation data. Nat Rev Genet 13, 705-19 (2012)
Y=C or T
TCGA TCGT ACGTATGA
Multiple hits
TTGT ATGT
Multiple hits
Recommended Workflow for the Primary Analysis of BS-Seq data
34Krueger, F., Kreck, B., Franke, A. & Andrews, S.R. DNA methylome analysis using short bisulfite sequencing data. Nat Methods 9, 145-51 (2012).
http://omictools.com/bisulfite-seq/
Published Tools
35Bock, C. et al. Quantitative comparison of genome-wide DNA methylation mapping technologies. Nat Biotechnol 28, 1106-14 (2010).Krueger, F., Kreck, B., Franke, A. & Andrews, S.R. DNA methylome analysis using short bisulfite sequencing data. Nat Methods 9, 145-51 (2012).http://omictools.com/bisulfite-seq/
B-SOLANA Bisulphite aligner for processing bisulphite-sequencing color space data http://code.google.com/p/bsolana
BatMeth Base and color space data http://code.google.com/p/batmeth
Bicycle Lister et al. 2009 workflow http://sing.ei.uvigo.es/bicycle/howitworks.html
BiQ Analyzer HTLocus-specific analysis and visualization of high-throughput bisulfite sequencing data
http://biq-analyzer-ht.bioinf.mpi-inf.mpg.de
BiSeq DMR for RRBS data R/Bioconductor package BiSeq
BISMA Support analysis of repetitive sequences http://biochem.jacobs-university.de/BDPC/BISMA
BismarkProbably the most widely used three-letter bisulphite aligner; supports both Bowtie (fast, gap-free alignment) and Bowtie 2.0 (sensitive, gapped alignment)
http://www.bioinformatics.babraham.ac.uk/projects/bismark
Bis-SNPVariant caller for inferring DNA methylation levels and genomic variants from BS-Seq reads that have been aligned by other tools
http://epigenome.usc.edu/publicationdata/bissnp2011
Bisulfighter Using Last for mapping, HMM for DMR detection http://epigenome.cbrc.jp/bisulfighter
BRAT Highly configurable and well-documented three-letter BS-Seq aligner http://compbio.cs.ucr.edu/brat
BS-SeekerBS-Seeker 2
Three-letter BS-Seq aligner based on Bowtiehttp://pellegrini.mcdb.ucla.edu/BS_Seeker/BS_Seeker.html
BSMAP Probably the most widely used wild-card BS-Seq aligner http://code.google.com/p/bsmap
Bsmooth Mapping, quality control and DMR analysis pipeline http://rafalab.jhsph.edu/bsmooth
COHCAP Integration with gene expression data https://sourceforge.net/projects/cohcap/
CpG_MPs Methylation patterns of genomic regions http://202.97.205.78/CpG_MPs/
DMAP DMR for BS-Seq and RRBS datahttp://biochem.otago.ac.nz/research/databases-software/
DSS Bayesian hierarchical model to detect differentially methylated loci (DML) R/Bioconductor package DSS
Epidiff DMR detection http://bioinfo.hrbmu.edu.cn/epidiff
Published Tools (cont.)
36
GSNAP Wild-card BS-Seq aligner included in a widely used general-purpose alignment tool http://share.gene.com/gmap
GBSA Analysis pipeline for gene-centric or gene-independent focus http://ctrad-csi.nus.edu.sg/gbsa
FadE Mapping for Base and Color space http://code.google.com/p/fade
Kismeth Designed to be used with plants http://katahdin.mssm.edu/kismeth
LastRecent and well-validated wild-card BS aligner included in a general-purpose alignment tool
http://last.cbrc.jp
MethPipe Mapping, BS conversion rate, HMR, DMR pipeline http://smithlabresearch.org/software/methpipe
Methyl-MAPSMethyl-Analyzer
Base and color space data + post analysishttp://epigenomicspub.columbia.edu/methylanalyzer_data.html
MethylCoderThree-letter Bs-Seq aligner that can be used with either Bowtie (high speed) or GSNAP (high sensitivity)
https://github.com/brentp/methylcode
MethylExtract Detects variation http://bioinfo2.ugr.es/MethylExtract
MethylSig R package pipeline for BS-Seq and RRBS http://sartorlab.ccmb.med.umich.edu/software
MOABS DMR detection http://code.google.com/p/moabs
Pash Wild-card BS aligner included in a general-purpose alignment tool http://brl.bcm.tmc.edu/pash
RMAPRMAPBS
Wild-card BS aligner included in a general-purpose alignment toolhttp://www.cmb.usc.edu/people/andrewds/rmaphttp://smithlabresearch.org/software/methpipe
RRBSMAPVariant of BSMAP that is specialized on reduced-representation bisulphitesequencing (RRBS) data
http://rrbsmap.computational-epigenetics.org
SAAP-RRBS RRBS mappinghttp://ndc.mayo.edu/mayo/research/biostat/stand-alone-packages.cfm
segemehl Wild-card bisulphite aligner included in a general-purpose alignment tool http://www.bioinf.uni-leipzig.de/Software/segemehl
SOCS-B Robin-Karp hashin, color space data http://solidsoftwaretools.com/gf/project/socs
Bock, C. et al. Quantitative comparison of genome-wide DNA methylation mapping technologies. Nat Biotechnol 28, 1106-14 (2010).Krueger, F., Kreck, B., Franke, A. & Andrews, S.R. DNA methylome analysis using short bisulfite sequencing data. Nat Methods 9, 145-51 (2012).http://omictools.com/bisulfite-seq/
How to Select a BS-Seq Analysis Tool?
Actively update
Good supports from authors or communities BS-Seeker 2 Bismark
Post-analysis tools MethPipe
Kunde-Ramamoorthy, G. et al. Comparison and quantitative verification of mapping algorithms for whole-genome bisulfite sequencing. Nucleic Acids Res 42, e43 (2014) Bismark (Balanced speed and genome coverage) BSMAP (Low genome coverage) Pash (High genome coverage, slow)
37
MethPipe
38Allele-specific Methylated Regionsamrfinder allelicmeth
Differential Methylation Regiondmr
Large Hypo/Hyper-Methylation Domainspmd
Hypo/Hyper-Methylation Regionshmr hmr_plant pmr
Methylation Callingmethcounts
Bisulfite Conversion Ratebsrate
Remove Duplicate Readsduplicate-remover
Mappingrmapbs rmapbs-pe
Quality Trimmingfastq_masker
Cross-species Comparison of MethylomesliftOver
Calculating Methylation Ratio for Genomic RegionsbigWigAverageOverBed
roimethstat Bwtools
Generate Methylation BED fileBedtools bedGraphToBigWig
fastx toolkit: http://hannonlab.cshl.edu/fastx_toolkit/
MethPipe: http://smithlabresearch.org/software/methpipe/
Bedtools: https://github.com/arq5x/bedtools2
Programs from UCSC Genome Browser: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64
bwtool: https://github.com/CRG-Barcelona/bwtool/wiki
39
基因體技術與資料分析手冊陽明大學榮陽基因體研究中心出版
Analysis Of High-throughput DNA Methylation Profiling
DNA methylation
The fundamental of bisulfite sequencing technology
Current status of published BS-Seq resources
Information could be presented in a BS-Seq study
Published tools for analyzing BS-Seq data
A comprehensive BS-Seq analysis tool: MethPipe
Questions?
40