High Throughput Sequencing Technologies: What We Can Know
-
Upload
brian-krueger -
Category
Science
-
view
407 -
download
1
description
Transcript of High Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies:
What We Can Know
Brian Krueger, PhDDuke University
Center for Human Genome Variation
2nd Generation Sequencing Overview
Align reads to a reference genome
Fragmented DNA
Ligate Adaptors
Add Bases
ImageCleave
Wash Wash
Bind Library and create clusters
Sequencing Cycle
Repeat Hundreds of times on billions of
clusters
Genomic DNA
2nd Generation Sequencing Advances
• V3 System Chemistry– 300GB per Flowcell– 11 Days to Data– Genome: $4700, Exome: $790
• V4 System Chemistry– 600GB per Flowcell– 6 Days to Data– Genome: $3000, Exome: $640
• X System Chemistry– 1GB per Patterned Flowcell– 3 Days to Data– Genome: $1500, Exome: $500
Techniques for Acquiring Data
• Whole Genome Sequencing– Obtain whole blood or tissue sample– Create sequencing libraries of all DNA fragments
• Whole Exome Sequencing– Utilizes a selection protocol to fish out ONLY
coding DNA sequences– Create sequencing libraries from enriched DNA– Reduces cost and analysis time
• Custom Capture– Same protocol as Exome sequencing– Only target desired DNA sequences
• Amplicon Sequencing– Use PCR to amplify target DNA– Sequence amplified DNA (Amplicon)
• RNA-Seq– Extract RNA, capture mRNA, convert to cDNA– Used for differential gene expression analyses,
RNA isoform detection
Chromosome
Common DNA MutationsCommon DNA MutationsSequence
vari
ants
Str
uct
ura
l vari
ants
Single nucleotide variant
Small insertion
Small deletion
Deletion
Translocation
Reference
A B C DATCGGGTCATGTCA
ATCGGGTCATATCA
A B C D
ATCGGGTCATGACGTCA
A B C D
ATCGGGTCAT
A B C D
A C D
A B GE
Duplication
A B C DC
Inversion A B
D C
F
Credit: Elizabeth Ruzzo, PhD, CHGV
Disadvantages of Current Techniques
• Amplification errors– All polymerases have an inherent error rate (10-6-10-7)
• GC bias– PCR bias against GC rich sequences– Exome capture bias against GC rich sequences
• Trouble detecting small insertions and deletions– Capture baits may not hybridize well– Capture cannot be used to reliably detect large CNVs
• Cannot be used for De novo assembly– Read length too short to span long repeat regions– Not good for detecting trinucleotide repeat
expansions • Miss large structural variations
– Translocations and inversions likely will be missed– Require significant read depth at break points for
these variations to be detected• Trouble with RNA-seq isoform detection
– Like large structural variations, hard to accurately detect all splice isoforms using short read technology
A
CD
GE FA
A B C DB B
A B C DB B BB B
A B C DBB B
X
X
Solutions!
• Solutions for many of these problems exist– As always, come at a cost
• Whole Genome Sequencing - $1500– Reduce Exome Artifacts
• Better Indel Detection and higher coverage in high GC regions
• Can be used to detect large copy number variations
• PCR Free Whole Genome Sequencing– Reduces amplification bias and polymerase
error artifacts• WGS will miss large structural variations
(Inversions, Translocations, microsatellites)– Combine with long read technologies– Added cost of $1000-$10,000– Higher cost = better detection
Long-ish Read Sequencing Technologies
• Mate-Pair Sequencing– Insert size increased from 300bp to 3-8KB– Sequence ends of mate-pairs to pair reads
over much longer distances– Use short reads to fill gaps– Adds $1000 to Genome cost
Long-ish Read Sequencing Technologies
• Illumina Synthetic Long Reads– Fragment Genomic DNA to 10KB– Dilute across a 384 well plate– Fragment clonal 10KB fragments into
300bp fragments and barcode – Sequence fragments and use barcodes to
re-create the long reads synthetically– Use as a short read scaffold to perform De
Novo sequencing– Has been used in HLA sequencing and De
Novo assembly of the Drosophila genome including accurate mapping of 80% of the transposable elements
– Adds $1800 to Genome cost
10kb fragmentation
Barcoding and clonal amp
Nextera prep
Sequencing
True Long Read Sequencing Technologies
• Defined as single molecule sequencing• Less complex sample prep and much longer read length
(1-100kb) compared to 200-400bp for 2nd Gen• Two categories
– Sequencing by synthesis• Pioneered by Pacific Biosciences• Sequencer uses super microscopes and polymerase bound
nanowells to WATCH DNA as it is sequenced in real time• Nanowells filled with DNA bases• Fluorescence of base only detected at the polymerase
– Direct sequencing by passing DNA through a nanopore• Bases fed through a membrane bound nanopore• Ionic difference between both sides of the membrane• Detect how ion flow changes at the pore as each base passes
through• Oxford Nanopore, Base4, Stratos Genomics, Genia
• Bleeding edge technology– Many technical hurdles with very high error rates (10-40%)– Current best use is to create scaffolds for De Novo assembly– Very expensive technology
• Costs 3-10x as much as Illumina to do whole genome sequencing
PacBio
Oxford Nanopore
Questions??
• Reading/Viewing Material:• Sequencing Methods Ecosystem -
http://res.illumina.com/documents/products/research_reviews/sequencing-methods-review.pdf
• Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly repetitive transposable elements - http://biorxiv.org/content/early/2014/01/19/001834
• Characterization of the human ESC transcriptome by hybrid sequencing - http://www.pnas.org/content/110/50/E4821.short
• Nanopore Sequencing Web Conference - http://www.youtube.com/watch?v=UtXlr19xTh8