20140711 3 t_clark_ercc2.0_workshop

28
FIND MEANING IN COMPLEXITY © Copyright 2012 by Pacific Biosciences of California, Inc. All rights reserved. Tyson Clark 7/11/14 Single Molecule, Real-Time Sequencing of Full-length cDNA Transcripts

Transcript of 20140711 3 t_clark_ercc2.0_workshop

Page 1: 20140711 3 t_clark_ercc2.0_workshop

FIND MEANING IN COMPLEXITY © Copyright 2012 by Pacific Biosciences of California, Inc. All rights reserved.

Tyson Clark 7/11/14

Single Molecule, Real-Time Sequencing of Full-length cDNA Transcripts

Page 2: 20140711 3 t_clark_ercc2.0_workshop

Single Molecule, Real-Time (SMRT) DNA Sequencing

PacBio RS II

Page 3: 20140711 3 t_clark_ercc2.0_workshop

P5-C3 Sequencing Chemistry

Page 4: 20140711 3 t_clark_ercc2.0_workshop

Transcript Diversity

Page 5: 20140711 3 t_clark_ercc2.0_workshop

Current State of Transcript Assembly

“The  way  we  do  RNA-seq now is…  you  take  the  transcriptome,  you blow it up into pieces and then you try to figure out how they all go back together again…    If  you  think  about  it,  it’s  kind of a crazy way to do things”.

Michael Snyder Stanford University

Tal Nawy (2013) End-to-end RNA sequencing,

Nature Methods 10: 1144–1145

Ian Korf (2013) Genomics: the state of the art in RNA-seq analysis. Nature Methods 10: 1165-1166.

Page 6: 20140711 3 t_clark_ercc2.0_workshop

PacBio Iso-Seq for High-quality, Full-length Transcripts

PolyA mRNA AAAAA

AAAAA

AAAAA

AAAAA

cDNA synthesis with adapters

AAAAA TTTTT

AAAAA TTTTT

AAAAA TTTTT

AAAAA TTTTT

AAAAA TTTTT

AAAAA TTTTT

AAAAA TTTTT

AAAAA TTTTT

Size partitioning & PCR amplification

SMRTbell ligation

PacBio RS II Sequencing

Experimental Pipeline

Informatics Pipeline

Remove adapters Remove artifacts

Clean sequence

reads

Reads clustering

Isoform clusters

Consensus calling

Nonredundant transcript isoforms

Quality filtering

Final isoforms PacBio raw sequence

reads

Raw 5’  primer 3’  primer

Map to reference genome

Experimental pipeline Informatics pipeline

PacBio raw sequence reads

Figure 1

a b

AAAA

AAAA

AAAAAAAAAA

AAAAAAAAAAAAAAA

Size partitioning &PCR amplification

cDNA synthesiswith adapters

SMRTbell ligation

RS sequencing

Remove adaptersRemove artifacts

Reads clustering

Quality filtering

Cleansequence reads

Nonredundant transcript isoforms

Final isoforms

TTTT

TTTT

Consensus calling

Isoform clusters

Map to reference genome

Evidence-based gene models

polyA mRNA

AAAA

AAAA

TTTT

TTTT

AAAATTTT

AAAATTTT

AAAATTTT

AAAATTTT

Evidenced-based gene models

(AAA)n

(TTT)n

SMRT adapter

1 2 3 4 5

6 7 8 9 10

(TTT)n

(AAA)n

5’  UTR Coding sequence 3’  

UTR polyA

tail

SMRT adapter

https://github.com/PacificBiosciences/cDNA_primer/

(AAA)n Reads of Insert (AAA)n

Page 7: 20140711 3 t_clark_ercc2.0_workshop

Detailed Clontech workflow for conversion of cDNA into SMRTbell libraries

7

polyA+ RNA

Total RNA

Optional Poly-A Selection

Reverse Transcription

Full Length 1st Strand cDNA

PCR Optimization

Large Scale Amplification

Amplified cDNA

1-2 kb

2-3 kb

3-6 kb

Size Selection (Blue Pippin or Gel)

1-2 kb

2-3 kb

3-6 kb

Re-Amplification

1-2 kb

2-3 kb

3-6 kb

SMRTbell Template Preparation

1-2 kb

2-3 kb

3-6 kb

SMRT Sequencing

3-6 kb

Optional Size Selection (Blue Pippin)

Page 8: 20140711 3 t_clark_ercc2.0_workshop

Brain Amplified cDNA – Testing PCR Enzymes

8

Phusion Kapa Hifi SeqAmp

Page 9: 20140711 3 t_clark_ercc2.0_workshop

Brain Amplified cDNA (zoom)

9

Phusion Kapa Hifi SeqAmp

Page 10: 20140711 3 t_clark_ercc2.0_workshop

2nd Amplification (after Blue Pippin size selection)

10

4000

2000

1250

800 500

Brain

1-2

kb

2-3

kb

3-6

kb

5-10

kb

6-10

kb

8-12

kb

10-1

5 kb

Kapa Polymerase

Page 11: 20140711 3 t_clark_ercc2.0_workshop

2nd Amplification (after Blue Pippin size selection)

11

4000

2000

1250

800 500

Heart

1-2

kb

2-3

kb

3-6

kb

5-10

kb

8-12

kb

Liver

1-2

kb

2-3

kb

3-6

kb

5-10

kb

Kapa Polymerase

Page 12: 20140711 3 t_clark_ercc2.0_workshop

Amplified cDNA from Multiple Human Tissues

12

Brain Heart Liver

Page 13: 20140711 3 t_clark_ercc2.0_workshop

SageELF

13

Page 14: 20140711 3 t_clark_ercc2.0_workshop

Brain Amplifed cDNA – Size Selected

14

M 12 11 10 9 8 7 6 5 4 3 2 1 800-

1600

1600

-270

0

2700

-480

0

4800

-800

0

3000 1500

800 500 300

100

SageELF BluePippin

Kapa Polymerase

Page 15: 20140711 3 t_clark_ercc2.0_workshop

15

SageELF – 12 size bins (Amplified cDNA)

Page 16: 20140711 3 t_clark_ercc2.0_workshop

SageELF – 12 size bins (Amplified cDNA)

16

Page 17: 20140711 3 t_clark_ercc2.0_workshop

Brain cDNA – ELF Size Selected – 2nd Amplification

17

Page 18: 20140711 3 t_clark_ercc2.0_workshop

Actual FL Lengths from each ELF Fraction

18

ELF 12 (400 bp) Actual: 181 - 266 bp

ELF 11 (550 bp) Actual: 370 - 480 bp

ELF 10 (800 bp) Actual: 617 – 727 bp

(25 percentile – 75 percentile)

Page 19: 20140711 3 t_clark_ercc2.0_workshop

Actual FL Lengths from each ELF Fraction

19

ELF 9 (1.2 kb) Actual: 955 – 1113 bp

ELF 8 (1.5 kb) Actual: 1355 – 1544 bp

ELF 7 (1.8 kb) Actual: 1800 – 2033 bp

Page 20: 20140711 3 t_clark_ercc2.0_workshop

Actual FL Lengths from each ELF Fraction

20

ELF 6 (2.5 kb) Actual: 2398 – 2737 bp

ELF 5 (3 kb) Actual: 3193 – 3574 bp

ELF 4 (4 kb) Actual: 2127 – 4664 bp

Page 21: 20140711 3 t_clark_ercc2.0_workshop

Actual FL Lengths from each ELF Fraction

21

ELF 3 (5.5 kb) Actual: 1342 – 6075 bp

ELF 2 (7 kb) Actual: 1229 – 7446 bp

Page 22: 20140711 3 t_clark_ercc2.0_workshop

ELF 1 (9 kb) 180 min Actual: 1295 – 1814 bp

Actual FL Lengths from each ELF Fraction

Page 23: 20140711 3 t_clark_ercc2.0_workshop

Summarizing ELF for Size Selection

ELF Lane # Actual FL range ELF12-400bp 181 - 266 bp

ELF11-500bp 370 - 480 bp

ELF10-800bp 617 - 727 bp

ELF9-1.2kb 955 - 1113 bp

ELF8-1.5kb 1355 - 1544 bp

ELF7-1.8kb 1800 - 2033 bp

ELF6-2.5kb 2398 - 2737 bp

ELF5-3kb 3193 - 3574 bp

ELF4-4kb 2127 - 4664 bp

ELF3-5.5kb 1342 - 6075 bp

ELF2-7kb 1229 - 7446 bp

ELF1-9kb 1295 - 1814 bp

The Good: 1. One run, 12 fractions 2. Finer size fractions (~ 200 bp) 3. 100 bp – 10 kb spread

The Not-Good-Yet: 1. > 4 kb gets small inserts competing To Work On: 1. New beta machine 2. Combining fractions

Page 24: 20140711 3 t_clark_ercc2.0_workshop

Targeted Sequencing

24

Page 25: 20140711 3 t_clark_ercc2.0_workshop

Targeted Sequencing

25

Page 26: 20140711 3 t_clark_ercc2.0_workshop

Targeted Sequencing

26

Page 27: 20140711 3 t_clark_ercc2.0_workshop

ERCC 2.0 Controls (from the PacBio perspective)

• Long Transcripts (>10kb, if possible)

• Transcript Isoforms that span size bins

• Complex alternative splicing patterns

• Diversity of GC contents

27

Page 28: 20140711 3 t_clark_ercc2.0_workshop

Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, and SMRTbell are trademarks of Pacific Biosciences in the United States and/or other countries. All other trademarks are the sole property of their respective owners.