Die Bedeutung der Warnow Interviews in Rostock 13. Matrikel Umwelt und Bildung 14.-16.05.2010.
Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf ·...
Transcript of Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf ·...
![Page 1: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/1.jpg)
Taxonomic iden,fica,on and phylogene,c profiling
Nam-‐phuong Nguyen Carl R. Woese Ins,tute for Genomic Biology University of Illinois at Urbana-‐Champaign
Joint work with Siavash Mirarab, Mihai Pop, and Tandy Warnow
![Page 2: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/2.jpg)
Metagenomics
Courtesy of Human Microbiome Project
• Culture-independent method for
studying a microbiome
• Extract genetic material directly from the environment
• Applications to biofuel production, agriculture, human health
• Sequencing technology produces
millions of short reads from unknown species
• Fundamental steps in analysis is
identifying taxa of read and estimating a population profile of a sample
![Page 3: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/3.jpg)
Taxonomic Identification and Profiling l Taxonomic identification
l Objective: Given a query sequence, identify the taxon (species, genus, family, etc...) of the sequence
l Classification problem
l Taxonomic profiling l Objective: Given a set of query sequences collected from a sample,
estimate the population profile of the sample
l Estimation problem
l Can be solved via taxonomic identification
![Page 4: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/4.jpg)
• Sequence similarity search • Classifies by finding most similar sequence
• Classifies fragments from any region of genome
• BLAST
• Composition-based methods • Typically uses k-mers
• Classifies fragments from any region of genome
• PhymmBL, NBC
• Phylogeny-based methods
• Classifies fragments by using a phylogeny
Taxonomic Identification Methods
![Page 5: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/5.jpg)
ACT..TAGA..A (species5)
AGC...ACA (species4)
TAGA...CTT (species3)
TAGC...CCA (species2)
AGG...GCAT (species1)
• ACCG • CGAG • CGG • GGCT • TAGA • GGGGG • TCGAG • GGCG • GGG • . • . • . • ACCT
(60-‐200 bp long)
Fragmentary Reads: Known full-‐length gene sequences, and an alignment and a tree
(500-‐10,000 bp long)
Phylogeny-based taxonomic identification
![Page 6: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/6.jpg)
ACT..TAGA..A (species5)
AGC...ACA (species4)
TAGA...CTT (species3)
TAGC...CCA (species2)
AGG...GCAT (species1)
• ACCG • CGAG • CGG • GGCT • TAGA • GGGGG • TCGAG • GGCG • GGG • . • . • . • ACCT
(60-‐200 bp long)
Fragmentary Reads: Known full-‐length gene sequences, and an alignment and a tree
(500-‐10,000 bp long)
Phylogeny-based taxonomic identification
![Page 7: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/7.jpg)
ACT..TAGA..A (species5)
AGC...ACA (species4)
TAGA...CTT (species3)
TAGC...CCA (species2)
AGG...GCAT (species1)
• ACCG • CGAG • CGG • GGCT • TAGA • GGGGG • TCGAG • GGCG • GGG • . • . • . • ACCT
(60-‐200 bp long)
Fragmentary Reads: Known full-‐length gene sequences, and an alignment and a tree
(500-‐10,000 bp long)
Phylogeny-based taxonomic identification
![Page 8: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/8.jpg)
ACT..TAGA..A (species5)
AGC...ACA (species4)
TAGA...CTT (species3)
TAGC...CCA (species2)
AGG...GCAT (species1)
• ACCG • CGAG • CGG • GGCT • TAGA • GGGGG • TCGAG • GGCG • GGG • . • . • . • ACCT
(60-‐200 bp long)
Fragmentary Reads: Known full-‐length gene sequences, and an alignment and a tree
(500-‐10,000 bp long)
Phylogeny-based taxonomic identification
![Page 9: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/9.jpg)
ACT..TAGA..A (species5)
AGC...ACA (species4)
TAGA...CTT (species3)
TAGC...CCA (species2)
AGG...GCAT (species1)
• ACCG • CGAG • CGG • GGCT • TAGA • GGGGG • TCGAG • GGCG • GGG • . • . • . • ACCT
(60-‐200 bp long)
Fragmentary Reads: Known full-‐length gene sequences, and an alignment and a tree
(500-‐10,000 bp long)
Phylogeny-based taxonomic identification
![Page 10: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/10.jpg)
ACT..TAGA..A (species5)
AGC...ACA (species4)
TAGA...CTT (species3)
TAGC...CCA (species2)
AGG...GCAT (species1)
• ACCG • CGAG • CGG • GGCT • TAGA • GGGGG • TCGAG • GGCG • GGG • . • . • . • ACCT
(60-‐200 bp long)
Fragmentary Reads: Known full-‐length gene sequences, and an alignment and a tree
(500-‐10,000 bp long)
Phylogeny-based taxonomic identification
![Page 11: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/11.jpg)
Phylogenetic Placement • Input: (Backbone) Alignment and tree on full-length
sequences and a query sequence (short read)
• Output: Placement of the query sequence on the backbone tree
• Use placement to infer relationship between query sequence and full-length sequences in backbone tree
• Applications in metagenomic analysis
• Millions of reads
• Reads from different genomes mixed together
• Use placement to identify read
![Page 12: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/12.jpg)
Align Sequence
S1
S4
S2
S3
S1 = -AGGCTATCACCTGACCTCCA-AA S2 = TAG-CTATCAC--GACCGC--GCA S3 = TAG-CT-------GACCGC--GCT S4 = TAC----TCAC--GACCGACAGCT Q1 = TAAAAC
![Page 13: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/13.jpg)
Align Sequence
S1
S4
S2
S3
S1 = -AGGCTATCACCTGACCTCCA-AA S2 = TAG-CTATCAC--GACCGC--GCA S3 = TAG-CT-------GACCGC--GCT S4 = TAC----TCAC--GACCGACAGCT Q1 = -------T-A--AAAC--------
![Page 14: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/14.jpg)
Place Sequence
S1
S4
S2
S3 Q1
S1 = -AGGCTATCACCTGACCTCCA-AA S2 = TAG-CTATCAC--GACCGC--GCA S3 = TAG-CT-------GACCGC--GCT S4 = TAC----TCAC--GACCGACAGCT Q1 = -------T-A--AAAC--------
![Page 15: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/15.jpg)
Phylogenetic Placement
l Align each query sequence to backbone alignment: l HMMALIGN (Eddy, Bioinformatics 1998) l PaPaRa (Berger and Stamatakis, Bioinformatics
2011)
l Place each query sequence into backbone tree, using extended alignment: l pplacer (Matsen et al., BMC Bioinformatics 2010) l EPA (Berger et al., Systematic Biology 2011)
![Page 16: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/16.jpg)
Phylogenetic Placement
l Align each query sequence to backbone alignment: l HMMALIGN (Eddy, Bioinformatics 1998) l PaPaRa (Berger and Stamatakis, Bioinformatics
2011)
l Place each query sequence into backbone tree, using extended alignment: l pplacer (Matsen et al., BMC Bioinformatics 2010) l EPA (Berger et al., Systematic Biology 2011)
![Page 17: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/17.jpg)
HMMER and PaPaRa results
Increasing rate evolution
0.0 Backbone size: 500 5000 fragments 20 replicates
![Page 18: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/18.jpg)
Old approach using single HMM
![Page 19: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/19.jpg)
Old approach using single HMM
HMM 1
![Page 20: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/20.jpg)
Old approach using single HMM
Large evolutionary diameter
HMM 1
![Page 21: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/21.jpg)
New approach
![Page 22: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/22.jpg)
New approach
![Page 23: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/23.jpg)
New approach
Smaller evolutionary diameter
![Page 24: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/24.jpg)
New approach
HMM 1
HMM 2
![Page 25: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/25.jpg)
New approach
HMM 1
HMM 3 HMM 4
HMM 2
![Page 26: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/26.jpg)
SEPP (10% rule) Simulated Results
0.0 0.0
Increasing rate evolution
Backbone size: 500 5000 fragments 20 replicates
![Page 27: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/27.jpg)
ACT..TAGA..A (species5)
AGC...ACA (species4)
TAGA...CTT (species3)
TAGC...CCA (species2)
AGG...GCAT (species1)
• ACCG • CGAG • CGG • GGCT • TAGA • GGGGG • TCGAG • GGCG • GGG • . • . • . • ACCT
(60-‐200 bp long)
Fragmentary Unknown Reads: Known Full length Sequences, and an alignment and a tree
(500-‐10,000 bp long)
Using SEPP
ML placement 40%
![Page 28: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/28.jpg)
ACT..TAGA..A (species5)
AGC...ACA (species4)
TAGA...CTT (species3)
TAGC...CCA (species2)
AGG...GCAT (species1)
• ACCG • CGAG • CGG • GGCT • TAGA • GGGGG • TCGAG • GGCG • GGG • . • . • . • ACCT
(60-‐200 bp long)
Fragmentary Unknown Reads: Known Full length Sequences, and an alignment and a tree
(500-‐10,000 bp long)
Taxonomic Identification using Phylogenetic Placement Adding Uncertainty
2nd highest likelihood placement 38%
ML placement 40%
![Page 29: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/29.jpg)
ACT..TAGA..A (species5)
AGC...ACA (species4)
TAGA...CTT (species3)
TAGC...CCA (species2)
AGG...GCAT (species1)
• ACCG • CGAG • CGG • GGCT • TAGA • GGGGG • TCGAG • GGCG • GGG • . • . • . • ACCT
(60-‐200 bp long)
Fragmentary Unknown Reads: Known Full length Sequences, and an alignment and a tree
(500-‐10,000 bp long)
Taxonomic Identification using Phylogenetic Placement Adding Uncertainty
2nd highest likelihood placement 38%
ML placement 40%
![Page 30: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/30.jpg)
TIPP
Nguyen et al. Bioinformatics 2014
![Page 31: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/31.jpg)
TIPP for Taxonomic Profiling l Marker-based abundance profiler
l Uses a collection of single copy housekeeping genes
l Only fragments binned to marker genes classified
l Profiling algorithm
l Bins fragments to marker genes
l Classify fragments binned to each marker
l Pool all classified reads
l Estimate abundance profile on pooled reads
![Page 32: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/32.jpg)
5/14/14
Taxonomic Profiling Experimental Design
l Datasets l Easy conditions (low error rates, known genomes)
l Hard conditions (novel genomes, high error rates)
l Methods l Marker-based – TIPP, Metaphyler, mOTU, Metaphlan
l Genome-based – NBC, PhymmBL
l Measured distance to true profile as error metric
![Page 33: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/33.jpg)
“Easy” genome datasets
![Page 34: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/34.jpg)
High indel datasets containing known genomes
Note: NBC, MetaPhlAn, and Metaphyler cannot classify any sequences from at least of the high indel long sequence datasets. mOTU terminates with an error message on all the high indel datasets.
“Hard” genome datasets
![Page 35: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/35.jpg)
“Novel” genome datasets
Note: mOTU terminates with an error message on the long fragment datasets and high indel datasets.
![Page 36: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/36.jpg)
Summary l TIPP: marker-based taxonomic identification and
classification method through phylogenetic placement
l Very robust to sequencing errors and novel genomes
l Results in overall more accurate profiles
l Accurate profiles can be obtained by classifying reads from the marker genes
![Page 37: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/37.jpg)
Acknowledgements
Siavash Mirarab Tandy Warnow Mihai Pop
Supported by NSF DEB 0733029 University of Alberta
Bo Liu
![Page 38: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/38.jpg)
SEPP/TIPP/UPP SEPP/UPP/TIPP site: https://github.com/smirarab/sepp/ Instructions for installing UPP: https://github.com/smirarab/sepp/blob/master/tutorial/upp-tutorial.md Instructions for installing TIPP: https://github.com/smirarab/sepp/blob/master/tutorial/tipp-tutorial.md References: 1) N. Nguyen, S. Mirarab, K. Kumar, and T. Warnow. Ultra-large alignments using phylogeny-aware profiles, Proceedings of Research in Computational Biology (RECOMB) 2015 and to appear in Genome Biology 2015. 1) N. Nguyen, S. Mirarab, B. Liu, M. Pop, and T. Warnow. TIPP:Taxonomic Identification and Phylogenetic Profiling. Bioinformatics, 2014, 30 (24): 3548-3555. 2) Mirarab, S., N. Nguyen, and T. Warnow, 2012. SEPP: SATe-Enabled Phylogenetic Placement. Pacific Symposium on Biocomputing.
![Page 39: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/39.jpg)
Place Sequence
S1
S4
S2
S3 Q1
S1 = -AGGCTATCACCTGACCTCCA-AA S2 = TAG-CTATCAC--GACCGC--GCA S3 = TAG-CT-------GACCGC--GCT S4 = TAC----TCAC--GACCGACAGCT Q1 = -------T-A--AAAC-------- Q1
Q2 Q3
Query sequences are aligned and placed independently
![Page 40: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/40.jpg)
Phylogenetic Placement
l Align each query sequence to backbone alignment: l HMMALIGN (Eddy, Bioinformatics 1998) l PaPaRa (Berger and Stamatakis, Bioinformatics
2011)
l Place each query sequence into backbone tree, using extended alignment: l pplacer (Matsen et al., BMC Bioinformatics 2010) l EPA (Berger et al., Systematic Biology 2011)
![Page 41: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/41.jpg)
16S Identification
A A
A A
B
B
16S gene
![Page 42: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/42.jpg)
16S Identification
A A
A A
B
B
16S gene
True Abundance A: 67% B: 33%
Estimated Abundance A: 50% B: 50%
![Page 43: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/43.jpg)
Single copy gene
A A
A A
B
B
Single copy gene
True Abundance A: 67% B: 33%
Estimated Abundance A: 67% B: 33%
![Page 44: Taxonomicidenficaonandphylogenec profiling)tandy.cs.illinois.edu/nam_michigan.pdf · 2015-05-18 · • Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended](https://reader034.fdocument.pub/reader034/viewer/2022043022/5f3e70ccb2a851722176fbf6/html5/thumbnails/44.jpg)
• Developers: Nguyen, Mirarab, Pop, and Warnow • SEPP takes the best extended alignment and finds the
ML placement. • Modify SEPP to use uncertainty:
• Take as many alignments necessary to reach support alignment threshold
• Classify query sequence at node with sufficient placement support threshold
• Nguyen et al. Bioinformatics 2014
TIPP: Taxonomic identification and Phylogenetic Profiling