Lire un métagénome c’est retrouver son chemin dans une forêt
Transcript of Lire un métagénome c’est retrouver son chemin dans une forêt
Bachar Cheaib Laboratoire Derome Le 07 Avril 2015
Lire un métagénome c’est retrouver son chemin dans une forêt ?
Du ciel confortable (bio) à la forêt inconfortable (bioinfo)
NGS
Rendre la complexité simple et lisible
La métagénomique une méthode d’accès aux ressources généMques
culture cellulaire
ÉchanMllonnage
ÉchanMllonnage
Microfiltrage
adapté
Extrac3on de
L’ADN total
La métagénomique une méthode d’accès aux ressources généMques
ÉchanMllonnage
Coupure
La métagénomique une méthode d’accès aux ressources généMques
Microfiltrage
adapté
Extrac3on de
L’ADN total Prépara3on
des librairies
amplifica3on
ÉchanMllonnage
Coupure Prépara3on
des librairies
amplifica3on
Séquençage
massif
fragments
courts
La métagénomique une méthode d’accès aux ressources généMques
Microfiltrage
adapté
Extrac3on de
L’ADN total
Du séquençage à la bioinformaMque
Metagenomes Reads
Preprocessing
(Trimming, Quality Control,
Decontamination)
Metagenomes Reads
Preprocessing
(Trimming, Quality Control,
Decontamination)
Assembly
Annotation
Metagenomes Reads
Preprocessing
(Trimming, Quality Control,
Decontamination)
Assembly
Annotation Function
abundance
Metagenomes Reads
Preprocessing
(Trimming, Quality Control,
Decontamination)
Assembly
Annotation Taxonomic
abundance
Function abundance
Metagenomes Reads
Preprocessing
(Trimming, Quality Control,
Decontamination)
Assembly
Annotation Taxonomic
abundance
Function abundance
Polymorphism
Metagenomes Reads
Preprocessing
(Trimming, Quality Control,
Decontamination)
Assembly
Annotation Taxonomic
abundance
Function abundance
Polymorphism
Metabolic abundance
Metagenomes Reads features
Data exploraMon ( i.e SGA Preqc)
Per‐base error rates
Sequence coverage
Repeat‐content
Metagenome size,
Before filtering…
Metagenomes Reads features
Before filtering…
Metagenomes Reads features
Before filtering…
Metagenomes Reads
Preprocessing
(Trimming, Quality Control,
Decontamination)
Metagenomes Reads
Preprocessing
(Trimming, Quality Control,
Decontamination)
NGS technologies
Fastx-toolkit,
Trimmomatic,
Sickle,
ERNE-filter,
Deconseq
Metagenomes Reads
Preprocessing
(Trimming, Quality Control,
Decontamination)
NGS technologies
‐ Fastx‐toolkit for Short‐Reads FASTA/FASTQ (Hannon Lab ) ‐ TrimmomaMc (Bolger et al 2014, Bioinforma3cs)
‐ Sickle (Joshi NA and Fass JN. , 2011) ‐ ERNE (Extended Randomized Numerical alignEr) (Fabbro et al, 2013) ‐ Deconseq (DECONtamina3on of SEQuence data) (Schmeider and Edwards, 2011)
Fastx-toolkit,
Trimmomatic,
Sickle,
ERNE-filter,
Deconseq
Metagenomes Reads
Preprocessing
(Trimming, Quality Control,
Decontamination)
NGS technologies
Assembly
RAY META
SOAP, SGA
Fermi, MetaVelvet,
Newbler
Fastx-toolkit,
Trimmomatic,
Sickle,
ERNE-filter,
Deconseq
Metagenomes Reads
Preprocessing
(Trimming, Quality Control,
Decontamination)
Assembly
Annotation
Inconnu
Connu
Retrouver le connu et prédire l’inconnu
Inconnu
Connu
Retrouver le connu et prédire l’inconnu
Generalist Specialist
Curated/
verified
Not
verified/
curated
GeneBank
EMBL
…
SwissProt
…
Curated/
verified
Not
verified/
curated
SEED
Model organisms
Databases (FlyBase)
Local databases
…
Pfam
ProDom
…
Sequence/funcMon/structure Sequence/funcMon/structure
FuncMonal classificaMon
• The gene/protein family approach or the Clusters of Orthologous Groups(COG)s, (clustering algorithms based similarity)
• The Subsystem approach to genome annotaMon (Overbeek et al 2005)
• Bio‐ontology (rarely used in metagenomics)
h^p://portal.nersc.gov/project/m1317/FOAM/
Bio‐onthologies
Inconnu
Connu
Retrouver le connu et prédire l’inconnu
métagenome
Inconnu
Connu
Retrouver le connu et prédire l’inconnu
métagenome
• Importance de
l’algorithmique de
graphes
• Assemblage des
nouveaux génomes
• Découverte des
nouvelles organismes
Metagenomes Reads
Preprocessing
(Trimming, Quality Control,
Decontamination)
Assembly
Annotation Function
abundance
FindORFS
• Ab ini3o/De novo
(based gene content)
• Similarity methods
• Combined
• Compara3ve
Annota3on against
databases
• Local and global
similarity search
• Exact matching
etc
Func3onal
annota3on and
classifica3on
…..
GLIMMER
METAGENE‐ANNOTATOR
GeneMark
FragGeneScan
…..
Func3onal annota3on
and classifica3on
ORFS (FragGenScan) BLAT
The SEED
database
(Overbeek et al 2014)
(Oberbeek et al 2005)
Func3onal annota3on
and classifica3on
ORFS (FragGenScan) BLAT/BLAST
The SEED
database
4 hierarchical levels
Level 1
(27 categories)
(Overbeek et al 2014)
(Oberbeek et al 2005)
RAST, Rapid Annota3ons
using Subsystems Technology
Metagenomes Reads
Preprocessing
(Trimming, Quality Control,
Decontamination)
Assembly
Annotation Taxonomic
abundance
Function abundance
Taxonomic assignment
(binning)
Amplicons of markers from metagenomes
Whole MG content
Gene Markers
All genes
16S
House‐keeping
Specific biomarkers
Taxonomic assignment
(binning)
Whole MG content
All genes
LCA (Lowest Common
Ancestor)
Best hit
Representa3ve hit
Assignement des taxons
Bazinet and Cummings BMC Bioinforma3cs 2012
Similarity based PhylogeneMc‐based ComposiMon‐based
Bazinet and Cummings, 2012
Taxonomic assignment
(binning)
OTUs Profiling based
16S gene markers
Alpha‐Diversity es3ma3on
Mothur (Uclust)
Qiime (Usearch)
OTU (Opera3onal Taxonomic
Unit) Clustering
For Amplicons only, not metagenomes
16S
Gene Markers
Taxonomic assignment
(binning)
Amplicons of markers from metagenomes
OTUs based Gene Markers
16S
House‐keeping
Biodiversity sampling ? RarefacMon curves
Specific biomarkers
Taxonomic assignment
(binning)
OTUs Profiling based
16S gene markers
Alpha‐Diversity es3ma3on
Mothur (Uclust)
Qiime (Usearch)
OTU (Opera3onal Taxonomic
Unit) Clustering
For Amplicons only, not metagenomes
16S
Gene Markers House‐
keeping
OTUs profiling based
housekeeping genes
Metaphlan
Mapping/similarity
research
BOWTIE/BLAST
(Segata et al, 2012)
Metagenomes Reads
Preprocessing
(Trimming, Quality Control,
Decontamination)
Assembly
Annotation Taxonomic
abundance
Function abundance
Polymorphism
Environmental biomarkers annotated from contigs
Calling Variants (SNP)s
Mappings Reads/Contigs
Calling Variants (SNP)s
SamTools, VCFTools
FreeBayes, picard
GATK, Platypus
1 ?
Next club discussion ?
Tools based mapping
Environmental biomarkers annotated from contigs
Mappings Reads/Contigs
Tools EvaluaMon
Rufallo et al, 2011
SEAL evaluator : Seal is available as open source at h^p://compbio.case .edu/seal/
Rufallo et al, 2011
Indexing run3me versus Alignment Run3me
Metagenomes Reads
Preprocessing
(Trimming, Quality Control,
Decontamination)
Assembly
Annotation Taxonomic
abundance
Function abundance
Polymorphism
Metabolic abundance
Functional metagenome
annotations
Mapping metabolic pathways
KEGG MAP TOOL
Abundance of pathways
enzymes
Pipeline d’analyse de “reads” aux métabolites
Prétraitement
Pipeline d’analyse de “reads” aux métabolites
LCA
MGRAST
Assignement des taxons
Prétraitement
Taxonomie
Pipeline d’analyse de “reads” aux métabolites
LCA
MGRAST
Assignement des taxons
Prétraitement
Taxonomie
FoncMon
Pipeline d’analyse de “reads” aux métabolites
LCA
MGRAST
Assignement des taxons
Prétraitement Polymorphisme
Taxonomie
FoncMon
Metabolic abundance
Metagenome webservers and soeware
• MG‐RAST
• METAGENassist
• AMPHORA2
• QIIME (amplicons)
• MOTHUR (amplicons)
• MEGAN
etc
Quelques astuces …
• Chercher des études compara3ves ou des logiciels d’évalua3on
• Chercher des références pour l’op3misa3on des paramètres
• Choisir les bonnes ressources d’annota3on
• Faire des expériences supplémentaires pour évaluer la per3nence
et la fiabilité des méthodes
• Ouvrir la boite noire des ou3ls, logiciels, web serveurs etc.
“Whether you want to uncover the secrets of
the universe, or you just want to pursue a
career in the 21st century, basic computer
programming is an essen0al skill to learn.”
Stephen Hawking
Club of “Biocoders” idea ?
R, Python, PERL, AWK …
“Scripts developed over 0me and (some0mes) s0ll usefull”
References
• Sharon et Banfield, Science, 2013
• Albertsen et al, Nature biotechnology, 2013
• Iverson et al, Science, 2012
• Yilmaz et al, Nature biotechnology, 2011
• Luo et al., Methods in enzymology, 2013
• Sharpton et al., Fron3ers in PLANT science, 2014
• Caporaso et al, Nature Methods, 2010
• Huson et al, Genome Research, 2011
QuesMons ?
Mande et al, 2012
Mande et al, 2012
U=Unknown
K=known