Coffee Shop
description
Transcript of Coffee Shop
Coffee Shop
F91921025 黃仁暐F92921029 戴志華F92921041 施逸優R93921142 吳於芳R94921035 林與絜
2005/12/14 2
Menu
Coffee Shop OpeningWhy coffee shop?
Three FlavorsCOFFEET-Coffee3DCoffee
RemarksRecipes
2005/12/14 3
Multiple Sequence Alignment
Multiple sequence alignment is one of the most important tool for analyzing biological sequence.
structure predictionphylogenetic analysisfunction prediction polymerase chain reaction (PCR) primer design.
2005/12/14 4
Multiple Sequence Alignment
However, the accuracy is not good enough.difficult to evaluate the quality of a multiple alignmentalgorithmically very hard to produce the optimal alignment
In order to increase the accuracy of multiple sequence alignment, we opened a coffee shop to share three kinds of coffee.
2005/12/14 5
Before (drinking) COFFEEFor comparative genomics, and why?
Understanding the process of evolution at gross level and local levelTranslate DNA sequence data into proteins of known functionMeaning of conservative regions
E. coli, C. elegans, Drosophila, Human…What’s their relationship?
2005/12/14 6
阿拉伯芥
大腸桿菌酵母菌
集胞藻屬( 藍綠藻類 )
線蟲 果蠅人類
Classification for genes of different function
Adapted from “Principles of genome analysis and genomics” Fig. 7.5 (p.129), by S. B. Primrose and R. M. Twyman, 3rd edition
2005/12/14 7
Comparative genomics vs. multiple sequence alignment
Alignment → conservative regionConservative region → gene locationEvolution evidence
http://www.public.iastate.edu/~semrich/compgen/
2005/12/14 8http://gchelpdesk.ualberta.ca/news/02jun05/cbhd_news_02jun05.php
A: human chromosome IB: human chromosome IIC: human chromosome III
Chromosome III region 125-128 Mb was magnified 120X
The alignment between the chromosomes
2005/12/14 9
Our FlavorsCOFFEE: A New Objective Function For Multiple Sequence Alignmnent.
C. Notredame, L. Holme and D.G. Higgins,Bioinformatics,Vol 14 (5) 407-422,1998
T-Coffee: A novel method for multiple sequence alignments.
C.Notredame, D. Higgins, J. Heringa,Journal of Molecular Biology,Vol 302, pp205-217,2000
3DCoffee: Combining Protein Sequences and Structures within Multiple Sequence Alignments.
O. O'Sullivan, K Suhre, C. Abergel, D.G. Higgins, C. Notredame. Journal of Molecular Biology,Vol 340, pp385-395,2004
COFFEE
2005/12/14 11
COFFEE
An objective function for multiple sequence alignments
Cédirc Notredame, Liisa Holm and Desmond G. Higgins
SAGA with COFFEE score
2005/12/14 12
Introduction COFFEE - Consistency based Objective Function For alignmEnt EvaluationAn objective function, COFFEE score, is proposed to measure the quality of multiple sequence alignmentsOptimize the COFFEE score of a multiple sequence alignment with the genetic algorithm package SAGA (Sequence Alignment Genetic Algorithm)
2005/12/14 13
Overview of their methodGiven
a set of sequences to be aligneda library containing all pairwise alignments between them,
the COFFEE score reflects the level of consistency between a multiple sequence alignment and the library.
2005/12/14 14
COFFEE score
×
×
1
1 1,,
1
1 1,,
)(
)( COFFEE N
i
N
ijjiji
N
i
N
ijjiji
ALENW
ASCOREWscore
librarytheandAbetweensharedarethat
residuesofpairsalignedofnumberASCOREwith
ji
ji
,
, )(:
2005/12/14 15
COFFEE score
2005/12/14 16
Using COFFEE in SAGAIteratively, a multiple sequence alignment with higher COFFEE score is generated by SAGA until the COFFEE score cannot be improved SAGA follows the general principle of genetic algorithm.
The notion of survival of the fittestSAGA iteratively does:
Evaluate the score of the alignmentsThe fitter an alignment, the more likely it is to survive and produce an offspringAlignments survived may be kept unchanged, randomly modified (mutation), or combined with another alignment (cross-over)
2005/12/14 17
ResultsCOFFEE function
SAGA
Optimization of COFFEE function
Effect of optimization
Comparison: COFFEE and others
Others: PRRP, Clustal W, PILEUP, SAGA MSA, SAM
COFFEE score & alignment accuracy
等下會看到一堆表格很枯燥,所以請忍耐…
2005/12/14 18
Optimization COFFEE function was optimized by SAGA
Using ClustalW alignmentsUsing SAGA alignments
2005/12/14 19
Comparison Multiple alignments of SAGA COFFEE and 5 other methods
PRRP, ClustalW, PILEUP, SAGA MSA, SAM
Performance of SAGA and ClustalWComparison of other 5 methods
即使 SAGA-COFFEE 不是最好的結果 →跟最好的也相去不遠Identity level lower → better SAGA-COFFEE results
2005/12/14 20
2005/12/14 21
Ratio of (E+H) residue correctly alignedBetter of worse alignment? SAGA-COFFEE & othersNO such thing as an ideal method
Correctly aligned ratio Better than PRRPWorse than PRRP
2005/12/14 22
COFFEE score and alignment accuracy
r=0.65
Coffee sequence score
E+H accuracy (%)E+H accuracy (%)
Average identity (%)
由 coffee score 去預測 alignment 的準確度Average identity 並沒有辦法預測 alignment 的準確度
>85% 的 sequence 都可預測 (error ~ ±10%)
2005/12/14 23
Correlation between score and accuracyHigher score → higher accuracySAGA produces more high-score sequence than ClustalW
Coffee Break ?
T-Coffee
2005/12/14 26
T-Coffee
A novel method for multiple sequence alignments
C.Notredame, D. Higgins, J. Heringa
ClustalW with extended library
2005/12/14 27
ClustalWClustalW is the core alignment stradegy of T-Coffee,
it follows the procedure below:Pairwise Alignment: calculate distance matrixGuide Tree
Unrooted Neighbor-Joining TreeRooted Neighbor-Joining Tree: guide tree with sequence weights
Progressive Alignment: align following the guide tree
2005/12/14 28
Calculate distance matrix
2005/12/14 29
Guide tree
Use Neighbor-Joining Method to build guide tree from distance matrix.First construct an unrooted Neighbor-Joining tree, then convert it to a rooted Neighbor-Joining tree, the guide tree.
2005/12/14 30
Unrooted Neighbor-Joining Tree
2005/12/14 31
Rooted Neighbor-Joining Tree
2005/12/14 32
Progressive Alignment: align following the guide tree
Seq1 Seq2 Seq3 Seq4 Seq5
Alignment 1 Alignment 2
Alignment 3 Final alignment
2005/12/14 33
Progressive-alignment strategy
ProsFaster and saving spaces. (compared with computing all possible multiple alignments)
Cons May not find optimum solution.Errors made in the rest alignments cannot be rectified later as the rest of the sequences are added in.
T-Coffee is an attempt to minimize that effect!“Once a gap, always a gap!”
2005/12/14 34
T-Coffee Algorithm
Generating a primary library of alignmentsDerivetion of the primary library weightsCombination of the librariesExtending the libraryProgressive alignment strategy
2005/12/14 35
ClustalW Primary Library (Global)
Lalign Primary Library (Local)
Weighting
Primary Library
2005/12/14 36
Primary Library
2005/12/14 37
ClustalW Primary Library (Global)
Lalign Primary Library (Local)
Weighting
Primary Library
Extension
Extended Library
2005/12/14 38
Extended Library
A
Weight(A-C-B)= min( Weigh(A-C), Weight(B-C) )= min( 77, 100 ) = 77
Weight(A-D-B)= min( Weight(A-D), Weight(B-D) )= min( 100, 100 ) = 100
2005/12/14 39
Extended Library
SeqA: GARFIELD THE LAST FAT CATSeqB: GARFIELD THE FAST CAT
SeqA: GARFIELD THE LAST FAT CATSeqB: GARFIELD THE FAST CAT
A
2005/12/14 40
Extended Library
SeqA: GARFIELD THE LAST FAT CATSeqB: GARFIELD THE FAST CAT
ASeqA: GARFIELD THE LAST FAT CATSeqB: GARFIELD THE FAST CAT
2005/12/14 41
Progressive Alignment
ClustalW Primary Library (Global)
Lalign Primary Library (Local)
Weighting
Primary Library
Extension
Extended Library
Multiple Alignment Information
2005/12/14 42
Progressive Assignment
2005/12/14 43
Complexity Analysiscomplexity of the whole procedure:O(N2L2) + O(N3L) + O(N3) + O(NL2)O(N2L2): computation of the pair-wise libraryO(N3L): computation of the extended pair-wise libraryO(N3): computation of the NJ treeO(NL2): computation of the progressive alignmentN sequences that can be aligned in a multiple alignment of length L
2005/12/14 44
Experiment
Implementation environmentResult 1: Effect of combining local and global alignments without extension; effect of the library extensionResult 2: compared with other multiple sequence alignment methods
2005/12/14 45
Implementation environment
Programming language: ANSI CHardware: LINUX platform with Pentium II processors (330 MHz).Test case: BaliBase database of multiple sequence alignment
2005/12/14 46
Result 1
Table 1: The effect of combining local and global alignments
Name global/local/extend Cat1(81) Cat2(23) Cat3(4) Cat4(12) Cat5(11) Total(141) Significance
C ClustalW pw /.../... 70.6 26.7 43.0 56.0 60.0 58.9 7.8CE ClustalW pw/…/ex 77.1 33.6 47.6 64.8 75.9 66.3 17.7L .../Lalign pw/... 65.4 12.1 22.8 53.9 66.0 52.0 7.8LE .../Lalign pw/ex 72.6 25.6 47.2 77.5 85.5 64.2 16.3CL ClustalW pw/Lalign pw/.. 76.2 32.0 48.3 76.2 74.6 66.5 12.1g
CLE ClustalW pw/Lalign pw /ex 80.6 37.1 52.9 83.2 88.6 72.0
2005/12/14 47
Result 2
Table 2: T-coffee compared with other multiple sequence alignment methods
Method Cat1(81) Cat2(23) Cat3(4) Cat4(12) Cat5(11) Total1(141) Total2(141) Significance
Dialign 71.0 25.2 35.1 74.7 80.4 61.5 57.3 11.3ClustalW 78.5 32.2 42.5 65.7 74.3 66.4 58.6 26.2Prrp 78.6 32.5 50.2 51.1 82.7 66.4 59.0 36.9 T-Coffee 80.6 37.1 52.9 83.2 88.6 72.0 68.6
3DCoffee
2005/12/14 49
3DCoffee
Combining protein sequences and structures within multiple sequence alignments
O. O'Sullivan, K Suhre, C. Abergel, D.G. Higgins, C. Notredame
T-Coffee with structure information
2005/12/14 50
3DCoffeeStructural information can help to improve the quality of multiple sequence alignments
3DCoffeeCombines protein sequences and structuresIs based on T-Coffee version 2.00Uses a mixture of pairwise sequence alignments and pairwise structure comparison methods.
2005/12/14 51
3DCoffee
Use T-Coffee to compileA primary library: a list of weighted pairs of residues.An extended library: usage the column consistency relationship between all sequences
According to the structure informationFugue, SAP, LSQman
2005/12/14 52
3DCoffee
Fugue – a threading method that aligns a protein sequence with a 3D-structureSAP – uses DP to compute a pairwise alignment based on a non-rigid structure superpositionLSQman – a rigid body structure superposition package
2005/12/14 53
3DCoffee
Set the weight of new alignment as 100which is the most score of primary library
Add the weighted alignments into the libraryCarry out progressive alignment the same as T-Coffee
2005/12/14 54
RemarksCOFFEE : An objective function for multiple sequence alignments
SAGA with COFFEE scoreT-Coffee : A novel method for multiple sequence alignments
ClustalW with extended library3DCoffee : Combining protein sequences and structures within multiple sequence alignmentsT-Coffee with structure information
2005/12/14 55
RecipesCLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.
Julie D.Thompson, Desmond G.Higgins+ and Toby J.Gibson*. 1994COFFEE: A New Objective Function For Multiple Sequence Alignmnent.
C. Notredame, L. Holme and D.G. Higgins,Bioinformatics,Vol 14 (5) 407-422,1998
T-Coffee: A novel method for multiple sequence alignments.C.Notredame, D. Higgins, J. Heringa,Journal of Molecular Biology,Vol 302, pp205-217,2000
3DCoffee: Combining Protein Sequences and Structures within Multiple Sequence Alignments.
O. O'Sullivan, K Suhre, C. Abergel, D.G. Higgins, C. Notredame. Journal of Molecular Biology,Vol 340, pp385-395,2004
2005/12/14 56
Q & A
2005/12/14 57
Thank You
2005/12/14 58
Residue scoreSequence score measurement
Global measurement
Residue was scored 9 >90% of the pairs involved in were also present in the reference library
Residue score evaluated → substitution defined
Class 5 substitution → residue score ≥ 5
2005/12/14 59
5566677788888888899999877- - - - -66666666788888888887
vsdvprdlevvaatptslliswdap gslevvaatptslliswdap
2005/12/14 60
• Correct substitution: SAGA > ClustalW
• Lower accuracy: more false positive in SAGA alignment
2005/12/14 61
High-scoring residues with high accuracy Higher substitution
category → smaller number of prediction