Post on 24-Jan-2016
description
Multiple sequence alignmentLesson 4
VTISCTGSSSNIGAG-NHVKWYQQLPGVTISCTGTSSNIGS--ITVNWYQQLPGLRLSCSSSGFIFSS--YAMYWVRQAPGLSLTCTVSGTSFDD--YYSTWVRQPPGPEVTCVVVDVSHEDPQVKFNWYVDG--ATLVCLISDFYPGA--VTVAWKADS--AALGCLVKDYFPEP--VTVSWNSG---VSLTCLVKGFYPSD--IAVEWWSNG--
Like pairwise alignment BUT compare n sequences instead of 2
Each row represents an individual sequenceEach column represents the same position
May be gaps in some sequences
MSA & Evolution MSA can give you a picture of the forces that shape evolution!
Important amino acids or nucleotides are not allowed to mutateLess important positions change more easily
Conserved positionsColumns where all the sequences contain the same amino acids or nucleotides Important for the function or structure VTISCTGSSSNIGAG-NHVKWYQQLPGVTISCTGSSSNIGS--ITVNWYQQLPGLRLSCTGSGFIFSS--YAMYWYQQAPGLSLTCTGSGTSFDD-QYYSTWYQQPPG
Consensus SequenceA consensus sequence holds the most frequent character of the alignment at each column
ProfileProfile = PSSM Position Specific Score (probability) Matrix
Alignment methodsThere is no available optimal solution for MSA all methods are heuristics:
Progressive/hierarchical alignment (Clustal)Iterative alignment (mafft, muscle)
Progressive alignmentABCDE
Compute the pairwise alignments for all against all (6 pairwise alignments)the similarities are stored in a tableFirst step:
Cluster the sequences to create a tree (guide tree):represents the order in which pairs of sequences are to be alignedsimilar sequences are neighbors in the tree distant sequences are distant from each other in the tree
Second step:The guide tree is imprecise and is NOT the tree which truly describes the relationship between the sequences!
Third step:1. Align the most similar (neighboring) pairssequencesequencesequencesequence
Third step:2. Align pairs of pairssequenceprofile
Main disadvantages:sub-optimal tree topologyMisalignments resulting from globally aligning a pair of sequences will only cause further deterioration
Iterative alignmentABCDEGuide treeMSAPairwise distance tableADCBIterate until the MSA doesnt change (convergence)E
Searching for remote homologsSometimes BLAST isnt enough.Large protein family, and BLAST only gives close members. We want more distant members
PSI-BLASTProfile HMMs
Profile HMMSimilar to PSI-BLAST: also uses a profileTakes into account:Dependence among sites (if site n is conserved, it is likely that site n+1 is conserved part of a domainThe probability of a certain column in an alignment
PSI BLAST Vs. profile HMMProfile HMMPSI BLASTMore exactSlowerLess exactFaster
Case study: Using homology searching The human kinome
Kinases and phosphatases
Multi-tasking enzymesSignal transductionMetabolismTranscriptionCell-cycleDifferentiation Function of nervous and immune systemAnd more
How many kinases in the human genome?1950s, discovery of that reversible phosphorylation regulates the activity of glycogen phosphorylase
1970s, advent of cloning and sequencing produced a speculation that the vertebrate genome encodes as many as 1001 kinases
2001 human genome sequence As well databases of Genbank, Swissprot, and dbEST
How can we find out how many kinases are out there?
How many kinases in the human genome?
The human kinomeIn 2002, Manning, Whyte, Martinez, Hunter and Sudarsanam set out to:Search and cross-reference all these databases for all kinasesCharacterize all found kinases
ePKs and aPKsEukaryotic protein kinase (majority) catalytic domainAtypical protein kinasesSequence homology of the catalytic domain; additional regulatory domains are non-homologousNo sequence homology to ePKs; some aPK subfamilies have structural similarity to ePKs
The searchSeveral profiles were built: based on the catalytic domain of: (a) 70 known ePKs from yeast, worm, fly, and human with >50% identity in the ePK domain (b) each subfamily of known aPKs
HMM-profile searches and PSI-BLAST searches were performed
The results478 apKs 40 ePKs
Total of 518 kinases in the human genome (half of the prediction in the 1970s)
Classifying the kinasesClassification based on the catalytic domainClassification based on the regulatory domains189 sub-families of kinases
Comparison to other species209 subfamilies of ePKs in human, worm, yeast and fly
The human genome has x2 kinases (in number) as fly or worm. Many are aPKs. Most of them are receptor tyrosine kinases (RTKs)
The human-expanded kinase families function predominantly in processes of the:Nervous systemImmune systemAngiogenesisHemopoiesis
The discovery of new kinases: a new front for battling human diseases
Correlating with human diseases160 kinases mapped to amplicons seen in tumors80 kinases mapped to amplicons in other major illnessesUsually kinases are over-expressed in cancer and other diseases
Correlating with human diseases6 kinase inhibitors have been approved till today for the use against cancer>70 other inhibitors are in clinical trials