EVOLUTION OF GLOBINS Evolution of Globins Evolution of visual pigments and related molecules.
-
date post
19-Dec-2015 -
Category
Documents
-
view
225 -
download
3
Transcript of EVOLUTION OF GLOBINS Evolution of Globins Evolution of visual pigments and related molecules.
EVOLUTION OF GLOBINS
Evolution of Globins
Evolution of visual pigments and related molecules
Evolution of gene clusters
• Many genes occur as multigene families (e.g., actin, tubulin, globins, Hox)– Inference is that they evolved from a common
ancestor– Families can be
• clustered - nearby on chromosomes (α-globins, HoxA)
• Dispersed – on various chromosomes (actin, tubulin)
• Both – related clusters on different chromosomes (α,β-globins, HoxA,B,C,D)
– Members of clusters may show stage ortissue-specific expression
• Implies means for coregulation as wellas individual regulation
Evolution of gene clusters• multigene families (contd)
– Gene number tends to increase withevolutionary complexity
• Globin genes increase in number from
primitive fish to humans
– Clusters evolve by duplication and divergence
• History of gene families can be traced by comparing sequences
– Molecular clock model holds that rate of change within a group is relatively constant
• Not totally accurate – check rat genome sequence paper
– Distance between related sequences combined with clock leads to inference about when duplication took place
Classic phylogenetic studies of sequenceconservation: the globinsThe globins are the best studied family in terms of sequence conservation, partly because they were one of the first families for which multiple members were sequenced, and partly because some of the earliest protein structures (in fact, the earliest) solved were globins. The classic papers of Perutz, Kendrew and Watson were the first to correlate sequence conservation with aspects of protein structure and function. They drew their conclusion based on only a few aligned sequences. Later globin studies, such as that of Bashford, Chothia and Lesk, expanded the analyses of globin sequence conservation to include hundreds of sequences.
Perutz, Kendrew & Watson J Mol Biol 13, 669 (1965)
Bashford, Chothia & Lesk J Mol Biol 196, 199 (1987)
Scapharca inaequivalvisoxygenated hemoglobin
Conservation of functional residues
Phe 43
His 87
heme
There were only 2 perfectly conserved residues among the 8 known globin structures at the time of the Bashford et al study. These are residues critical in binding of heme and/or interaction w/heme-bound oxygen. It will often be found that the best conserved residues in related proteins are those involved in critical aspects of the general function.
Residues involved in more specific aspects of function may or may not be conserved, depending upon the relationship between the proteins under consideration. For example, residues involved in substrate specificity for serine proteases may be conserved among orthologs, such as the chymotrypsins, but not between paralogs, such as chymotrypsins and trypsins.
yellow = small neutral/polargreen = hydrophobicred/pink = polar/acidicblue = basic
buriedhuman hemoglobinbeta chain
Conservation at buried positions
• core residues, which are usually hydrophobic, often tolerate conservative substitutions, i.e. to other hydrophobics• overall core volume is well-conserved (Lim & Ptitsyn, 1970) though individual core positions tolerate variation in volume• this reflects what we know about packing and the effects of core mutations on stability--thus sequence conservation is partly related to maintaining a stable structure
Y140
H156
portion of alignment of prokaryotic and eukaryotic globins
yellow = small neutral/polargreen = hydrophobicred/pink = polar/acidicblue = basic
human hemoglobinbeta chain
Y140
H156
Conservation at solvent-exposed positions
• solvent-exposed (surface) positions are mutable and usually toleratemutation to many residue types including hydrophobics. Bashford et al.,however, noted that for globins at least, some surface positions do nottolerate large hydrophobics. Since polar-to-hydrophobic mutations on proteinsurfaces do not reduce stability, this conservation could reflect constraintson solubility. Indeed, it is clear that the overall polar character of the surface is conserved for soluble, globular proteins, even though a certainnumber of hydrophobics may be tolerated.
examplesof surfaceresidues
Conservation of loops and turns
• “Spacer” regions between secondary structures, such as loops and turns, are often hypermutable and vary not only in sequence but in length, tolerating insertion and deletion events (Insertions and deletions are much less often found within secondary structure elements. Why?)
part of alignment of animal hemoglobin and chainshumanchain
Are the and chains related to each other by paralogy or orthology?
Sequence identity and homology: poor coverage
the two proteins have the same fold,both bind heme and oxygen in same place: good independent structural/functional evidence for homology...
Yet alignments of their sequences reveal only 24% identity. There are also many examples of related globins and other proteins with much lower identity than this.
1MBO and 1HBBhemoglobin and myoglobin
Any reasonable sequence identity criterion, whether it is a flat percent cutoff or a length-dependent cutoff, will give incomplete coverage--in other words, it will fail to identify many distant but true relationships.
Evolutionary analysis: one step into the a priori prediction
Consensus: AAT GGC TCT TTT GAA AAA ...
N G F F N K .
Seq2: AAC GGA TGT TTC GAG AAA...
N G C F E K .
Synonymous
Non-synonymous
Nu
mb
er
of
ind
ivid
ua
ls
Number of mutations
Neutrally fixed
Purifying selection
Positive selection
AAT GGC TGT TTT GAA AAA ...
N G C F N K .
E
Seq1
Seq2
Seq3
Seq4
Seq5Seq6
Seq7
Seq8
Seq9
Seq10
Seq11
Consensus
Amino acid replacements
Non-synonymous nucleotide substitution
Protein function or structure
changes
Neutral evolution vs selection
Biological fitness (W)
Amino acid changes
Neutrality
Purifying selection
Positive selection
Neutral Theory of molecular evolution
Measuring the strength of selection
)()(
S
N
dSynonymousdsynonymousNon n
NdN
sS
dS
= 1 < 1 > 1
Neutrality
Purifying selectionPositive selection
Two ways of testing the functional importance of peptide regions
Experimental (Functional Biologists) Predictive (Evolutionary Biologists)
Evolutionary and structural analysis
Serial deletions and random directed
mutagenesis
Consensus: AAT GGC TCT TTT GAA AAA ...
N G F F N K .
Seq2: AAC GGA TGT TTC GAG AAA...
N G C F E K .
Methods to detect adaptive evolution using DNA divergence data
Maximum-likelihoodmodels
Models to detect adaptive evolution at single codon sites
Models to detect adaptive evolution at specific lineages of
the tree
Kimura-based modelsMultiple alignment
Sq1: ...ATGGGCGTC...
Sq2: ...ATGGACGTA...
Sq3: ...ATGGGAGAG...
Sq4: ...ATGAGCGTC...
Sq1
Sq2
Sq4
Sq3
a
b
Tree
Parsimony method to detectSelection at single sites
Sliding-window basedMethods
Sq1: ...ATGGGCGTC...
Sq2: ...ATGGACGTA...
Sq3: ...ATGGGAGAG...
Sq4: ...ATGAGCGTC...
4
Sq1
Sq2
Sq4
Sq3
a
b
Tree
1
2
5
6
...ATGGGCGTC...
...ATGGACGTA...
...ATGGGAGAG...
...ATGAGCGTC...
Sq1
Sq2
Sq4
Sq3
a
b
Tree
1
2
5
6
A B
A1
A2
B1
B2
A3 A4B3
Different levels of protein’s function and evolution
Intra-molecular co-evolution
Tully and Fares (2006) Evol. Bioinf.
Inter-protein/gene co-evolution
Co-evolution/interaction between two different biological systems
Covariation analysis
Substitution patterns at different positions in a sequence alignment are not necessarily independent. This is sometimes referred to as covariation or correlated evolution.
name sequenceA YADLGRIKSB YSDLGSEKEC IDDFGEIAAD IDDFGVIGT
For example, in the mini multiple alignment shown at left, the identity of the residue at the 4th position is correlated to the identity of the residue at the 1st position.
A statistical perturbation analysis can be used to characterize this covariation. An alignment of related sequences is “perturbed” by only considering sequences at which, for example, the first position is Y. The effect of this perturbation on the residue distribution observed at other positions is then measured. If the distribution changes significantly, covariation between sequence changes at the first site and other sites in the alignment is inferred.
The hydrophobic core residues in related proteins tend to be covariant due to constraints on core packing. One sees compensatory volume changes at different positions.
Davidson and coworkers found that for 266 aligned SH3 domain sequences, the strongest covariation was observed for a cluster of central hydrophobic residues.
For example, substitution of a smaller residue (Ala->Gly) at 39 was strongly correlated to substitution of a larger residue (Ile->Phe) at 50.
Hydrophobic core of SH3domains, with most frequentlycovarying residues shown in yellow
Covariation and hydrophobic core packing
S.M. Larson, A.A. DiNardo and A.R. Davidson, J Mol Biol 303, 433 (2000)
Some recent studies (Suel et al) have suggested a connection between covarying clusters of residues and transduction of signals between distant sites in proteins.
For example, G-protein coupled receptors bind a ligand on one side of a membrane, and then transduce that signal to the other side through conformational change. Suel et al showed thatthe main clusters of covarying residues tended to connect the ligand and G-protein binding sites.
ligand
G-protein binding sites
membrane
covaryingnetworks(brown)
Suel et al. Nat Struct Biol 2003
A novel method to detect co-evolution in protein-coding genes
(Fares and Travers, Genetics 2006)
AAMWCGPCPNDEE
CAMCCGMCMNDEE
CAMDCGACANDEE
AAMMCGCCCNDEE
AAMWCGPCPNDEE
CAMCCGMCMNDEE
CAMDCGACANDEE
AAMMCGCCCNDEE
ij
ekijektxB
1
ijekijektxB
1
T
SSekA
T 1
1
AAMWCGPCPNDEE
CAMCCGMCMNDEE
CAMDCGACANDEE
AAMMCGCCCNDEE
AAMWCGPCPNDEE
CAMCCGMCMNDEE
CAMDCGACANDEE
AAMMCGCCCNDEE
T
SSekB
T 1
1
AAMWCGPCPNDEE
CAMCCGMCMNDEE
CAMDCGACANDEE
AAMMCGCCCNDEE
AAMWCGPCPNDEE
CAMCCGMCMNDEE
CAMDCGACANDEE
AAMMCGCCCNDEE
2ˆBijekekD
AAMWCGPCPNDEE
CAMCCGMCMNDEE
CAMDCGACANDEE
AAMMCGCCCNDEE
AAMWCGPCPNDEE
CAMCCGMCMNDEE
CAMDCGACANDEE
AAMMCGCCCNDEE
2
1
1
T
SASekA T
D 2
1
1
T
SBSekB T
D
AAMWCGPCPNDEE
CAMCCGMCMNDEE
CAMDCGACANDEE
AAMMCGCCCNDEE
AAMWCGPCPNDEE
CAMCCGMCMNDEE
CAMDCGACANDEE
AAMMCGCCCNDEE
T
S
T
SBSekASek
T
SBSekASek
AB
DDDD
DDDD
1 1
22
1
ˆˆ
ˆˆ
Testing the significance of the correlation coefficient
i
ii
Z
RP
95.0
,1000
1)95.0(
1000
1
2ˆAijekekD
Sequence alignment
3D
> 75%
> 75%
Clade 1
Clade 2
Tree
Molecular co-evolution analyses: CAPS (Fares and McNally, Bioinformatics 2006)
Collate results from ‘re-sampling’ and ‘real’ data and sort by
1 = 0.12 = 0.153 = 0.35...i = 0.40i+1 = 0.55..N-1 = 0.98N = 0.99
Re-sampling1 = 0.552 = 0.98
Real
Calculate probabilities of R-values applying the step-down permutational
correction
N
iP
155.0
Identify groups of co-evolving pairs with P > 0.95
Flow of information
in CAPS
Comparative analysis of sensitivities
MICKDependency
CAPSlnLCorr
0
10
20
30
40
50
60
70
80
90
100
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
10
20
30
40
50
60
70
80
90
100
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
10
20
30
40
50
60
70
80
90
100
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
10
20
30
40
50
60
70
80
90
100
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
10
20
30
40
50
60
70
80
90
100
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
10
20
30
40
50
60
70
80
90
100
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
SE
NS
ITIV
ITY
DISTANCE
TR
UE
PO
SIT
IVE
S
DistanceCAPSMICKDEPENDENCYlnLCorr
0102030405060708090
100
0.1 0.2 0.5 1
Mea
n S
ensi
tivi
tyDivergence
CAPS
MICK
Dep.
LnLCorr
0102030405060708090
100
10 20 30
Number of Sequences
Mea
n S
ensi
tivi
ty
n. sequence
CAPS
MICK
Dep.
LnLCorr
Three-dimensional spheres to detect protein-protein interfaces
Co-evolving amino acid sites
Highly conserved sites at overlapping areas
Spheres of 4Å radius
Co-evolving Amino acids share properties of hydrophobicity and molecular weight
Protein-protein interfaces could be predicted with greater accuracy