Chapter 5
PHYLOGENETIC ANALYSIS OF σ32
AND FtsH
Chapter 5
71
5.1 EVOLUTIONARY ANALYSIS OF PROKARYOTIC HEAT SHOCK
TRANSCRIPTION REGULATORY PROTEIN SIGMA32
Induction of heat-shock proteins (HSPs) at elevated temperature is a universal cellular
response. The HSPs function as chaperones and proteases (Gross,1996; Yura et al.,2000).
In Escherichia coli HSPs are usually regulated by a single transcription factor, e.g. the heat
shock regulator protein σ32 (the rpoH gene product). On the other hand in eukaryotic
cells the heat shock factors (HSFs) are also involved in the regulation process (Guisbert et
al.,2008). From lower prokaryotes to humans some HSPs are conserved and act as
response to the environmental stress. These HSPs include mainly the chaperones, which
help proteins to fold properly (Guisbert et al.,2008). In Escherichia coli, expression of
more than 30 heat shock ge es is u der the co trol of the alter ative sig a factor σ ,
the product of the rpoH gene. Cellular σ level is very lo duri g gro th at 0C and
increases transiently when temperature increases. Moreover, the half-life of Escherichia
coli σ is e tre ely s all, less tha . i at C; upon temperature up-shift from 30
to 42 C, σ ecomes transiently stabilized at least 8-fold (Gruber and Gross, 2003).
Three regulatory echa is s are respo si le for the σ ediated regulatio of heat
shock responses (HSRs) in Escherichia coli (Yura and Nakahigashi, 1999). They are:
1. Control of synthesis of σ : σ sy thesis is co trolled y te perature.
2. Co trol of activity of σ : The u folded protei titratio odel has ee proposed
to e plai the regulatio of σ . Over-expression of either the DnaK/J or GroEL/S
chaperone machine inactivates σ . Co versely, depletio of either of the chapero e
achi e leads to accu ulatio of active σ .
3. Co trol of degradatio of σ : The post tra slatio al level regulatio occurs via
co trolled proteolysis of σ i volvi g differe t proteases e.g. FtsH, Lon, HslVU) and
chaperones (e.g. DnaKJ, GroELS) [Herman et al.,1995; Tomoyasu et al.,1995; Guisbert et
al., 2004; Kanemori et al., 1997). Normally at 30 C, the DnaK/J heat-shock chaperone
system binds with σ32, limiting its binding to core RNA polymerase (Liberek et al., 1995).
Chapter 5
72
The FtsH, an ATP-dependent heat-shock metalloprotease, degrades σ32 (bound with
DnaK/J) (Ito and Akiyama, 2005; Blaszczak et al., 1999). Upon heat stress, the chaperone
system DnaK/J becomes engaged with the increased cellular level of unfolded proteins
and thus makes the σ32 free to bind with core RNA polymerase, ultimately inducing HSPs
[Arsène et al., 2000). The interaction of σ32 with RNA polymerase core enzyme prevents
degradation of σ32 by the protease FtsH (Blaszczak et al., 1999), finally increasing its
stability.
Usually acteria co tai a si gle housekeepi g σ factor σ70) and several alternative σ s
σ , σ , σ 4 etc. , hich ediate respo se to differe t altered e viro e tal
conditions (Guisbert et al.,2008; Ohnuma et al., 2000; Raina et al., . σ factors ai ly
contains two to four functional regions in their sequences set depending on the particular
group to hich they elo gs. σ is a e er of the group σ s; they co tai regio s ,
3 and 4 (Guisbert et al.,2008). The regions contain recognition determinants for RNA
polymerase binding and for promoter recognition. In Escherichia coli, region2 i) consists
of sub-regions 1.2 to 2.4 (amino acid residue number 16 to 126), ii) recognizes the -10
region of promoter and iii) carries the major RNA polymerease recognition determinants.
On the other hand, region 4 is made up of sub-regions 4.1 and 4.2 (amino acid residue
number 213 to 280) and recognizes -35 region of promoter (Gruber and Gross, 2003). Our
sequence analysis study revealed that the amino acid sequences in these two regions are
conserved throughout the proteobacterial kingdom barring some mutations here and
there. The rpoH o regio i σ a i o acid residue u er to 4 has also ee
proved to play a role in binding to RNA polymerase. In addition to carrying out the
functions common to all σ s, σ must contain determinants that allow it to bind several
heat-shock chaperones like DnaK/J and heat-shock protease like FtsH , so that its activity
and stability can be properly regulated.
“i ce σ is prese t i al ost all acterial species a d ecause they are uite co served
we have attempted to study the evolutionary pattern of this important protein across the
proteobacterial kingdom using their phylogenetic relationships across the proteobacterial
Chapter 5
73
kingdom. The proteobacteria being the largest and the most diversified class of
eubacteria, are classified in different categories on the basis of their 16s rRNA sequences
(Prescott et al., 1999). We have investigated the se ue ces of σ of 4 differe t
proteobacterial species. Our aim is (1) to study the sequence diversity, length variation in
the conserved regions 2 and 4, (2) to determine the secondary structure variations and
(3) to draw the phylogenetic relatio ships a d a alyze the evolutio ary patter of σ
throughout the proteobacterial kingdom. Several groups used forward genetic screens to
study the effects of the utatio s i σ O rist et al., . It as o served that poi t
mutations in region 2.1 led to the stabilization of the protein. In addition, region 2.1 was
rece tly descri ed to e i porta t for chapero e ediated i activatio of σ Yura.,
2007). But the possible roles of the mutations which confer different functionalities to the
σ 2 protein have not yet been reported. No such study, using just the protein sequence
data, has ee reported earlier. Our results o the differe t regio s of σ ay shed
so e light i a alyzi g the roles of differe t utatio s i σ .
5.1.1 METHODS:
5.1.1.1 DATABASES:
Database is a collection of searchable, structured and updated data. Retrieval of dataset
and sketch the new conclusion are major reasons behind the database formation.
Retrieval of data may be done manually or by using computer programme.
5.1.1.2 PROTEIN SEQUENCE DATABASE AND UNIPROTKB
Protein sequence databases are a collection of annotated sequence information from
several source organisms. This is actually the fundamental determinants of biological
structure and functions.
Chapter 5
74
The Universal Protein Resource (UniProt) is a comprehensive annotated resource of
protein sequence. It serves through the collaboration among the European Bioinformatics
Institute (EMBL-EBI), Swiss Institute of Bioinformatics (SIB) and the Protein Information
Resource (PIR). UniProtKB is the central hub for the collection of functional information of
proteins with accurate, consistent, and rich information.
5.1.1.3 PAIRWISE SEQUENCE ALIGNMENT
Pairwise Sequence Alignment is used to identify regions of similarity that confer
functional, structural and/or evolutionary relationship between two protein or nucleotide
sequences. In 1970 Needleman and Wunsch demonstrated dynamic algorithm for global
alignment of two protein sequences. In 1981 Smith and Waterman developed a local
implementation of the dynamic algorithm and remove the obligations of the previous
one. Dynamic programming algorithm provides perfect output alignment for a given
scoring matrix, though is slow to some extent. Almost a decade latter Altschul et al (in
1990) introduce a heuristic approach Basic Local Alignment Search Tool (BLAST) for the
use on large datasets.
5.1.1.4 SEQUENCE HOMOLOGY SEARCH AND PAIR WISE ALIGNMENT OF
SEQUENCES
I itially a i o acid se ue ces of σ32 proteins from 24 different organisms were
extracted from refseq of NCBI, fro hich 4 a i o acid se ue ces of σ32 proteins were
chose for our study. “i ce for the sa e orga is there ere ultiple e tries for the σ32
proteins, each having the same amino acid sequence compositions, we chose one
represe tative se ue ce of σ32 protein for each of the 24 different organisms to eliminate
the data redundancies. NCBI refseq was selected for collecting our required sequences
because it provides comprehensive, integrated, non-redundant, well-annotated set of
sequences. The accession numbers of the proteins were tabulated in Table 5.1.
Chapter 5
75
Species Protein Accession Number
Salmonella enterica subsp. enterica serovar Typhi str. CT18 [GenBank:NP_458353.1]
Yersinia enterocolitica subsp. enterocolitica 8081 [GenBank:YP_001004614.1]
Haemophilus influenzae Rd KW20 [GenBank:NP_438438.1]
Citrobacter koseri ATCC BAA-895 [GenBank:YP_001456366.1]
Dickeya dadantii 3937 [GenBank:YP_003880910.1]
Pectobacterium carotovorum subsp. brasiliensis PBR1692 [GenBank:ZP_03827501.1]
Edwardsiella tarda EIB202 [GenBank:YP_003297368.1]
Xenorhabdus bovienii SS-2004 [GenBank:YP_003466496.1
Enterobacter cloacae SCF1 [GenBank:YP_003939845.1]
Pantoea sp. At-9b [GenBank:YP_004114141.1]
Erwinia amylovora CFBP1430 [GenBank:YP_003532845.1]
Photorhabdus luminescens subsp. laumondii TTO1 [GenBank:NP_931293.1]
Shigella flexneri 2a str. 301 [GenBank:NP_709231.1]
Cronobacter sakazakii ATCC BAA-894 [GenBank:YP_001440292.1]
Klebsiella pneumoniae subsp. pneumoniae MGH 78578 [GenBank:YP_001337481.1]
Sodalis glossinidius str. 'morsitans' [GenBank:YP_453767.1]
Serratia symbiotica str. Tucson [GenBank:ZP_08039036.1]
Proteus mirabilis HI4320 [GenBank:YP_002153283.1]
Rahnella sp. Y9602 [GenBank:YP_004214910.1]
Photobacterium damselae subsp. damselae CIP 102761 [GenBank:ZP_06156576.1]
Mannheimia succiniciproducens MBEL55E [GenBank:YP_087217.1]
Pasteurella multocida subsp. multocida str. Pm70 [GenBank:NP_246523.1]
Tolumonas auensis DSM 9187 [GenBank:YP_002894185.1]
Escherichia coli K-12 [GenBank: AAC76486.1]
Table 5.I: List of σ32 proteins of the 24 different proteobactreria used in our analysis
Chapter 5
76
Only those amino acid sequences for which there were clear annotations and no
ambiguities were chosen. These sequences were used as inputs to run BLAST (Altschul et
al., 1990), using the default parameters, in order to find out the homologous sequences.
The BLAST results again produced the same set of sequences as obtained previously. This
could be considered as a double check of our initial results of downloading the
sequences.
5.1.1.5 MULTIPLE SEQUENCE ALIGNMENT (MSA)
Multiple sequence alignments are an essential tool for protein structure and function
prediction, domain determination/characterization, phylogenetic inference and regular
sequence analysis. Due to requirement of high computational infrastructure dynamic
programming cannot easily utilized here for large number of sequences. So progressive
approach used for global multiple alignments are employed. CLUSTALW, a weighted
variant of CLUSTAL program, till date most widely used software program for multiple
sequence alignment. CLUSTALW constructs a distance matrix reflecting the relatedness of
each pair of sequences through pairwise alignment.
Here a sequence profile was generated by MSA, using the default parameters in the
software tool ClustalW (Thompson et al., 1994). The result shown in fig. 5.1. The MSA, so
generated, showed the presence of mutations in the sequences.
5.1.1.6 PREDICTION OF SECONDARY STRUCTURE
Secondary Structure prediction aims to predict the local secondary structures of
sequences based on the knowledge of their primary structure. Primary generation
prediction methods of early sixties and nineties mainly based on statistically calculated
single and multiple residue propensities. Modern day protein secondary structure
prediction methods based on evolutionary information contained multiple sequence
alignments viz, sequence conservation, hydrophobicity pattern, etc. Dynamic
Chapter 5
77
conformational changes related to the function of proteins and envioronmental
complications.
The secondary structures of the 32 family of proteins were predicted from their amino
acid sequences using the GOR IV, DSC of University of Wisconsin Genetics Computer
Group (GCG) package (Wisconsin Package Version 9.0., 1997). The secondary structures
were classified as helix, sheet and others. The generalized secondary structural patterns
of the 24 proteins were presented in figures 5.2a and 5.2b. The distribution of secondary
structural elements in the proteins has been found to be conserved, albeit a few
differences here and there. Therefore, figures 3a and 3b show a consensus view of the
distribution of secondary structure within the 32 family.
5.1.1.7 PROTEIN FUNCTION PREDICTION
The functions of the different parts of the 32 family of proteins were predicted from the
outputs of Pfam (Bateman et al., 2004). The Pfam results were obtained from the basis of
a MSA of closely related homologous proteins. The amino acid sequences of the
aforementioned 24 proteins were incorporated one by one to Pfam and the results were
analyzed. Pfam also demarcated the amino acid sequences of the proteins on the basis of
their structural components.
5.1.1.8 DISTANCE MATRIX CALCULATION AND CONSTRUCTION OF
PHYLOGENETIC TREE
A distance matrix was generated using MEGA (Tamura et al., 2007). This tool used
Maximum Composite Likelihood (MCL) method to estimate the evolutionary distances
between sequences. The MCL approach was different from the existing approaches for
evolutionary distance estimation, wherein each distance was estimated independent of
others, either by analytical formulae or by likelihood methods (independent estimation
[IE] approach). This implementation of the MCL method allowed for the consideration of
Chapter 5
78
substitution rate variation from site to site, using an approximation of the gamma
distribution of evolutionary rates.
5.1.2 RESULT & DISCUSSION:
5.1.2.1 SEQUENCE CONSERVATION OF σ32
Being the major heat shock regulatory protei σ32 showed great conservatism in their
functional regions among proteobacterial organisms. Both the terminal regions i.e., the
carboxyl terminal (C-ter i al a d the a i o ter i al N ter i al regio s of the σ32
proteins, contain the co served σ70 super family specific conserved regions. Initially 95
a i o acid se ue ces of σ32 proteins from 24 different organisms were extracted from
refse of NCBI, fro hich 4 a i o acid se ue ces of σ32 proteins were chosen for our
study. Since for the sa e orga is there ere ultiple e tries for the σ32 proteins each
having the same amino acid sequence compositions, we chose one representative
se ue ce of σ32 protein for each of the 24 different organisms to eliminate data
redundancies. NCBI refseq was selected for collecting our required sequences because it
provides comprehensive, integrated, non-redundant, well-annotated set of sequences.
As already e tio ed σ32 , a e er of the group σ s, contains regions 2, 3 and 4 in its
amino acid sequence (Gruber and Gross, 2003). The amino acid sequences of region 2
(amino acid residue numbers 53-126) in all the 24 different organisms were highly
conserved barring 14 amino acid residue positions of which at nine amino acid positions
(amino acid residue numbers 54, 55, 62, 64, 91, 94, 96, 97 and 123) there were
synonymous substitutions. There were non-synonymous substitutions at the following 4
amino acid sequence positions 56, 69, 75 and 116. The remaining amino acid sequence
position (position 67) was highly varia le i all of the σ32 proteins of the 24 different
organisms. This was presented in the sequence alignment of the corresponding regions of
σ32 in figure 5.1.
Chapter 5
79
Figure 5.1 - Multiple Sequence Alignment of sigma32 from 24 different proteobacteria
using ClustalW default parameters, showing their conserved/variable regions, functionally
important residues and functional regions shown in the box by CINEMA-MX (Lord et al.,
2002) .
Chapter 5
80
Compared to region 2, region 4 (amino acid sequence positions 216-280) showed more
sequence diversity (figure 5.1). In all the 24 different organisms, out of the 65 amino acid
residues in region 4, 28 positions had exactly the same amino acid residue. On the other
ha d at positio s i the a i o acid se ue ces i regio 4 of σ32 proteins there were
synonymous substitutions in all the 24 different organisms. Again non-synonymous
substitutions were found to be present at 6 different positions in amino acid sequences in
regio 4 of σ32 proteins in all the 24 different organisms. The amino acid sequence
positions of non-synonymous substitutions were 220, 223, 227, 228, 238 and 261. These
mutations led to functional divergences in the proteins as presented in the following
section.
5.1.2.2 FUNCTIONAL AND MUTATIONAL ANALYSIS
σ32 protein plays the regulatory role in heat-shock transcription. To search whether there
as a y fu ctio al diversity i σ32 of different proteobacterial species, we ventured into
Pfam (Bateman et al., 2004)-based functional studies. The Pfam search results showed
that the a i o acid se ue ces of σ32 proteins bore resemblances to other different
functionally important domains of various other proteins.
Pfam search results revealed that the ost co served regio of the σ32 proteins in all the
24 different organisms had the signature sequence of the -10 promoter recognition
motif, being helical in nature, as well as the signature sequence of the primary core RNA
polymerase binding determinant (Malhotra et al., 1996). The aromatic residues at the C-
terminus of this region take part in the transcription initiation by mediating strand
separation (Malhotra et al., 1996; Campbell et al., 2002). The other most conserved
region had the signature sequence of -35 core promoter binding motif (Campbell et al.,
2002).
Pfam search results also revealed the presence of other sequence motifs. A short stretch
of amino acids at positions 60 – 96 of region 2 had similarity with ubiquinol-cytochrome C
reductase, a helper protein of electron transport system (Iwata et al., 1998). This motif
Chapter 5
81
was found to be present in the amino acid sequences of all the 24 different organisms. In
some of organisms the signature sequences of Opi1 family of proteins were present. This
Opi1 family of proteins takes part in phospholipid biosynthesis (White et al., 1991) and
also acts as a repressor of several genes through phosphorylation (Chang et al., 2006).
The amino acid sequences at positions 130 – 171 of the σ32 proteins in all the 24 different
organisms were found to have sequence homology with respiratory-chain NADH
dehydrogenase, which catalyzed the transfer of electron from NADH to coenzyme Q
(Nakamaru-Ogiso et al., 2010).
The amino acid sequences at positions 218 – 262, which lie in the region 4 of the σ32
proteins in all the 24 different organisms, had signature sequence of the double strand
recombinant repair protein of Mei5 family, which played a role in the recombination of
homologous chromosomes during meiosis (Hayase et al., 2004).
A conserved patterns of amino acids were found at the C-terminal parts of amino acid
sequences of the σ32 proteins in all the 24 different organisms resembling the signature
sequence of RHH1 (ribbon helix helix) family of proteins. This RHH1 family of proteins
played a role in nucleotide incorporation in DNA. However, this family of proteins did not
function as DNA binding proteins (Gomis et al., 1998).
In Photobacterium damselea (ZP_06156576.1), the amino acid sequence at positions 166-
217 showed similarity with phospho-protein family which modulated the polymerase
protein (Pattnaik et al., 1997).
5.1.2.3 CONSERVED SECONDARY STRUCTURE/PATTERN ANALYSIS
“eco dary structures of the σ32 proteins in all the 24 different organisms were predicted
using both Chou Fasman (CF), and Garnier Osguthorpe Robson (GOR) algorithms present
in GCG package (Wisconsin Package Version 9.0., 1997). The results showed a great deal
of se ue ce co servatio i the ai fu ctio al regio s regio s a d 4 of the σ32
proteins, although there were a few mutations in the amino acid sequences in some parts
of region 2 (positions 26,27,45,53,,68,70) and region 4 (positions 221,222,224,228,229) of
the σ32 proteins (shown in fig5.2a, 5.2 . The regio of the σ32 proteins in all the 24
different organisms was found to be comprised of 4 long helices interspersed with sheet
Chapter 5
82
regions. On the other hand, the region 4 of the σ32 proteins in all the 24 different
organisms was made up of 4 helices and one sheet joined together by loops and turn
regions. The figures 3a, 3b depicting the arrangements of the secondary structural
ele e ts i regio a d regio 4 of σ32 proteins were prepared using Geneious v5.3
(Drummond et al., 2010).
Figure: 5.2(a)
Figure: 5.2(b)
Figure 5.2 - (a) Secondary structure of region 2 (above) (b) Secondary structure of region
4(below) Both were developed in GeniousPro. Mauve, Ivory white, blue coloured regions
denote alpha helix, beta sheet, and turns respectively.
Chapter 5
83
5.1.2.4 PHYLOGENETIC RELATIONSHIPS OF σ32 BETWEEN DIFFERENT SPECIES
We used Multiple Sequence Alignment (MSA) to detect the sequence conservation /
variatio s i the σ32 proteins in all the 24 different organisms. In order to derive a
phylogenetic relationship between these proteins a phylogenetic tree of the 24 different
proteobacterial species was constructed [Fig. 5.3]. In the top branch of the tree
Salmonella enterica subspecies Enterica serovar Typhi strain CT18 and Shigella flexneri 2a
strain 301 clubbed together. With them Enterobacter cloacae SCF1 and Klebsiella
pneumoniae subspecies Pneumoniae MGH 78578 formed a sub group. Since these
proteobacterial species had similar biochemical functionalities, therefore their grouping
together was not surprising. The same trend followed throughout the tree. The bottom of
the tree had Photobacterium damselae subspecies damselae CIP 102761 and Tolumonas
auensis DSM 9187. The taxonomic details of Photobacterium damselae is Bacteria;
Proteobacteria; Gammaproteobacteria; Vibrionales; Vibrionaceae; Photobacterium
whereas the taxonomic details of Tolumonas auensis DSM 9187 is Bacteria;
Proteobacteria; Gammaproteobacteria; Aeromonadales; Aeromonadaceae; Tolumonas.
The Vibrionaceae family of Proteobacteria produces their own order. They are inhabitants
of fresh or salt water, several species are pathogenic (Fischer-Romero et al., 1996). It is
clearly evident that these two bacterial species are quite far from being related in terms
of taxonomy. This is reflected in the comparatively larger distance of separation between
them relative to the other proteobacterial species in the phylogenetic tree.
Chapter 5
84
Figure 5.3 - Phylogenetic tree drawn using method Neighbor Joining with
1000 bootstrap
Chapter 5
85
5.1.3 CONCLUSIONS:
σ -bound RNA polymerase involves in the bacterial heat-shock gene transcription. In our
study, the diversity in sequence, variations in the secondary structure and function
amongst the major functional regions of the proteo acterial σ fa ily of protei s, a d
their phylogenetic relationships have been established. Bacterial σ protei s ca e
subdivided into mainly three functional regions which are referred to as regions 2, 3, and
4. We have found a great deal of sequence conservation among the functional region 2 of
proteo acterial σ fa ily of proteins though some mutations are also present in these
regions while region 4 has comparatively more variable sequences. In this study, we want
to explore the effects of mutations in these regions. Our study suggests that the sequence
diversities due to natural mutations i the differe t regio s of proteo acterial σ fa ily
lead to different functions. I this ork e a alyzed the se ue ces of σ32 of 24 different
proteobacterial species. We used the NCBI refseq in our analysis. We observed that there
exists a great deal of sequence conservation in the regions 2 and 4 of the proteins though
there ere so e utatio s i regio s a d 4 as ell of σ32 proteins. The mutations in
the proteins confer different additional functionalities to the proteins. These
fu ctio alities ay help the σ32 protein to perform some additional functions other that
its usual function. There were no previous reports regarding the bioinformatic analysis of
the se ue ces of the σ32 proteins. This study also revealed the possible effects of
utatio s i the σ32 proteins. Our study would therefore be useful to elucidate the effects
of mutations in different amino acid sequence positions in this family of proteins. Since
the differe t utatio s ere respo si le for so e additio al fu ctio alities of the σ32
proteins, our study would therefore be the first of its kind to predict the possible
ioche ical roles of utatio s of σ32 proteins. This may shed light to identify the
molecular biochemistry of this family of proteins. So far, this study is the first
ioi for atic approach to ards the u dersta di g of the echa istic details of σ
family of proteins using the protein sequence information only. This study will help in
elucidating the unknown molecular mechanism and fu ctio alities of σ family of
proteins.
Chapter 5
86
5.2 PHYLOGENETIC ANALYSIS OF PROKARYOTIC PROTEASE FtsH
In a cell proteases are primarily required for the maintenance of cell homeostasis. They
act in protein quality control and are indispensable for the integrity and performance of a
cell. Most of the cytoplasmic membrane proteins or cytosolic proteins are assumed to be
stable under normal growth conditions whereas their abnormal, unfolded and non-
functional states are subjected to rapid degradation by proteases. Otherwise they
accumulate inside the cell and arrest cell growth. Escherichia coli cells contain several
ATP-dependent proteases, including ClpAP, ClpXP, HslUV, Lon and FtsH (Gottesman1996).
Among these FtsH is the only membrane bound protease (Akiyama and Ito 2003). It plays
crucial roles in the quality control of membrane proteins like YccA (Kihara and Akiyama
1999) by eliminating nonfunctional membrane proteins in prokaryotic organisms as well
as in the mitochondria and the chloroplasts of eukaryotic cells. It also degrades various
protei s like the heat shock sig a σ factor σ Ito a d Akiya a , To oyasu et al.
, λ phage tra scriptio factor cII hich plays a key role in lysogeny (Banuett et al.
1986), LpxC (Ito and Akiyama 2005, Tomoyasu et al. 1995), a fundamental enzyme that
co trols the a ou t of e ra e lipopolysaccharides, su u it α of F part of proto
ATPase(Ito and Akiyama 2005), uncomplexed SecY(Kihara et al.1999).
FtsH is a cytoplasmic membrane protein which can be subdivided into two major
segments, the N terminal and the C-terminal segment (Ito and Akiyama 2005). In case of
E. coli, the N-terminal segment (approximately 120 residues long) consists of a small
cytoplasmic region followed by two transmembrane regions and a periplasmic stretch.
The i teractio of FtsH ith HflKC co ple is attri uted to the periplas ic regio of FtsH
Kihara et al. . HflK a d HflC HflKC pair of e ra e proteins is known to be
required for proteolytic activities of FtsH against different substrates (Saikawa et al.
2004). The membrane-anchoring domain of FtsH is required for its oligomerization, hence
it is essential for its protease activity.
The C-terminal segment (approximately 525 amino acid long) consists of two
major domains, AAA+ ATPase domain and the metalloprotease domain (Accession No:
Chapter 5
87
sp_P0AAI3). The AAA+ protein modules are ubiquitously found in biological kingdoms and
known to be involved in a wide variety of cellular processes.
The ATPase domain is composed of the conserved Walker A, having the GXXGXGK (T/S)
box (Hanson and Whiteheart 2005) and WalkerB motifs which are essential for nucleotide
binding and hydrolysis. Apart from these two motifs this domain has another region of
homology (SRH), which carries conserved arginine residues argi i e fi gers hich
assists in oligomerization and nucleotide hydrolysis. The conserved FGV pore motif helps
in the substrate recognition and its translocation. The protease domain shows the zinc
binding HEXXH finger print and a coiled-coil leucine zipper sequence (Bieniossek et al.
2009).
Crystal structure of the C-terminal domain shows it forms a hexameric ring like
assembly (Bieniossek et al. 2006, Suno et al. 2006). The crystal structure of the ADP state
of the constructs of FtsH, spanning most of the cytosolic region, thus containing AAA and
protease domains from Thermotogamaritima, showed a hexameric assembly with the
protease region having a 6-foldsymmetry and a AAA ring having 2 –fold symmetry
(Bieniossek et al.2009). The 2.6Å resolution structure of the cytosolic region of apo-FtsH
however reveals a new arrangement where the entire ATPase region has perfect 6-fold
symmetry and the crucial pore residues outline an open circular entrance (Bieniossek et
al.2009).This conformational change in the ADP bound form compared to the apo
structure generates a model of substrate translocation by AAA_ proteases (Bieniossek et
al.2009). In addition to ATP hydrolysis, proton motive force (PMF) has been shown to
stimulate proteolytic activity of membrane-bound FtsH (Akiyama2002).
In this report we aim to draw phylogenetic relationships of the FtsH from 109 different
organisms and study their amino acid sequence diversity, length variation of conserved
region together with the structural diversity between different clusters of the
phylogenetic tree. Our results show that the FtsH proteins have mainly three domains
arising out of different mutations. The presence of the domains correlates well with the
habitat-distributions of the organisms taken into considerations. So far there were no
Chapter 5
88
previous reports that analyze the effects of the mutations on the FtsH proteins. This work
is therefore the first of its kind that shed light in the yet to be identified evolutionary
history of this family of proteins. Thus, the sequence analysis and hence the evolutionary
history of FtsH and its homologues may provide clues as to which activities are shared by
members of this family and may reveal the basis of their diverse cellular functions.
5.2.1 METHODS
5.2.1.1 SEQUENCE HOMOLOGY SEARCH AND PAIR-WISE ALIGNMENT OF
SEQUENCES
Uniprot database (UniProt Consortium 2013) which specifically provides a
comprehensive, integrated, non-redundant, well-annotated and properly reviewed set of
sequences was used to analyze the sequences of FtsH proteins. Initially amino acid
sequences of about 250 FtsH proteins were extracted. Out of these sequences, 109 FtsH
sequences were chosen for our study. During the sequence file preparation we removed
all fragmented sequences, having gene name other than FtsH to eliminate the data
redundancies and finally took one representative sequence among the multiple entries
for the same organism where they have the same amino acid sequence compositions. The
accession numbers of the proteins were tabulated in Table 5.2. Since sequence file
preparation is the most fundamental step for sequence based analysis these sequences
were used as inputs to run psi-BLAST (Altschul et al. 1990), using the default parameters,
in order to find out the homologous sequences including all possible of sequence
information. This step can be considered as a double check of our initial result
downloading the sequences.
Chapter 5
89
SL.
No.
NAME ACCESSION
NUMBER
SL. No. NAME ACCESSION
NUMBER
1. Acaryochloris marina Sp A8ZNZ4 56. Pelodictyon luteolum Sp Q3B6R3
2. Acholeplasma laidlawii Sp A9NE17 57. Protochlamydia amoebophila Sp Q6MDI5
3. Acidimicrobium
ferrooxidans
Sp C7M0M0 58. Pelobacter carbinolicus Sp Q3A579
4. Acidobacterium capsulatum Sp C1F8X6 59. Photobacterium profundum Sp Q6LUJ8
5. Acidothermus cellulolyticus Sp A0LR74 60. Parabacteroides distasonis Sp A6LD25
6. Akkermansia muciniphila Sp B2UMY1 61. Pirellula staleyi Sp D2QZ34
7. Alkaliphilus metalliredigens Sp A6TSZ1 62. Magnetococcus sp. Sp.A0L4S0
8. Anaeromyxobacter
dehalogenans
Sp B8J992 63. Propionibacterium acnes Sp D4HA34
9. Aquifex aeolicus Sp O67077 64. Porphyra purpurea Sp P51327
10. Arabidopsis thaliana Sp Q39102 65. Marinobacter aquaeolei Sp A1TZE0
11. Atopobium parvulum Sp C8W731 66. Ralstoniametallidurans Sp Q1LLA9
12. Bacillus subtilis Sp P37476 67. Pyropia yezoensis Sp Q1XDF9
13. Bartonella bacilliformis Sp A1URA3 68. Pseudomonas putida Sp A5W382
14. Bdellovibrio bacteriovorus Sp Q6MLS7 69. Medicago sativa Sp Q9BAE0
15. Borrelia burgdorferi Sp B7J0N5 70. Rickettsia bellii Sp Q1RGP0
16. Brachybacterium faecium Sp C7MC16 71. Rhodococcus erythropolis Sp C0ZPK5
17. Buchnera aphidicola Sp Q8K9G8 72. Rhodothermus marinus Sp D0MGU8
18. Burkholderia pseudomallei SpQ3JMH0 73. Salmonella typhi Sp P63344
19. Caldicellulosiruptor bescii Sp B9MPK5 74. Staphylococcus haemolyticus Sp Q4L3G8
20. Caulobacter crescentus Sp B8H444 75. Rubrobacter xylanophilus Sp Q1AV13
21. Chlamydia trachomatis Sp B0B970 76. Slackia heliotrinireducens Sp C7N1I1
22. Chloroflexus aggregans Sp B8G4Q6 77. Rothia mucilaginosa Sp D2NQQ7
23. Clostridium sp. Sp A0PXM8 78. Shigella flexneri Sp P0AAI4
24. Conexibacter woesei Sp D3F124 79. Mesoplasma florum Sp Q6F0E5
25. Corynebacterium
glutamicum
Sp Q6M2F0 80. Streptobacillus moniliformis Sp D1AXT4
26. Cyanidioschyzon merolae Sp Q9TJ83 81. Streptococcus pneumoniae Sp O69076
27. Cyanidium caldarium Sp O19922 82. Sulcia muelleri Sp D5D8E3
28. Ectocarpus siliculosus Sp D1J722 83. Synechococcus sp Sp Q2JNP0
29. Escherichia coli-K12 Sp P0AAI3 84. Syntrophus aciditrop
hicus
Sp Q2LUQ1
30. Guillardia theta Sp O78516 85. Syntrophobacter fumaroxidans Sp A0LN68
31. Haemophilus influenzae Sp P71377 86. Sulfurovum sp. Sp A6QBN8
32. Hahella chejuensis Sp Q2SF13 87. Thermotoga lettingae Sp A8F7F7
33. Haliangium ochraceum Sp D0LWB8 88. Thermobaculum terrenum Sp D1CDT8
34. Halothermothrix orenii Sp B8D065 89. Thermus thermophilus Sp Q5SI82
35. Helicobacter pylori Sp P71408 90. Nicotiana tabacum Sp O82150
36. Heterosigma akashiwo Sp B2XTF7 91. Treponema pallidum Sp O83746
37. Hydrogenobaculum sp. Sp B4U7U4 92. Trichodesmium erythraeum Sp Q10ZF7
38. Kosmotoga olearia Sp C5CES8 93. Zymomonas mobilis Sp C8WEG0
39. Lactobacillus plantarum Sp C6VKW6 94. Ureaplasma parvum Sp B1AI94
40. Lactococcus lactis Sp P46469 95. Tropheryma whipplei Sp Q83FV7
41. Leptospira borgpetersenii Sp Q04Q03 96. Vaucheria litorea Sp B7T1V0
42. Leptotrichia buccalis Sp C7N914 97. Veillonella parvula Sp D1BLD0
43. Leuconostoc mesenteroides Sp Q03Z46 98. Methylibium petroleiphilum Sp A2SHH9
44. Methylacidiphilum
infernorum
Sp B3DV46 99. Mycoplasma arthritidis Sp B3PNH3
45. Oryza sativa Sp Q5Z974 100. Myxococcus xanthus Sp Q1D491
46. Petrotoga mobilis Sp A9BFL9 101. Methylococcus capsulatus Sp Q60AK1
47. Phytoplasma mali Sp B3R057 102. Mycobacterium tuberculosis Sp A5QU8T5
48. Rhodopirellula baltica Sp Q7UUZ7 103. Natranaerobius thermophilus Sp B2A3Q4
49. Salinibacter ruber Sp D5H7Z5 104. Nostoc sp. Sp Q8YMZ8
50. Sorangium cellulosum Sp A9GRC9 105. Nitrosococcus oceani Sp Q3JEE4
51. Sphaerobacter thermophilus Sp D1C1U7 106. Neorickettsia risticii Sp C6V4R9
52. Symbiobacterium
thermophilum
Sp Q67LC0 107. Odontella sinensis Sp P49825
53. Synechocystis sp. Sp P73179 108. Oenococcus oeni Sp Q83XX3
54. Thermoanaerobacter sp. Sp B0K5A3 109. Opitutus terrae Sp B1ZMG6
55. Thermomicrobium roseum Sp B9KXV3
Table 5.2 List of the species and their accession numbers
Chapter 5
90
5.2.1.2 MULTIPLE SEQUENCE ALIGNMENT (MSA)
Multiple protein sequences were aligned using Clustal W associated with MEGA (Tamura
et al. 2007). Pairwise alignment parameters were set as: slow/accurate alignment; gap
opening penalty 10; gap extension penalty 0.10; protein weight matrix PAM. Multiple
alignment parameters were set as: gap opening penalty 10; gap extension penalty 0.2;
gap separation distance 4, delay divergent sequences 30%. The MSA, so generated,
showed the presence of mutations in the sequences [Figure 5.4].
Figure: 5.4 Multiple sequence alignmentsof the FtsH proteins. The alignment
shows the presence of mutations; Consensus pattern of the alignment. It reflects
the conserved pattern observed in the AAA domain.
Chapter 5
91
5.2.1.3 PREDICTION OF SECONDARY STRUCTURE
The secondary structures of the FtsH family of proteins were predicted from their amino
acid sequences using psi PRED (Buchan et al. 2013; Jones 1999) and GOR4 (Garnier et al.
1996). The secondary structures were classified as helix, sheet and random coil.
5.2.1.4 PROTEIN FUNCTION PREDICTION
The domain architecture and functions of different parts of the FtsH family of proteins
were predicted from the outputs of Pfam (Bateman et al. 2004). Pfam gives the results on
the basis of an MSA of closely related homologous proteins. The amino acid sequences of
the proteins were incorporated one by one into Pfam sequence search and the results
were analyzed.
5.2.1.5 HOMOLOGY MODELING AND STRUCTURAL ALIGNMENT
Homology model of eight species (Slackia heliotrinireducens, Staphylococcus
haemolyticus, Acidobacterium capsulatum, Thermus thermophilus, Cyanidioschyzon
merolae, Syntrophus aciditrophicus, Atopobium parvulum, Phytoplasma mali from group
1, 2, 3, 4, 5, 6, 7, 8and denoted as SH1, SH2, AC, TT, CM, SA, AP, PM respectively) taken
from every group was build using Swiss model work space usingA chain of 2CE7 as
template structure (Bieniossek et al. 2006) which was searched by BLAST (using PDB
database). After getting the models we checked them in Procheck (Laskowski et al 1993),
verify 3D (Lüthy et al. 1992), ERRAT (Colovos and Yeates 1993). Three models passed
these checks successfully. Remaining five were refined either by using loop refinement
soft are ModBase Pieper et al. 4 or y i i izatio step usi g “teepest Desce d
algorith a d CHA‘M Brooks et al. polar H force field i Discovery “tudio .
Chapter 5
92
suite. Finally all the refined structures were aligned with each other usi g alig
structure protocol of Discovery “tudio . suite.
5.2.1.6 CODON USAGE DETERMINATION
Codon usage patterns of the mutated amino acids in the amino acid sequences of FtsH
proteins from eight species as mentioned in the section 5.2.1.5 were determined using
default parameters of OPTIMIZER (Puigbo et al. 2007) web server.
5.2.1.7 DISTANCE MATRIX CALCULATION AND CONSTRUCTION OF
PHYLOGENETIC TREE
A distance matrix was generated using MEGA (Tamura et al. 2007) and a phylogenetic
tree was built using the Neighbour Joining (NJ) method with 1000 bootstrap values,
Poison model and keeping uniform rate.
5.2.2 RESULT & DISCUSSION
5.2.2.1 SEQUENCE AND FUNCTIONAL DIVERSITY DUE TO MUTATIONS
Pfam results suggest some resemblance with few additional functional domains other
than the FtsH N-terminal domain, AAA domain, and Peptidase M 41 major domains. 201-
277 amino acids of Vaucheria litorea shows similarity with iron-containing alcohol
dehydrogenase, 566-612 amino acids of Thermotoga lettingae with fumarase C, 111-158
of Opitutus terrae with cytochrome B561, 310-412 of Alkaliphilus metallirudigens with
purine nucleoside permease, 511-635 of Autophobium pervulum with choline kinase and
in addition to these there are other few sequences which show some correspondence
with few uncharacterized domains.
Chapter 5
93
5.2.2.2 STRUCTURAL ANALYSIS
5.2.2.2.1 SECONDARY STRUCTURE ANALYSIS
Typical pattern of secondary structure of FtsH shown in the Figure 5.5 using Geneious Pro
[Drummond et al. 2010].We need to mention here that total secondary structural analysis
reveals that secondary structure in the AAA domain is mainly conserved but in other
regions there are some variations. If we consider the analysis in details using psi PRED and
Gor4 [Table 5.3] it has been found that proportion of alpha helix ranges from
approximately 35% to 54%, beta sheet 7% to 21%, and random coil 11% to 52% of the
total stretch of protein [Figure 5.5].
Figure: 5.5 Distribution of the predicted secondary structural patterns of the FtsH
proteins. Helical regions are presented as cylinders (violet). Sheet regions are
presented with a arrowhead. The rest of the parts are loops.
Chapter 5
94
STRAIN ALPHA STRAND RANDOM
COIL
STRAIN ALPHA STRAND RANDOM
COIL
Acaryochloris
marina
43.97%
13.59%
42.44%
Nicotiana tabacum 42.58% 11.20% 46.22%
Acholeplasma
laidlawii
44.15%
14.51%
41.34%
Nitrosococcus
oceani
49.14% 13.62% 37.25%
Acidimicrobium
ferrooxidans
36.21%
17.12%
46.67%
Nostoc sp. 49.70% 11.28% 39.02%
Acidobacterium
capsulatum
45.38%
12.52%
42.10%
Odontella sinensis 38.35% 15.84% 45.81%
Acidothermus
cellulolyticus
43.24%
10.21%
46.55%
Oenococcus oeni 47.83% 12.87% 39.30%
Akkermansia
muciniphila
37.81%
17.61%
44.58%
Opitutus terrae 52.93% 11.73% 35.34%
Alkaliphilus
metalliredigens
45.25%
15.59%
39.15% Oryza sativa
42.42%
12.54%
45.04%
Anaeromyxobacter
dehalogenans
42.92%
13.74%
43.34%
Parabacteroides
distasonis
45.32% 14.77% 39.91%
Aquifex aeolicus
44.64%
15.77%
39.59%
Pelobacter
carbinolicus
46.28% 14.09% 39.63%
Table 5.3 Distribution patterns of secondary structural elements.
Chapter 5
95
Arabidopsis
thaliana
36.45%
14.53%
49.02%
Pelodictyon
luteolum
38.10% 16.29% 45.61%
Atopobium
parvulum
45.65%
10.56%
43.79%
Petrotoga mobilis
49.92%
15.47%
34.61%
Bacillus subtilis
40.19%
16.95%
42.86%
Photobacterium
profundum
45.40% 15.80% 38.79%
Bartonella
bacilliformis
48.56%
11.52%
39.92%
Phytoplasma mali
48.00%
12.50%
39.50%
Bdellovibrio
bacteriovorus
48.99% 9.61%
41.40%
Pirellula staleyi 45.14% 11.14% 43.71%
Borrelia
burgdorferi
42.88%
17.68%
39.44%
Porphyra purpurea 45.70% 11.15% 43.15%
Brachybacterium
faecium
46.31%
10.65%
43.04%
Propionibacterium
acnes
38.63% 11.58% 49.79%
Buchnera
aphidicola
40.46%
21.21%
38.34%
Protochlamydia
amoebophila
52.95% 10.59% 36.46%
Burkholderia
pseudomallei
47.15%
10.96%
41.89%
Pseudomonas putida 53.66% 8.46% 37.89%
Caldicellulosirupto
r bescii
46.92%
14.29%
38.80%
Pyropia yezoensis 42.04% 12.58% 45.38%
Caulobacter
crescentus
49.84%
12.62%
37.54%
Ralstonia
metallidurans
40.22% 18.64% 41.14%
Chlamydia
trachomatis
48.41%
11.17% 40.42% Rhodococcus
erythropolis
36.53% 10.66% 52.81%
Chapter 5
96
Chloroflexus
aggregans
50.46%
8.99%
40.55%
Rhodopirellula
baltica
43.15%
15.62%
41.22%
Clostridium sp.
43.93%
16.72%
39.35%
Rhodothermus
marinus
46.34% 13.34% 40.32%
Conexibacter
woesei
39.97%
15.77%
44.26%
Rickettsia bellii 39.03% 19.12% 41.85%
Corynebacterium
glutamicum
45.02%
7.50%
47.48%
Rothia mucilaginosa 44.58% 10.32% 45.11%
Cyanidioschyzon
merolae
46.93%
16.58%
36.48%
Rubrobacter
xylanophilus
45.78% 11.98% 42.24%
Cyanidium
caldarium
46.58%
15.64%
37.79%
Salinibacter ruber
41.40%
14.43%
44.17%
Ectocarpus
siliculosus
40.85%
15.28%
43.87%
Salmonella typhi 45.03% 16.30% 38.66%
Escherichia coli-
K12
43.94%
15.68%
40.37%
Shigella flexneri 43.94% 15.68% 40.37%
Guillardia theta
43.26%
14.74%
42.00%
Slackia
heliotrinireducens
43.30% 10.73% 45.98%
Haemophilus
influenzae
35.59%
22.20%
42.20%
Sorangium
cellulosum
60.10%
7.11%
32.79%
Hahella chejuensis
45.40%
10.66%
43.94%
Sphaerobacter
thermophilus
43.49%
13.94%
42.57%
Chapter 5
97
Haliangium
ochraceum
47.51%
12.32%
40.18%
Staphylococcus
haemolyticus
38.93% 13.62% 47.46%
Halothermothrix
orenii
46.83%
10.63%
42.54%
Streptobacillus
moniliformis
52.71% 10.25% 37.04%
Helicobacter pylori
47.15%
11.39%
41.46%
Streptococcus
pneumoniae
44.33% 15.64% 40.03%
Heterosigma
akashiwo
42.53%
12.52%
44.95%
Sulcia muelleri 43.78% 14.38% 41.84%
Hydrogenobaculu
m sp.
41.82%
16.67%
41.51%
Sulfurovum sp 50.67% 12.82% 36.51%
Kosmotoga olearia
39.84%
15.35%
44.81%
Symbiobacterium
thermophilum
61.62%
7.41%
30.98%
Lactobacillus
plantarum
44.30%
14.50%
41.21%
Synechococcus sp. 50.63% 10.97% 38.40%
Lactococcus lactis
35.11%
21.15%
43.74%
Synechocystis sp.
46.17%
11.58%
42.26%
Leptospira
borgpetersenii
41.26%
16.41%
42.33%
Syntrophobacter
fumaroxidans
44.51% 14.06% 41.42%
Leptotrichia
buccalis
46.09%
9.24%
44.66%
Syntrophus
aciditrophicus
47.83% 9.78% 42.39%
Leuconostoc
mesenteroides
48.57%
12.00%
39.43%
Thermoanaerobacter
sp.
42.06%
14.08%
43.86%
Magnetococcus sp. 45.47% 13.67% 40.86% Thermobaculum
terrenum
44.43% 13.31% 42.26%
Chapter 5
98
Marinobacter
aquaeolei
51.03% 8.06% 40.92%
Thermomicrobium
roseum
43.40%
13.96%
42.64%
Medicago sativa 40.79% 11.33% 47.88%
Thermotoga
lettingae
46.33% 15.97% 37.70%
Mesoplasma
florum
50.15% 15.23% 34.62%
Thermus
thermophilus
46.47% 12.66% 40.87%
Methylacidiphilum
infernorum
45.75%
13.36%
40.88%
Treponema pallidum 53.69% 11.82% 34.48%
Methylibium
petroleiphilum
51.70% 12.32% 35.98% Trichodesmium
erythraeum
46.63% 13.04% 40.33%
Methylococcus
capsulatus
51.96% 12.24% 35.79%
Tropheryma
whipplei
46.70% 12.76% 40.54%
Mycobacterium
tuberculosis
37.63% 13.95% 48.42% Ureaplasma parvum 51.73% 11.10% 37.17%
Mycoplasma
arthritidis
41.67% 15.19% 43.15%
Vaucheria litorea 40.22% 14.44% 45.34%
Myxococcus
xanthus
38.71% 17.87% 43.42%
Veillonella parvula 44.08% 13.71% 42.21%
Natranaerobius
thermophilus
44.44% 12.55% 43.00% Zymomonas mobilis 41.24% 13.75% 45.02%
Neorickettsia
risticii
45.28% 15.88% 38.84%
Chapter 5
99
5.2.2.2.2 TERTIARY STRUCTURE ANALYSIS
Results of structural alignments (in RMSD) [Table 5.4] of these proteins revealed that
though there are differences in some aspects but their overall conformation remains
almost same which in-turn implies their functional conservation.
SH1 SH2 AC TT CM SA AP PM
SH1 - 0.428 0.408 0.426 0.486 0.428 0.408 0.327
SH2 0.428 - 0.479 0.403 0.514 0.459 0.322 0.365
AC 0.408 0.479 - 0.421 0.298 0.434 0.428 0.271
TT 0.426 0.403 0.421 - 0.466 0.403 0.480 0.396
CM 0.486 0.514 0.298 0.466 - 0.509 0.488 0.443
SA 0.428 0.459 0.434 0.403 0.509 - 0.445 0.286
AP 0.408 0.322 0.428 0.480 0.488 0.445 - 0.405
PM 0.327 0.365 0.271 0.396 0.443 0.286 0.405 -
Table 5.4 Deviations in three dimensional structures measured in RMSD
Chapter 5
100
5.2.2.3 CODON USAGE ANALYSIS
A high codon usage value indicated a higher rate of appearance of that particular amino
acid obtained from the codon. This would give an insight into why that particular amino
acid is favored at that position. Codon usage analysis revealed that it varies widely among
different species (each case numerical values indicated the number of repetitions). The
range of variations of the codons representing the amino acids is presented in the
following section. The codon usage value was presented for only the organism with a high
codon usage value for a particular amino acid [Table 5.5].
Figure 5.6 Distribution of %GC content in the different organisms. for SH1,
SH2, AC, TT, CM, SA, AP, PM denoted as Slackia heliotrinireducens,
Staphylococcus haemolyticus, Acidobacterium capsulatum, Thermus
thermophilus, Cyanidioschyzon merolae, Syntrophus aciditrophicus,
Atopobium parvulum, Phytoplasma mali from group 1, 2, 3, 4, 5, 6, 7, 8
respectively.
Chapter 5
101
for TTA (L) Thermus thermophilus (65), Atopobium parvulum (54); for GCT (A)
Phytoplasma mali (42) , Syntrophus aciditrophicus (83); for GGT (G) Phytoplasma mali
(36), Syntrophus aciditrophicus (75); for GAA (E) Phytoplasma mali (42), Slackia
heliotrinireducens (76); for ATT (I) Thermus thermophilus (22), Phytoplasma mali (54); for
GAT (D) Cyanidioschyzon merolae (29), Staphylococcus haemolyticus (58); for TTC (F) CAC
(H) and ATG (M) there are no major change in the overall distribution observed; for AAA
(K) Thermus thermophilus (27), Phytoplasma mali (61); for AAC (N) Thermus
thermophilus(11), Atopobium parvulum (33); for CCA (P) Phytoplasma mali (15) Slackia
heliotrinireducens (47); for CAA (Q) Thermus thermophilus (13), Slackia heliotrinireducens
(37); for CGT (R) Thermus thermophilus (58) Phytoplasma mali (24); for TCT (S) Thermus
thermophilus (19) have lower Atopobium parvulum and Slackia heliotrinireducens (41)
have similar higher values; for ACT (T) Slackia heliotrinireducens (50) Staphylococcus
haemolyticus (25); for GTT (V) Phytoplasma mali (34) Acidobacterium capsulatum (55).
TAT (Y) TGT (C) and TGG (W) appeared at lowest frequency in the usage pattern but TGG
(W) is entirely absent for Atopobium parvulum , Staphylococcus haemolyticus and TGT (C)
for Syntrophus aciditrophicus.
Variation in this respect reflected in their positions in the phylogenetic tree and total %GC
content. From Fig: 5.6 it was evident that though major strains had very much the equal
proportion of GC population but for the case of Thermus thermophilus as expected for
the thermo-tolerant strains the %GC content was much higher and on the other hand
Phytoplasma mali a plant pathogen had lowest %GC values.
Chapter 5
102
GCT TGT GAT GAA TTC GGT CAC ATT AAA TTA
SH1 77 3 43 76 26 57 12 35 37 59
SH2 43 3 58 63 29 67 14 48 54 55
AC 58 4 37 46 27 56 16 33 35 57
TT 78 1 33 58 26 55 11 22 27 65
CM 63 3 29 41 20 46 7 42 29 61
SA 83 - 45 56 28 75 11 43 34 64
AP 57 4 36 53 27 56 10 44 38 54
PM 42 13 32 42 33 36 13 54 61 62
ATG AAC CCA CAA CGT TCT ACT GTT TGG TAT
SH1 29 36 47 37 42 41 50 58 2 16
SH2 19 38 26 36 45 40 25 47 - 17
AC 21 21 27 29 42 33 29 55 5 8
TT 14 11 34 13 58 19 30 54 5 10
Table 5.5 Codon usage patterns of the mutated amino acids in the
different organisms
Chapter 5
103
CM 25 25 21 26 36 31 29 52 12 5
SA 17 14 41 27 55 35 35 53 5 14
AP 19 33 21 26 36 31 38 41 - 11
PM 16 25 15 23 24 37 34 34 1 13
5.2.2.4 PHYLOGENETIC RELATIONSHIP
In the sequence set there were sequences of FtsH proteins of various species. For
establishing the mutual relationship between these species we constructed a phyogenetic
tree (shown in figure: 5.7) using Neighbour Joining method. To analyze the tree we first
divided the tree into eight major groups and group 6 is then subdivided into two
subgroups 6A, 6B. Our search to find out the reason behind the diversified pattern of
these species yielded some surprising facts. Extremophiles are those which can live in
physically and geochemically extreme conditions, were mainly present at group 4. Some
thermophilic strains that can grow in hot spring are also located at group 6 such as
Thermus thermophilus, Symbiobacterium thermophilum, Thermobaculum terrenum,
Marinobactor aquaeolei etc. Pathogenic i.e., disease causing harmful strains were present
in group 3, last portion of 4 and 7 i.e, Shigella flexneri, Salmonella typhi, Bertonella
bacilliformis, Chlamydia trachomatis etc. Marine microbes.
Chapter 5
104
5.2.3 CONCLUSIONS:
Cellular proteases maintain the cell homeostasis. Among the proteases, FtsH is a
membrane bound protease and is basically involved in the quality controlling of
membrane bound proteins. FtsH proteins are widely distributed among different
organisms with several mutations. In the present work we have tried to analyze the
effects of those mutations and to correlate the mutations with the habitat of the
organisms. We also analyzed the changes in the patterns of secondary structures in the
FtsH proteins due to mutations. Finally we constructed a phylogenetic tree to detect the
effects of the mutations in the evolutionary pathway of the FtsH proteins. This report is
the first of its kind. So our study would therefore be beneficial to determine the probable
modes of activities of the FtsH proteins.
Figure 5.7 Phylogenetic trees of the various species having the FtsH
domains. The tree can be divided into various categories.
Top Related