Skip Navigation


Bioinformatics Advance Access originally published online on June 28, 2007
Bioinformatics 2007 23(17):2231-2238; doi:10.1093/bioinformatics/btm345
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary data
Right arrow All Versions of this Article:
23/17/2231    most recent
btm345v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (7)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Glyakina, A. V.
Right arrow Articles by Galzitskaya, O. V.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Glyakina, A. V.
Right arrow Articles by Galzitskaya, O. V.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Different packing of external residues can explain differences in the thermostability of proteins from thermophilic and mesophilic organisms

Anna V. Glyakina 1, Sergiy O. Garbuzynskiy 2, Michail Yu. Lobanov 2 and Oxana V. Galzitskaya 2,*

1Institute of Mathematical Problems of Biology and 2Institute of Protein Research, Russian Academy of Sciences, Pushchino, Moscow Region 142290, Russia

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 ACKNOWLEDGEMENTS
 REFERENCES
 

Motivation: Understanding the basis of protein stability in thermophilic organisms raises a general question: what structural properties of proteins are responsible for the higher thermostability of proteins from thermophilic organisms compared to proteins from mesophilic organisms?

Results: A unique database of 373 structurally well-aligned protein pairs from thermophilic and mesophilic organisms is constructed. Comparison of proteins from thermophilic and mesophilic organisms has shown that the external, water-accessible residues of the first group are more closely packed than those of the second. Packing of interior parts of proteins (residues inaccessible to water molecules) is the same in both cases. The analysis of amino acid composition of external residues of proteins from thermophilic organisms revealed an increased fraction of such amino acids as Lys, Arg and Glu, and a decreased fraction of Ala, Asp, Asn, Gln, Thr, Ser and His. Our theoretical investigation of folding/unfolding behavior confirms the experimental observations that the interactions that differ in thermophilic and mesophilic proteins form only after the passing of the transition state during folding. Thus, different packing of external residues can explain differences in thermostability of proteins from thermophilic and mesophilic organisms.

Availability: The database of 373 structurally well-aligned protein pairs is available at http://phys.protres.ru/resources/termo_meso_base.html

Contact: ogalzit{at}vega.protres.ru

Supplementary information: Supplementary data are available at Bioinformatics online.


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 ACKNOWLEDGEMENTS
 REFERENCES
 
The importance of the various factors that contribute to a protein's thermostability is the subject of intense study. Hydrogen bonding and the hydrophobic effect are the major stabilizing forces, but their relative contributions are still debated (Dill, 1990; Honig, 1999). In relation to this uncertainty, a question of great interest is how proteins from thermophilic organisms retain their structure and function at high temperature.

The study of protein stability can help explore the sequence-structure stability relationship (Kumar et al., 2001; Olofsson et al., 2007; Perl et al., 1998; Razvi and Scholtz, 2006; Schuler et al., 2002). It has been shown that no correlations exist between a protein's melting temperature and parameters, such as change in heat capacity, change in accessible surface area upon folding and number of residues in the protein (Kumar et al., 2001). Experimental data to show this was obtained from studies that included nine proteins from thermophiles and 10 proteins from mesophiles, all of which show reversible two-state folding/unfolding kinetics. However, the change in heat capacity correlates both to the difference in surface area accessible to the solvent between unfolded and native states and to the number of residues in the protein (Kumar et al., 2001). The authors drew the conclusion that higher thermostability was achieved by specific interactions, particularly electrostatic ones, which is supported by an increased enthalpy change at the melting temperature (Kumar et al., 2001).

In a recent review, thermodynamic data from 26 homologous proteins obtained from thermophiles and mesophiles are compared (Razvi and Scholtz, 2006). The authors made an attempt to classify the proteins based on three different (Nojima et al., 1977) approaches to modulation of the protein stability curve in order to achieve higher thermostability. These three approaches are: (1) increasing the value of {Delta}H without compensating for changes in {Delta}S, which will result in a similar stability curve, but with higher {Delta}G values at all temperatures; (2) reducing {Delta}Cp, which results in a broadened stability curve {Delta}Cp and (3) lowering the change in entropy for the folding transition, which increases the temperature of maximum stability. All three ways of achieving higher thermostability have been observed in nature, sometimes independently and in other cases in combination. Moreover, most proteins use different combinations of these three general approaches, and so a simple classification of proteins into three separate groups was not possible (Razvi and Scholtz, 2006).

It has been shown that the folding rates of cold shock proteins from Thermotoga maritima (thermophilic organism) and Bacillus subtilis (mesophilic organism) are similar, while the unfolding rate of the thermophilic protein is two orders lower than that of its mesophilic homologue (Perl et al., 1998; Schuler et al., 2002). Thus, distinctions in stability arise from distinctions in the unfolding rate constants of thermophilic and mesophilic proteins. For thermophilic proteins, the activation barrier of unfolding increases (Schuler et al., 2002). Stabilization of the native state by the mutation of amino acid residues situated on the protein surface is one possible explanation for this phenomenon. These mutations lead to a large number of enthalpic interactions that form only after overcoming the free-energy barrier in the course of folding (Schuler et al., 2002).

Recently, a similar result has been demonstrated for the folding of S6 ribosomal proteins from the thermophilic bacterium Thermus thermophilus and from the hyperthermophilic bacterium Aquifex aeolicus (Olofsson et al., 2007). The folding rate constants for these proteins are identical, but the unfolding rate for the protein from the hyperthermophilic organism is by an order of magnitude slower than that for the protein from the thermophilic organism (Olofsson et al., 2007).

Along with experimental works, theoretical studies of sequences and 3D structures of proteins provide another way to find correlations between structural characteristics and stabilizing forces (Berezovsky and Shakhnovich, 2005; Berezovsky et al., 2005; Fukuchi and Nishikawa, 2001; Liang et al., 2005; Robinson-Rechavi et al., 2006; Zeldovich et al., 2007).

Two physical mechanisms for the increase of protein thermostability are suggested (Berezovsky and Shakhnovich, 2005). One of these mechanisms relates to structural factors (homologous thermophilic and mesophilic proteins have different structures, and the compactness of thermophilic protein is greater), and the other is concerned with essential modifications of amino acid sequences of proteins (homologous thermophilic and mesophilic proteins have similar structures, but different sequences). It was shown that the mechanism chosen to increase stability depends on the evolutionary history of an organism (Berezovsky and Shakhnovich, 2005). The proteins from those organisms that originated in an extremely hot environment are more compact, and their stability increase is provided by structural factors. In contrast, those organisms that evolved as mesophiles but later recolonized in a hot environment have proteins that were thermostabilized by the modification of amino acid sequences (Berezovsky and Shakhnovich, 2005).

Since amino acid composition plays an important role in thermostabilization, the sequences of thermophilic and mesophilic proteins have been studied (Berezovsky et al., 2005; Fukuchi and Nishikawa, 2001; Zeldovich et al., 2007). In the first case, the amino acid composition of the surface, interior and entire amino acid chain of 279 proteins from thermophilic and mesophilic bacteria with known spatial structures were analyzed, which demonstrated that polar residues are scarce and charged residues are abundant in thermophilic proteins (Fukuchi and Nishikawa, 2001). This difference is most apparent from the surface composition rather than that of the interior (Fukuchi and Nishikawa, 2001). Different sets of scarce and abundant amino acid types have been obtained for different databases (Kumar et al., 2000; Szilagyi and Zavodszky, 2000).

A set of amino acid residues whose total fractions in the proteomes are correlated with optimal growth temperatures has been recently found (Ile, Val, Tyr, Trp, Arg, Glu and Leu) (Zeldovich et al., 2007).

An interesting result was obtained upon a theoretical investigation of the mesophilic and thermophilic hydrolase H proteins from Escherichia coli (mesophilic organism) and T.thermophilus (thermophilic) (Berezovsky et al., 2005). Although Lys and Arg are chemically similar amino acid residues, it was shown that Lys has a much greater number of accessible rotamers in the folded state than Arg has. To demonstrate the stabilizing role of lysine residues, the authors replaced Arg with Lys (in silico) and analyzed the unfolding simulations. These simulations have shown that the modified structure was more stable. Consequently, Lys (in contrast to Arg) stabilizes the native state of the protein, preferentially entropically (Berezovsky et al., 2005).

Analysis of a database consisting of 94 homologous pairs of protein structures from thermophilic and mesophilic organisms demonstrates that thermophilic proteins are on average shorter than mesophilic proteins, and that thermophilic proteins contain a larger number of salt bridges per residue (0.0401 versus 0.0322, p = ~10–4), a larger number of hydrogen bonds per residue (1.42 versus 1.40, p = ~10–2), a smaller fraction of amino acid residues that are not involved in regular secondary structure (0.370 versus 0.382, p = ~10–2), and a smaller ratio of the accessible surface area to the surface of a sphere of the same volume as the protein (2.38 versus 2.43, p = ~10–4) (Robinson-Rechavi et al., 2006). (p is the probability that the two distributions have the same average values, as calculated by Student's paired t-test. If this probability approaches zero, then these two average values are different, and if it approaches unity, they are identical.)

Although it has been shown that {pi}-cationic interactions (i.e. interactions between an aromatic side chain (Phe, Tyr or Trp) and a cationic side chain (Lys or Arg) (Gallivan and Dougherty, 1999) play an important role in increasing the thermostability of proteins (Chakravarty and Varadarajan, 2002), the difference in this parameter between thermophiles and mesophiles is not reliable (0.00984 for thermophiles versus 0.00749 for mesophiles, p = ~10–2) (Robinson-Rechavi et al., 2006). There also is not a high correlation between the number of salt bridges and {pi}-cationic interaction (correlation coefficient: 0.21). The authors of the article (Robinson-Rechavi et al., 2006) also demonstrated that {pi}-cationic interactions play a smaller role (compared to salt bridges) in the stabilization of thermophilic proteins (Robinson-Rechavi et al., 2006). On the other hand, there is no correlation between changes in properties that describe compactness and in electrostatic interactions (Robinson-Rechavi et al., 2006).

Differences in residue packing between inner and outer residues of 20 pairs of thermophilic and mesophilic proteins were explored by Pack and Yoo (2005). They found that exposed residues do not have a distinct difference on residue packing (in the Results section we will discuss this observation).

Without a doubt, the 3D structure of any protein is determined by a delicate balance of different interactions. Hydrogen bonds, salt bridges, the hydrophobic effect play roles in the folding of a protein and the establishment of its final structure as well as binding multi-valent ions and a large number of prolines (Barlow and Thornton, 1983; Kumar et al., 2000; Szilagyi and Zavodszky, 2000). However, on the basis of the known facts, it is difficult to answer the question of which type of interaction is dominant in the maintenance of the native structure of proteins from thermophilic organisms. Despite the fact that there are many works devoted to the search for differences between thermophilic and mesophilic proteins, this question is not solved. Moreover, in some of the works the authors compare only one pair of proteins (one thermophilic protein and one mesophilic protein) or pairs of proteins without any definite criterion for homology or all proteins (both homologous or non-homologous) in the proteomes. In this work, we have created a unique database of proteins in order to look for differences between structurally well-aligned proteins from thermophilic and mesophilic organisms.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 ACKNOWLEDGEMENTS
 REFERENCES
 
2.1 Creation of the database
To search for structural differences between thermophilic and mesophilic proteins, we constructed a database of pairs of homologous proteins (one protein in each pair was from a thermophilic organism while the other protein was from a mesophilic organism). This database was created in the following way. In September 2005, all available 3D structures of proteins were taken from Protein Data Bank (PDB) (Berman et al., 2000). For each protein chain, the organism which was the initial source of the protein (usually its name is in the ORGANISM_SCIENTIFIC record corresponding to this chain in the PDB file) was determined. From literature data, we determined which organisms are thermophilic. Pairwise local alignments for all thermophilic chains with each other were made with the help of the program BLAST (Basic Local Alignment Search Tool) (Altschul et al., 1990) in order to find homologous sequences. The homologous sequences of thermophiles (BLAST E-value < 10–5) were joined into clusters. Then, the sequences of mesophiles which were homologous to at least one thermophilic sequence in a cluster were added to the corresponding clusters. Then, all proteins in a cluster were structurally aligned by the program AligProf (M.Y. Lobanov, unpublished data), and the MaxSub value (this value determines the quality of 3D alignment of structures) (Siew et al., 2000) was obtained for each pair of proteins. The MaxSub value was calculated as:


Formula 1

(1)
where N1 and N2 were numbers of residues in the proteins in the pair. The summation was made for all pairs of amino acids for which the distance between C{alpha}-atoms di (after superposition) was <3.5 Å. If the distance between C{alpha}-atoms was more than 3.5 Å, the residues were considered not to be aligned. If the aligned fragments were less than four residues, these fragments were also ignored. The MaxSub value changes within the limits of 0% < MaxSub < 100%; if MaxSub = 100%, it means that the examined structures completely coincide. The thermophile–mesophile pair with the largest MaxSub value was selected from each cluster.

To these best aligned pairs, the following criteria were applied: (1) multidomain proteins were divided into domains; (2) the length of a domain was not more than 400 amino acid residues; (3) if the N- or C-end of one of the proteins in the pair were longer than in another one, they were cut; (4) the difference in the length between proteins in any pair was not more than 10% from the length of the shortest protein; (5) the number of residues without 3D coordinates did not exceed 10% of the total number of residues in the protein; (6) split domains (i.e. domains consisting of two or more separate regions of the chain) were excluded and (7) the MaxSub value for each thermophile–mesophile pair was more than 70%. As a result, 373 pairs of structurally well-aligned proteins (one of the pairs is presented in Fig. 1) were selected.


Figure 1
View larger version (27K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Structural alignment (with the help of the program AligProf) of a pair of proteins from thermophilic and mesophilic organisms: 1 (gray curve)—protein GroES from a thermophilic organism T.thermophilus (PDB entry 1we3; chain O; residues 6 – 100); 2 (black curve)—protein GroES from a mesophilic organism E.coli (PDB entry 1pcq; chain O; residues 1 – 96).

 
2.2 Calculation of structural parameters
For each amino acid residue in the database, the type of secondary structure, number of hydrogen bonds, and surface area accessible to the solvent were obtained with the help of the program DSSP (Definition of Secondary Structure of Proteins) (Kabsch and Sander, 1983).

A residue was called internal if the area of its surface accessible to the solvent was equal to zero, and external if its accessible surface area was more than 25% of the maximal accessible surface area observed in PDB for this type of residue (see Table 1). The other amino acid residues were called intermediate residues (Fukuchi and Nishikawa, 2001).


View this table:
[in this window]
[in a new window]

 
Table 1. Maximum accessible surface area (ASA) of 20 types of amino acid residues, as observed in PDB*

 
The calculation of the number of atom–atom contacts per residue in proteins was carried out in the following way: two atoms were considered in contact with each other if their centers were at a distance of <6 Å (or 8 Å). The atom–atom contacts between adjacent residues as well as within one residue were not taken into account. Then, the total number of atom–atom contacts in a protein was divided by the number of residues in the protein.

The frequency of the occurrence of various types of amino acid residues among interior and exterior residues was determined as the ratio of the number of the residues of each type occurring among exterior (interior), to the total number of the exterior (interior) residues.

The energy of hydrogen bonds was calculated with the help of the program DSSP. According to Kabsch and Sander (1983), a hydrogen bond exists if its energy is lower than –0.5 kcal/mol. A hydrogen bond was considered exterior (interior), if its donor and acceptor both belong to the exterior (interior) residues. If a donor belongs to the exterior (interior) residue and an acceptor to the interior (exterior) residue, then the hydrogen bond was considered neither interior nor exterior.

We consider a salt bridge to exist if the distance between oppositely charged side groups does not exceed 4 Å (Barlow and Thornton, 1983; Kumar et al., 2000). We also calculated salt-bridge contacts for 3 Å and the results were the same (data not shown).

Numerical values of all the structural features considered in this work were calculated for all proteins in the database and compared (thermophiles versus mesophiles) by Student's pairwise t-test.

2.3 Analysis of protein folding
For investigation of protein folding and unfolding behavior, we used the previously described (Galzitskaya and Finkelstein, 1999; Galzitskaya et al., 2005; Garbuzynskiy et al., 2004, 2005) algorithm of analysis of a network of protein folding/unfolding pathways.

The method is based on investigation of pathways of stepwise reversible unfolding of a known 3D structure of the native state (taken from PDB) of the protein. Each step on each pathway is reversible and consists of a removal of one ‘chain link’ (a chain fragment consisting of a few amino acid residues) from the 3D structure. In this work, the size of the chain link was set to five residues (except for the C-terminal link which was smaller if the given protein size was non-divisible by five), which is the optimal value of this parameter (Garbuzynskiy et al., 2004). The removed links are assumed to form a coil: they lose all their native interactions but gain the entropy of a coil. In addition, if an unfolded loop with both ends fixed by the remaining structured part arises in the structure, some entropy is spent on fixation of these ends.

Considering all possible pathways of protein reversible unfolding, a network of protein folding/unfolding pathways is obtained. For each of the obtained structures, the free energy is calculated as follows:


Formula 2

(2)
where nI is the number of native contacts (these are atom–atom contacts at a distance 6 Å) in the folded part of structure I, {eta}I is the number of residues in the unfolded part of the structure, {varepsilon} is the energy of one contact (all contacts are assumed to be equal), {sigma} is the difference in entropy between the coil and the native state of a residue [we take {sigma} = 2.3R (Privalov, 1979) where R is the gas constant], T is temperature and {sum}loopsisinISloop is the entropy spent on fixation of the ends of all closed unfolded loops existing in structure I:


Formula 3

(3)
where rkl is the distance between the C{alpha} atoms of residues k and l, a = 3.8 Å is the distance between the neighbor C{alpha} atoms in the chain, and A is the persistence length for a polypeptide (we take A = 20 Å) (Flory, 1969).

We model protein folding and unfolding processes in the mid-transition point (i.e. at the melting temperature) of the given protein. At the point of mid-transition, only two states are observed (native state and totally unfolded state) while the other possible structures (partially folded and misfolded) are destabilized; thus, the folding process is in the simplest form (Finkelstein and Badretdinov, 1997). The free energies of the native state and the totally unfolded state are equal, i.e. the average contact energy {varepsilon} is


Formula 4

(4)
where Tm is the melting temperature of the protein, n0 is the number of contacts in the native structure and N is the total number of the protein chain residues.

Thus, a network of protein folding/unfolding pathways on a free-energy landscape is obtained. Further, on each of the pathways, we search (using a dynamic programming method) (Aho et al., 1976; Finkelstein and Roytberg, 1993) for a free-energy maximum (i.e. the transition state of this pathway; the part of the molecule which is structured in the transition state is a folding nucleus). Then, the free energies of all pathways are compared and the pathway with the minimum free energy of the transition state Fmin is revealed (this is the optimal pathway since it has the lowest free-energy barrier).

Considering all possible pathways, we thus obtain the ensemble of transition states (each of which possesses its own free-energy) and the ensemble of folding nuclei.

The effective height of the free energy barrier was calculated as


Formula 5

(5)
where Formula is the free energy of a single transition state, and the summation is over all transition states.

The average number of residues in the folding nucleus ensemble was also calculated according to the free energy of the transition state. The value of the Boltzmann probability of a transition state I # is


Formula 6

(6)
The effective size of the folding nucleus for a protein was calculated as follows:


Formula 7

(7)
where Formula is the number of folded residues in a transition state I #, and P1 # is the Boltzmann probability of I# in the transition state ensemble.

The degree of involvement of a residue in the folding nucleus is measured by the {Phi}-value (Matouschek et al., 1989), which is the ratio of destabilization of the transition state ensemble induced by point mutation of the residue to destabilization of the native state induced by this mutation; the point mutations are typically changes from a larger residue to a smaller one. In our model, we calculated {Phi}-values (for each residue) as the number of contacts which are formed by the side chain of the residue in the transition state (averaged over the whole transition state ensemble) divided by the number of contacts of this residue in the native state (in other words, we made mutations to glycine) (Galzitskaya et al., 2005):


Formula 8

(8)
In the cases when a residue was a glycine in the wild-type protein, we took the probability of this residue to be folded in the transition state ensemble instead of its {Phi}-value.


    3 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 ACKNOWLEDGEMENTS
 REFERENCES
 
In our work, we have studied proteins from thermophilic and mesophilic organisms. Here, we refer to proteins from thermophilic organisms as thermophilic proteins and proteins from mesophilic organisms as mesophilic proteins.

To carry out a comparative study of the properties of thermophilic and mesophilic proteins, we constructed a database which includes 373 thermophile–mesophile pairs of structurally well-aligned proteins (one of the pairs is shown in Fig. 1 structurally aligned).

The origin of the selected thermophilic proteins was the following: 269 of 373 proteins (72.1%) from bacteria, 2 (0.5%) from eukaryotes and 102 (27.3%) from archaea. The origin of the mesophilic counterparts was as follows: 283 (75.9%) from bacteria, 83 (22.3%) from eukaryotes and 7 (1.9%) from archaea (this database is available at http://phys.protres.ru/resources/termo_meso_base.html).

One of the advantages of our database is that it consists of homologous pairs of proteins. It should be mentioned that the fraction of amino acid residues involved in the regular secondary structure was identical in thermophilic and mesophilic proteins (80%).

Literature data indicates that thermophilic proteins should be more compact than mesophilic proteins (Robinson-Rechavi et al., 2006). One of our explanations is a greater number of contacts between amino acid residues in thermophilic proteins than in mesophilic proteins. In order to test this hypothesis, the number of atom–atom contacts per residue for the 373 thermophile–mesophile pairs of proteins was calculated with a contact distance cutoff of 6 and 8 Å (see Materials and Methods section). Indeed, a greater number of atom–atom contacts per residue was observed in thermophilic proteins than in mesophilic proteins (the difference: 1.4 ± 0.4, P = 1.1 x 10–9 for 6 Å contact distance cutoff and 3.9 ± 1.1, P = 4.3 x 10–12 for 8 Å contact distance cutoff, see Table 2). To determine which amino acid residues contribute to the increased number of atom–atom contacts, amino acid residues in each protein were divided into groups according to whether they were internal, external or intermediate (see Methods section). It should be mentioned that the thermophilic and mesophilic proteins from our database do not differ from each other by the fraction of interior, exterior and intermediate amino acid residues: 13% of interior, 46% of exterior and 41% of intermediate ones. This is an advantage for our analysis. We calculated the number of atom–atom contacts per residue for interior and exterior amino acid residues from thermophilic and mesophilic proteins (Table 2).


View this table:
[in this window]
[in a new window]

 
Table 2. Average number of atom–atom contacts per residue for all, interior and exterior amino acid residues of thermophilic and mesophilic proteins

 
From the obtained data, it is evident that interior amino acid residues of thermophilic and mesophilic proteins do not differ in the number of atom–atom contacts per residue. This means that the interior parts of thermophilic and mesophilic proteins are similarly packed. On the other hand, exterior amino acid residues of thermophilic proteins have a greater number of atom–atom contacts per residue than exterior amino acid residues of mesophilic proteins (the difference is 1.5 ± 0.2, P = 2.0 x 10–11 for 6 Å and 4.2 ± 0.6, P = 7.0 x 10–15 for 8 Å).

The average number of hydrogen bonds per residue within the main chain of the protein was calculated for interior residues only, for exterior residues only and for all residues. In all three cases (see Table 3), a greater number of hydrogen bonds per residue was observed in thermophilic proteins than in mesophilic (the difference: 0.03 ± 0.01, P = 0.02).


View this table:
[in this window]
[in a new window]

 
Table 3. Average number of hydrogen bonds per residue for all, interior and exterior amino acid residues of thermophilic and mesophilic proteins

 
Further, the amino acid composition of the interior and exterior residues from thermophilic and mesophilic proteins was analyzed. From Figure 2a, it is evident that the interior residues of thermophilic and mesophilic proteins do not differ in amino acid composition. External residues of thermophilic proteins are observed to contain a higher content of amino acids such as Lys (a difference of 0.019 ± 0.003), Arg (a difference of 0.017 ± 0.003) and Glu (a difference of 0.039 ± 0.003) than is observed in their mesophilic homologues. In contrast, a higher content of Ala (a difference of 0.020 ± 0.002), Asp (a difference of 0.009 ± 0.002), Asn (a difference of 0.008 ± 0.002), Gln (a difference of 0.022 ± 0.002), Thr (a difference of 0.011 ± 0.002), Ser (a difference of 0.012 ± 0.002) and His (a difference 0.007 ± 0.001) is observed in the exterior residues of mesophiles than in their thermophilic homologues (see Fig. 2b).


Figure 2
View larger version (36K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Fraction of amino acid residues of each of 20 types in thermophilic and mesophilic proteins: (a)interior amino acid residues; (b)—exterior amino acid residues. In the Figure, the error of average is given.

 
Since the number of Lys, Arg and Glu residues is greater in thermophilic than mesophilic proteins, the number of salt bridges was calculated. It turned out that thermophilic proteins contain on average 10.94 ± 0.41 salt bridges per protein, while mesophilic proteins contain only 9.20 ± 0.37 salt bridges per protein. In addition, the number of atom–atom contacts per residue was calculated at a contact distance of 4 Å (the distance cutoff for salt bridge formation) for external, internal and all amino acid residues (Table 4). One can see that thermophilic and mesophilic proteins differ in the number of atom–atom contacts per residue at 4 Å. Additionally, salt bridges evidently contribute to the stability of thermophilic proteins.


View this table:
[in this window]
[in a new window]

 
Table 4. Average number of atom–atom contacts per residue for all, interior and exterior amino acid residues of thermophilic and mesophilic proteins at a contact distance of 4 Å

 
In the article (Pack and Yoo, 2005), authors explored 20 pairs of thermophilic and mesophilic proteins and said that exposed residues do not have a distinct difference on residue packing. We have calculated the number of contacts for their database using our approach and the results are similar to those which we obtained for our database: thermophilic proteins show a greater degree of packing of exposed residues than mesophilic proteins (see Supplementary Material).

The t-test presented in the paper (Pack and Yoo, 2005) is not reliable. So the authors consider ti > 1.282 and ti < –1.282 then this is a two-tailed distribution but not a one-tailed t-test, and the authors really consider t0.2 = 1.29155 but not t0.1. But such a consideration is not reliable. It means that each fifth difference will have such t-value (>1.282) randomly (namely this situation we can see in the article in Table 2). If the authors work on the critical level t0.05 (this is a usually used critical level in the papers but not t0.2) then the obtained results are not reliable for all data presented in the article.

One of the ways to analyze whether proteins with similar topologies fold in a similar way is to compare theoretically predicted free-energy barriers and {Phi}-values for all residues in the native structures. Therefore, we analyzed the folding and unfolding processes of proteins from our database and compared the predicted properties of these processes in thermophilic and mesophilic proteins. We simulated folding and unfolding behavior at the melting temperature of a protein using a method (Galzitskaya and Finkelstein, 1999; Garbuzynskiy et al., 2004) that represents protein folding and unfolding processes as a network of folding/unfolding pathways on a free-energy landscape (see Materials and Methods section). This method was previously (Galzitskaya and Finkelstein, 1999; Garbuzynskiy et al., 2004, 2005) tested on a set of proteins in which both folding rates and folding nuclei (the structured parts of protein molecules in the transition state) were already investigated experimentally, and the method was shown to be able to predict both folding nuclei and folding rates (Galzitskaya and Finkelstein, 1999; Galzitskaya et al., 2005; Garbuzynskiy et al., 2004, 2005).

For our calculations, we selected (from the database described above) those pairs in which both proteins had no breaks in their 3D structures. To finish calculations within a reasonable time, we took only those proteins pairs in which both proteins were smaller than 150 amino acid residues. After applying the two additional criteria, we had 154 pairs of proteins.

In Table 5, the obtained data averaged over all proteins are demonstrated. One can see that the height of the free-energy barrier is virtually the same in thermophilic and in mesophilic proteins. This means that the proteins do not differ in free-energy barrier at their mid-transition point. The size of the folding nucleus is also the same on average (59 ± 2 amino acid residues). Our results confirm the experimental observations (Olofsson et al., 2007; Schuler et al., 2002) that those interactions which are different in thermophilic proteins compared to their mesophilic counterparts form only after the passing of the transition state. Behavior of the proteins is similar at the melting temperatures of these proteins (although thermophiles and mesophiles are characterized by different melting temperatures). These results are in agreement with the experimental data (Kumar et al., 2001) on the absence of a correlation between the living temperature of organisms and the height of the free-energy barrier on the folding pathway of their proteins at the living temperature of the source organism (Kumar et al., 2001).


View this table:
[in this window]
[in a new window]

 
Table 5. Parameters of the folding/unfolding processes for thermophilic and mesophilic proteins averaged over the database of 154 pairs of proteins

 
Average {Phi}-values for thermophilic and mesophilic proteins are also virtually the same (see Table 5).

However, there is a difference in {Phi}-value distributions (see Fig. 3): thermophilic proteins on average have a slightly narrower distribution than mesophilic proteins have.


Figure 3
View larger version (13K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. Distribution of {Phi}-values of amino acid residues in thermophilic (solid curve) and mesophilic (dash curve) proteins. Errors are calculated as n1/2 where n is the number of residues.

 
In other words, thermophilic proteins have fewer {Phi}-values below 0.2 (for which <20% of contacts are formed; residues that are virtually not involved in the folding nucleus) and greater than 0.9 (for which more than 90% of contacts are formed; residues that are inside the folding nucleus); instead, thermophilic proteins have a larger fraction of {Phi}-values in the range 0.2–0.4 (20–40% of contacts formed; these residues are partially involved in the folding nucleus). This difference is not large but is reliable ({chi}2-test gives a probability below 10–18 that these distributions are the same).

A database that includes 373 homologous pairs of proteins from thermophilic and mesophilic organisms was created. Using this database, we found that proteins from thermophilic organisms contain more atom–atom contacts per residue than their mesophilic homologues do. Exterior residues that are accessible to the solvent make the main contribution to this difference. The amino acid composition of interior (inaccessible to the solvent) and exterior amino acid residues of proteins from thermophilic and mesophilic organisms were analyzed. We determined that the exterior residues of proteins from thermophilic organisms contain more Lys, Arg and Glu and less Ala, Asp, Asn, Gln, Ser and His than do proteins from mesophilic organisms. The amino acid compositions of interior residues of the proteins considered are not different. Modeling of the folding/unfolding behavior of the studied proteins at the melting temperature of each protein has demonstrated that there is virtually no difference between the thermophilic and mesophilic free-energy barrier heights and the average size of protein folding nuclei under these conditions. Thus, the proteins of thermophilic and mesophilic organisms possess similar folding properties at the mid-transition point.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 ACKNOWLEDGEMENTS
 REFERENCES
 
We are grateful to D. Reifsnyder for assistance in preparation of the article. This work was supported by the programs ‘Molecular and cellular biology’ and ‘Fundamental sciences – medicine’, by the Russian Foundation of Basic Research (05-04-48750-a), by the INTAS grant (numero 05-1000004-7747) and by the Howard Hughes Medical Institute (55005607).

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Burkhard Rost

Received on April 24, 2007; revised on June 25, 2007; accepted on June 25, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Aho A, et al. The Design and Analysis of Computer Algorithms (1976) Reading, MA: Addison-Wesley.

    Altschul SF, et al. Basic local alignment search tool. J. Mol. Biol (1990) 215:403–410.[CrossRef][Web of Science][Medline]

    Barlow DJ, Thornton JM. Ion-pairs in proteins. J. Mol. Biol (1983) 168:867–885.[Web of Science][Medline]

    Berezovsky IN, Shakhnovich EI. Physics and evolution of thermophilic adaptation. Proc. Natl Acad. Sci. USA (2005) 102:12742–12747.[Abstract/Free Full Text]

    Berezovsky IN, et al. Entropic stabilization of proteins and its proteomic consequences. PLoS Comput. Biol (2005) 1:e47.[CrossRef][Medline]

    Berman HM, et al. The Protein Data Bank. Nucleic Acids Res (2000) 28:235–242.[Abstract/Free Full Text]

    Chakravarty S, Varadarajan R. Elucidation of factors responsible for enhanced thermal stability of proteins: structural genomics based study. Biochemistry (2002) 41:8152–8161.[CrossRef][Medline]

    Dill KA. Dominant forces in protein folding. Biochemistry (1990) 29:7133–7155.[CrossRef][Medline]

    Finkelstein AV, Badretdinov AY. Rate of protein folding near the point of thermodynamic equilibrium between the coil and the most stable chain fold. Fold. Des (1997) 2:115–121.[CrossRef][Web of Science][Medline]

    Finkelstein AV, Roytberg MA. Computation of biopolymers: a general approach to different problems. Biosystems (1993) 30:1–19.[CrossRef][Web of Science][Medline]

    Flory PJ. Statistical Mechanics of Chain Molecules (1969) New York: Interscience.

    Fukuchi S, Nishikawa K. Protein surface amino acid compositions distinctively differ between thermophilic and mesophilic bacteria. J. Mol. Biol (2001) 309:835–843.[CrossRef][Web of Science][Medline]

    Gallivan JP, Dougherty DA. Cation-pi interactions in structural biology. Proc. Natl Acad. Sci. USA (1999) 96:9459–9464.[Abstract/Free Full Text]

    Galzitskaya OV, Finkelstein AV. A theoretical search for folding/unfolding nuclei in three-dimensional protein structures. Proc. Natl Acad. Sci. USA (1999) 96:11299–11304.[Abstract/Free Full Text]

    Galzitskaya OV, et al. Theoretical study of protein folding: outlining folding nuclei and estimation of protein folding rates. J. Phys. Condens. Matter (2005) 17:S1539–S1551.[CrossRef]

    Garbuzynskiy SO, et al. Outlining folding nuclei in globular proteins. J. Mol. Biol (2004) 336:509–525.[CrossRef][Web of Science][Medline]

    Garbuzynskiy SO, et al. On the prediction of folding nuclei in globular proteins. Mol. Biol (2005) 39:1032–1041.[Web of Science]

    Honig B. Protein folding: from the levinthal paradox to structure prediction. J. Mol. Biol (1999) 293:283–293.[CrossRef][Web of Science][Medline]

    Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers (1983) 22:2577–2637.[CrossRef][Web of Science][Medline]

    Kumar S, et al. Factors enhancing protein thermostability. Protein Eng (2000) 13:179–191.[Abstract/Free Full Text]

    Kumar S, et al. Thermodynamic differences among homologous thermophilic and mesophilic proteins. Biochemistry (2001) 40:14152–14165.[CrossRef][Medline]

    Liang H-K, et al. Amino acid coupling patterns in thermophilic proteins. Proteins (2005) 59:58–63.[Medline]

    Matouschek JT, et al. Mapping the transition state and pathway of protein folding by protein engineering. Nature (1989) 340:122–126.[CrossRef][Medline]

    Nojima H, et al. Reversible thermal unfolding of thermostable phosphoglycerate kinase. Thermostability associated with mean zero enthalpy change. J. Mol. Biol (1977) 116:429–442.[CrossRef][Web of Science][Medline]

    Olofsson M, et al. Folding of S6 structures with divergent amino acid composition: pathway flexibility within partly overlapping foldons. J. Mol. Biol (2007) 365:237–248.[CrossRef][Web of Science][Medline]

    Pack SP, Yoo YJ. Packing-based difference of structural features between thermophilic and mesophilic proteins. Int. J. Biol. Macromol (2005) 35:169–174.[CrossRef][Web of Science][Medline]

    Perl D, et al. Conservation of rapid two-state folding in mesophilic, thermophilic and hyperthermophilic cold shock proteins. Nat. Struct. biol (1998) 5:229–235.[CrossRef][Web of Science][Medline]

    Privalov PL. Stability of proteins: small globular proteins. Adv. Protein Chem (1979) 33:167–241.[Medline]

    Razvi A, Scholtz JM. Lessons in stability from thermophilic proteins. Protein Sci (2006) 15:1569–1578.[CrossRef][Web of Science][Medline]

    Robinson-Rechavi M, et al. Contribution of electrostatic interactions, compactness and quaternary structure to protein thermostability: lessons from structural genomics of Thermotoga maritima. J. Mol. Biol (2006) 356:547–557.[CrossRef][Web of Science][Medline]

    Schuler B, et al. Role of entropy in protein thermostability: folding kinetics of a hyperthermophilic cold shock protein at high temperatures using 19F NMR. Biochemistry (2002) 41:11670–11680.[CrossRef][Medline]

    Siew N, et al. MaxSub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics (2000) 16:776–785.[Abstract/Free Full Text]

    Szilagyi A, Zavodszky P. Structural differences between mesophilic moderately thermophilic and extremely thermophilic protein subunits: results of a comprehensive survey. Structure (2000) 8:493–504.[Medline]

    Zeldovich KB, et al. Protein and DNA sequence determinants of thermophilic adaptation. PLoS Comput. Biol (2007) 3:e5.[CrossRef][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
K. Rother, P. W. Hildebrand, A. Goede, B. Gruening, and R. Preissner
Voronoia: analyzing packing in protein structures
Nucleic Acids Res., January 1, 2009; 37(suppl_1): D393 - D395.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary data
Right arrow All Versions of this Article:
23/17/2231    most recent
btm345v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (7)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Glyakina, A. V.
Right arrow Articles by Galzitskaya, O. V.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Glyakina, A. V.
Right arrow Articles by Galzitskaya, O. V.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?