Skip Navigation

Bioinformatics 2007 23(2):e225-e230; doi:10.1093/bioinformatics/btl318
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Jacob, E.
Right arrow Articles by Unger, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Jacob, E.
Right arrow Articles by Unger, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Structural Bioinformatics

A tale of two tails: why are terminal residues of proteins exposed?

Etai Jacob and Ron Unger *

The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University Ramat-Gan, 52900, Israel

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 

Motivation: It is widely known that terminal residues of proteins (i.e. the N- and C-termini) are predominantly located on the surface of proteins and exposed to the solvent. However, there is no good explanation as to the forces driving this phenomenon. The common explanation that terminal residues are charged, and charged residues prefer to be on the surface, cannot explain the magnitude of the phenomenon. Here, we survey a large number of proteins from the PDB in order to explore, quantitatively, this phenomenon, and then we use a lattice model to study the mechanisms involved.

Results: The location of the termini was examined for 425 small monomeric proteins (50–200 amino acids) and it was found that the average solvent accessibility of termini residues is 87.1% compared with 49.2% of charged residues and 35.9% of all residues. Using a cutoff of 50% of the maximal possible exposure, 80.3% of the N-terminal and 86.1% of the C-terminal residues are exposed compared to 32% for all residues. In addition, terminal residues are much more distant from the center of mass of their proteins than other residues. Using a 2D lattice, a large population of model proteins was studied on three levels: structural selection of compact structures, thermodynamic selection of conformations with a pronounced energy gap and kinetic selection of fast folding proteins using Monte-Carlo simulations. Progressively, each selection raises the proportion of proteins with termini on the surface, resulting in similar proportions to those observed for real proteins.

Contact: ron{at}biocom1.ls.biu.ac.il


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
Quite a few studies have been devoted to understanding the structural features of the first and last protein residues (i.e. termini). Two lines of investigations were taken; one is the question whether the two termini of proteins tend to be closer to each other than would be expected for random distances distribution. The other question is whether the properties of the N-terminal are different than those of the C-terminal. This is an important question since it has bearing on the controversial issue of sequential folding, i.e. is folding, for example on the ribosome, a sequential process that proceeds from the N-terminal to the C-terminal. In pioneering work, Thornton and Sibanda (1983) evaluated the distances between termini in 52 proteins and concluded that the distances between termini are smaller than expected for random chains. Christopher and Baldwin (1996) examined a much larger set of proteins and reached a different conclusion that the distance between termini is not statistically different than the random expectation. A recent study (Krishna and Englander, 2005) has contributed an interesting observation that proteins which fold in a two-state kinetics have their termini close together, while proteins that fold in a non-two-state kinetics have their termini separated.

The different environment of the termini was first studied in Thornton and Chakauya (1982) where it was observed that for proteins which exist at that time in the PDB, the N-terminal region tend to adopt an extended beta-strand conformation while C-terminal regions are often helical. In Alexandrov (1993) it was argued that N-terminal residues tend to have more intramolecular contacts than the C-terminal, suggesting that the N-terminal folds before the C-terminal. Laio and Micheletti (2006) have re-examined the data, and did not see this tendency. They did find, however, that the C-terminal is significantly more compact and locally organized than the N-terminal, although they argue that the bias is not due to sequential folding.

All these studies are based on the observation that protein termini tend to be on the surface of proteins and not buried in the core. This fact is critical for all these studies since it supplies the background against which calculations are tested. For example, when comparing the expected distance between termini, it is critical to consider the fact that termini are mostly on the surface, since the average distance of random points on a surface of a sphere is very different from the expected distance between random points found anywhere within its volume.

Surprisingly, the tendency of termini to be located on the surface is commonly taken as a postulate without a sufficient explanation. For example, Christopher and Baldwin (1996) paper starts with the following statement: ‘The terminal regions of proteins differ in several ways from more internal segments. The termini are often surface exposed and flexible’.

We are not aware of studies aiming to explore this issue and explain how are the terminal residues get to be overwhelmingly located on the surface of proteins. At least for some proteins there is a need to bring the terminal residues to the surface to allow them to participate in post-translational processes (e.g. in N-terminal acetylation or methylation). However, many proteins do not undergo such modifications, and in any case this functional reason does not supply a mechanism to support the tendency of terminal residues to be located on the surface of folded proteins.

A common explanation often given for this tendency is that terminal residues are charged: the first amino group (which is not bonded to a carboxyl group) is positively charged, and likewise the last carboxyl group which is not paired with an amino group is negatively charged. Charged residues would tend to be on the surface of proteins because of their favorable interactions with water which is a polar solvent. However, this argument is valid also for charged amino acids like lysine, arginine, aspartic acid and glutamic acid. While these residues tend indeed to be located on the surface of proteins, we show here that the terminal residues are much more exposed than charged amino acids.

In our study we first use the large collection of protein structures that currently exist in the PDB to measure, by various methods, the extent to which termini are indeed located on the surface and exposed to the solvent. Next, we want to understand what are the mechanisms leading to this behavior. Since running full atom models of the thermodynamics of proteins and especially on their dynamic properties is not practical, we have chosen to use lattice models.

Despite the simplification of such models and the very short sequences that are usually used, they have been shown to capture generic properties of real proteins such as: collapse transitions, mutational properties, development of secondary and tertiary structure and folding kinetics (Dill et al., 1995; Sali et al., 1994; Unger and Moult, 1996).

Using a lattice, we generate a large population of model proteins and study their properties by selecting proteins on three levels: structural selection of compact structures, thermodynamic selection of conformations with strong energy preferences and kinetic selection of fast folding proteins using Monte-Carlo (MC) simulations. We show how, progressively, each selection raises the proportions of proteins with termini on the surface, resulting in very similar proportions to what is measured for real proteins.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
2.1 The PDB dataset
PDB entries were taken from the non-redundant PDB set (http://www.ncbi.nlm.nih.gov/Structure/VAST/nrpdb.html) using the non-redundant threshold of p-value of 10–40. From this list we took only monomeric structures of length between 50 and 200 amino acids that were solved by X-ray crystallography and for which no missing residues were reported. A total of 425 structures were considered.

2.2 Exposure analysis of residues of proteins
Two methods were used to determine the extent to which termini are located on the surface of proteins. The first measure is based on the exposure of termini residues to solvent, and the second on the distance of the termini from the center of mass of each protein.

2.2.1 Exposure calculations
The corresponding DSSP files for the PDB entries were downloaded from ftp://ftp.cmbi.ru.nl/pub/molbio/data/dssp/. We used the solvent accessibility value in the DSSP as the exposure measurement as described in Kabsch and Sander (1983). The relative solvent accessibility of each residue was calculated by normalizing its solvent accessibility to the maximum possible value for that amino acid (Shrake and Rupley, 1973).

2.2.2 Distance from center of mass
Whereas solvent exposure is a very common way to measure the extent to which amino acids are on the surface of proteins, there might be a problem in using it for terminal residues. Some of the protection from the solvent is supplied by the main chain and the side chain of the two immediate neighbors of each amino acid. However, terminal residues are truncated and have only one neighboring residue. Thus, to enable independent assessment of the location of terminal residues we suggest measuring the distance of each amino acid to the center of mass of its protein. Residues with the highest distance will be on the surface. Since proteins are of different sizes, and hence expected distances, we normalized this measure for each protein in terms of standard deviation according to:

Formula
Where RED is the relative distance of a residue (C{alpha} only), D is the absolute distance from the center of mass, Avg(Ds) is the average distances of all residues from the center of mass and SDV is the standard deviation of this average.

2.3 Lattice model of proteins
The polypeptide chains in the simulations are modeled as a linear sequence of residues on a 2D lattice. In order to increase the spectrum of interactions relevant to our study, four different types of residues are used, instead of the common HP model with only two types of interactions (Dill et al., 1995; Chan and Dill, 1996). These are hydrophobic (H), neutral polar (P), positively charged (+) and negatively charged (–). Interactions are considered only between residues in neighboring lattice points (diagonal points are not considered neighboring). Interactions between consecutive residues in the sequence are not considered since they are always present and are independent of the conformation.

The energy of sequence S of length N in conformation C is given by

Formula
where

Formula
and Pij represents the energy of the contact interactions which are given in the Table 1.


View this table:
[in this window]
[in a new window]

 
Table 1 The energy of the contact interactions

 
Those values were chosen to reflect the average strength of interactions in empirical mean force potentials (Miyazawa and Jernigan, 1993), where HH interactions are stronger than PP interactions, HP and H(+)/(–) interactions are neutral, P(+)/(–) interactions are weakly favorable, (+)(+) or (–)(–) are repulsive and (+)(–) interactions are the strongest attractor. Repeating the experiments described here with variations of this potential yielded similar results.

2.4 Generation of model sequences
A total of 3342 unique model sequences were generated by random rearrangements of 25 residues, drawn from a distribution of 45% H, 30% P, 12.5% (+) and 12.5% (–). This composition is similar to the composition of amino acid groups in the PDB (http://us.expasy.org/sprot/relnotes/relstat.html, PFB release 49.1) which is 44.3% neutral, 30.7% polar and 12% positively charged amino acids and 13% negatively charged amino acids. To be consistent with the fact that protein termini are charged, oppositely charged amino acids (+/–) were assigned to both termini.

2.5 Generation of compact structures
Conformations of 25 residues long that fit into a 6 x 6 square are considered compact. For each sequence of 25 residues, all possible 9 646 215 such compact non-symmetric conformations were recursively generated, and the energy values for all these conformations were calculated based on the potential. We assume that the minimal conformation of the chain is one of the 9 646 215 compact structures, and thus the conformation with the minimal energy is considered the native structure of that particular sequence (Fig. 1). If there is more than one conformation with the same minimum, one is arbitrarily chosen. Conformations for which the simulation (to be described later) demonstrates that the minimal energy is not a compact structure (i.e. the simulations found a conformations that cannot be contained within a 6 x 6 square which is better than all the compact conformations) were excluded retroactively from consideration. We encountered only very few (<1%) such cases.


Figure 1
View larger version (29K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 An example of a model sequence structure. The structure is the native conformation of the 25 residues sequence +HHPP-PPHPP-HPHHH+PP+HHH- with minimum energy of –11. Buried residues appear in black and exposed residues in gray.

 
2.5.1 Exposure measurement for the model
In the model, a residue is considered exposed if one of its four neighbor lattice points is part of the exterior, or if there is a path from this points to the exterior. Otherwise, it is considered buried. Note that by this definition a residue which is a neighbor to an internal cavity points is not considered exposed. An example of a native structure is illustrated in Figure 1.

2.6 Compact structures with significant energy gap
There is a large variance of the spectrum of energy values of the conformational space for different proteins. As suggested in Sali et al. (1994), a significant energy gap is important in order to ensure kinetic accessibility of the native structure. For each sequence, we measure the difference between the minimal energy (i.e. the native conformation) and the average energy of all conformations in units of standard deviations of the average energy. The larger the difference between these two numbers, the more pronounced is the energy gap. We selected the 800 sequences (out of the 3342), with the largest difference of their native structure for the simulations in the kinetic accessibility stage.

2.7 Simulation technique
Folding dynamics is simulated using the MC method with the Metropolis criterion (Metropolis et al., 1953). A chain starts as a random conformation and folds by the following algorithm: from a conformation S1 with energy E1, a random change (a move) of conformation to S2 is performed and the energy E2 is evaluated. If E1 ≥ E2, then the move to conformation S2 is accepted, otherwise acceptance of the move depends on the following non-deterministic criterion:

Formula
where Rnd is a random number between 0 and 1. Ck of 1 and Tf = 0.5 were used for all sequences and simulations. If the move was not accepted, the former conformation S1 is retained. Two types of moves are considered: a tail move, which is a random left or right turn of the first or last residue of the chain, and an internal move which is performed as follows. (1) Two residues are randomly selected in the conformation, with a sequences separation up to L residues. Then, the trajectory between the two residues is replaced by another trajectory, taken from a pre-defined library of trajectories of the same length and the same relative translocation between their end points (Fig. 2). Only trajectories that do not collide with other part of the chain are considered. L is a parameter that in our simulations was varied between 3 and 11, for size L, trajectories of length <= L are considered. This notion of local moves is a generalization of the standard local moves, for example corner flip and crankshafts for L = 3 (see review in Skolnick and Kolinski, 1991). The ratio between tail moves and internal moves was varied in the simulation, with no significant change in the results.


Figure 2
View larger version (20K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2 An example of a local move. The trajectory of length 6 between two residues (3 and 9) is replaced by another valid trajectory of the same length between these points. The rest of the structure is unchanged.

 
2.8 Kinetic accessibility
In order to examine and characterize the kinetic accessibility of a model sequence to its pre-calculated native structure, each of the 800 sequences with the largest energy gaps was simulated and analyzed by the following protocol: A single simulation of a model sequence consists of 106 MC Steps (MCS). The simulation process is terminated once the native conformation is found or after 106 MCS. Some flexibility is allowed in reaching the native conformation. We considered the native conformation as found if the simulation reached a conformation within a distance of <0.5 root mean square distance from the native conformation. (This distance is roughly equal to 2 out of the 25 residues being off by one lattice point from the corresponding position in the native conformation.) The number of MCS taken to find the structure is considered as the first passage time (FPT). For each sequence, 50 independent simulations were run with the same folding parameters (simulation temperature, local moves library size and tail moves probability). If a model sequence was folded successfully more than a defined percent threshold (e.g. 80%, 40 out of 50 runs), it is considered a fast folder; otherwise, it is considered as a slow folder. This threshold parameter, as well as other simulation parameters; like tail moves probability and local moves size (L) were varied in our simulations.


    3 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
3.1 Analysis of PDB structures
We start by calculating the exposure of the termini in a dataset of 425 non-redundant monomeric proteins from the PDB. The averaged normalized solvent accessibility of termini residues is 87.1% compared with 49.2% of charged residues and 35.9% of all residues. We consider a residue with solvent accessibility of >50% of its maximal surface area as exposed. Figure 3 shows the exposure of residues in the N- and C-terminal region, i.e. the first and last 10 residues of each protein. It is clearly seen that the terminal residues are highly exposed (80.3 and 86.1% for N- and C-terminal residues, respectively), there is a much smaller effect on the residues adjacent to the termini. When the analysis is done based on amino acid type (Fig. 4) we see, as expected, that charged residues are more exposed than hydrophobic and polar residues but that terminal residues are much more exposed than charged residues.


Figure 3
View larger version (17K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3 Exposure of terminal residues in PDB. The percent of residues, averaged over 425 proteins, that have >50% of their surface area accessible to solvent. Ten residues from the N-terminal (left) and C-terminal (right) are shown. The tendency of the terminal residues to be exposed is evident.

 


Figure 4
View larger version (12K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4 Exposure of residues by type in PDB. For the dataset of 425 proteins, the percent of residues with >50% accessible surface area is shown by residue type. It is clear that terminal residues are much more exposed than charged residues.

 
It might be argued that solvent accessibility of terminal residues is large because they are missing one of their neighboring residues that could have provided additional shield from the solvent. Thus, in order to probe directly the location of the terminal residues we measured the distance of the terminal residues and all other residues from the center of mass of their proteins. The distance was normalized, in units of standard deviation, to the average distance of residues to the center of mass for each protein. The results, shown in Figure 5, indicate that indeed terminal residues are found much more on the exterior of proteins as compared to any other type of residues.


Figure 5
View larger version (7K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 5 The distance of residues from the center of PDB proteins. For the dataset of 425 proteins, the distance of residues to the center of mass of their proteins is presented. The average distances, in units of standard deviations of distances in each protein, are grouped by residue type. It is evident that terminal residues are most distant from the center of their proteins.

 
Thus, we can say that indeed protein termini are predominantly located on the surface. Out of the 425 proteins only 132 have one termini buried (i.e. <50% exposure), and 13 with both termini buried. If we use a cutoff of 25% exposure then there are only 38 proteins with one buried termini and 2 with both termini buried. With a 10% exposure cutoff, only 14 proteins have one terminal buried and none has both. An example of one of the 14 cases, staphopain, is shown in Figure 6.


Figure 6
View larger version (17K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 6 Terminal residues of proteins. For most proteins both termini are exposed to the solvent, as in cytochrome c552 (PDB code 1C52) (left, where terminal residues are shown as yellow space filling objects). Only in very few cases termini residues are buried as in (right) staphopain (code 1CV8), a cysteine proteinase where the C-terminal tyrosine is totally buried.

 
3.2 Analysis of model proteins
For extended conformations of model proteins, most residues are exposed. We collected data from 42 450 extended conformations produced by MC simulations and observed (Fig. 7) that all residues are exposed in >80% of the extended structures. For the three terminal residues on each side, >90% are exposed and the very terminal residues are >95% exposed. To gather statistics about compact conformations, 3342 unique random sequences of 25 residues were created. For each sequence, all possible 9 646 215 two dimensional compact non-symmetric conformations that fit into a 6 x 6 lattice were generated. For these compact structures, the exposure profile of the proteins is quite flat along the structure and all residues have ~70% exposure (Fig. 7).


Figure 7
View larger version (32K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 7 Exposure profile of extended and compact conformations. For extended (dark) conformation, all residues are >80% exposed, with terminal residues reaching >95% exposure. For compact (light) conformations the exposure profile is quite flat with all residues having ~70% exposure.

 
Next we turn to analyze the exposure profile of native structures (i.e. minimal energy structure). We used enumeration of compact structures of the 3342 sequences composed of an alphabet of four types: (H) hydrophobic, (P) polar, (+) positively charged and (–) negatively charged, in proportion similar to what is found in the PDB. For each sequence, using a table of mean force potential [reflecting an average of the strength of interactions between the corresponding amino acids (Miyazawa and Jernigan, 1993)] the energy of every compact conformation was evaluated. The conformation with the lowest energy was considered the native conformation. The percent of exposed residues was calculated for all the native structures.

Figure 8 shows the exposure by residue type and demonstrates that for native structures in our model, terminal residues are more exposed than other types of residues. While these exposures are higher than observed for real proteins (Fig. 4), they do show the same rank between residues type as in real proteins.


Figure 8
View larger version (15K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 8 Exposure of residues by type for compact structures. For the set of 3342 native conformations of model proteins, the percent of exposure is shown by residue type.

 
The percent of exposed residues were calculated for the entire set and for the 800 proteins for which the native structure has the largest gap in energy from the averaged energy value. The tendency of the terminal residues to be exposed is slightly higher (89.2%) for those proteins than for the entire set (87%). If we use the top 200 sequences, the tendency goes slightly further higher to 90.8%.

3.3 Analysis of kinetic folding
Proteins were divided into two groups, fast folders and slow folders. The separation was based on the ability of sequences to fold to their native conformation in a MC simulation of 106 steps. Each sequence was run 50 times and proteins that were able to find the native conformation in more than a threshold percentage of the simulations were considered fast folders, and proteins that found their native structure in less than that threshold percentage of runs were considered slow folders. A threshold of 80% (which was used in most simulations) yielded 355 fast folders and 445 slow folders. A comparison of the percent of exposed residues for fast and slow folding proteins is shown in Figure 9, showing a significant difference. The exposure by residue type for the 355 fast folders is shown in Figure 10.


Figure 9
View larger version (28K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 9 Percent of exposed residues of fast and slow folders. A probability of 0.15 was used for tail moves and L = 7 of maximum local moves size.

 


Figure 10
View larger version (14K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 10 Exposure of residues by type for fast folders. For 355 fast folding proteins, the percent of exposure is shown. As in real proteins, hydrophobic residues are most buried, followed by polar residues. Terminal residues are more exposed than charged residues.

 
The simulations were performed using different parameters of local move set, percent of tail moves, threshold between fast and slow folders and in all cases the conclusion was similar: in all simulations proteins that fold fast have a higher percentage of their termini exposed than slow folding proteins (Fig. 11).


Figure 11
View larger version (24K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 11 Termini exposure of fast and slow folding proteins as a function of different simulation parameters. The percent of exposed termini for fast folders is shown in dashed green line and slow folders are shown in solid blue. (Top) Changing the threshold separating slow and fast folders from success in 20% of runs to success in 90%. (Tail move probability is fixed to 0.15 and library move size L = 7). (Middle) The percent of tail moves compared with internal moves is varied from 0.05 to 0.75 (library move size is fixed to 7 and threshold is 0.8). (Bottom) Library move size (L) is varied from 3 to 11 (threshold is fixed to 0.8 and tail move probability equals 0.15). In all cases the fast folding proteins have significantly higher tendency to have their terminal residues exposed.

 
Furthermore, we performed longer simulations of 6 x 106 MC moves for two groups of proteins: 78 proteins for which the native conformation has the two termini on the surface, and 78 proteins for which in the native structure at least one of the termini was not exposed. Again we saw that proteins with exposed termini fold faster: The average folding time (FPT) for proteins with exposed termini was 204 000 MCS compared with 404 000 MCS for proteins with at least one buried termini.


    4 DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
We set out to explain why terminal residues of proteins tend to be located on the surface. We first measured the location of the terminal residues in a dataset of 425 monomeric short proteins. We used two different measurements; first we checked the solvent accessibility of these residues and second we checked the distance of these residues from the center of mass of their proteins. Taken together, the results clearly indicate that indeed terminal residues are overwhelmingly located on the surface on proteins.

Based on this finding, we want to understand the mechanisms that force terminal residue to be on the surface. It is clear that many proteins need to have their terminal exposed in order to make them accessible to post-translational modifications which are common for both termini (Dixon, 1984; Chung et al., 2002). Thus, it can be argued that the location of terminal residues on the surface is a desirable feature that can be selected for by evolution. This feature could have been selected for directly, or, as is common in evolutionary processes, could have been incorporated into other considerations that would have preferred this feature. We suggest that the latter is true, i.e. thermodynamic and kinetic considerations that are known to have an effect on proteins could lead to such a preference.

Using a simple lattice model, we demonstrate that a series of constraints that affect proteins will lead to the preference of terminal residues to be located on the surface. Clearly, for extended conformations of protein, all residues tend to be exposed (Fig. 3). But even for compact conformations, our analysis shows that the exposure profile is quite flat, and all residues tend to be equally exposed (Fig. 3). When only conformations with minimal energy (i.e. native conformations) are considered, terminal residues start to prefer to be located on the surface. When native conformations with a profound energy gap are considered then this tendency increases. If we look at proteins that can fold fast in kinetic simulations, then we see that the tendency of terminal residues to be exposed is increased further (Fig. 10). Proteins that require that terminal residues will be tucked inside the core may be prohibitively complicated to fold. To conclude, we suggest that the tendency of terminal residues of proteins to be located on the surface is a result of thermodynamic and kinetic selection processes. Indeed, model proteins that have been selected using these considerations (Fig. 10) exhibit similar exposure profile to real proteins (Fig. 4). The lattice work presented here is based on small monomeric structures. In the future it might be of interest to examine the model on larger oligomeric structures.


    Acknowledgments
 
The authors thank Inna Myslyuk for assistance with the art work, and Tirza Doniger for useful comments on the manuscript.


    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 

    Alexandrov, N. (1993) Structural argument for N-terminal initiation of protein folding. Protein Sci, . 2, 1989–1991[Web of Science][Medline].

    Chan, H.S. and Dill, K.A. (1996) A simple model of chaperonin-mediated protein folding. Proteins, 24, 345–351[CrossRef][Web of Science][Medline].

    Chung, J.J., et al. (2002) Functional diversity of protein C-termini: more than zipcoding? Trends Cell Biol, . 12, 146–150[CrossRef][Web of Science][Medline].

    Christopher, J.A. and Baldwin, T.O. (1996) Implications of N and C-terminal proximity for protein folding. J. Mol. Biol, . 257, 175–187[CrossRef][Web of Science][Medline].

    Dill, K.A., et al. (1995) Principles of protein folding—a perspective from simple exact models. Protein Sci, . 44, 561–602.

    Dixon, H.B.F. (1984) N-terminal modification of proteins. J. Protein Chem, . 3, 99–108[CrossRef].

    Kabsch, W. and Sander, C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22, 2577–2637[CrossRef][Web of Science][Medline].

    Krishna, M.M. and Englander, S.W. (2005) The N-terminal to C-terminal motif in protein folding and function. Proc. Natl Acad. Sci. USA, 102, 1053–1058[Abstract/Free Full Text].

    Laio, A. and Micheletti, C. (2006) Are structural biases at protein termini a signature of vectorial folding? Proteins, 62, 17–23[CrossRef][Web of Science][Medline].

    Metropolis, N., et al. (1953) Equations of state calculations by fast computing machines. J. Chem. Phys, . 21, 1087–1091[CrossRef].

    Miyazawa, S. and Jernigan, R.L. (1993) A new substitution matrix for protein sequence searches based on contact frequencies in protein structures. Protein Eng, . 6, 267–278[Abstract/Free Full Text].

    Sali, A., et al. (1994) How does a protein fold? Nature, 369, 248–251[CrossRef][Medline].

    Shrake, A. and Rupley, J.A. (1973) Environment and exposure to solvent of protein atoms. Lysozyme and insulin. J. Mol. Biol, . 79, 351–371[CrossRef][Web of Science][Medline].

    Skolnick, J. and Kolinski, A. (1991) Dynamic Monte Carlo simulations of a new lattice model of globular protein folding, structure and dynamics. J. Mol. Biol, . 221, 499–531[CrossRef][Web of Science][Medline].

    Thornton, J.M. and Chakauya, B.L. (1982) Conformation of terminal regions in proteins. Nature, 298, 296–297[CrossRef][Medline].

    Thornton, J.M. and Sibanda, B.L. (1983) Amino and carboxy-terminal regions in globular proteins. Mol. Biol, . 167, 443–460.

    Unger, R. and Moult, J. (1996) Local interactions dominate folding in a simple protein model. J. Mol. Biol, . 259, 988–994[CrossRef][Web of Science][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
E. Jacob, A. Horovitz, and R. Unger
Different mechanistic requirements for prokaryotic and eukaryotic chaperonins: a lattice study
Bioinformatics, July 1, 2007; 23(13): i240 - i248.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Jacob, E.
Right arrow Articles by Unger, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Jacob, E.
Right arrow Articles by Unger, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?