Skip Navigation


Bioinformatics Advance Access originally published online on April 3, 2008
Bioinformatics 2008 24(10):1271-1277; doi:10.1093/bioinformatics/btn114
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/10/1271    most recent
btn114v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Park, Y.
Right arrow Articles by Helms, V.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Park, Y.
Right arrow Articles by Helms, V.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Prediction of the translocon-mediated membrane insertion free energies of protein sequences

Yungki Park and Volkhard Helms *

Center for Bioinformatics, Saarland University, Germany

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Motivation: Helical membrane proteins (HMPs) play crucial roles in a variety of cellular processes. Unlike water-soluble proteins, HMPs need not only to fold but also get inserted into the membrane to be fully functional. This process of membrane insertion is mediated by the translocon complex. Thus, it is of great interest to develop computational methods for predicting the translocon-mediated membrane insertion free energies of protein sequences.

Result: We have developed Membrane Insertion (MINS), a novel sequence-based computational method for predicting the membrane insertion free energies of protein sequences. A benchmark test gives a correlation coefficient of 0.74 between predicted and observed free energies for 357 known cases, which corresponds to a mean unsigned error of 0.41 kcal/mol. These results are significantly better than those obtained by traditional hydropathy analysis. Moreover, the ability of MINS to reasonably predict membrane insertion free energies of protein sequences allows for effective identification of transmembrane (TM) segments. Subsequently, MINS was applied to predict the membrane insertion free energies of 316 TM segments found in known structures. An in-depth analysis of the predicted free energies reveals a number of interesting findings about the biogenesis and structural stability of HMPs.

Availability: A web server for MINS is available at http://service.bioinformatik.uni-saarland.de/mins

Contact: volkhard.helms{at}bioinformatik.uni-saarland.de

Supplementary information: Supplementary data are available at Bioinformatic online.


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Helical membrane proteins (HMPs) play crucial roles in diverse cellular processes. Several studies have suggested that HMPs account for 20–30% of the open reading frames of sequenced genomes (Arkin and Brunger, 1998; Jones, 1998; Wallin and von Heijne, 1998). In spite of their biological importance and genomic abundance, only 1.5% of the proteins with known structure are HMPs (Tusnády et al., 2004), and thus the sequence–structure–function relationship of HMPs remains poorly understood.

Unlike water-soluble proteins, HMPs need not only to fold but also get inserted into the membrane to be fully functional. One of the most important steps in the biogenesis of HMPs is, therefore, the recognition and membrane insertion of their transmembrane (TM) segments by the translocon complex (Hessa et al., 2005a; van den Berg et al., 2004). Failure in this step is implicated in the pathology of several diseases, most notably cystic fibrosis (Tector and Hartl, 1999) and epilepsy (Gallagher et al., 2007). Recently, von Heijne and White have devised an experimental scheme for measuring the translocon-mediated membrane insertion free energies of protein sequences (Hessa et al., 2005a) [free energies are meant to be apparent free energies throughout this study, unless otherwise noted]. These experiments revealed that the translocon complex uses a position-dependent hydrophobicity scale in recognizing TM segments. In addition, both hydrophilic and hydrophobic amino acids have been shown to exhibit a significant populational bias at the N- versus C-terminal fractions of TM helices, which was deemed to arise from their distinct snorkelling preferences (Chamberlain and Bowie, 2004; Chamberlain et al., 2004). Taken together, the translocon complex appears to use an asymmetric, position-specific hydrophobicity scale for recognizing TM segments.

At the heart of TM segment recognition by the translocon complex lie the distinct membrane insertion behaviours of amino acids. Several computational studies have addressed this issue by deriving membrane insertion potentials of amino acids on the basis of statistical analysis of known HMP structures (Senes et al., 2007; Ulmschneider et al., 2005). Ideally, such statistical treatments would take into account both the asymmetric depth-dependent properties of biological membranes and the asymmetric position-dependent properties of TM segments, such as the populational biases of amino acids at their N- versus C-terminal fractions induced by the helix directionality. Ulmschneider's potential was the first derived based on computational analysis of known structures (Ulmschneider et al., 2005). Since its derivation took into account the asymmetry of membranes, some of the potentials are asymmetric in such a way that they are in good agreement with experimental findings (White and von Heijne, 2005). The procedure for deriving DeGrado's potential is insightful, in that Zmid and n in their Table 1 concisely capture the distinct membrane insertion behaviours of amino acids (Senes et al., 2007). For the sake of increasing data points, however, its derivation took into account neither asymmetric property. Overall, the two potentials agree closely, with his being a notable exception (Senes et al., 2007).


View this table:
[in this window]
[in a new window]

 
Table 1. Prediction performance of MINS and hydrophobicity scales

 
In this study, we present the development of Membrane Insertion (MINS), a novel sequence-based computational method for predicting the membrane insertion free energies of protein sequences. Its development was motivated by the observation that the two previous potentials, due to their neglect of the helix directionality, average out the populational biases of both hydrophilic and hydrophobic amino acids at the N- versus C-terminal fractions of TM segments, resulting in symmetric potentials for many (Ulmschneider's) and all (DeGrado's) amino acids. Another potential advantage of MINS would be that it can be easily applied to large-scale tasks because of its sequence-based nature, unlike the two previous potentials, which require explicit molecular modelling and simulation for estimating membrane insertion free energies. On the other hand, the development of MINS did not take into account the asymmetry of membranes, the significance of which we discuss subsequently (Section 3.5).

A benchmark test on 357 known cases shows that the free energies predicted by MINS closely agree with those experimentally measured. Moreover, MINS is shown to be quite effective in identifying TM segments. MINS was then used to predict the membrane insertion free energies of 316 TM segments occurring in known structures. An in-depth analysis of the predicted free energies provides a number of interesting insights into the biogenesis and structural stability of HMPs.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
2.1 Dataset
The current study is based on a non-redundant set of 73 protein chains (pairwise identity of 25% or less) extracted from known HMP structures with resolution better than 3.5 Å as of July 2007 (available in the Supplementary Material). Practically, this was done by filtering all the entries deposited in the OPM database (Lomize et al., 2006). For each protein chain, a multiple sequence alignment (MSA) was generated, from which profiles (frequencies of occurrence of the 20 amino acids for each sequence position) were derived using AL2CO (Pei and Grishin, 2001) as described previously (Park and Helms, 2007; Park et al., 2007). Briefly, for a given protein chain, a maximum of 1000 homologous sequences were retrieved from the nr database using BLAST (Altschul et al., 1997). Upon generating initial MSAs using ClustalW (Thompson et al., 1994), sequence fragments were removed, and the remaining sequences were re-aligned to yield a final MSA. When deriving profiles from an MSA, amino-acid frequencies were weighted using a modified method of Henikoff and Henikoff (1994) as implemented in PSI-BLAST (Altschul et al., 1997).

Mammalian secreted and cytoplasmic protein sequences were collected from the SwissProt database (O'Donovan et al., 2002) in the following way. First, SwissProt (version 54.6) was homology reduced at the sequence identity of 80% using the CD-HIT algorithm (Li and Godzik, 2006). Then all proteins annotated to have a signal peptide but no TM segments were taken as secreted proteins. Annotated signal peptides were removed before subsequent analysis. All proteins whose subcellular location is annotated to be ‘cytoplasmic’ were taken as cytoplasmic proteins. All together, 195 082 sequence segments from 709 secreted mammalian proteins and 222 985 sequence segments from 453 cytoplasmic mammalian proteins were analyzed.

2.2 Derivation of MINS—dataset
For deriving MINS, we considered 13 836 sequence segments of 19 residues length belonging to the 73 protein chains. These 13 836 sequence segments are without any missing profile elements based on MSAs generated as above. Note that these segments also include those that do not have any membrane-embedded residues. We approximated the membrane insertion free energy of each segment by the distance from the membrane centre of its middle residue as defined in the OPM database (Lomize et al., 2006). In this approximation, all segments with a distance larger than 20 Å were assigned 20 Å. Given the properties of the lipid bilayer-water interface (Granseth et al., 2005; White and Wimley, 1999), anywhere between 20 Å and 30 Å seems to be fine. We examined all 11 values from 20 Å through 30 Å in steps of 1 Å and found that the results vary little in this range (data not shown). Throughout this study except in Section 3.4, the average of 73 matrices, each derived using only 72 protein chains with one protein chain left out, was used because it enabled us to take the advantage of stabilizing effects of averaging 73 matrices to get a final matrix, which is sensible, given the relatively modest size of the current dataset.

2.3 Derivation of MINS—technical aspects
The set of the 13 836 sequence segments is represented by a matrix X of 13 836 by 381 (380 profile elements and 1). The set of corresponding unsigned Z coordinates is represented by a matrix Y of 13 836 by 1. The matrix to be derived is represented by a matrix β of 381 (380 scale values and an intercept) by 1. For deriving an optimal β, it is pivotal to recognize Equation (1)


Formula 1

(1)
where SSE(β) is the sum of squared errors between the predicted and observed Z coordinates for the 13 836 segments [see Equation (2)], r(β) the correlation coefficient between them and k a constant ≥0. As indicated, SSE(β) and r(β) are functions of β.


Formula 2

(2)
where (Y–Xβ)T is the transpose of (Y–Xβ). Equation (1) shows that maximization of r(β) is equivalent to minimization of SSE(β). Minimization of SSE(β) is the task of linear regression, and its analytical solution is given in Equation (3) (Hastie et al., 2001).


Formula 3

(3)
This means that the first 380 elements of β are the 380 optimal matrix entries. However, the β in Equation (3) might be an overfit to the dataset used whereas a generalizable matrix is desired. In addition, if some elements of β correlate, their values may be ill-defined, exhibiting high variance. To circumvent these possible problems, we used the ridge regression whose analytical solution is given in Equation (4) (Hastie et al., 2001).


Formula 4

(4)
In Equation (4), c is a complexity parameter, and I an identity matrix of 381 by 381 where the last diagonal element is set to 0. Obviously, if c approaches infinity, β would approach a null matrix, assigning 0 to all amino acids wherever they occur in a sequence segment. In contrast, if c approaches 0, β in Equation (4) goes back to β in Equation (3). Complexity parameters in the range of 10–4 to 102 were found to yield nearly identical stable results on the dataset used. The β in Equation (4) with the complexity parameter set to 50 was named MINS.

2.4 Predicting TM segments from protein sequences
For the 73 protein chains, TM segments were automatically defined as non-overlapping sequence segments of 19 residues long whose average of signed Z coordinates is in the range of –1 Å and 1 Å. All prediction methods were tested using single sequence inputs as defined in each respective PDB file. This particular testing scheme might have adversely affected the performance of some prediction methods. For hydrophobicity scales including MINS, a very simple scheme was adopted for predicting TM segments: a hydropathy plot was generated using the scoring function defined in Equation (5) with a window size set to 19 residues, and then all minima below a threshold t were taken as TM predictions. The threshold t was set by using a jack-knife test. Namely, the optimal threshold for a given protein chain was set such that it resulted in the best performance (measured as the number of protein chains for which the number and locations of TM segments were correctly predicted) for the other 72 protein chains. For MINS, this meant a double jack-knife test should be performed because a matrix for a given protein chain itself was also derived using the other 72 protein chains. The 5329 (= 73 x 73) matrices for the double jack-knife testing of MINS are available upon request from the authors. Another point worth noting is that the 4 hydrophobicity scales tested in this study (WW, KD, EIS and GES, see Tables 1 and 2) have traditionally been used for predicting TM segments without scaling via Equation (5). Thus, the scaling via Equation (5) might adversely affect their performance. To ensure their optimal performance, we repeated the same benchmarking for them without the scaling. It turned out that the scaling has no effect on the performance of the Wimley–White octanol (WW), Kyte–Doolittle (KD) and Eisenberg (EIS) scales and improves the performance of the Goldman-Engelman–Steitz (GES) scale. The results in Table 2 are those with the scaling. For HMMTOP, TMHMM, Phobius and MEMSAT3 with PSSMs from PSI-BLAST, predictions were obtained from each respective web server. A prediction was deemed correct if it overlaps with an experimental annotation for ≥5 residues. Care was taken to make sure that a given experimental annotation was matched with only one TM prediction during evaluation, as discussed before (Chen et al., 2002). The annotated 73 protein chains used in this benchmarking are available in the Supplementary Material.


View this table:
[in this window]
[in a new window]

 
Table 2. Performance comparison for TM segment prediction

 
2.5 Prediction of the membrane insertion free energy for TM segments occurring in known structures
For each protein chain, TM segments were defined as in Section 2.4. Sometimes, the 19-residue long neighbours of such a TM segment (obtained by shifting it 1–3 residues to the left or right) possess lower free energies. Thus, we additionally considered six different boundaries and assigned the lowest free energy. The dihedral angles were computed for the residues within the boundary with the lowest free energy. Free energies were predicted in a jack-knife scheme. Namely, for predicting the free energies of TM segments of a given protein chain, the other 72 protein chains were utilized for deriving a matrix (MINS) and an associated scaling function [Equation (5)].

2.6 Miscellaneous computation
The relative exposure of TM segments was computed using the program suite VOLBL (Edelsbrunner, 1995; Edelsbrunner et al., 1995) as described previously (Park and Helms, 2007; Park et al., 2007). The only difference from previous computations is that, since we are here interested in how much a TM segment gets buried due to the rest of the protein structure, the relative exposure of a TM segment was defined as its accessibility in the protein structure divided by its accessibility when isolated from the rest of the protein structure.

The average of dihedral angles was computed by a weighted iteration procedure such that the average of –160° and 160° is 180° (or equivalently –180°) but not 0°. Standard deviations (SDs) of dihedral angles were computed in the same manner.


    3 RESULTS AND DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
3.1 MINS—motivation and derivation
The membrane insertion free energies of protein sequences have traditionally been predicted by sliding window-based hydropathy analysis. An implicit assumption of this approach is that the translocon complex applies the same hydrophobicity scale to all sequence positions. Even though some variant methods assign differential weights to the sequence positions, the same hydrophobicity scale is still applied to all sequence positions. Recent experimental and computational studies strongly suggest that (1) the translocon complex uses distinct scales for different sequence positions (Hessa et al., 2005a) and (2) the scales are asymmetric across sequence positions (Chamberlain and Bowie, 2004; Chamberlain et al., 2004). One of the simplest possible generalizations of the traditional constant hydrophobicity-based approach would be a matrix. Following the experimental studies where the translocon-mediated membrane insertion free energies of protein sequences were measured (Hessa et al., 2005a, b, 2007), we based the derivation of a matrix to sequence segments of 19-residues long. Whereas an ideal dataset for deriving such a matrix would be a large set of 19-residue long sequence segments and their free energies, we could identify such values only for 357 sequence segments in the literature (Hessa et al., 2005a, b, 2007). Obviously, this is not enough for an accurate derivation of a matrix with 380 elements. Thus, we were forced to take a detour. As described in Section 2.2, we utilized 13 836 sequence segments taken from 73 protein chains with known structure. The membrane insertion free energies of the 13 836 segments are, of course, unknown. Instead, we hypothesized that one may approximate the membrane insertion free energy of a sequence segment by the distance from the membrane centre of its middle residue, the so-called unsigned Z coordinates. The validity of this approach, of course, needs to be checked based on the compatibility of its predictions with known properties of HMPs and the translocon complex (see below). On the basis of this dataset, we derived MINS in such a way that Z coordinates predicted from it are maximally correlated with known Z coordinates (Section 2.3).

3.2 Benchmarking of MINS
As mentioned earlier, the experimental membrane insertion free energies for 357 sequence segments have been known. Since they were not used for the derivation of MINS, they can serve as a good external validation set. Given its derivation procedure, MINS can predict free energies only on a relative scale. The raw MINS predictions should be properly scaled on the basis of known cases. To minimize the influence of a scaling function, we used the simplest one—linear regression, not ridge regression—as the scaling function [Equation (5)].


Formula 5

(5)
In Equation (5), r is the raw prediction by MINS [Equation (6)] for a given sequence segment of 19-residues long, and a and b are the free parameters of the scaling function having units of energy per mole.


Formula 6

(6)

In Equation (6), the index i indicates the sequence position, running from 1 through 19, and the index j indicates the identity of the ith amino acid. In other words, Mij is the entry corresponding to the ith amino acid of the sequence segment in the ith row of the MINS matrix. For an unbiased benchmarking of MINS using the 357 known cases, we performed a jack-knife test. Namely, for predicting the membrane insertion free energy of a given case, the other 356 cases were used for estimating a and b of the scaling function. The MINS matrix and the 357 set of a and b values as well as a plot of MINS-predicted and experimentally measured free energies for the 357 cases are available in the Supplementary Material.

The prediction performance of MINS is shown in Table 1. The correlation coefficient (CC) between experimentally measured and predicted {Delta}Gapp by MINS is 0.74, which corresponds to a mean unsigned error (MUE) of 0.41 kcal/mol. How significant are these results? To this end, the same benchmark test was carried out using four widely known hydrophobicity scales—the WW scale (Wimley et al., 1996) as specified by Jayasinghe et al. (2001), the KD scale (Kyte and Doolittle, 1982), the GES scale (Engelman et al., 1986) and the normalized Eis scale (Eisenberg et al., 1982). Table 1 shows that the results obtained by MINS are significantly better than those obtained by the four hydrophobicity scales, suggesting that the asymmetric, position-specific scales in MINS better mimic the translocon complex.

Another aspect that one may wonder about MINS is how good it is in identifying TM segments given protein sequences. This has been a classical bioinformatics problem in membrane protein research. We stress that it is not our aim to develop a new method for predicting TM segments (or more broadly membrane topology) because a vast array of diverse methods has been proposed for this purpose. Rather we are interested in checking what practical value MINS would have, exemplified here as the ability to predict TM segments. To this end, 316 TM segments identified from the 73 protein chains were taken as gold standard as described in Section 2.4. Table 2 summarizes the benchmark results. To see how significant the results of MINS are, the same benchmarking was repeated using commonly known prediction methods (MEMSAT3 (Jones, 2007), TMHMM2 (Krogh et al., 2001), HMMTOP (Tusnady and Simon, 1998; Tusnady and Simon, 2001) and Phobius (Kall et al., 2004)) as well as the four hydrophobicity scales. MINS is among the best, displaying balanced predictions in the sense that it neither over-predicts nor under-predicts. In addition to its excellent performance for bitopic protein chains, its performance for polytopic protein chains also appears respectable, given the performance of other tested methods. Thus, it seems that the ability of MINS to reasonably predict membrane insertion free energies is favourably translated for this prototypic practical task.

3.3 Interpretation of MINS
Even though the 380 entries of MINS are not to be interpreted as thermodynamic free energy values, they are expected to comply with known properties of the translocon complex. A plot of the 380 entries of MINS is available in the Supplementary Material.

First we focus on the 10th row of MINS (the scale for sequence position 10 of a 19-residue long sequence segment, referred to as MINS10), because there is a gold standard to be compared with, namely, the biological hydrophobicity scale (Hessa et al., 2005a). Although the biological hydrophobicity scale has been lively debated (Dorairaj and Allen, 2007; Shental-Bechor et al., 2006), previous statistical studies yielded scales comparable to the biological hydrophobicity scale (Senes et al., 2007; Ulmschneider et al., 2005), suggesting that the biological hydrophobicity scale is consistent with biochemistry deduced from known structures. The correlation coefficient between MINS10 and the biological hydrophobicity scale is 0.96, indicating that at least the 10th row of MINS does make sense biologically.

Second, we discuss the overall pattern of the 19 position-specific scale values of MINS for each of the 20 amino acids. The overall V-shape for Phe, Val, Ile and Leu, and the overall inverted V-shape for Asp, Asn, Glu, Gln, Pro, Lys and Arg are compatible with previous observations (Hessa et al., 2005a, b; Senes et al., 2007; Ulmschneider et al., 2005, 2007). In addition, the occurrence of minima for Trp and Tyr at the peripheries of sequence segments is also consistent with their abundance at the polar/non-polar boundary of the membrane bilayer (Domene et al., 2003; Senes et al., 2007; Ulmschneider et al., 2005; Yau et al., 1998). Thr and Gly exhibit rather flat patterns, and Ala displays a shallow minimum at sequence position 10, both as reported previously (Senes et al., 2007; Ulmschneider et al., 2005). Thus, the overall patterns in MINS are in excellent agreement with current knowledge on membrane proteins.

Finally, we check the asymmetries of MINS for both hydrophobic and hydrophilic amino acids. Recently, Bowie and his coworkers noted that seven hydrophilic amino acids (Tyr, Trp, Gln, Asp, His, Arg and Lys) consistently show a strong populational bias at the N- versus C-termini of TM helices (Chamberlain et al., 2004). Among them, Tyr was found to exhibit an unusual populational bias in being favoured at the C-termini of TM helices over the N-termini, which was explained by its peculiar snorkelling preference. This peculiarity of Tyr is exactly captured by MINS. MINS also captures the observed populational biases for the other five hydrophilic amino acids (Trp, Asp, His, Arg and Lys). For Gln, however, a slight preference for the C-terminal positions is observed in MINS, which might be due to different datasets used in that study and this one. Hydrophobic amino acids were found to have a general preference for the C-termini of TM segments over the N-termini (Chamberlain et al., 2004), as in MINS. Thus, the asymmetries found for both hydrophilic and hydrophobic amino acids in MINS appear reasonable in light of previous findings.

3.4 Application of MINS
As described in Section 2.5, MINS was used to assign the membrane insertion free energies of 316 TM segments identified from known structures. The detailed results are available in the Supplementary Material. The distribution of the predicted {Delta}Gapp values is shown in Figure 1. For comparison, the distributions of the predicted free energies for secreted and cytoplasmic proteins are also plotted. Two points are noteworthy. First, TM segments and non-TM proteins are clearly distinguished in an expected way. Second, large fractions of TM segments possess unfavourable free energies, which are not easily reconciled with the idea that TM segments get inserted into the membrane on their own. Instead, the data suggest (1) that some TM segments cooperate during the membrane insertion step, driving assisted membrane insertion of weakly hydrophobic TM segments (Heinrich and Rapoport, 2003; Meindl-Beinker et al., 2006) and/or (2) that some chaperone proteins are involved in the membrane insertion of marginally hydrophobic TM segments. The same analysis using Hessa's recent scoring function (Hessa et al., 2007) yielded nearly identical results (available in the Supplementary Material), lending support to the validity of the current analysis. On the other hand, it should be recalled that free energies here are not ‘true’ free energies but ‘apparent’ free energies, and thus should be interpreted with certain caveats. For this reason, we use such terms as ‘unfavourable’ and ‘favourable’ only in a relative sense.


Figure 1
View larger version (21K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Distributions of the predicted membrane insertion free energies of 316 TM segments and secreted and cytoplasmic proteins.

 
Which other features besides their amino-acid sequences correlate with the membrane insertion free energies of TM segments? First, prompted by a suggestion that TM segments of bitopic proteins tend to be more hydrophobic than those of polytopic proteins (Arkin and Brunger, 1998), we computed the average free energy of the 22 TM segments of bitopic proteins and that of the 294 TM segments of polytopic proteins, respectively. We find the average free energy of bitopic proteins (1.29 kcal/mol) to be lower than that of polytopic proteins (1.69 kcal/mol). This difference, when evaluated by the Wilcoxon rank sum test, is not statistically significant (P-value of 0.22), though. Yet, it is still attractive to speculate that TM segments of bitopic proteins should be hydrophobic enough to be able to insert on their own, whereas those of polytopic proteins do not have to be that hydrophobic because their membrane insertion can be assisted by neighbouring TM segments. Along these lines, we hypothesized that N-terminal TM segments, C-terminal TM segments and internal TM segments of polytopic proteins may exhibit distinct membrane insertion behaviours. For the case of the N-terminal TM segments, there is none that can pull them into the membrane during their insertion. For the C-terminal TM segments, there is none that can push them into the membrane during their insertion. In this regard, the N- and C-terminal TM segments of polytopic proteins would somewhat resemble the TM segments of bitopic proteins. Indeed, the average free energy of the N-terminal TM segments of polytopic proteins (which are 51 in total) is 1.21 kcal/mol and that of the internal TM segments (which are 192 in total) is 1.92 kcal/mol. This difference is statistically significant (P < 0.002). Also, the difference between the C-terminal (1.31 kcal/mol) and internal TM segments is significant (P < 0.006), suggesting that our hypothesis holds. As expected, when the TM segments of bitopic proteins are compared to the internal TM segments of polytopic proteins, the difference gets a bit more pronounced (P-value of 0.08). In summary, TM segments that should be able to insert on their own appear to have more favourable membrane insertion free energies than those whose membrane insertion can be assisted by others.

Another characteristic that may correlate with the membrane insertion free energy of a TM segment is its degree of exposure to the membrane in the tertiary structure. The logic behind this hypothesis is as follows. If the folding of a HMP occurs concurrently as its TM helices get inserted into the membrane, then the immediate environment that inserting TM helices face would be similar to that found in the fully folded tertiary structure. This reasoning predicts that TM segments that are exposed to the membrane in the tertiary structure would possess more favourable free energies than those that are buried. To investigate this issue, we computed the relative exposure of the TM segments of polytopic proteins as described in Section 2.6. As shown in Table 3, the difference in free energy between buried and exposed TM segments is very significant. For example, the average free energy of the most exposed TM segments within the top 10% is 1.22 kcal/mol, while that of the most buried TM segments within the top 10% is 2.65 kcal/mol. The P-value estimating the statistical significance of this difference is < 2.0 x 10–4. When comparing larger fractions, the difference gets much more pronounced. Thus, our analysis strongly suggests that HMPs fold concurrently as their TM segments get inserted into the membrane via the translocon complex.


View this table:
[in this window]
[in a new window]

 
Table 3. Comparison of the membrane insertion free energy between buried and exposed TM segments of polytopic proteins

 
Finally, TM segments with unfavourable membrane insertion free energies are expected to adjust their conformations such that the free energy of the whole system gets minimized. A visual inspection of TM segments with unfavourable free energies revealed that they often deviate from the ‘canonical’ helix conformation, which is defined as a conformation with backbone dihedral angles approaching –62° (phi) and –41° (psi) as previously suggested (Barlow and Thornton, 1988; Blundell et al., 1983). To quantify this observation, the average and SD of the backbone dihedral angles were compared. As shown in Table 4, the average dihedral angles of the TM segments with favourable free energies are closer to the canonical values than those of the TM segments with unfavourable free energies. In addition, the SDs are always larger for the TM segments with unfavourable free energies, indicating that TM segments with unfavourable free energies exhibit more widely spread dihedral angles. It may be concluded that the TM configuration of TM segments with unfavourable membrane insertion free energies is stabilized by conformational adaptations.


View this table:
[in this window]
[in a new window]

 
Table 4. Comparison of the dihedral angles

 
3.5 Limitation of MINS
As mentioned at the end of Section 1, we did not distinguish TM segments with the N-in (N-terminus inside) topology from those with the C-in topology in deriving MINS, unlike the approach taken by Ulmschneider et al. (2005). Given the asymmetric properties of biological membranes, the membrane insertion free energy for the N-in topology may not be the same as that for the C-in topology. Thus, it would be desired to also consider insertion topologies along with the asymmetric properties of TM segments induced by the helix directionality in deriving matrices. However, we could not carry out such complete analyses primarily because few experimental data are available on it. It is to be noted that the 357 known free energies were all measured in the C-in topology. Thus, until more experimental data become available on how much the insertion topology affects membrane insertion free energies, MINS would represent a satisfactory approximation, which is complementary to Ulmschneider's potentials.


    4 CONCLUSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
In this study, we have developed MINS, a novel sequence-based computational method for predicting the translocon-mediated membrane insertion free energies of protein sequences. Benchmarking on 357 known cases shows that free energies predicted by MINS agree closely with those experimentally measured. MINS is also quite effective in predicting TM segments. The scale values of MINS are shown to be consistent with known biochemical features of the translocon complex. An in-depth analysis of the predicted free energies for 316 TM segments identified in known structures provides a number of interesting insights into the biogenesis and stability of HMPs.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
We thank crystallographers of HMPs because the current work would have been impossible without their work.

Funding: This work was supported by Grant I/80469 of the Volkswagen Foundation.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Anna Tramontano

Received on December 10, 2007; revised on March 4, 2008; accepted on March 29, 2008

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Altschul SF, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res (1997) 25:3389–3402.[Abstract/Free Full Text]

    Arkin IT, Brunger AT. Statistical analysis of predicted transmembrane alpha-helices. Biochim. Biophys. Acta (1998) 1429:113–128.[CrossRef][Medline]

    Barlow DJ, Thornton JM. Helix geometry in proteins. J. Mol. Biol (1988) 201:601–619.[CrossRef][Web of Science][Medline]

    Blundell T, et al. Solvent-induced distortions and the curvature of alpha-helices. Nature (1983) 306:281–283.[CrossRef][Medline]

    Chamberlain AK, Bowie JU. Analysis of side-chain rotamers in transmembrane proteins. Biophys. J (2004) 87:3460–3469.[CrossRef][Web of Science][Medline]

    Chamberlain AK, et al. Snorkeling preferences foster an amino acid composition bias in transmembrane helices. J. Mol. Biol (2004) 339:471–479.[CrossRef][Web of Science][Medline]

    Chen CP, et al. Transmembrane helix predictions revisited. Protein Sci (2002) 11:2774–2791.[CrossRef][Web of Science][Medline]

    Domene C, et al. Lipid/protein interactions and the membrane/water interfacial region. J. Am. Chem. Soc (2003) 125:14966–14967.[CrossRef][Web of Science][Medline]

    Dorairaj S, Allen TW. On the thermodynamic stability of a charged arginine side chain in a transmembrane helix. Proc. Natl Acad. Sci. USA (2007) 104:4943–4948.[Abstract/Free Full Text]

    Edelsbrunner H. The union of balls and its dual shape. Discrete Comput. Geom (1995) 13:415–440.[CrossRef]

    Edelsbrunner H, et al. Measuring proteins and voids in proteins. (1995) 256–264. Proceedings of the 28th Annual Hawaii International Conference System Sciences, 1995. Vol. V of Biotechnology Computing.

    Eisenberg D, et al. The helical hydrophobic moment: a measure of the amphiphilicity of a helix. Nature (1982) 299:371–374.[CrossRef][Medline]

    Engelman DM, et al. Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins. Annu. Rev. Biophys. Biophys. Chem (1986) 15:321–353.[CrossRef][Web of Science][Medline]

    Gallagher MJ, et al. The GABAA receptor alpha1 subunit epilepsy mutation A322D inhibits transmembrane helix formation and causes proteasomal degradation. Proc. Natl Acad. Sci. USA (2007) 104:12999–13004.[Abstract/Free Full Text]

    Granseth E, et al. A study of the membrane-water interface region of membrane proteins. J. Mol. Biol (2005) 346:377–385.[CrossRef][Web of Science][Medline]

    Hastie T, et al. The Elements of Statistical Learning (2001) New York: Springer.

    Heinrich SU, Rapoport TA. Cooperation of transmembrane segments during the integration of a double-spanning protein into the ER membrane. EMBO J (2003) 22:3654–3663.[CrossRef][Web of Science][Medline]

    Henikoff S, Henikoff JG. Position-based sequence weights. J. Mol. Biol (1994) 243:574–578.[CrossRef][Web of Science][Medline]

    Hessa T, et al. Recognition of transmembrane helices by the endoplasmic reticulum translocon. Nature (2005a) 433:377–381.[CrossRef][Medline]

    Hessa T, et al. Membrane insertion of a potassium-channel voltage sensor. Science (2005b) 307:1427.[Abstract/Free Full Text]

    Hessa T, et al. Molecular code for transmembrane-helix recognition by the Sec61 translocon. Nature (2007) 450:1026–1030.[CrossRef][Medline]

    Jayasinghe S, et al. Energetics, stability, and prediction of transmembrane helices. J. Mol. Biol (2001) 312:927–934.[CrossRef][Web of Science][Medline]

    Jones DT. Do transmembrane protein superfolds exist? FEBS Lett (1998) 423:281–285.[CrossRef][Web of Science][Medline]

    Jones DT. Improving the accuracy of transmembrane protein topology prediction using evolutionary information. Bioinformatics (2007) 23:538–544.[Abstract/Free Full Text]

    Kall L, et al. A combined transmembrane topology and signal peptide prediction method. J. Mol. Biol (2004) 338:1027–1036.[CrossRef][Web of Science][Medline]

    Krogh A, et al. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol (2001) 305:567–580.[CrossRef][Web of Science][Medline]

    Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol (1982) 157:105–132.[CrossRef][Web of Science][Medline]

    Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics (2006) 22:1658–1659.[Abstract/Free Full Text]

    Lomize MA, et al. OPM: orientations of proteins in membranes database. Bioinformatics (2006) 22:623–625.[Abstract/Free Full Text]

    Meindl-Beinker NM, et al. Asn- and Asp-mediated interactions between transmembrane helices during translocon-mediated membrane protein assembly. EMBO Rep (2006) 7:1111–1116.[CrossRef][Web of Science][Medline]

    O'Donovan C, et al. High-quality protein knowledge resource: SWISS-PROT and TrEMBL. Brief. Bioinform (2002) 3:275–284.[Abstract/Free Full Text]

    Park Y, et al. Prediction of the burial status of transmembrane residues of helical membrane proteins. BMC Bioinformatics (2007) 8:302.[CrossRef][Medline]

    Park Y, Helms V. On the derivation of propensity scales for predicting exposed transmembrane residues of helical membrane proteins. Bioinformatics (2007) 23:701–708.[Abstract/Free Full Text]

    Pei J, Grishin NV. AL2CO: calculation of positional conservation in a protein sequence alignment. Bioinformatics (2001) 17:700–712.[Abstract/Free Full Text]

    Senes A, et al. E(z), a depth-dependent potential for assessing the energies of insertion of amino acid side-chains into membranes: derivation and applications to determining the orientation of transmembrane and interfacial helices. J. Mol. Biol (2007) 366:436–448.[CrossRef][Web of Science][Medline]

    Shental-Bechor D, et al. Has the code for protein translocation been broken? Trends Biochem. Sci (2006) 31:192–196.[CrossRef][Web of Science][Medline]

    Tector M, Hartl FU. An unstable transmembrane segment in the cystic fibrosis transmembrane conductance regulator. EMBO J (1999) 18:6290–6298.[CrossRef][Web of Science][Medline]

    Thompson JD, et al. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res (1994) 22:4673–4680.[Abstract/Free Full Text]

    Tusnády GE, et al. Transmembrane proteins in the Protein Data Bank: identification and classification. Bioinformatics (2004) 20:2964–2972.[Abstract/Free Full Text]

    Tusnady GE, Simon I. Principles governing amino acid composition of integral membrane proteins: application to topology prediction. J. Mol. Biol (1998) 283:489–506.[CrossRef][Web of Science][Medline]

    Tusnady GE, Simon I. The HMMTOP transmembrane topology prediction server. Bioinformatics (2001) 17:849–850.[Abstract/Free Full Text]

    Ulmschneider MB, et al. Properties of integral membrane protein structures: derivation of an implicit membrane potential. Proteins (2005) 59:252–265.[CrossRef][Web of Science][Medline]

    Ulmschneider MB, et al. A generalized born implicit-membrane representation compared to experimental insertion free energies. Biophys. J (2007) 92:2338–2349.[CrossRef][Web of Science][Medline]

    van den Berg B, et al. X-ray structure of a protein-conducting channel. Nature (2004) 427:36–44.[CrossRef][Medline]

    Wallin E, von Heijne G. Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms. Protein Sci (1998) 7:1029–1038.[Web of Science][Medline]

    White SH, von Heijne G. Transmembrane helices before, during, and after insertion. Curr. Opin. Struct. Biol (2005) 15:378–386.[CrossRef][Web of Science][Medline]

    White SH, Wimley WC. Membrane protein folding and stability: physical principles. Annu. Rev. Biophys. Biomol. Struct (1999) 28:319–365.[CrossRef][Web of Science][Medline]

    Wimley WC, et al. Solvation energies of amino acid side chains and backbone in a family of host-guest pentapeptides. Biochemistry (1996) 35:5109–5124.[CrossRef][Web of Science][Medline]

    Yau WM, et al. The preference of tryptophan for membrane interfaces. Biochemistry (1998) 37:14713–14718.[CrossRef][Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
A. Rose, S. Lorenzen, A. Goede, B. Gruening, and P. W. Hildebrand
RHYTHM--a server to predict the orientation of transmembrane helices in channels and membrane-coils
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W575 - W580.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Y. Park and V. Helms
MINS2: Revisiting the molecular code for transmembrane-helix recognition by the Sec61 translocon
Bioinformatics, August 15, 2008; 24(16): 1819 - 1820.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/10/1271    most recent
btn114v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Park, Y.
Right arrow Articles by Helms, V.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Park, Y.
Right arrow Articles by Helms, V.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?