Bioinformatics Advance Access originally published online on January 18, 2007
Bioinformatics 2007 23(6):701-708; doi:10.1093/bioinformatics/btl653
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
On the derivation of propensity scales for predicting exposed transmembrane residues of helical membrane proteins
Center for Bioinformatics, Saarland University, Germany
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Helical membrane proteins (HMPs) play a crucial role in diverse physiological processes. Given the difficulty in determining their structures by experimental techniques, it is desired to develop computational methods for predicting the burial status of transmembrane residues. Deriving a propensity scale for the 20 amino acids to be exposed to the lipid bilayer from known structures is central to developing such methods. A fundamental problem in this regard is what would be the optimal way of deriving propensity scales. Here, we show that this problem can be reformulated such that an optimal scale is straightforwardly obtained in an analytical fashion. The derived scale favorably compares with others in terms of both algorithmic optimality and practical prediction accuracy. It also allows interesting insights into the structural organization of HMPs. Furthermore, the presented approach can be applied to other bioinformatics problems of HMPs, too.
All the data sets and programs used in the study and detailed primary results are available upon request.
Contact: volkhard.helms{at}bioinformatik.uni-saarland.de
| 1 INTRODUCTION |
|---|
|
|
|---|
Helical membrane proteins (HMPs) play a crucial role in diverse physiological processes, including energy generation, signal transduction, the transport of solutes across the membrane, and the maintenance of ionic and proton gradients. Several studies have suggested that HMPs account for 20–30% of open reading frames of sequenced genomes (Liu et al., 2002; Wallin and von Heijne, 1998). In spite of their physiological importance and genomic abundance, < 1% of the proteins with known structure are HMPs (Chen and Rost, 2002).
Given this circumstance, it is desirable to develop computational methods for predicting structural aspects of HMPs. At the heart of such efforts lies the development of a propensity scale for the 20 amino acids to be exposed to the lipid bilayer (Adamian et al., 2005; Beuming and Weinstein, 2004; Pilpel, et al., 1999). Based on the recently increased number of experimentally determined 3D structures, Beuming and Weinstein derived a knowledge-based scale (the BW scale), which in combination with sequence conservation patterns enabled them to predict the burial status of TM residues with an accuracy of 80% (Beuming and Weinstein, 2004). Remarkably, Adamian and Liang (the TMLIP1/TMLIP2 scales) achieved a prediction accuracy of 88% in a similar study by taking advantage of the fact that most helix–helix interactions in the TM region can be recapitulated as occurring between two heptad repeat frames originally developed for coiled coils (Adamian and Liang, 2006).
The ways the BW and TMLIP1/TMLIP2 scales were derived represent three different learning algorithms for deriving a propensity scale from known structures. Our previous study revealed that the three algorithms are not equally effective (Park and Helms, 2006). A natural question that arises is which algorithm works better and why. Or more fundamentally, what would be the optimal way of deriving a propensity scale? This is an important problem not only from an algorithmic viewpoint but also from a practical viewpoint. An optimal algorithm would yield a propensity scale that faithfully captures the affinities of the 20 amino acids to preferentially interact with the lipid bilayer as reflected in experimental HMP structures. The knowledge of such a scale might allow interesting insights into the folding of HMPs. In this article, we show that one can reformulate this problem by selecting a sensible objective function such that an optimal scale (the MO scale) is straightforwardly obtained in an analytical fashion. A comparative analysis reveals that the MO scale favorably compares with others not only in terms of algorithmic optimality but also in terms of practical prediction accuracy. The MO scale also suggests interesting insights into the structural organization of HMPs compared with that of soluble proteins. Moreover, we show that the algorithm used for deriving the MO scale can be also applied to other bioinformatics problems of HMPs.
| 2 METHODS |
|---|
|
|
|---|
2.1 Generation of the data set
A non-redundant high-quality data set (<25% pairwise sequence identity and resolution better than 3.0 Å) was generated based on the lists of HMPs with known structure compiled by White (http://blanco.biomol.uci.edu) and by Michel (http://www.mpibp-frankfurt.mpg.de/michel/public/memprotstruct.html) as of September 2006. Some protein chains were omitted in spite of satisfying the above criteria either because we could not retrieve enough numbers of homologous sequences from sequence databases or the average pairwise identity of aligned sequences is greater than 80% (i.e. a diverse set of homologous sequences is not available). The final data set comprises 41 protein chains of 2901 TM residues (Table 1). To ensure the high homogeneity of the data set, residues located outside of the hydrophobic core of the lipid bilayer were excluded from the data set. The hydrophobic core of each protein chain, defined to be the region for which the probability of occurrence of the hydration waters of the lipid head groups is zero (White and Wimley, 1999), was derived from the location of the carbonyl groups of the lipid molecules along the membrane normal and the effective hydration profile obtained from the OPM database (Lomize et al., 2006a, b).
|
2.2 Computation of exposure patterns
The classification of a residue as being exposed or buried was based on its relative solvent-accessible surface area (rSASA) value. Several choices need to be made for the accurate computation of rSASA values. First, the probe radius should be properly chosen. Previous studies used the probe radii of 1.4 Å (the approximate radius of a free water molecule) or 1.9 Å (the approximate radius of a –CH2– group) (Adamian, et al., 2005; Beuming and Weinstein, 2004). Given that the solvents surrounding the hydrophobic core parts of HMPs are hydrocarbon chains of phospholipids, we believe that 1.4 Å is not a proper choice. 1.9 Å is not suitable, either, because the CH2 group of phospholipids is part of a long hydrocarbon chain and would not have a full mobility like a free –CH2– group. Thus, a value larger than 1.9 Å that well-approximates the effective radius of the CH2 group of hydrocarbon chains should be chosen. In this study, we empirically set the probe radius to 2.2 Å. Second, when necessary, the two faces of the TM region (the cytoplasmic and exoplasmic faces) were capped with dummy atoms before computing SASA values. Many HMPs contain large interval cavities, and, without capping, large SASA values were assigned to residues lining internal cavities, making these residues look as if they were facing the lipid bilayer. Upon capping, internal cavities that are inaccessible to the probe were identified and excluded in computing SASA values. Actual computations were carried out by using the program suite VOLBL (Edelsbrunner, 1995; Edelsbrunner et al., 1995). SASA values were normalized by dividing them by reference values to yield rSASA values. The reference value for an amino acid, X, is its SASA in the context of a nonapeptide helix GGGG-X-GGGG. It is an open issue which reference state to use. GGGG-X-GGGG and G-X-G have been used in similar work (Adamian et al., 2005; Beuming and Weinstein, 2004). In our case, essentially the same results were obtained using G-X-G of a helical conformation as a reference state (see Supplementary Information).
Another point to be clarified in computing rSASA values is whether to use a monomeric or oligomeric form. Since there are few experimental data for the oligomerization of HMPs, it is not clear in most cases which form to choose. Presumably for this reason, different studies from different groups as well as different studies from the same group adopted HMPs of different oligomeric status in deriving propensity scales (Adamian and Liang, 2006; Adamian et al., 2005; Beuming and Weinstein, 2004). Our guiding principle was the degrees of conservation for the residues involved in the oligomerization. The very reason that buried residues tend to be more conserved than exposed ones (Baldwin et al., 1997; Donnelly et al., 1993; Stevens and Arkin, 2001; Yeates et al., 1987) is that they are central to maintaining structural and/or functional integrity. Thus, we reasoned that if oligomeric forms are absolutely necessary for whatever reasons, this obligatory nature should be reflected in the degrees of conservation for the residues involved in the oligomerization. The analysis revealed that the potassium channel (1R3J in Table 1) is the only one for which the use of oligomeric form is justified (see Supplementary Information). Cytochrome bc1 complex is known to function as a dimer, and the 1PP9 structure in fact reveals a dimeric form (Huang et al., 2005). However, the dimerization is mediated by residues located outside of the hydrophobic core, which is why a monomeric form is also adopted for this protein.
2.3 Computation of profiles, positional scores and conservation indices
In general, the use of a profile (the frequencies of the 20 amino acids for a sequence position) improves the performance of sequence-based prediction methods. Thus, we derived the profiles of the protein chains in Table 1 as described before (Park and Helms, 2006). Briefly, for each protein chain, a maximum of 1000 homologous sequences were retrieved from the non-redundant database using BLAST (Altschul et al., 1997). Initial MSAs were then built by using ClustalW (Thompson et al., 1994). Then, sequence fragments were deleted from the MSA. Sequences that are <25% identical to the query sequence were also removed. The remaining sequences were realigned using ClustalW to yield a final MSA, which was used to obtain the profiles. When deriving profiles from an MSA, amino acid frequencies were weighted using a modified method of Henikoff and Henikoff as implemented in PSI-BLAST (Altschul et al., 1997; Henikoff and Henikoff, 1994). Actual computations were performed using the program AL2CO (Pei and Grishin, 2001), which is freely available at ftp://iole.swmed.edu/pub/al2co.
For a given propensity scale P, the positional score of sequence position i, SP(i), is computed to be
|
| (1) |
Conservation indices were estimated by using the variance-based method (Pei and Grishin, 2001). Our previous study showed that this method performs slightly better than other alternatives.
|
| (2) |
2.4 Performance measures
The correlation coefficient (cc) for a set of n data points (xi, yi) was computed as follows:
|
| (3) |
|
| (4) |
| 3 RESULTS AND DISCUSSION |
|---|
|
|
|---|
3.1 Problem statement
Given a set of known HMP structures, the task is to derive an optimal propensity scale of the 20 amino acids to be exposed to the lipid bilayer. The derived scale would faithfully capture the affinities of the 20 amino acids to preferentially interact with the lipid bilayer. Also, it would allow one to predict exposed residues from the sequence with a highest possible accuracy under a linear regime as represented by equation 1.
3.2 Overview of the previous algorithms
Before introducing our novel learning algorithm, it is helpful to review the previous algorithms that were used to derive the BW and TMLIP1/TMLIP2 scales. It is often difficult to directly compare performance values of different learning algorithms reported in different studies, because of the variety of data sets used and the discrepancy in state definitions. To facilitate a transparent performance comparison, we implemented the algorithms for the BW and TMLIP1/TMLIP2 scales and carried out comparisons on the common data set (Table 1).
The BW scale was derived in the following way (Beuming and Weinstein, 2004).
- For each amino acid type, compute its SF value, which is a sum of surface fraction values of exposed TM residues (defined as those with an rSASA >0.10 when computed by using a probe with radius of 1.4 Å and the reference value from a tripeptide G-X-G in extended conformation).
- Identify the highest and lowest SF values (SFhigh and SFlow).
- Compute a propensity value for amino acid type j as (SFj-SFlow)/(SFhigh – SFlow)
The TMLIP1 scale was derived as follows (Adamian et al., 2005).
- Compute Nj,s (the number of exposed TM residues of type j, with exposed being defined as rSASA >0.0 when computed by using a probe with radius of 1.9 Å), Ns (the number of exposed TM residues of all types), Nj (the number of TM residues of type j), N (the number of all TM residues).
- Pj,s = Nj,s/Ns, and Pj = Nj/N.
- Compute a propensity value for amino acid type j as log(Pj,s/Pj).
- Compute Nj,s, Ns as for the TMLIP1 scale.
- Compute Nj (the number of buried TM residues of type j), N (the number of buried TM residues of all types).
- Pj,s = Nj,s/Ns, and Pj = Nj/N.
- Compute a propensity value for amino acid type j as log(Pj,s/Pj).
3.3 Derivation of an optimal propensity scale
Our starting point is fundamentally different from the approaches for the above three scales. We first ask what is meant by a propensity scale being optimal. In other words, how would one compare different propensity scales? Our answer is how strongly correlated the positional scores derived from a scale for given profiles (equation 1) are with the corresponding exposure patterns (rSASA values in our context). In fact, our answer is not novel. This measure (the Pearson's correlation coefficient between the positional scores and rSASA values) has been, for a long time, known to be a key property measuring the fundamental goodness of a prediction method and extensively used in the realm of bioinformatics of soluble proteins (Adamczak et al., 2004; Ahmad et al., 2003; Chen and Zhou, 2005; Li and Pan, 2001; Nguyen and Rajapakse, 2006; Pollastri et al., 2002; Rost and Sander, 1994; Sim et al., 2005; Thompson and Goldstein, 1996).
Now that the meaning of a scale being optimal is clear, our task is to derive a propensity scale in such a way that the positional scores derived from it for given profiles are maximally correlated with the corresponding exposure patterns. Optimization techniques such as gradient-based optimization and Monte Carlo techniques may be used for this purpose. However, they usually do not guarantee the optimality of obtained solutions. For this reason, they have to be run several times with different starting points.
We find out, however, that an exact solution for this problem can be straightforwardly obtained in an analytical fashion. Equation 5 provides the essential hint for our finding.
|
| (5) |
0, and r(β) the Pearson's correlation coefficient between them.
|
| (6) |
|
| (7) |
We would like to extract a general picture on the structural characteristics of HMPs from the limited data set of Table 1. The MO scale obtained by Equation 7 might represent an overfitting to the data set of Table 1. In order to extract a generalizable picture, we used the ridge regression analysis (equivalent to weight decay methods) with the complexity parameter empirically set to 0.00001 (Hastie et al., 2001). Regarding the choice of complexity parameters, it is to be noted that too small complexity parameters, e.g. 10–10, are likely to generate an MO scale overfitting to the used data set while too large complexity parameters might induce an unreasonably high degree of compression, assigning propensity values close to 0 to all amino acids.
The justification for using the correlation measure as an objective function is now clear. It is naturally connected to the sum of squared errors loss function, which enables one to treat the whole problem in an analytical fashion. Accordingly, the MO scale is guaranteed to be optimal in the linear regime, unlike others. Most importantly, this guaranteed optimality allows one to perform novel analyses with it (see Section 3.5).
3.4 Comparative analysis
A jack-nife test was used for measuring the performances of the propensity scales. For each protein chain in Table 1, four different positional scores were derived from its profile and temporary BW, TMLIP1, TMLIP2, MO scales that were derived from the data set of Table 1 excluding the protein chain in question. Then, the performance of each scale was assessed in two complementary ways. First, by the Pearson's correlation coefficient between the computed positional scores and rSASA values, corresponding to algorithmic optimality since we define being good as being strongly correlated. Second, by the accuracy of predicting the burial status of TM residues (equation 4). This corresponds to what we mean by practical prediction accuracy. Upon deriving positional scores, residues whose scores are higher than a cutoff value are classified as being exposed while those with a lower score as being buried. The cutoff value is objectively defined by a linear support vector machine on the basis of a training data set excluding the protein chain in question. We made use of the SVM implemented in R for this task with all parameters set to default values (Hsu and Lin, 2002; Karatzoglou et al., 2006; R Development Core Team, 2004).
The results of the comparative analysis are shown in Table 2. (It is to be noted that figures in Table 2 are only for the purpose of comparing the intrinsic goodness of the four propensity scales. For predicting the burial status in real applications, one would rely on more advanced methods along with other information available, e.g. conservation indices.) Table 2 reveals that the MO scale outperforms the others in terms of algorithmic optimality. In terms of practical prediction accuracy, the MO scale is better than the BW and TMLIP1 scales and compares favorably with the TMLIP2 scale. Thus, Table 2 experimentally validates the practical virtues of the logic behind the derivation of the MO scale.
|
A detailed analysis revealed two trends in the prediction results (see Supplementary Information). One is that the MO scale achieved more balanced predictions than others. The other is that, as the proportion of buried residues in the data set increased, the accuracy of predicting buried residues as being buried improved. This is possibly due to the weakening of bias introduced in the partitioning of the data set. The better performance of the TMLIP1/TMLIP2 scales over the BW scale suggests that it pays off to include normalization steps in deriving a propensity scale. The better performance of the TMLIP2 scale over the TMLIP1 scale indicates that not every null model is equally effective for normalization. On the other hand, the overall weak correlation between the positional scores computed from the MO scale and rSASA values of TM residues supports the suggestion that the lipophobic effect does not play a dominant role in folding of HMPs (Faham et al., 2004).
3.5 The MO scale
In addition to checking how the performance of the MO scale compares with that of others, it is of interest to find out how the MO scale itself compares with other scales because the MO scale accurately captures the affinities of the 20 amino acids to preferentially interact with the hydrophobic core of the lipid bilayer as reflected in experimental HMP structures.
Table 3 lists the MO scale, and Table 4 shows its correlations with other scales. As shown in Table 4, there is a strong correlation between the MO scale and other structure-based propensity scales. In contrast, the MO scale correlates poorly with hydrophobicity scales such as KD, EIS, GES, WW and Hessa. This observation supports the suggestion that the scale used by the translocon for recognizing TM segments is not the same as that for constrained partitioning of TM residues between being buried and exposed to the lipid bilayer (Pilpel et al., 1999).
|
|
Perhaps most striking is the observation that the MO scale for HMPs exhibits the strongest correlation with partial specific volume (PSV). Since the propensity values captured by the MO scale should reflect a net result of numerous complex interactions involved in the folding of HMPs, they are not expected to display a strong correlation with a single scale. The correlation with PSV is even stronger than that with the structure-based ones. However, the correlation with a related scale, the bulkiness scale, does not stand out as strongly. As a control experiment, we derived an analogous MO scale for a representative set of soluble protein structures (see Supplementary Information). As expected, the MO scale for soluble proteins strongly correlates with hydrophobicity scales. Yet, it displays only a weak correlation with PSV.
In an effort to facilitate the interpretation of the MO scales for HMPs and soluble proteins in terms of more intuitive scales such as the hydrophobicity scales and the Cohn & Edsall partial specific volumes, they were decomposed into binary combinations of these scales using a linear regression analysis (For a meaningful analysis, each scale was standardized to have mean of 0 and SD of 1). As shown in Table 5, the MO scale for HMPs can be interpreted as a hybrid of
0.7 of PSV with
0.3 of a hydrophobicity scale. In contrast, the MO scale for soluble proteins appears almost the same as hydrophobicity scales.
|
These analyses reveal the organizing principles of each type of protein structures. The strong correlation of the MO scale for soluble proteins with hydrophobicity scales and the decomposition analysis indicate that the hydrophobic effect is a major force behind the folding of soluble proteins, as has been long known (Pace et al., 1996). Regarding HMPs, two points are to be noted. First, the moderate correlation of the MO scale for HMPs with hydrophobicity scales and the decomposition analysis indicate that hydrophobicity still plays a role, albeit much weaker compared with the case of soluble proteins, in the thermodynamic stability of HMPs. Second, the strong correlation of the MO scale for HMPs with PSV and the decomposition analysis suggest, unexpectedly, that the structural organization of HMPs is better captured by PSV than by hydrophobicity. Our interpretation of this observation is as follows. In soluble proteins, functionally important residues are usually on the surface, and structural scaffolds are maintained by other residues buried inside. In this sense, functional and structural integrities are served by separate groups of residues. If this is not the case, a tradeoff between function and structural stability is to be made (Meiering et al., 1992; Schreiber et al., 1994; Shoichet et al., 1995; Zhang et al., 1992). In HMPs, functionally important residues are usually found buried inside. Also, HMPs are usually well-packed, presumably to compensate for the lack of the hydrophobic effect as a driving force for folding. Thus, functional and structural integrities are served by similar groups of residues, and one has to compromise between function and structural stability. One suitable way of doing so would be, whenever possible, to select amino acids with smaller partial specific volumes over those with larger partial specific volumes in the buried positions. As specific examples, we show the values from the MO and PSV scales for similarly charged/polar amino acids: S (MO: –0.19, PSV: 0.63) versus T (MO: –0.18, PSV: 0.70), R (MO: –0.21, PSV: 0.70) versus K (MO: –0.10, PSV: 0.82), D (MO: –0.27, PSV: 0.60) versus E (MO: –0.20, PSV: 0.66) and N (MO: –0.23, PSV: 0.62) versus Q (MO: –0.22, PSV: 0.67), suggesting that amino acids with a stronger tendency to be buried tend to display smaller partial specific volumes. Understandably, the intriguing analyses presented in this section are with due caveats owing to the small size of the current data set.
3.6 Applications
The approach for the derivation of the MO scale can be also applied to other bioinformatics problems of HMPs. We use the ProperTM method as an example (Beuming and Weinstein, 2004) for demonstration. ProperTM combines positional scores from the BW scale and sequence conservation patterns for improved predictions of the burial status of TM residues. Technically, it computes the overall score for sequence position i, OS(i), as 0.5 x [C(i) – SBW(i)], where C(i) is the conservation index for sequence position i and SBW(i) the positional score for sequence position i computed from the BW scale via equation 1. Since SBW(i) is a linear combination of the profile elements, the overall approach of ProperTM to derive an overall score for sequence position i can be cast as follows:
|
| (8) |
|
| 4. CONCLUSION |
|---|
|
|
|---|
The current study introduces a novel way of deriving a propensity scale for the 20 amino acids to be exposed to the lipid bilayer from known structures. The derived scale (the MO scale) favorably compares with others in terms of both algorithmic optimality and practical prediction accuracy. The MO scale also suggests interesting insights into the structural organization of HMPs. In addition, the same approach can be applied to the problem of optimally combining a propensity scale and sequence conservation patterns under a linear regime, as well.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
This work was supported by Grant I/80469 of the Volkswagen Foundation. We thank crystallographers of HMPs because the current work was impossible without their work.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Anna Tramontano
Received on November 20, 2006; revised on December 20, 2006; accepted on December 21, 2006
| REFERENCES |
|---|
|
|
|---|
Adamczak R, Porollo A, Meller J. Accurate prediction of solvent accessibility using neural networks-based regression. Proteins (2004) 56:753–767.[CrossRef][Web of Science][Medline]
Adamian L, Liang J. Prediction of transmembrane helix orientation in polytopic membrane proteins. BMC Struct. Biol. (2006) 6:13.[CrossRef][Medline]
Adamian L, Nanda V, Degrado WF, Liang J. Empirical lipid propensities of amino acid residues in multispan alpha helical membrane proteins. Proteins (2005) 59:496–509.[CrossRef][Web of Science][Medline]
Ahmad S, Gromiha MM, Sarai A. Real value prediction of solvent accessibility from amino acid sequence. Proteins (2003) 50:629–635.[CrossRef][Web of Science][Medline]
Altschul SF, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. (1997) 25:3389–3402.
Baldwin JM, Schertler GF, Unger VM. An alpha-carbon template for the transmembrane helices in the rhodopsin family of G-protein-coupled receptors. J. Mol. Biol. (1997) 272:144–164.[CrossRef][Web of Science][Medline]
Beuming T, Weinstein H. A knowledge-based scale for the analysis and prediction of buried and exposed faces of transmembrane domain proteins. Bioinformatics (2004) 20:1822–1835.
Chen CP, Rost B. State-of-the-art in membrane protein prediction. Appl. Bioinformatics (2002) 1:21–35.[Medline]
Chen H, Zhou HX. Prediction of solvent accessibility and sites of deleterious mutations from protein sequence. Nucleic Acids Res. (2005) 33:3193–3199.
Cohn EJ, Edsall JT. Proteins, amino acids and peptides (1943) New York: Reinhold Publ. Corp.
Donnelly D, Overington JP, Ruffle SV, Nugent JH, Blundell TL. Modeling alpha-helical transmembrane domains: the calculation and use of substitution tables for lipid-facing residues. Protein Sci. (1993) 2:55–70.[Web of Science][Medline]
Edelsbrunner H. The union of balls and its dual shape. Discrete Comput. Geom. (1995) 13:415–440.[CrossRef]
Edelsbrunner H, Facello M, Fu P, Liang J. Measuring proteins and voids in proteins. "Proc. 28th Ann. Hawaii Internat. Conf. System Sciences, 1995". 256–264. vol. V: Biotechnology Computing.
Eisenberg D, Schwarz E, Komaromy M, Wall R. Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J. Mol. Biol. (1984) 179:125–142.[CrossRef][Web of Science][Medline]
Engelman DM, Steitz TA, Goldman A. Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins. Annu. Rev. Biophys. Biophys. Chem. (1986) 15:321–353.[CrossRef][Web of Science][Medline]
Faham S, Yang D, Bare E, Yohannan S, Whitelegge JP, Bowie JU. Side-chain contributions to membrane protein structure and stability. J. Mol. Biol. (2004) 335:297–305.[CrossRef][Web of Science][Medline]
Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning (2001) Springer.
Henikoff S, Henikoff JG. Position-based sequence weights. J. Mol. Biol. (1994) 243:574–578.[CrossRef][Web of Science][Medline]
Hessa T, et al. Recognition of transmembrane helices by the endoplasmic reticulum translocon. Nature (2005) 433:377–381.[CrossRef][Medline]
Hsu CW, Lin CJ. A comparison on methods for multi-class support vector machines. IEEE Trans. Neural Networks (2002) 13:415–425.[CrossRef]
Huang LS, Cobessi D, Tung EY, Berry EA. Binding of the respiratory chain inhibitor antimycin to the mitochondrial bc1 complex: a new crystal structure reveals an altered intramolecular hydrogen-bonding pattern. J. Mol. Biol. (2005) 351:573–597.[CrossRef][Web of Science][Medline]
Karatzoglou A, Meyer D, Hornik K. Support Vector Machines in R. Journal of Statistical Software (2006) 15.
Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. (1982) 157:105–132.[CrossRef][Web of Science][Medline]
Li X, Pan XM. New method for accurate prediction of solvent accessibility from protein sequence. Proteins (2001) 42:1–5.[CrossRef][Web of Science][Medline]
Liu Y, Engelman DM, Gerstein M. Genomic analysis of membrane protein families: abundance and conserved motifs. Genome Biol. (2002) 3:research0054.0051–research0054.0012.
Lomize AL, Pogozheva ID, Lomize MA, Mosberg HI. Positioning of proteins in membranes: a computational approach. Protein Sci. (2006a) 15:1318–1333.[CrossRef][Web of Science][Medline]
Lomize MA, Lomize AL, Pogozheva ID, Mosberg HI. OPM: orientations of proteins in membranes database. Bioinformatics (2006b) 22:623–625.
Meiering EM, Serrano L, Fersht AR. Effect of active-site residues in barnase on activity and stability. J. Mol. Biol. (1992) 225:585–589.[CrossRef][Web of Science][Medline]
Nguyen MN, Rajapakse JC. Two-stage support vector regression approach for predicting accessible surface areas of amino acids. Proteins (2006) 63:542–550.[CrossRef][Web of Science][Medline]
Pace CN, Shirley BA, McNutt M, Gajiwala K. Forces contributing to the conformational stability of proteins. FASEB J. (1996) 10:75–83.[Abstract]
Park Y, Helms V. How strongly do sequence conservation patterns and empirical scales correlate with exposure patterns of transmembrane helices of membrane proteins? Biopolymers (2006) 83:389–399.[CrossRef][Web of Science][Medline]
Pei J, Grishin NV. AL2CO: calculation of positional conservation in a protein sequence alignment. Bioinformatics (2001) 17:700–712.
Pilpel Y, Ben-Tal N, Lancet D. kPROT: a knowledge-based scale for the propensity of residue orientation in transmembrane segments. Application to membrane protein structure prediction. J. Mol. Biol. (1999) 294:921–935.[CrossRef][Web of Science][Medline]
Pollastri G, Baldi P, Fariselli P, Casadio R. Prediction of coordination number and relative solvent accessibility in proteins. Proteins (2002) 47:142–153.[CrossRef][Web of Science][Medline]
R Development Core Team. R: A Language and Environment for Statistical Computing (2004) Vienna, Austria: R Foundation for Statistical Computing.
Rost B, Sander C. Conservation and prediction of solvent accessibility in protein families. Proteins (1994) 20:216–226.[CrossRef][Web of Science][Medline]
Schreiber G, Buckle AM, Fersht AR. Stability and function: two constraints in the evolution of barstar and other proteins. Structure (1994) 2:945–951.[Medline]
Shoichet BK, Baase WA, Kuroki R, Matthews BW. A relationship between protein stability and protein function. Proc. Natl. Acad. Sci. USA (1995) 92:452–456.
Sim J, Kim S.-Y, Lee J. Prediction of protein solvent accessibility using fuzzy k-nearest neighbor method. Bioinformatics (2005) 21:2844–2849.
Stevens TJ, Arkin IT. Substitution rates in alpha-helical transmembrane proteins. Protein Sci. (2001) 10:2507–2517.[CrossRef][Web of Science][Medline]
Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. (1994) 22:4673–4680.
Thompson MJ, Goldstein RA. Predicting solvent accessibility: higher accuracy using Bayesian statistics and optimized residue substitution classes. Proteins (1996) 25:38–47.[CrossRef][Web of Science][Medline]
Wallin E, von Heijne G. Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms. Protein Sci. (1998) 7:1029–1038.[Web of Science][Medline]
White SH, Wimley WC. Membrane protein folding and stability: physical principles. Annu. Rev. Biophys. Biomol. Struct. (1999) 28:319–365.[CrossRef][Web of Science][Medline]
Wimley WC, Creamer TP, White SH. Solvation energies of amino acid side chains and backbone in a family of host-guest pentapeptides. Biochemistry (1996) 35:5109–5124.[CrossRef][Medline]
Yeates TO, Komiya H, Rees DC, Allen JP, Feher G. Structure of the reaction center from Rhodobacter sphaeroides R-26: membrane-protein interactions. Proc. Natl. Acad. Sci. USA (1987) 84:6438–6442.
Zhang JH, Liu ZP, Jones TA, Gierasch LM, Sambrook JF. Mutating the charged residues in the binding pocket of cellular retinoic acid-binding protein simultaneously reduces its binding affinity to retinoic acid and increases its thermostability. Proteins (1992) 13:87–99.[CrossRef][Web of Science][Medline]
Zimmerman JM, Eliezer N, Simha R. The characterization of amino acid sequences in proteins by statistical methods. J. Theor. Biol. (1968) 21:170–201.[CrossRef][Web of Science][Medline]
This article has been cited by other articles:
![]() |
Y. Park and V. Helms Prediction of the translocon-mediated membrane insertion free energies of protein sequences Bioinformatics, May 15, 2008; 24(10): 1271 - 1277. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

