Bioinformatics Advance Access originally published online on May 8, 2006
Bioinformatics 2006 22(14):1800-1802; doi:10.1093/bioinformatics/btl176
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Prelude&Fugue, predicting local protein structure, early folding regions and structural weaknesses
Unité de Bioinformatique génomique et structurale, Université Libre de Bruxelles CP 165/61, Avenue Roosevelt 50, 1050 Bruxelles, Belgium
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: Prelude&Fugue are bioinformatics tools aiming at predicting the local 3D structure of a protein from its amino acid sequence in terms of seven backbone torsion angle domains, using database-derived potentials. Prelude&Fugue computes all lowest free energy conformations of a protein or protein region, ranked by increasing energy, and possibly satisfying some interresidue distance constraints specified by the user. Prelude&Fugue detects sequence regions whose predicted structure is significantly preferred relative to other conformations in the absence of tertiary interactions. These programs can be used for predicting secondary structure, tertiary structure of short peptides, flickering early folding sequences and peptides that adopt a preferred conformation in solution. They can also be used for detecting structural weaknesses, i.e. sequence regions that are not optimal with respect to the tertiary fold.
Availability: http://babylone.ulb.ac.be/Prelude_and_Fugue
Contact: Jean.Marc.Kwasigroch{at}ulb.ac.be
The programs Prelude&Fugue predict the local 3D structure of a peptide, protein or protein region in the absence of tertiary interactions (Rooman et al., 1991, 1992). The input is the amino acid sequence and the output contains the predicted local 3D structures described in terms of seven letters, representing each a (
,
,
)-backbone torsion angle domain. The letter A represents the domain that includes the
-helices, C the 310-helices, B the ß-type extended structures, P the poly-proline-type extended structures, G and E the positive-
conformations mirror symmetrical to the A/C and B/P domains, respectively, and O the cis-conformations. Side-chain degrees of freedom are neglected. By assigning to each letter the central (or average) value of the corresponding (
,
,
) domain, and considering average bond lengths and angles, a sequence of letters uniquely defines a 3D structure. However, two structures represented by the same succession of (
,
,
) letters usually differ, as the actual and average (
,
,
) values are not identical. Only for short peptides does this representation yield a good approximation of the 3D backbone structure, whereas for longer peptides it describes the local 3D structure.
To estimate the folding free energy of a protein structure, Prelude&Fugue use database-derived potentials describing local interactions along the chain, computed from the propensities of amino acid pairs to be associated with a backbone torsion angle domain. More precisely, the folding free energy
G(S, C) of a sequence S in a conformation C is approximated by:
![]() |
k a normalization factor. The sum goes over i, j and k values satisfying k 8
i
j
k + 8. P are joint probabilities, e.g. P(si, sj, tk) is the probability of finding two amino acids si and sj at positions i and j, and a backbone torsion angle domain tk at position k. They are evaluated from relative frequencies of occurrences in a dataset of 1403 well-resolved and refined protein chains displaying 20% pairwise sequence identity at most (see the Prelude&Fugue website for a list). To avoid biases towards the native structure when the sequence to be predicted is similar to a sequence included in the dataset, the user may exclude one protein from the set.
Prelude&Fugue use the same potentials and protein representation and receive the same protein sequence as input, but their goals and predictions are different. Prelude&Fugue predicts the N backbone conformations of lowest free energy of the input sequence or of a segment of it, where N and the segment limits are specified by the user. Moreover, the user can impose up to four constraints on interresidue distances between any of the heavy backbone atoms or Cßs. The predicted conformations are in this case the lowest free energy conformations satisfying the constraints. Note that for sufficiently loose constraints, the distances are simply monitored. The output contains the N lowest free energy conformations represented as (
,
,
) strings and ordered as a function of increasing free energy, the values of the constrained interresidue distances (if any) and the backbone coordinate root mean square deviation (r.m.s.d.) relative to the lowest energy structure. The user can also ask for the lowest energy conformations up to a threshold value of the free energy or r.m.s.d. He can moreover perform a steric hindrance test that keeps only the predicted structures whose C
atoms do not come closer than 2.5 Å. For each predicted conformation, the main chain and Cß coordinates are supplied in protein Data Bank format (Berman et al., 2000).
When applied to full-protein sequences, Prelude&Fugue provides a local 3D structure prediction, similar to a secondary structure prediction but with seven (
,
,
) assignments. When applied to short peptides, Prelude&Fugue yields a genuine 3D structure prediction, as the (
,
,
) letters allow to represent basically all secondary structures and turn motifs. This prediction is however only valid if tertiary interactions within the peptide may be overlooked, given that they are not taken into account in the potentials. The information about pairwise r.m.s.d. given in the output allows the user to appreciate the variability of the predicted lowest energy structures. In particular, regions where the predicted (
,
,
) assignments are conserved among lowest energy structures are likely to have a well-defined structure, whereas the others are likely to be more flexible. Moreover, the differences in free energy help to refine the appreciation of the stability of the predicted conformations. For example, if the lowest energy structure displays a sizable free energy gap, of 0.5 kcal/mol or more, with respect to the next conformation in the ranking that is significantly different, as monitored by a large r.m.s.d., this structure can be considered as preferred and to display some (marginal) stability.
In contrast to Prelude&Fugue, Prelude&Fugue is designed to be applied to full-protein sequences. It compiles the predictions of Prelude&Fugue on a given protein and identifies strongly predicted segments. It proceeds by dividing the sequence in short overlapping segments of 515 residues, and by applying Prelude&Fugueto each segment. A segment is retained if its lowest free energy conformation displays an energy gap of 0.5 kcal/mol at least relative to the next best structure that is sufficiently different in terms of r.m.s.d. The number of retained segments that map onto each sequence position is called the confidence. It measures the strength of the prediction: the higher the confidence, the higher the probability of coincidence of the predicted and native structures. Segments with high confidence values are likely to adopt a preferred conformation when excised from the rest of the chain or to be formed at the very beginning of the folding process (Rooman et al., 1991, 1992; Rooman and Wodak, 1992). This hierarchic view of folding is supported by experimental data and theoretical considerations (Baldwin and Rose, 1999). The predicted segments typically correspond to a helical or extended stretch of 510 residues or to a turn. Note that the user has the choice between taking into account or neglecting the sequence environment of the segments in the predictions, i.e. the eight residues upstream and the eight residues downstream. The former possibility entails considering the predicted segment covalently linked to the rest of the chain and thus predicting early folding events, whereas the latter possibility is akin to excising the segment from the chain and considering it in isolation. The prediction scores of Prelude&Fugue are shown in Table 1. More details are given on the website.
|
Prelude&Fugue have been proven to be quite successful in several applications, and first of all in the prediction of the location of flickering early folding units (Rooman and Wodak, 1992) and of peptides that adopt a certain amount of structure in solution (Rooman et al., 1992). For example, Prelude&Fugue has been used to identify four protein fragments in cytochrome c2 and calcium-binding protein, predicted to form preferably helical conformations in solution. Prelude&Fugue has then been applied to these fragments to get a more precise estimate of their (marginal) stability. These four peptides have been synthesized, and characterized by circular dichroism and nuclear magnetic resonance. A remarkable agreement between predictions and experiments has been observed, both in the relative stability of the peptides and in the limits of the structured regions (Pintar et al., 1994).
Though tertiary interactions are overlooked, the conformations strongly predicted by Prelude&Fugue, with high-confidence values, generally coincide with the native structures (Rooman et al., 1992). This can be explained by the fact that these conformations are so much preferred by local interactions along the chain that tertiary forces are not sufficient to break them. In some sequence regions, however, it happens that predicted and native structures differ. We interpret these regions as structural weaknesses, defined as regions whose intrinsic structural preferences are in contradiction with the tertiary fold. Such regions might be expected to slow down folding and make the protein more subject to structural modifications or alternative folding. We found such weaknesses often in proteins related to conformational diseases, such as the prion protein (Gilis and Rooman, 2000) and in 3D domain swapping proteins (Dehouck et al., 2003). Other computational approaches aiming at understanding folding and misfolding are reviewed in Dokholyan (2006).
Prelude&Fugue present several advantages compared with more established local/secondary structure prediction methods. In summary, they offer the possibilities of (1) predicting alternate conformations ranked by their relative stabilities, (2) identifying flexible or stable sequence regions, and (3) yielding structures compatible with interresidue distance constraints.
| Acknowledgments |
|---|
J.-P. Kocher is acknowledged for interesting suggestions. Funding to pay the Open Access publication charges for this article was provided by the European Community through the Concerted Action Quality of Life 2001-3-8.4. M.R. is research director at the Belgian Fund for Scientific Research (F.N.R.S.).
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Anna Tramontano
Received on March 31, 2006; revised on May 2, 2006
| REFERENCES |
|---|
|
|
|---|
Baldwin, L.B. and Rose, G.D. (1999) Is protein folding hierarchic? I. Local structure and peptide folding. TIBS, 24, 2633.
Berman, H.M., et al. (2000) The Protein Data Bank. Nucleic Acids Res, . 28, 235242
Dehouck, Y., et al. (2003) Sequence-structure signals of 3D domain swapping in proteins. J. Mol. Biol, . 330, 12151225[CrossRef][Web of Science][Medline].
Dokholyan, N.V. (2006) Studies of folding and misfolding using simplified models. Curr. Opin. Struct. Biol, . 16, 7985[CrossRef][Web of Science][Medline].
Gilis, D. and Rooman, M. (2000) PopMusSiC, an algorithm, for predicting protein mutant stability changes. Application to prion proteins. Protein Eng, . 13, 849856
Pintar, A., et al. (1994) Conformational properties of four peptides corresponding to
-helical regions of Rhodospirillum cytochrome c2 and Bovine calcium binding protein. Biochemistry, 33, 1115811173[CrossRef][Medline].
Rooman, M.J., et al. (1991) Prediction of protein backbone conformation based on seven structure assignments. J. Mol. Biol, . 221, 961979[CrossRef][Web of Science][Medline].
Rooman, M.J., et al. (1992) Extracting information on folding from the amino acid sequence: accurate predictions for proteins regions with preferred conformation in, the absence of tertiary interactions. Biochemistry, 42, 1022610238.
Rooman, M.J. and Wodak, S.J. (1992) Extracting information on folding from the amino acid sequence: consensus regions with preferred conformations in homologous proteins. Biochemistry, 42, 11023910249.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
