Bioinformatics Advance Access originally published online on October 4, 2006
Bioinformatics 2006 22(23):2948-2949; doi:10.1093/bioinformatics/btl504
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
FoldUnfold: web server for the prediction of disordered regions in protein chain
Institute of Protein Research, Russian Academy of Sciences 142290, Pushchino, Moscow Region, Russia
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: Identification of disordered regions in polypeptide chains is very important because such regions are essential for protein function. A new parameter, namely mean packing density of residues has been introduced to detect disordered regions in a protein sequence. We have demonstrated that regions with weak expected packing density would be responsible for the appearance of disordered regions. Our method (FoldUnfold) has been tested on datasets of globular proteins (559 proteins) and long disordered protein segments (129 proteins) and showed improved performance over some other widely used methods, such as DISOPRED, PONDR VL3H, IUPred and GlobPlot.
Availability: The FoldUnfold server is available for users at http://skuld.protres.ru/~mlobanov/ogu/ogu.cgi. There is a link to our server through the web site of DisProt (http://www.disprot.org/predictors.php).
Contact: ogalzit{at}vega.protres.ru
| 1 INTRODUCTION |
|---|
|
|
|---|
The formation of a sufficient number of interactions is necessary to compensate the loss of conformational entropy during the protein folding process. Therefore, structural uniqueness of native protein is a result of the balance between the conformational entropy and the energy of residue interactions. It seems that disordered regions in a protein chain do not have a sufficient amount of interactions to compensate the loss of conformational entropy resulting from the formation of a globular state (Galzitskaya et al., 2000). Therefore, their enhanced stabilization can be achieved by additional interactions with other agents or by oligomerization.
It was shown that disordered regions are involved in DNA-binding and other types of molecular recognition and a large portion of the sequences of natively unfolded proteins contain segments of low complexity and high-predicted flexibility (Wootton, 1994; Romero et al., 1998; Wright and Dyson, 1999; Galzitskaya et al., 2000; Obradovic et al., 2003; Radivojac et al., 2004). Also it was indicated that a combination of low overall hydrophobicity and a large net charge represents a structural feature of natively unfolded proteins in comparison with small globular proteins (Uversky et al., 2000). Now there are several widely used methods to predict disordered regions in proteins: GlobPlot (Linding et al., 2003) is a simple propensity-based approach evaluating the tendency of residues to be in a regular secondary structure; PONDR VL3H (Obradovic et al., 2003) was trained to distinguish experimentally verified disordered proteins from globular proteins by various machine learning approaches; DISOPRED (Ward et al., 2004) was trained to specifically recognize regions missing in X-ray structures; IUPred (Dosztanyi et al., 2005) assigns the order/disorder status to residues based on their ability to form favorable pairwise contacts. We were the first who used such parameter as the number of contacts per residue to distinguish folded and natively unfolded proteins (Garbuzynskiy et al., 2004). We have extended our method to predict disordered regions and made comparison with the above mentioned methods (Galzitskaya et al., 2006). It has been demonstrated that our method is the best among widely used methods.
| 2 BACKGROUND |
|---|
|
|
|---|
Mean packing density was calculated for each amino acid residue from the database of 5829 3D structures as an average number of close residues (within the given distance). In our case a residue will be considered close to the given residue if any pair of their heavy atoms is at a distance <8 Å excluding the neighboring residues. The mean packing density in a globular state for each of 20 types of amino acid residues is presented in our work (Galzitskaya et al., 2006).
To detect disordered regions, we construct a packing density profile of the expected packing density for the protein sequence. The calculations are based on a sliding window averaging technique. First, the expected packing density is determined for each residue (it equals to the average packing density observed for this type of residue in a globular state); then, these numbers are averaged inside the window and assigned to the central residue of the window. The value of the averaged expected packing density for every position of the polypeptide chain provides the packing density profile.
Our method has been tested on datasets of globular proteins (559 proteins) and long disordered protein segments (129 proteins) (Dosztanyi et al., 2005). A receiver operator characteristic (ROC)curve for our method has been obtained (Galzitskaya et al., 2006) to determine a threshold for our method. The true positive rate was calculated as the percentage of residues predicted as disordered on the set of the disordered proteins and segments; the false positive rate is the percentage of predicted disordered residues on the set of globular proteins. Our method showed improved performance over some other widely used methods, such as DISOPRED (Ward et al., 2004) PONDR VL3H (Obradovic et al., 2003), IUPred (Dosztanyi et al., 2005), GlobPlot (Linding et al., 2003) (see Table 1).
|
| 3 THE FOLDUNFOLD SERVER |
|---|
|
|
|---|
The web server takes amino acid sequence in Fasta format as an input and calculates the expected packing density profiles along the sequence. We used this property, that is the mean packing density, to predict the state of protein with an unknown 3D structure: either folded or unfolded (in other words, disordered). If the expected mean packing density in protein is <20.4 then the whole protein is predicted to be in the disordered form. But if the expected mean packing density exceeds 20.4, then the program finds disordered segments satisfying the criteria that the expected mean packing density within the given segments is <20.4 and the size of these segments is equal or larger than the size of the window used.
We have constructed ROC curves for our method used with different size of the sliding window (see Fig. 1). Two databases were used for this construction: 427 disordered proteins and regions [DisProt database (Vucetic et al., 2005)] and 559 globular proteins (Dosztanyi et al., 2005). The size of the sliding window is a user-selectable parameter now, but we recommend using window-size of 41 residues to find large disordered regions and window-size of 11 residues to find short unstructured loops (however decreasing of window-size leads to increasing false positive rates, see Fig. 1). It should be underlined that our program can predict unfolded regions of size equal or greater than the window-size used.
|
We have also made predictions of disordered regions in 129 proteins (Dosztanyi et al., 2005) using the recently published method RONN (Yang et al., 2005). True positive rate for this method (0.765 if averaging is done over residues and 0.694 if averaging is done over proteins) does not exceed that of our method (0.851 and 0.716, respectively, see Table 1). Comparison of our method with other new published methods [PONDR VSL2 (Obradovic et al., 2005), PreLink (Coeytaux and Poupon, 2005), SPRITZ (Vullo et al., 2006)] will be done in next publications.
| Acknowledgments |
|---|
This work was supported by the program MCB RAS, by the Russian Foundation for Basic Research (grant 05-04-48750), by the Howard Hughes Medical Institute (55005607) and by INTAS grant (05-1000004-7747).
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Dmitrij Frishman
Received on July 12, 2006; revised on August 29, 2006; accepted on September 22, 2006
| REFERENCES |
|---|
|
|
|---|
Coeytaux, K. and Poupon, A. (2005) Prediction of unfolded segments in a protein sequence based on amino acid composition. Bioinformatics, 21, 18911900
Dosztanyi, Z., et al. (2005) The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. J. Mol. Biol, . 347, 827839[CrossRef][ISI][Medline].
Galzitskaya, O.V., et al. (2000) Optimal region of average side-chain entropy for fast protein folding. Protein Sci, . 9, 580586[Abstract].
Galzitskaya, O.V., et al. (2006) Prediction of natively unfolded regions in protein chains. Mol. Biol. (Moscow), 40, 341348.
Garbuzynskiy, S.O., et al. (2004) To be folded or to be unfolded? Protein Sci, . 13, 28712877
Linding, R., et al. (2003) Protein disorder prediction: implications for structural proteomics. Structure, 11, 14531459[Medline].
Obradovic, Z., et al. (2003) Predicting intrinsic disorder from amino acid sequence. Proteins, 53, 566572.
Obradovic, Z., et al. (2005) Exploiting heterogeneous sequence properties improves prediction of protein disorder. Proteins, 61, 176182[ISI][Medline].
Radivojac, P., et al. (2004) Protein flexibility and intrinsic disorder. Protein Sci, . 13, 7180
Romero, P., et al. (1998) Thousands of proteins likely to have long disordered regions. Pac. Symp. Biocomput, . 437448.
Uversky, V.N., et al. (2000) Why are natively unfolded proteins unstructured under physiologic conditions? Proteins, 41, 415427[CrossRef][ISI][Medline].
Vucetic, S., et al. (2005) DisProt: a database of protein disorder. Bioinformatics, 21, 137140
Vullo, A., et al. (2006) Spritz: a server for the prediction of intrinsically disordered regions in protein sequences using kernel machines. Nucleic Acids Res, . 34, W164W168
Ward, J.J., et al. (2004) Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J. Mol. Biol, . 337, 635645[CrossRef][ISI][Medline].
Wootton, J.C. (1994) Non-globular domains in protein sequences: automated segmentation using complexity measures. Comput. Chem, . 18, 269285[CrossRef][ISI][Medline].
Wright, P.E. and Dyson, H.J. (1999) Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. J. Mol. Biol, . 293, 321331[CrossRef][ISI][Medline].
Yang, Z.R., et al. (2005) RONN: the bio-basis function neural network technique applied to the dectection of natively disordered regions in proteins. Bioinformatics, 21, 33693376
This article has been cited by other articles:
![]() |
B. W. Brandt, J. Heringa, and J. A. M. Leunissen SEQATOMS: a web tool for identifying missing regions in PDB in sequence context Nucleic Acids Res., July 1, 2008; 36(suppl_2): W255 - W259. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. B. Holladay, L. N. Kinch, and N. V. Grishin Optimization of linear disorder predictors yields tight association between crystallographic disorder and hydrophobicity Protein Sci., October 1, 2007; 16(10): 2140 - 2152. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Hirose, K. Shimizu, S. Kanai, Y. Kuroda, and T. Noguchi POODLE-L: a two-level SVM prediction system for reliably predicting long disordered regions Bioinformatics, August 15, 2007; 23(16): 2046 - 2053. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.-T. Su, C.-Y. Chen, and C.-M. Hsu iPDA: integrated protein disorder analyzer Nucleic Acids Res., July 13, 2007; 35(suppl_2): W465 - W472. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



