Skip Navigation


Bioinformatics Advance Access originally published online on October 4, 2006
Bioinformatics 2006 22(23):2948-2949; doi:10.1093/bioinformatics/btl504
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/23/2948    most recent
btl504v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (13)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Galzitskaya, O. V.
Right arrow Articles by Lobanov, M. Yu.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Galzitskaya, O. V.
Right arrow Articles by Lobanov, M. Yu.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

FoldUnfold: web server for the prediction of disordered regions in protein chain

Oxana V. Galzitskaya *, Sergiy O. Garbuzynskiy and Michail Yu. Lobanov

Institute of Protein Research, Russian Academy of Sciences 142290, Pushchino, Moscow Region, Russia

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 BACKGROUND
 3 THE FOLDUNFOLD SERVER
 REFERENCES
 

Summary: Identification of disordered regions in polypeptide chains is very important because such regions are essential for protein function. A new parameter, namely mean packing density of residues has been introduced to detect disordered regions in a protein sequence. We have demonstrated that regions with weak expected packing density would be responsible for the appearance of disordered regions. Our method (FoldUnfold) has been tested on datasets of globular proteins (559 proteins) and long disordered protein segments (129 proteins) and showed improved performance over some other widely used methods, such as DISOPRED, PONDR VL3H, IUPred and GlobPlot.

Availability: The FoldUnfold server is available for users at http://skuld.protres.ru/~mlobanov/ogu/ogu.cgi. There is a link to our server through the web site of DisProt (http://www.disprot.org/predictors.php).

Contact: ogalzit{at}vega.protres.ru


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 BACKGROUND
 3 THE FOLDUNFOLD SERVER
 REFERENCES
 
The formation of a sufficient number of interactions is necessary to compensate the loss of conformational entropy during the protein folding process. Therefore, structural uniqueness of native protein is a result of the balance between the conformational entropy and the energy of residue interactions. It seems that disordered regions in a protein chain do not have a sufficient amount of interactions to compensate the loss of conformational entropy resulting from the formation of a globular state (Galzitskaya et al., 2000). Therefore, their enhanced stabilization can be achieved by additional interactions with other agents or by oligomerization.

It was shown that disordered regions are involved in DNA-binding and other types of molecular recognition and a large portion of the sequences of natively unfolded proteins contain segments of low complexity and high-predicted flexibility (Wootton, 1994; Romero et al., 1998; Wright and Dyson, 1999; Galzitskaya et al., 2000; Obradovic et al., 2003; Radivojac et al., 2004). Also it was indicated that a combination of low overall hydrophobicity and a large net charge represents a structural feature of natively unfolded proteins in comparison with small globular proteins (Uversky et al., 2000). Now there are several widely used methods to predict disordered regions in proteins: GlobPlot (Linding et al., 2003) is a simple propensity-based approach evaluating the tendency of residues to be in a regular secondary structure; PONDR VL3H (Obradovic et al., 2003) was trained to distinguish experimentally verified disordered proteins from globular proteins by various machine learning approaches; DISOPRED (Ward et al., 2004) was trained to specifically recognize regions missing in X-ray structures; IUPred (Dosztanyi et al., 2005) assigns the order/disorder status to residues based on their ability to form favorable pairwise contacts. We were the first who used such parameter as the number of contacts per residue to distinguish folded and natively unfolded proteins (Garbuzynskiy et al., 2004). We have extended our method to predict disordered regions and made comparison with the above mentioned methods (Galzitskaya et al., 2006). It has been demonstrated that our method is the best among widely used methods.


    2 BACKGROUND
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 BACKGROUND
 3 THE FOLDUNFOLD SERVER
 REFERENCES
 
Mean packing density was calculated for each amino acid residue from the database of 5829 3D structures as an average number of close residues (within the given distance). In our case a residue will be considered close to the given residue if any pair of their heavy atoms is at a distance <8 Å excluding the neighboring residues. The mean packing density in a globular state for each of 20 types of amino acid residues is presented in our work (Galzitskaya et al., 2006).

To detect disordered regions, we construct a packing density profile of the expected packing density for the protein sequence. The calculations are based on a sliding window averaging technique. First, the expected packing density is determined for each residue (it equals to the average packing density observed for this type of residue in a globular state); then, these numbers are averaged inside the window and assigned to the central residue of the window. The value of the averaged expected packing density for every position of the polypeptide chain provides the packing density profile.

Our method has been tested on datasets of globular proteins (559 proteins) and long disordered protein segments (129 proteins) (Dosztanyi et al., 2005). A receiver operator characteristic (ROC)curve for our method has been obtained (Galzitskaya et al., 2006) to determine a threshold for our method. The true positive rate was calculated as the percentage of residues predicted as disordered on the set of the disordered proteins and segments; the false positive rate is the percentage of predicted disordered residues on the set of globular proteins. Our method showed improved performance over some other widely used methods, such as DISOPRED (Ward et al., 2004) PONDR VL3H (Obradovic et al., 2003), IUPred (Dosztanyi et al., 2005), GlobPlot (Linding et al., 2003) (see Table 1).


View this table:
[in this window]
[in a new window]

 
Table 1 Performance of disorder prediction methods on datasets of globular proteins (559 proteins) and long disordered protein segments (129 proteins) (Dosztanyi et al., 2005)

 

    3 THE FOLDUNFOLD SERVER
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 BACKGROUND
 3 THE FOLDUNFOLD SERVER
 REFERENCES
 
The web server takes amino acid sequence in Fasta format as an input and calculates the expected packing density profiles along the sequence. We used this property, that is the mean packing density, to predict the state of protein with an unknown 3D structure: either folded or unfolded (in other words, disordered). If the expected mean packing density in protein is <20.4 then the whole protein is predicted to be in the disordered form. But if the expected mean packing density exceeds 20.4, then the program finds disordered segments satisfying the criteria that the expected mean packing density within the given segments is <20.4 and the size of these segments is equal or larger than the size of the window used.

We have constructed ROC curves for our method used with different size of the sliding window (see Fig. 1). Two databases were used for this construction: 427 disordered proteins and regions [DisProt database (Vucetic et al., 2005)] and 559 globular proteins (Dosztanyi et al., 2005). The size of the sliding window is a user-selectable parameter now, but we recommend using window-size of 41 residues to find large disordered regions and window-size of 11 residues to find short unstructured loops (however decreasing of window-size leads to increasing false positive rates, see Fig. 1). It should be underlined that our program can predict unfolded regions of size equal or greater than the window-size used.


Figure 1
View larger version (19K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 Each ROC curve corresponds to predictions with specified (on the legend) size of the sliding window. The open circle corresponds to the value of packing density 20.4 that is chosen as a threshold.

 
We have also made predictions of disordered regions in 129 proteins (Dosztanyi et al., 2005) using the recently published method RONN (Yang et al., 2005). True positive rate for this method (0.765 if averaging is done over residues and 0.694 if averaging is done over proteins) does not exceed that of our method (0.851 and 0.716, respectively, see Table 1). Comparison of our method with other new published methods [PONDR VSL2 (Obradovic et al., 2005), PreLink (Coeytaux and Poupon, 2005), SPRITZ (Vullo et al., 2006)] will be done in next publications.


    Acknowledgments
 
This work was supported by the program MCB RAS, by the Russian Foundation for Basic Research (grant 05-04-48750), by the Howard Hughes Medical Institute (55005607) and by INTAS grant (05-1000004-7747).

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Dmitrij Frishman

Received on July 12, 2006; revised on August 29, 2006; accepted on September 22, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 BACKGROUND
 3 THE FOLDUNFOLD SERVER
 REFERENCES
 

    Coeytaux, K. and Poupon, A. (2005) Prediction of unfolded segments in a protein sequence based on amino acid composition. Bioinformatics, 21, 1891–1900[Abstract/Free Full Text].

    Dosztanyi, Z., et al. (2005) The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. J. Mol. Biol, . 347, 827–839[CrossRef][Web of Science][Medline].

    Galzitskaya, O.V., et al. (2000) Optimal region of average side-chain entropy for fast protein folding. Protein Sci, . 9, 580–586[Web of Science][Medline].

    Galzitskaya, O.V., et al. (2006) Prediction of natively unfolded regions in protein chains. Mol. Biol. (Moscow), 40, 341–348.

    Garbuzynskiy, S.O., et al. (2004) To be folded or to be unfolded? Protein Sci, . 13, 2871–2877[CrossRef][Web of Science][Medline].

    Linding, R., et al. (2003) Protein disorder prediction: implications for structural proteomics. Structure, 11, 1453–1459[Medline].

    Obradovic, Z., et al. (2003) Predicting intrinsic disorder from amino acid sequence. Proteins, 53, 566–572.

    Obradovic, Z., et al. (2005) Exploiting heterogeneous sequence properties improves prediction of protein disorder. Proteins, 61, 176–182[Web of Science][Medline].

    Radivojac, P., et al. (2004) Protein flexibility and intrinsic disorder. Protein Sci, . 13, 71–80[CrossRef][Web of Science][Medline].

    Romero, P., et al. (1998) Thousands of proteins likely to have long disordered regions. Pac. Symp. Biocomput, . 437–448.

    Uversky, V.N., et al. (2000) Why are ‘natively unfolded’ proteins unstructured under physiologic conditions? Proteins, 41, 415–427[CrossRef][Web of Science][Medline].

    Vucetic, S., et al. (2005) DisProt: a database of protein disorder. Bioinformatics, 21, 137–140[Abstract/Free Full Text].

    Vullo, A., et al. (2006) Spritz: a server for the prediction of intrinsically disordered regions in protein sequences using kernel machines. Nucleic Acids Res, . 34, W164–W168[Abstract/Free Full Text].

    Ward, J.J., et al. (2004) Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J. Mol. Biol, . 337, 635–645[CrossRef][Web of Science][Medline].

    Wootton, J.C. (1994) Non-globular domains in protein sequences: automated segmentation using complexity measures. Comput. Chem, . 18, 269–285[CrossRef][Web of Science][Medline].

    Wright, P.E. and Dyson, H.J. (1999) Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. J. Mol. Biol, . 293, 321–331[CrossRef][Web of Science][Medline].

    Yang, Z.R., et al. (2005) RONN: the bio-basis function neural network technique applied to the dectection of natively disordered regions in proteins. Bioinformatics, 21, 3369–3376[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
B. W. Brandt, J. Heringa, and J. A. M. Leunissen
SEQATOMS: a web tool for identifying missing regions in PDB in sequence context
Nucleic Acids Res., July 1, 2008; 36(suppl_2): W255 - W259.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Hirose, K. Shimizu, S. Kanai, Y. Kuroda, and T. Noguchi
POODLE-L: a two-level SVM prediction system for reliably predicting long disordered regions
Bioinformatics, August 15, 2007; 23(16): 2046 - 2053.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C.-T. Su, C.-Y. Chen, and C.-M. Hsu
iPDA: integrated protein disorder analyzer
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W465 - W472.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/23/2948    most recent
btl504v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (13)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Galzitskaya, O. V.
Right arrow Articles by Lobanov, M. Yu.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Galzitskaya, O. V.
Right arrow Articles by Lobanov, M. Yu.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?