Bioinformatics Advance Access originally published online on June 14, 2005
Bioinformatics 2005 21(16):3433-3434; doi:10.1093/bioinformatics/bti541
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content
Institute of Enzymology, BRC, Hungarian Academy of Sciences PO Box 7, H-1518 Budapest, Hungary
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Summary: Intrinsically unstructured/disordered proteins and domains (IUPs) lack a well-defined three-dimensional structure under native conditions. The IUPred server presents a novel algorithm for predicting such regions from amino acid sequences by estimating their total pairwise interresidue interaction energy, based on the assumption that IUP sequences do not fold due to their inability to form sufficient stabilizing interresidue interactions. Optional to the prediction are built-in parameter sets optimized for predicting short or long disordered regions and structured domains.
Availability: The IUPred server is available for academic users at http://iupred.enzim.hu
Contact: zsuzsa{at}enzim.hu
| INTRODUCTION |
|---|
|
|
|---|
Instrinsically unstructured proteins exist as an ensemble of alternative conformations, in contrast to folded, globular proteins that have unique native structure. Significant fraction of known genomes encode for proteins with regions of disordered structure. In some eukaryotic genomes >20% of the coded residues are predicted as disordered (Dunker et al., 2000; Ward et al., 2004a). In many cases a protein is fully disordered, while in many other cases there are long disordered segments in otherwise ordered, folded proteins (Tompa, 2002; Dyson and Wright, 2005). Despite their lack of a well-defined globular structure, these proteins carry out basic functions (Iakoucheva et al., 2002; Ward et al., 2004a), mostly associated with signal transduction, cell-cycle regulation and transcription. Several methods have been developed to predict the disordered character from amino acid sequences. Some are based on the special amino acid composition of fully disordered proteins, i.e. the abundance of hydrophilic residues and a high net charge (Uversky et al., 2000; Vucetic et al., 2003), whereas others use various machine learning approaches trained on specific datasets (Obradovic et al., 2003; Ward et al., 2004a; Linding et al., 2003b). Recently, it was suggested that these sequences do not have the capacity to properly wrap backbone hydrogen bonds (Fernandez and Berry, 2004), which has also been shown to be important for protein stability.
| BACKGROUND |
|---|
|
|
|---|
Our method is footed on the physical explanation of the ordered/disordered nature of proteins. Globular proteins make a large number of interresidue interactions, providing the stabilizing energy to overcome the entropy loss during folding (Garbuzynskiy et al., 2004). In contrast, intrinsically unstructured/disordered proteins and domains (IUPs) have special sequences that do not have the capacity to form sufficient interresidue interactions. To discriminate between ordered and disordered regions in proteins, we have developed a new approach that estimates the potential of polypeptides to form such stabilizing contacts by using a statistical interaction potential (Thomas and Dill, 1996; Dosztányi et al., 2005). It was shown that the sum of interaction energies can be estimated by a quadratic expression in the amino acid composition, which takes into account that the contribution of an amino acid to order/disorder depends not only on its own chemical type, but also on its potential interaction partners (Dosztányi et al., 2005).
The calculation involves a 20 x 20 energy predictor matrix, parameterized by a statistical method to approach the expected pairwise energy of globular proteins of known structure. Comparing globular proteins and disordered ones, a clear separation of their energy content is found (Dosztányi et al., 2005). As no training on disordered proteins is involved, this distinction underlines that the lack of a well-defined three-dimensional structure is an intrinsic property of certain evolved proteins. This approach was turned into a position-specific method to predict protein disorder by considering only the local sequential environment of residues within 2100 residues in either direction. The score is then smoothed over a window-size of 21. This prediction method (IUPred), when tested on datasets of globular proteins and long disordered protein segments, showed improved performance over some other widely used methods, such as DISOPRED2 (Ward et al., 2004a,b) and PONDR VL3H (Obradovic et al., 2003).
| THE IUPred SERVER |
|---|
|
|
|---|
The web server takes a single amino acid sequence as an input and calculates the pairwise energy profile along the sequence. The energy values are then transformed into a probabilistic score ranging from 0 (complete order) to 1 (complete disorder). Residues with a score above 0.5 can be regarded as disordered. Optional is the prediction of long disorder, short disorder, and structured domains, each using slightly different parameters. The main profile of our server is to predict context-independent global disorder that encompasses at least 30 consecutive residues of predicted disorder. A different set of parameters is suited for predicting short, probably context-dependent, disordered regions such as missing residues in the X-ray structure of an otherwise globular protein. For this application the sequential neighborhood of only 25 residues is considered. As chain termini of globular proteins are often disordered in X-ray structures, this is taken into account by an end-adjustment parameter that favors disorder prediction at the ends.
The dependable identification of ordered regions is a crucial step in target selection for structural studies and structural genomics projects (Linding et al., 2003a). Finding putative structured domains suitable for stucture determination is another potential application of this server. In this case the algorithm takes the energy profile and finds continuous regions confidently predicted ordered. Neighboring regions close to each other are merged, while regions shorter than the minimal domain size of at least 30 residues are ignored. When this prediction type is selected, the region(s) predicted to correspond to structured/globular domains are returned.
The core program to calculate the pairwise energy profile and disorder probability is written in C, the web server is written in PHP. The calculation of the energy profile is based on single sequence, without time-consuming alignment calculations. To further facilitate the easy accessibility for scripting, a simple text output is generated on default. However, the user can also request a graphical output. The plot shows the disorder tendency of each residue along the sequence. The plot is generated by the JpGraph software (JpGraph, 2005 http://www.aditus.nu/jpgraph/) on the fly, without storing the graphical images on the local machine. When the prediction type of structured domains is selected, these are highlighted on the plot by thick lines. For long sequences, the graph is shown for fragments of user-defined fixed length, 500 on default.
| Acknowledgments |
|---|
This work has been sponsored by grants GVOP-3.1.1.-2004-05-0143/3.0, OTKA F043609, T049073, and NKFP MediChem2 1/A/ 005/2004. Z.D. and P.T. were supported by the Bolyai János Scholarship. P.T. would like to acknowledge the support of the International Senior Research Fellowship GR067595 from the Wellcome Trust.
Conflict of Interest: none declared.
Received on March 24, 2005; revised on May 27, 2005; accepted on June 13, 2005
| REFERENCES |
|---|
|
|
|---|
Dosztányi, Z., et al. (2005) The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. J. Mol. Biol., 347, 827839[CrossRef][Web of Science][Medline].
Dunker, A.K., et al. (2000) Intrinsic protein disorder in complete genomes. Genome Inform. Ser. Workshop Genome Inform., 11, 161171[Medline].
Dyson, H.J. and Wright, P.E. (2005) Intrinsically unstructured proteins and their functions. Nat. Rev. Mol. Cell Biol., 6, 197208[CrossRef][Web of Science][Medline].
Fernandez, A. and Berry, R.S. (2004) Molecular dimension explored in evolution to promote proteomic complexity. Proc. Natl Acad. Sci. USA, 101, 1346013465
Garbuzynskiy, S.O., et al. (2004) To be folded or to be unfolded? Protein Sci., 13, 28712877[CrossRef][Web of Science][Medline].
Iakoucheva, L.M., et al. (2002) Intrinsic disorder in cell-signaling and cancer-associated proteins. J. Mol. Biol., 323, 573584[CrossRef][Web of Science][Medline].
JpGraph JpGraph. (2005) Aditus Consulting.
Linding, R., et al. (2003a) GlobPlot: exploring protein sequences for globularity and disorder. Nucleic Acids Res., 31, 37013708
Linding, R., et al. (2003b) Protein disorder prediction: implications for structural proteomics. Structure (Camb), 11, 14531459.
Obradovic, Z., et al. (2003) Predicting intrinsic disorder from amino acid sequence. Proteins, 53, Suppl. 6, 566572.
Thomas, P.D. and Dill, K.A. (1996) An iterative method for extracting energy-like quantities from protein structures. Proc. Natl Acad. Sci. USA, 93, 1162811633
Tompa, P. (2002) Intrinsically unstructured proteins. Trends Biochem. Sci., 27, 527533[CrossRef][Web of Science][Medline].
Uversky, V.N., et al. (2000) Why are natively unfolded proteins unstructured under physiologic conditions? Proteins, 41, 415427[CrossRef][Web of Science][Medline].
Vucetic, S., et al. (2003) Flavors of protein disorder. Proteins, 52, 573584[CrossRef][Web of Science][Medline].
Ward, J.J., et al. (2004a) Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J. Mol. Biol., 337, 635645[CrossRef][Web of Science][Medline].
Ward, J.J., et al. (2004b) The DISOPRED server for the prediction of protein disorder. Bioinformatics, 20, 21382139
This article has been cited by other articles:
![]() |
C. Netter, G. Weber, H. Benecke, and M. C. Wahl Functional stabilization of an RNA recognition motif by a noncanonical N-terminal expansion RNA, July 1, 2009; 15(7): 1305 - 1313. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. T. M. Mooij, E. Mitsiki, and A. Perrakis ProteinCCD: enabling the design of protein truncation constructs for expression and crystallization experiments Nucleic Acids Res., July 1, 2009; 37(suppl_2): W402 - W405. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. E. Davey, D. C. Shields, and R. J. Edwards Masking residues using context-specific evolutionary conservation significantly improves short linear motif discovery Bioinformatics, February 15, 2009; 25(4): 443 - 450. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Diella, S. Chabanis, K. Luck, C. Chica, C. Ramu, C. Nerlov, and T. J. Gibson KEPE--a motif frequently superimposed on sumoylation sites in metazoan chromatin proteins and transcription factors Bioinformatics, January 1, 2009; 25(1): 1 - 5. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Schallus, C. Jaeckh, K. Feher, A. S. Palma, Y. Liu, J. C. Simpson, M. Mackeen, G. Stier, T. J. Gibson, T. Feizi, et al. Malectin: A Novel Carbohydrate-binding Protein of the Endoplasmic Reticulum and a Candidate Player in the Early Steps of Protein N-Glycosylation Mol. Biol. Cell, August 1, 2008; 19(8): 3404 - 3414. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Ferrada and A. Wagner Protein robustness promotes evolutionary innovations on large evolutionary time-scales Proc R Soc B, July 22, 2008; 275(1643): 1595 - 1602. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Rotem, C. Katz, H. Benyamini, M. Lebendiker, D. Veprintsev, S. Rudiger, T. Danieli, and A. Friedler The Structure and Interactions of the Proline-rich Domain of ASPP2 J. Biol. Chem., July 4, 2008; 283(27): 18990 - 18999. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Ishida and K. Kinoshita Prediction of disordered regions in proteins based on the meta approach Bioinformatics, June 1, 2008; 24(11): 1344 - 1348. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Kovacs, E. Kalmar, Z. Torok, and P. Tompa Chaperone Activity of ERD10 and ERD14, Two Disordered Stress-Related Plant Proteins Plant Physiology, May 1, 2008; 147(1): 381 - 390. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Michael, G. Trave, C. Ramu, C. Chica, and T. J. Gibson Discovery of candidate KEN-box motifs using Cell Cycle keyword enrichment combined with native disorder prediction and motif conservation Bioinformatics, February 15, 2008; 24(4): 453 - 457. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Ivanyi-Nagy, J.-P. Lavergne, C. Gabus, D. Ficheux, and J.-L. Darlix RNA chaperoning and intrinsic disorder in the core proteins of Flaviviridae Nucleic Acids Res., February 11, 2008; 36(3): 712 - 725. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Schlessinger, M. Punta, and B. Rost Natively unstructured regions in proteins identified from contact predictions Bioinformatics, September 15, 2007; 23(18): 2376 - 2384. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. P. John, T. Wang, S. Steffen, S. Longhi, C. S. Schmaljohn, and C. B. Jonsson Ebola Virus VP30 Is an RNA Binding Protein J. Virol., September 1, 2007; 81(17): 8967 - 8976. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Hirose, K. Shimizu, S. Kanai, Y. Kuroda, and T. Noguchi POODLE-L: a two-level SVM prediction system for reliably predicting long disordered regions Bioinformatics, August 15, 2007; 23(16): 2046 - 2053. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.-T. Su, C.-Y. Chen, and C.-M. Hsu iPDA: integrated protein disorder analyzer Nucleic Acids Res., July 13, 2007; 35(suppl_2): W465 - W472. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. E. Davey, R. J. Edwards, and D. C. Shields The SLiMDisc server: short, linear motif discovery in proteins Nucleic Acids Res., July 13, 2007; 35(suppl_2): W455 - W459. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Fuxreiter, P. Tompa, and I. Simon Local structural disorder imparts plasticity on linear motifs Bioinformatics, April 15, 2007; 23(8): 950 - 956. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Nemeth-Pongracz, O. Barabas, M. Fuxreiter, I. Simon, I. Pichova, M. Rumlova, H. Zabranska, D. Svergun, M. Petoukhov, V. Harmat, et al. Flexible segments modulate co-folding of dUTPase and nucleocapsid proteins Nucleic Acids Res., January 28, 2007; 35(2): 495 - 505. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. E. Lyons, E. Lesieur, M. Kim, D. C.C. Wong, M. G. Huson, K. M. Nairn, A. G. Brownlee, R. D. Pearson, and C. M. Elvin Design and facile production of recombinant resilin-like polypeptides: gene construction and a rapid protein purification method Protein Eng. Des. Sel., January 12, 2007; (2007) gzl050v2. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. P. Ng, G. Potikyan, R. O. V. Savene, C. T. Denny, V. N. Uversky, and K. A. W. Lee Multiple aromatic side chains within a disordered structure are critical for transcription and transforming activity of EWS family oncoproteins PNAS, January 9, 2007; 104(2): 479 - 484. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||









