Skip Navigation


Bioinformatics Advance Access originally published online on June 28, 2007
Bioinformatics 2007 23(17):2337-2338; doi:10.1093/bioinformatics/btm330
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/17/2337    most recent
btm330v2
btm330v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (10)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Shimizu, K.
Right arrow Articles by Noguchi, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Shimizu, K.
Right arrow Articles by Noguchi, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

POODLE-S: web application for predicting protein disorder by using physicochemical features and reduced amino acid set of a position-specific scoring matrix

Kana Shimizu 1,*, Shuichi Hirose 2 and Tamotsu Noguchi 1

1Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-42 Aomi, Koto-ku, Tokyo 135-0064 and 2PharmaDesign, Inc., 2-19-8 Hatchobori, Chuo-ku, Tokyo 104-0032, Japan

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 OUTLINE OF METHODS
 3 PERFORMANCE
 4 THE POODLE-S SERVER
 ACKNOWLEDGEMENTS
 REFERENCES
 

Summary: Protein disorder is characterized by a lack of a stable 3D structure, and is considered to be involved in a number of important protein functions such as regulatory and signalling events. We developed a web application, the POODLE-S, which predicts the disordered region from amino acid sequences by using physicochemical features and reduced amino acid set of a position-specific scoring matrix.

Availability: POODLE-S is available from http://mbs.cbrc.jp/poodle/poodle-s.html and can be used by both academic and commercial users.

Contact: poodle{at}cbrc.jp


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 OUTLINE OF METHODS
 3 PERFORMANCE
 4 THE POODLE-S SERVER
 ACKNOWLEDGEMENTS
 REFERENCES
 
Protein disorder is a widespread phenomenon, in which there is a lack of a stable 3D structure and a high degree of flexibility in the polypeptide chain. This phenomenon is considered to provide essential biological functions because dynamic conformation allows proteins to interact with multiple targets (Dunker et al., 2002). As the primary structure of the disordered regions is different from that of folded regions (Garner et al., 1998), the development of prediction methods based on amino acid sequence analysis has been encouraged (Jones and Ward, 2003; Li et al., 1999; Linding et al., 2003; Obradovic et al., 2003). We focused our attention on two facts. First, amino acid composition has different propensities in the N-term, C-term and internal regions (Shimizu et al., 2005). Second, general physicochemical properties, rather than specific amino acids, are the key factors that contribute to the development of protein disorder (Weathers et al., 2004). Then, we investigated if/how different physicochemical properties are required to characterize disorder in different regions (Shimizu et al., 2005).Our application, POODLE-S, defines a suitable length and position for the N-term and C-term regions for predicting disorder, and provides specific predictions on the basis of these regions by selecting physicochemical features, which are discriminative factors for each region.


    2 OUTLINE OF METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 OUTLINE OF METHODS
 3 PERFORMANCE
 4 THE POODLE-S SERVER
 ACKNOWLEDGEMENTS
 REFERENCES
 
We used a {chi}2- test to define seven regions on the basis of positions from the N-terminal, so that each data item had similar amino acid composition. The POODLE-S application consists of seven predictors, which use support vector machines.1 Each predictor is prepared for each region, and selects its own features as follows.

  1. The predictor selects specifically discriminative physicochemical features for a region from 10 different physicochemical properties (hydrophilic, hydrophobic, charged, positive, negative, aromatic, aliphatic, tiny, small and polar). Also, amino acids, which do not have any selected physicochemical properties, are selected as features.
  2. A position-specific scoring matrix (PSSM) of target sequences via PSI-BLAST is obtained. The PSSMs are divided into sliding windows of size m. Each window is a matrix E i j{i = 1, ... ,m, j = 1, ... , 20} (where j represents each of the 20 amino acids).
  3. Each feature is calculated as Fi,c = {sum}j isin c Ei,j (i = 1, ... , m, j isin c means that j has the characteristic c).


    3 PERFORMANCE
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 OUTLINE OF METHODS
 3 PERFORMANCE
 4 THE POODLE-S SERVER
 ACKNOWLEDGEMENTS
 REFERENCES
 
We used the dataset2 of the latest Critical Assessment of Techniques for Protein Structure Prediction (CASP7, http://predictioncenter.org/casp7/Casp7.html) to assess how well the POODLE-S performs. First, POODLE-S was trained on high-resolution single chained X-ray crystal structural data (Shimizu et al., 2005) and the DisProt database (Vucetic et al., 2005). All the data was obtained before the CASP7 prediction season. Therefore, at the time it was trained, the POODLE-S contained no information about CASP7 targets sequences. We used sensitivity [tp/(tp+fn)], specificity [tn/(tn + fp)], selectivity[tp/(tp + fp)], and Matthews' correlation coefficient (MCC) for assessment. This coefficient balances sensitivity and specificity, and is calculated as follows.


Formula

(tp: true positive, tn: true negative, fp:false positive and fn: false negative).

Table 1 shows the results of the assessment of POODLE-S based on the four different scores in comparison with three other groups successfully participating in CASP7. The predictions of ‘DISOPRED’ (Ward et al., 2004), ‘ISTZORAN’ (Li et al., 1999; Obradovic et al., 2003) and ‘fais’ were downloaded from the CASP7 website. ‘DISOPRED’ is a fully automatic server group, while both ‘ISTZORAN’ and ‘fais’ registered as human expert groups, which can use any combination of computational and human methods. The data indicate that our method is of comparable accuracy (MCC) with the other three top groups. It is characterized by on average a lower sensitivity (SEN), which is, however, compensated by a higher specificity (SPC) and selectivity (SEL). We additionally compared the predictions of the different groups for the seven regions defined by our method (Table 2). The results of POODLE-S indicate that it performs better on regions NR2 and NR3. Region-specific feature selection appears to be an effective way of predicting protein disorder.


View this table:
[in this window]
[in a new window]

 
Table 1. Comparison of sensitivity (SEN), specificity (SPC) selectivity (SEL) and Matthews correlation coefficient (MCC) for POODLE-S and three successful groups

 

    4 THE POODLE-S SERVER
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 OUTLINE OF METHODS
 3 PERFORMANCE
 4 THE POODLE-S SERVER
 ACKNOWLEDGEMENTS
 REFERENCES
 
The web server takes a single amino acid sequences as an input. Also, users are required to input an accessible e-mail address where the result of the prediction is sent. The POODLE-S provides both text output and graphical output. The text output style is based on the CASP format. Data in this format are inserted between the MODEL and the END records. Each line consists of a residue code a two-state prediction code and a confidence score. The symbols for the two-state order/disorder prediction are ‘O’ for order and ‘D’ for disorder. The last column should indicate the probability of a residue being in the disordered region. This value is between 0.0 and 1.0. The graphical output, in the form of an interactive line graph, is available from a URL, which is included in the e-mail and is accessible for 2 weeks after a submission. The user can display a position from the N-terminal, an amino acid code and, a probability score by pointing the cursor on the line graph (Fig. 1).


View this table:
[in this window]
[in a new window]

 
Table 2. Comparison of MCC for POODLE-S and three successful groups, for seven regions

 

Figure 1
View larger version (32K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. An example of POODLE-S's graphical output.

 


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 OUTLINE OF METHODS
 3 PERFORMANCE
 4 THE POODLE-S SERVER
 ACKNOWLEDGEMENTS
 REFERENCES
 
We would like to thank Yoichi Muraoka from Waseda University and Satoru Kanai from PharmaDesign, Inc. for helpful discussions. We also thank an anonymous reviewer for his/her helpful comments, which improved the manuscript.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Anna Tramontano

1 We use support vector machines package tool libSVM (Chang and Lin, 2001, http://www.csie.ntu.edu.tw/~cjlin/libsvm). Back

2 CASP7 provided 100 valid targets during the prediction season. We evaluated results using 89 targets whose structures are available from Protein Data Bank. Back

Received on April 4, 2007; revised on May 2, 2007; accepted on June 16, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 OUTLINE OF METHODS
 3 PERFORMANCE
 4 THE POODLE-S SERVER
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Chang CC, Lin CJ. LIBSVM : a library for support vector machines. (2001).

    Dunker AK, et al. Intrinsic disorder and protein function. Biochemistry (2002) 41:6573–6582.[CrossRef][Medline]

    Garner E, et al. Predicting disordered regions from amino acid sequence: common themes despite differing structural characterization. Genome Inform. Ser. Workshop Genome Inform (1998) 9:201–213.[Medline]

    Jones DT, Ward JJ. Prediction of disordered regions in proteins from position specific score matrices. Proteins (2003) 53(Suppl. 6):573–578.[CrossRef][Web of Science][Medline]

    Li X, et al. Predicting protein disorder for n-, c-, and internal regions. Genome Inform Ser Workshop Genome Inform (1999) 10:30–40.[Medline]

    Linding R, et al. Protein disorder prediction: implications for structural proteomics. Structure (2003) 11:1453–1459.[Medline]

    Obradovic Z, et al. Predicting intrinsic disorder from amino acid sequence. Proteins (2003) 53(Suppl. 6):566–572.[CrossRef][Web of Science][Medline]

    Shimizu K, et al. Feature selection based on physicochemical properties of redefined n-term region and c-term regions for predicting disorder. In: In Proceedings of 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (2005) 262–267.

    Vucetic S, et al. Disprot: a database of protein disorder. Bioinformatics (2005) 21:137–140.[Abstract/Free Full Text]

    Ward JJ, et al. The disopred server for the prediction of protein disorder. Bioinformatics (2004) 20:2138–2139.[Abstract/Free Full Text]

    Weathers EA, et al. Reduced amino acid alphabet is sufficient to accurately recognize intrinsically disordered protein. FEBS Lett (2004) 576:348–352.[CrossRef][Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
L. J. McGuffin
Intrinsic disorder prediction from the analysis of multiple protein fold recognition models
Bioinformatics, August 15, 2008; 24(16): 1798 - 1804.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
T. Ishida and K. Kinoshita
Prediction of disordered regions in proteins based on the meta approach
Bioinformatics, June 1, 2008; 24(11): 1344 - 1348.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/17/2337    most recent
btm330v2
btm330v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (10)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Shimizu, K.
Right arrow Articles by Noguchi, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Shimizu, K.
Right arrow Articles by Noguchi, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?