Skip Navigation


Bioinformatics Advance Access originally published online on February 5, 2007
Bioinformatics 2007 23(7):895-897; doi:10.1093/bioinformatics/btm020
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/7/895    most recent
btm020v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (12)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Ingrell, C. R.
Right arrow Articles by Blom, N.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ingrell, C. R.
Right arrow Articles by Blom, N.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

NetPhosYeast: prediction of protein phosphorylation sites in yeast

Christian R. Ingrell 1, Martin L. Miller 2, Ole N. Jensen 1 and Nikolaj Blom 2,*

1University of Southern Denmark, Campusvej 55, DK-5230, Odense M, Denmark and 2Center for Biological Sequence Analysis, BioCentrum-DTU, Technical University of Denmark, Anker Engelunds Vej 1, DK-2800 Kgs. Lyngby, Denmark

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 CONCLUSION
 Acknowledgements
 References
 

Summary: We here present a neural network-based method for the prediction of protein phosphorylation sites in yeast—an important model organism for basic research. Existing protein phosphorylation site predictors are primarily based on mammalian data and show reduced sensitivity on yeast phosphorylation sites compared to those in humans, suggesting the need for an yeast-specific phosphorylation site predictor. NetPhosYeast achieves a correlation coefficient close to 0.75 with a sensitivity of 0.84 and specificity of 0.90 and outperforms existing predictors in the identification of phosphorylation sites in yeast.

Availability: The NetPhosYeast prediction service is available as a public web server at http://www.cbs.dtu.dk/services/NetPhosYeast/

Contact: nikob{at}cbs.dtu.dk


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 CONCLUSION
 Acknowledgements
 References
 
Protein phosphorylation is a post-translational modification catalyzed by protein kinases. Reversible protein phosphorylation is a universal regulatory mechanism of multiple cellular functions in the eukaryote, prokaryote and archaea kingdom. As the number of sequenced genomes rapidly increases, the functional annotation of the gene products is lacking behind. Since computational analysis of a protein sequence is often the first step toward understanding its function, it is important to develop and improve such computational methods. Several algorithms for predicting protein phosphorylation from the amino acid sequence are available, including Scansite 2.0 (Obenauer et al., 2003), Prosite (Sigrist et al., 2002), Netphos (Blom et al., 1999), Netphosk (Blom et al., 2004), GPS (Xue et al., 2005), Disphos (Iakoucheva et al., 2004), kinasePhos (Huang et al., 2005), PPSP (Xue et al., 2006). None of these methods are directed to predict yeast phosphorylation sites, and rely primarily on annotated phosphorylation sites identified with classical low-throughput biochemical experiments extracted from databases such as Phospho. ELM (Diella et al., 2004) and PhosphoBase (Blom et al., 1998). Advances in biological mass spectrometry has enabled identification of hundreds of protein phosphorylation sites in a single experiment (Jensen, 2006). Recently, two large-scale phosphoproteomic studies mapped more than 900 phosphorylation sites in yeast (Ficarro et al., 2002; Gruhler et al., 2005), providing the foundation to develop a predictor for yeast protein phosphorylation. Although many protein kinases in yeast have homologues in humans, and vice versa, many kinases are not shared between these species. An evolutionary study of protein kinases showed that 32 kinases are unique in yeast covering unicellular functions such as osmotic and stress response, cell wall signalling, cell-cycle regulation and small molecule transport (Ball et al., 2000). Similarly, humans have protein kinases governing development, differentiation and intercellular communication (Manning et al., 2002) that are not found in yeast. We here present the first yeast-specific phosphorylation predictor with high sensitivity and specificity. We also demonstrate that existing predictors, which are based primarily on mammalian phosphorylation sites, exhibit lower performance on known phosphorylation sites in yeast proteins. The yeast-specific protein phosphorylation site predictor, NetPhosYeast, will facilitate more confident computational annotation of yeast proteins.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 CONCLUSION
 Acknowledgements
 References
 
We generated a positive data set consisting of yeast serine and threonine phosphorylation sites experimentally identified by mass-spectrometry-driven phosphoproteomics from Gruhler et al. (2005) and Ficarro et al. (2002). We also included annotated phosphorylation sites from the Swiss-Prot database constrained not to include the modifiers ‘potential’, ‘probable’, ‘by similarity’ or ‘autocatalysis’ in the description field. After merging of the three data sets, redundant 7-mer phosphopeptides were removed. This yielded a total of 953 phosphoserine sites and 192 phosphothreonine sites from 675 yeast proteins. The negative data was compiled by randomly collecting non-phosphorylated serines and threonines in yeast phosphoproteins. For comparison, 1696 annotated human serine and threonine phosphorylation sites were extracted from the Swiss-Prot database not including sites with the modifier ‘potential’, ‘probable’, ‘by similarity’ or ‘autocatalysis’ in the description field.

Prior to training the artificial neural networks (ANNs), the negative and positive data set were pooled. An n-fold cross-validation (Nielsen et al., 2003) is typically used to estimate the accuracy of a machine-learning scheme. In n-fold cross-validation, the pooled data set is partitioned into a number of subsets, including one test set and a number of training sets. Using this strategy, the ANN training is performed by shifting the test set stepwise so that all data is used for training and test when completed. For each test set, a number of ANN parameters (window size and number of hidden neurons) are optimized according to the Matthews correlation coefficient (MCC) and an optimal parameter space is chosen. We devised a new evaluation scheme for the n-fold cross-validation procedure. In our scheme, the cross-validation procedure is extended from traditionally using a test and training set to also include an evaluation set. In this approach, the pooled data set is divided into five subsets by random partitioning. For each subset 4-fold cross-validation is performed, but instead of using the test-set performance we calculate the performance based on the respective evaluation set. Thus, we obtain a fair and independent performance estimate of our ANN. We suggest that this training method should be termed cross-evaluation.

The artificial neural network (ANN) used in this study was a standard three-layer feedforward type that has been described previously (Qian et al., 1988). In addition to previously proposed methods for predicting phosphosites amino acids were encoded with the BLOSUM62 scoring matrix (Nielsen et al., 2003) for achieving a more general physicochemical description as compared with the sparse encoding scheme (Qian et al., 1988). In the BLOSUM62 encoding scheme each amino acid is represented by the corresponding vector of numbers in the BLOSUM62 matrix denoting the penalty for replacing that amino acid with any of the 19 other amino acids. The ANN input window size for the sequences and the number of neurons in the hidden layer was subsequently optimized in each cross-validation procedure.


    3 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 CONCLUSION
 Acknowledgements
 References
 
In total, 20 ANNs were trained to classify validated yeast phosphorylation sites by optimizing input window size and the number of hidden units in each cross-validation set. The average output of these 20 networks constitutes the output score from NetPhosYeast. Using the independent evaluation scheme as described in the Methods section, NetPhosYeast achieves an average MCC of 0.74, a sensitivity of 0.84 and a specificity of 0.90 using a threshold of 0.5.

To estimate the ability of NetPhosYeast to identify phosphosites in yeast and human we compared its performance with four existing phosphosite predictors, that allow multiple sequence submissions: Netphos, NetPhosK, KinasePhos and Scansite v2.0 (the respective setting that gives rise to the maximal MCC was used for each prediction method). On average these methods find 82% of all annotated human phosphorylation sites in Swiss-Prot, which is comparable to NetPhosYeast (see Fig. 1). This suggests that there is a considerable overlap in recognition sequence space between the kinases repertoire of the two species. Using the independent evaluation data set, NetPhosYeast identifies 84% of yeast phosphosites, whereas the aforementioned methods identify 71% on average. This indicates the existence of yeast-specific substrate motifs, which are exclusively recognized by NetPhosYeast, and demonstrates the need for a yeast-specific predictor.


Figure 1
View larger version (21K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Sensitivity comparison of KinasePhos, NetPhosK, NetPhos, Scansite v2.0 and NetPhosYeast on the verified human and yeast phosphorylation sites.

 

    4 CONCLUSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 CONCLUSION
 Acknowledgements
 References
 
The method presented here predicts phosphorylation sites in yeast proteins with high specificity (0.90) and sensitivity (0.84) measured on an independent data set. Since many researchers use yeast as the preferred model organism, NetPhosYeast will aid the sequence analysis of proteins in their work. As more data will become available, the next generation of phosphorylation site predictors will move towards both species and kinase specificity.


    Acknowledgements
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 CONCLUSION
 Acknowledgements
 References
 
This project was supported by funds from The Danish National Research Foundation and from EU-FP6: Interaction Proteome.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Thomas Lengauer

Received on November 3, 2006; revised on January 8, 2007; accepted on January 18, 2007

    References
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 CONCLUSION
 Acknowledgements
 References
 

    Ball CA, et al. Integrating functional genomic information into the Saccharomyces genome database. Nucleic Acids Res (2000) 28:77–80.[Abstract/Free Full Text]

    Blom N, et al. PhosphoBase: a database of phosphorylation sites. Nucleic Acids Res (1998) 26:382–386.[Abstract/Free Full Text]

    Blom N, et al. Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J. Mol. Biol (1999) 294:1351–1362.[CrossRef][Web of Science][Medline]

    Blom N, et al. Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics (2004) 4:1633–1649.[CrossRef][Web of Science][Medline]

    Diella F, et al. Phospho.ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins. BMC Bioinformatics (2004) 5:79.[CrossRef][Medline]

    Ficarro SB, et al. Phosphoproteome analysis by mass spectrometry and its application to Saccharomyces cerevisiae. Nat. Biotechnol (2002) 20:301–305.[CrossRef][Web of Science][Medline]

    Gruhler A, et al. Quantitative phosphoproteomics applied to the yeast pheromone signaling pathway. Mol. Cell Proteomics (2005) 4:310–327.[Abstract/Free Full Text]

    Huang HD, et al. KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites. Nucleic Acids Res (2005) 33:W226–W229.[Abstract/Free Full Text]

    Iakoucheva LM, et al. The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res (2004) 32:1037–1049.[Abstract/Free Full Text]

    Jensen ON. Interpreting the protein language using proteomics. Nat. Rev. Mol. Cell. Biol (2006) 7:391–403.[CrossRef][Web of Science][Medline]

    Manning G, et al. Evolution of protein kinase signaling from yeast to man. Trends Biochem. Sci (2002) 27:514–520.[CrossRef][Web of Science][Medline]

    Nielsen M, et al. Reliable prediction of T-cell epitopes using neural networks with novel sequence representations. Protein Sci (2003) 12:1007–17.[CrossRef][Web of Science][Medline]

    Obenauer JC, et al. Scansite 2.0: proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res (2003) 31:3635–3641.[Abstract/Free Full Text]

    Qian N, et al. Predicting the secondary structure of globular proteins using neural network models. J. Mol. Biol (1988) 202:865–884.[CrossRef][Web of Science][Medline]

    Sigrist CJ, et al. PROSITE: a documented database using patterns and profiles as motif descriptors. Brief Bioinform (2002) 3:265–274.[Abstract/Free Full Text]

    Xue Y, et al. GPS: a comprehensive www server for phosphorylation sites prediction. Nucleic Acids Res (2005) 33:W184–W187.[Abstract/Free Full Text]

    Xue Y, et al. PPSP: prediction of PK-specific phosphorylation site with Bayesian decision theory. BMC Bioinformatics (2006) 7:163.[CrossRef][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J R Soc InterfaceHome page
G. A Reeves, D. Talavera, and J. M Thornton
Genome and proteome annotation: organization, interpretation and integration
J R Soc Interface, February 6, 2009; 6(31): 129 - 147.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. ProteomicsHome page
D. Schwartz, M. F. Chou, and G. M. Church
Predicting Protein Post-translational Modifications Using Meta-analysis of Proteome Scale Data Sets
Mol. Cell. Proteomics, February 1, 2009; 8(2): 365 - 379.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
F. J. Navarro, Y. Martin, and J. M. Siverio
Phosphorylation of the Yeast Nitrate Transporter Ynt1 Is Essential for Delivery to the Plasma Membrane during Nitrogen Limitation
J. Biol. Chem., November 7, 2008; 283(45): 31208 - 31217.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. ProteomicsHome page
Y. Xue, J. Ren, X. Gao, C. Jin, L. Wen, and X. Yao
GPS 2.0, a Tool to Predict Kinase-specific Phosphorylation Sites in Hierarchy
Mol. Cell. Proteomics, September 1, 2008; 7(9): 1598 - 1608.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
G. Luo, A. Gruhler, Y. Liu, O. N. Jensen, and R. C. Dickson
The Sphingolipid Long-chain Base-Pkh1/2-Ypk1/2 Signaling Pathway Regulates Eisosome Assembly and Turnover
J. Biol. Chem., April 18, 2008; 283(16): 10433 - 10444.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/7/895    most recent
btm020v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (12)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Ingrell, C. R.
Right arrow Articles by Blom, N.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ingrell, C. R.
Right arrow Articles by Blom, N.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?