Skip Navigation


Bioinformatics Advance Access originally published online on June 23, 2009
Bioinformatics 2009 25(17):2200-2207; doi:10.1093/bioinformatics/btp386
This Article
Right arrow Full Text
Right arrow Full Text (Print PDF)
Right arrow Supplementary Data
Right arrow All Versions of this Article:
25/17/2200    most recent
btp386v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Magnan, C. N.
Right arrow Articles by Baldi, P.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Magnan, C. N.
Right arrow Articles by Baldi, P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2009. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

SOLpro: accurate sequence-based prediction of protein solubility

Christophe N. Magnan , Arlo Randall and Pierre Baldi *

Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California, Irvine, CA, USA

* To whom correspondence should be addressed.


   Abstract

Motivation: Protein insolubility is a major obstacle for many experimental studies. A sequence-based prediction method able to accurately predict the propensity of a protein to be soluble on overexpression could be used, for instance, to prioritize targets in large-scale proteomics projects and to identify mutations likely to increase the solubility of insoluble proteins.

Results: Here, we first curate a large, non-redundant and balanced training set of more than 17 000 proteins. Next, we extract and study 23 groups of features computed directly or predicted (e.g. secondary structure) from the primary sequence. The data and the features are used to train a two-stage support vector machine (SVM) architecture. The resulting predictor, SOLpro, is compared directly with existing methods and shows significant improvement according to standard evaluation metrics, with an overall accuracy of over 74% estimated using multiple runs of 10-fold cross-validation.

Availability: SOLpro is integrated in the SCRATCH suite of predictors and is available for download as a standalone application and as a web server at: http://scratch.proteomics.ics.uci.edu.

Contact: pfbaldi{at}ics.uci.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Associate Editor: Burkhard Rost


Received on April 13, 2009; revised on June 9, 2009; accepted on June 17, 2009

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.