Bioinformatics Advance Access published online on August 12, 2008
Bioinformatics, doi:10.1093/bioinformatics/btn418
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Computational Prediction of Human Proteins That Can Be Secreted into the Bloodstream
1 Department of Biochemistry and Molecular Biology and 2Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
3 Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
*To whom correspondence should be addressed. Dr. Jaun Cui, E-mail: juancui{at}csbl.bmb.uga.edu.edu
| Abstract |
|---|
We present a novel computational method for predicting which proteins from highly and abnormally expressed genes in diseased human tissues, such as cancers, can be secreted into the bloodstream, suggesting possible marker proteins for follow-up serum proteomic studies. A main challenging issue in tackling this problem is that our understanding about the downstream localization after proteins are secreted outside of cells is very limited and not sufficient to provide useful hints about secretion to the bloodstream. To bypass this difficulty, we have taken a data mining approach by first collecting, through extensive literature searches, human proteins that are known to be secreted into the bloodstream due to various pathological conditions as detected by previous proteomic studies, and then asking the question: "what do these secreted proteins have in common in terms of their physical and chemical properties, amino acid sequence and structural features that can be used to predict them?" We have identified a list of features such as signal peptides, transmembrane domains, glycosylation sites, disordered regions, secondary structural content, hydrophobicity and polarity measures that show relevance to protein secretion. Using these features, we have trained a Support Vector Machine (SVM) based classifier to predict protein secretion to the bloodstream. On a large test set containing 98 secretory proteins and 6,601 non-secretory proteins of human, our classifier achieved
90 % prediction sensitivity and
98% prediction specificity. Several additional datasets are used to further assess the performance of our classifier. On a set of 122 proteins that were found to be of abnormally high abundance in human blood due to various cancers, our program predicted 62 as blood-secreted proteins. By applying our program to abnormally highly expressed genes in gastric cancer and lung cancer tissues detected through microarray gene-expression studies, we predicted 13 and 31 as blood secreted, respectively, suggesting that they could serve as potential biomarkers for these two cancers, respectively. Our study demonstrated that our method can provide highly useful information to link genomic and proteomic studies for disease biomarker discovery. Our software can be accessed at http://csbl1.bmb.uga.edu/cgi-bin/Secretion/secretion.cgi
Associate Editor: Dr. Limsoon Wong
Received on April 9, 2008; revised on August 6, 2008; accepted on August 7, 2008