Bioinformatics Vol. 17 no. 8 2001
Pages 721-728
© 2001 Oxford University Press
Support vector machine approach for protein subcellular localization prediction
Institute of Bioinformatics, State Key Laboratory of Biomembrane and Membrane Biotechnology, Department of Biological Sciences and Biotechnology, Tsinghua University, Beijing 100084, Peoples Republic of China
Received on December 12, 2000
; revised on March 28, 2001
; accepted on April 24, 2001
Motivation: Subcellular localization is a key functional characteristic of proteins. A fully automatic and reliable prediction system for protein subcellular localization is needed, especially for the analysis of large-scale genome sequences.
Results: In this paper, Support Vector Machine has been introduced to predict the subcellular localization of proteins from their amino acid compositions. The total prediction accuracies reach 91.4% for three subcellular locations in prokaryotic organisms and 79.4% for four locations in eukaryotic organisms. Predictions by our approach are robust to errors in the protein N-terminal sequences. This new approach provides superior prediction performance compared with existing algorithms based on amino acid composition and can be a complementary method to other existing methods based on sorting signals.
Availability: A web server implementing the prediction method is available at http://www.bioinfo.tsinghua.edu.cn/SubLoc/.
Contact: sunzhr{at}mail.tsinghua.edu.cn; huasj00{at}mails.tsinghua.edu.cn
Supplementary information: Supplementary material is available at http://www.bioinfo.tsinghua.edu.cn/SubLoc/.
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
S. Keerthikumar, S. Bhadra, K. Kandasamy, R. Raju, Y.L. Ramachandra, C. Bhattacharyya, K. Imai, O. Ohara, S. Mohan, and A. Pandey Prediction of Candidate Primary Immunodeficiency Disease Genes Using a Support Vector Machine Learning Approach DNA Res, December 1, 2009; 16(6): 345 - 351. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Mitsuda and M. Ohme-Takagi Functional Analysis of Transcription Factors in Arabidopsis Plant Cell Physiol., July 1, 2009; 50(7): 1232 - 1248. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Urban, I. Behm-Ansmant, C. Branlant, and Y. Motorin RNA Sequence and Two-dimensional Structure Features Required for Efficient Substrate Modification by the Saccharomyces cerevisiae RNA:{Psi}-Synthase Pus7p J. Biol. Chem., February 27, 2009; 284(9): 5845 - 5858. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Yoshihara, K. Inoue, D. Schichnes, S. Ruzin, W. Inwood, and S. Kustu An Rh1-GFP Fusion Protein Is in the Cytoplasmic Membrane of a White Mutant Strain of Chlamydomonas reinhardtii Mol Plant, November 14, 2008; (2008) ssn074v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Lee, H.-Y. Chuang, A. Beyer, M.-K. Sung, W.-K. Huh, B. Lee, and T. Ideker Protein networks markedly improve prediction of subcellular localization in multiple eukaryotic species Nucleic Acids Res., November 1, 2008; 36(20): e136 - e136. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Ma and J. Huang Penalized feature selection and classification in bioinformatics Brief Bioinform, September 1, 2008; 9(5): 392 - 403. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Song, H. Tan, K. Takemoto, and T. Akutsu HSEpred: predict half-sphere exposure from protein sequences Bioinformatics, July 1, 2008; 24(13): 1489 - 1497. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. M. Lauro, K. Tran, A. Vezzi, N. Vitulo, G. Valle, and D. H. Bartlett Large-Scale Transposon Mutagenesis of Photobacterium profundum SS9 Reveals New Genetic Loci Important for Growth at Low Temperature and High Pressure J. Bacteriol., March 1, 2008; 190(5): 1699 - 1709. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Casadio, P. L. Martelli, and A. Pierleoni The prediction of protein subcellular localization from sequence: a shortcut to functional genome annotation Brief Funct Genomic Proteomic, February 18, 2008; (2008) eln003v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Shen, J. Bai, and M. Vihinen Physicochemical feature-based classification of amino acid mutations Protein Eng. Des. Sel., January 1, 2008; 21(1): 37 - 44. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. J. Hillson, P. Hu, G. L. Andersen, and L. Shapiro Caulobacter crescentus as a Whole-Cell Uranium Biosensor Appl. Envir. Microbiol., December 1, 2007; 73(23): 7615 - 7621. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Song, Z. Yuan, H. Tan, T. Huber, and K. Burrage Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure Bioinformatics, December 1, 2007; 23(23): 3147 - 3154. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-B. Shen and K.-C. Chou Nuc-PLoc: a new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM Protein Eng. Des. Sel., November 10, 2007; (2007) gzm057v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Liu, S. Kang, C. Tang, L. B.M. Ellis, and T. Li Meta-prediction of protein subcellular localization with reduced voting Nucleic Acids Res., August 1, 2007; (2007) gkm562v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Jiang, H. Wu, J. Wei, F. Sang, X. Sun, and Z. Lu RF-DYMHC: detecting the yeast meiotic recombination hotspots and coldspots by random forest model using gapped dinucleotide composition features Nucleic Acids Res., July 13, 2007; 35(suppl_2): W47 - W51. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Lunn Compartmentation in plant metabolism J. Exp. Bot., January 1, 2007; 58(1): 35 - 47. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Hao, X. Li, T. Qiao, R. Du, G. Zhang, and D. Fan Subcellular Localization of CIAPIN1 J. Histochem. Cytochem., December 1, 2006; 54(12): 1437 - 1444. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. A. Haveman, D. E. Holmes, Y.-H. R. Ding, J. E. Ward, R. J. DiDonato Jr., and D. R. Lovley c-Type Cytochromes in Pelobacter carbinolicus Appl. Envir. Microbiol., November 1, 2006; 72(11): 6980 - 6985. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q.-B. Gao and Z.-Z. Wang Classification of G-protein coupled receptors at four levels Protein Eng. Des. Sel., November 1, 2006; 19(11): 511 - 516. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Lee, D.-W. Kim, D. Na, K. H. Lee, and D. Lee PLPD: reliable protein localization prediction from imbalanced and overlapped datasets Nucleic Acids Res., October 18, 2006; 34(17): 4655 - 4666. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Guo and Y. Lin TSSub: eukaryotic protein subcellular localization by extracting features from profiles Bioinformatics, July 15, 2006; 22(14): 1784 - 1785. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. R. Li, H. H. Lin, L. Y. Han, L. Jiang, X. Chen, and Y. Z. Chen PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W32 - W37. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Guda pTARGET: a web server for predicting protein subcellular localization. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W210 - W213. [Abstract] [Full Text] [PDF] |
||||
![]() |
M.-J. Han and S. Y. Lee The Escherichia coli Proteome: Past, Present, and Future Prospects Microbiol. Mol. Biol. Rev., June 1, 2006; 70(2): 362 - 439. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Li, D. W. Ehrhardt, and S. Y. Rhee Systematic Analysis of Arabidopsis Organelles and a Protein Localization Database for Facilitating Fluorescent Tagging of Full-Length Arabidopsis Proteins Plant Physiology, June 1, 2006; 141(2): 527 - 539. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Hoglund, P. Donnes, T. Blum, H.-W. Adolph, and O. Kohlbacher MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition Bioinformatics, May 15, 2006; 22(10): 1158 - 1165. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Chen, N. Huang, and Z. Sun SubLoc: a server/client suite for protein subcellular location based on SOAP Bioinformatics, February 1, 2006; 22(3): 376 - 377. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Mehta, M. V. Coppi, S. E. Childers, and D. R. Lovley Outer Membrane c-Type Cytochromes Required for Fe(III) and Mn(IV) Oxide Reduction in Geobacter sulfurreducens Appl. Envir. Microbiol., December 1, 2005; 71(12): 8634 - 8641. [Abstract] [Full Text] [PDF] |
||||
![]() |
K.-J. Park, M. M. Gromiha, P. Horton, and M. Suwa Discrimination of outer membrane proteins using support vector machines Bioinformatics, December 1, 2005; 21(23): 4223 - 4229. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Guda and S. Subramaniam TARGET: a new method for predicting protein subcellular localization in eukaryotes Bioinformatics, November 1, 2005; 21(21): 3963 - 3969. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. C. Kulkarni, R. Vigneshwar, V. K. Jayaraman, and B. D. Kulkarni Identification of coding and non-coding sequences using local Holder exponent formalism Bioinformatics, October 15, 2005; 21(20): 3818 - 3823. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Heazlewood, J. Tonti-Filippini, R. E. Verboom, and A. H. Millar Combining Experimental and Predicted Datasets for Determination of the Subcellular Location of Proteins in Arabidopsis Plant Physiology, October 1, 2005; 139(2): 598 - 609. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Huang, H. Chen, and Z. Sun CTKPred: an SVM-based method for the prediction and classification of the cytokine superfamily Protein Eng. Des. Sel., August 1, 2005; 18(8): 365 - 368. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Xie, A. Li, M. Wang, Z. Fan, and H. Feng LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST Nucleic Acids Res., July 1, 2005; 33(suppl_2): W105 - W110. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Bhasin and G. P. S. Raghava GPCRsclass: a web tool for the classification of amine type of G-protein-coupled receptors Nucleic Acids Res., July 1, 2005; 33(suppl_2): W143 - W147. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Bhasin, A. Garg, and G. P. S. Raghava PSLpred: prediction of subcellular localization of bacterial proteins Bioinformatics, May 15, 2005; 21(10): 2522 - 2524. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Boden and J. Hawkins Prediction of subcellular localization using sequence-biased recurrent networks Bioinformatics, May 15, 2005; 21(10): 2279 - 2286. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Garg, M. Bhasin, and G. P. S. Raghava Support Vector Machine-based Method for Subcellular Localization of Human Proteins Using Amino Acid Compositions, Their Order, and Similarity Search J. Biol. Chem., April 15, 2005; 280(15): 14427 - 14432. [Abstract] [Full Text] [PDF] |
||||
![]() |
K.-C. Chou and Y.-D. Cai Predicting protein localization in budding Yeast Bioinformatics, April 1, 2005; 21(7): 944 - 950. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Gardy, M. R. Laird, F. Chen, S. Rey, C. J. Walsh, M. Ester, and F. S. L. Brinkman PSORTb v.2.0: Expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis Bioinformatics, March 1, 2005; 21(5): 617 - 623. [Abstract] [Full Text] [PDF] |
||||
![]() |
X.-S. Jiang, J. Dai, Q.-H. Sheng, L. Zhang, Q.-C. Xia, J.-R. Wu, and R. Zeng A Comparative Proteomic Strategy for Subcellular Proteome Research: Icat Approach Coupled with Bioinformatics Prediction to Ascertain Rat Liver Mitochondrial Proteins and Indication of Mitochondrial Localization for Catalase Mol. Cell. Proteomics, January 1, 2005; 4(1): 12 - 34. [Abstract] [Full Text] [PDF] |
||||
![]() |
K.-C. Chou Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes Bioinformatics, January 1, 2005; 21(1): 10 - 19. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Rey, M. Acab, J. L. Gardy, M. R. Laird, K. deFays, C. Lambert, and F. S. L. Brinkman PSORTdb: a protein subcellular localization database for bacteria Nucleic Acids Res., January 1, 2005; 33(suppl_1): D164 - D168. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Heazlewood and A. H. Millar AMPDB: the Arabidopsis Mitochondrial Protein Database Nucleic Acids Res., January 1, 2005; 33(suppl_1): D605 - D610. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Huff, O. Rosorius, A. M. Otto, C. S. G. Muller, E. Ballweber, E. Hannappel, and H. G. Mannherz Nuclear localisation of the G-actin sequestering peptide thymosin {beta}4 J. Cell Sci., October 15, 2004; 117(22): 5333 - 5341. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. S. Scott, D. Y. Thomas, and M. T. Hallett Predicting Subcellular Localization via Protein Motif Co-Occurrence Genome Res., October 1, 2004; 14(10a): 1957 - 1966. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Bhasin and G. P. S. Raghava GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors Nucleic Acids Res., July 1, 2004; 32(suppl_2): W383 - W389. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Bhasin and G. P. S. Raghava ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST Nucleic Acids Res., July 1, 2004; 32(suppl_2): W414 - W419. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Nair and B. Rost LOCnet and LOCtarget: sub-cellular localization for structural genomics targets Nucleic Acids Res., July 1, 2004; 32(suppl_2): W517 - W521. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Bhasin and G. P. S. Raghava Classification of Nuclear Receptors Based on Amino Acid Composition and Dipeptide Composition J. Biol. Chem., May 28, 2004; 279(22): 23262 - 23266. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Shen and M. Vihinen Conservation and covariance in PH domain sequences: physicochemical profile and information theoretical analysis of XLA-causing mutations in the Btk PH domain Protein Eng. Des. Sel., March 1, 2004; 17(3): 267 - 276. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Guo, S. Hua, X. Ji, and Z. Sun DBSubLoc: database of protein subcellular localization Nucleic Acids Res., January 1, 2004; 32(90001): D122 - 124. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Vinayagam, G. Pugalenthi, R. Rajesh, and R. Sowdhamini DSDBASE: a consortium of native and modelled disulphide bonds in proteins Nucleic Acids Res., January 1, 2004; 32(90001): D200 - 202. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Heazlewood, J. S. Tonti-Filippini, A. M. Gout, D. A. Day, J. Whelan, and A. H. Millar Experimental Analysis of the Arabidopsis Mitochondrial Proteome Highlights Signaling and Regulatory Components, Provides Assessment of Targeting Prediction Programs, and Indicates Plant-Specific Mitochondrial Proteins PLANT CELL, January 1, 2004; 16(1): 241 - 256. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bhaduri and R. Sowdhamini A genome-wide survey of human tyrosine phosphatases Protein Eng. Des. Sel., December 1, 2003; 16(12): 881 - 888. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Gardy, C. Spencer, K. Wang, M. Ester, G. E. Tusnady, I. Simon, S. Hua, K. deFays, C. Lambert, K. Nakai, et al. PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria Nucleic Acids Res., July 1, 2003; 31(13): 3613 - 3617. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Mott, J. Schultz, P. Bork, and C. P. Ponting Predicting Protein Cellular Localization Using a Domain Projection Method Genome Res., August 1, 2002; 12(8): 1168 - 1174. [Abstract] [Full Text] [PDF] |
||||


















