Bioinformatics Vol. 17 no. 4 2001
Pages 349-358
© 2001 Oxford University Press
Original Paper |
Multi-class protein fold recognition using support vector machines and neural networks
NERSC Division, Lawrence Berkeley National Laboratory, University of California, Berkeley, CA 94720, USA
Received on August 2, 2000
; revised on November 4, 2000
; accepted on November 16, 2000
Motivation: Protein fold recognition is an important approach to structure discovery without relying on sequence similarity. We study this approach with new multi-class classification methods and examined many issues important for a practical recognition system.
Results: Most current discriminative methods for protein fold prediction use the one-against-others method, which has the well-known False Positives problem. We investigated two new methods: the unique one-against-others and the all-against-all methods. Both improve prediction accuracy by 14110% on a dataset containing 27 SCOP folds. We used the Support Vector Machine (SVM) and the Neural Network (NN) learning methods as base classifiers. SVMs converges fast and leads to high accuracy. When scores of multiple parameter datasets are combined, majority voting reduces noise and increases recognition accuracy. We examined many issues involved with large number of classes, including dependencies of prediction accuracy on the number of folds and on the number of representatives in a fold. Overall, recognition systems achieve 56% fold prediction accuracy on a protein test dataset, where most of the proteins have below 25% sequence identity with the proteins used in training.
Supplementary information: The protein parameter datasets used in this paper are available online (http://www.nersc.gov/~cding/protein).
Contact: chqding{at}lbl.gov; ildubchak{at}lbl.gov
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
S. Keerthikumar, S. Bhadra, K. Kandasamy, R. Raju, Y.L. Ramachandra, C. Bhattacharyya, K. Imai, O. Ohara, S. Mohan, and A. Pandey Prediction of Candidate Primary Immunodeficiency Disease Genes Using a Support Vector Machine Learning Approach DNA Res, December 1, 2009; 16(6): 345 - 351. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Bostan, R. Greiner, D. Szafron, and P. Lu Predicting homologous signaling pathways using machine learning Bioinformatics, November 15, 2009; 25(22): 2913 - 2920. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. Dong, S. Zhou, and J. Guan A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation Bioinformatics, October 15, 2009; 25(20): 2655 - 2662. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Lin, B. Hu, L. Chen, P. Sun, Y. Fan, P. Wu, and X. Chen Computational Identification of Potential Molecular Interactions in Arabidopsis Plant Physiology, September 1, 2009; 151(1): 34 - 46. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Rackovsky Sequence physical properties encode the global organization of protein structure space PNAS, August 25, 2009; 106(34): 14345 - 14348. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Guo and X. Gao A novel hierarchical ensemble classifier for protein fold recognition Protein Eng. Des. Sel., November 1, 2008; 21(11): 659 - 664. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Damoulas and M. A. Girolami Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection Bioinformatics, May 15, 2008; 24(10): 1264 - 1270. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Lama and M. Girolami vbmp: Variational Bayesian Multinomial Probit Regression for multi-class classification in R Bioinformatics, January 1, 2008; 24(1): 135 - 136. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. T. A. Shamim, M. Anwaruddin, and H.A. Nagarajaram Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs Bioinformatics, December 15, 2007; 23(24): 3320 - 3327. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Chen and L. Kurgan PFRES: protein fold classification by using evolutionary information and predicted secondary structure Bioinformatics, November 1, 2007; 23(21): 2843 - 2850. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Hochreiter, M. Heusel, and K. Obermayer Fast model-based protein homology detection without alignment Bioinformatics, July 15, 2007; 23(14): 1728 - 1736. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-R. Xu, J.-X. Zhang, B.-C. Han, L. Liang, and Z.-L. Ji CytoSVM: an advanced server for identification of cytokine-receptor interactions Nucleic Acids Res., July 13, 2007; 35(suppl_2): W538 - W542. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Y. Tao, J. Hoyt, and Yan Feng A Support Vector Machine Classifier for Recognizing Mitotic Subphases Using High-Content Screening Data J Biomol Screen, June 1, 2007; 12(4): 490 - 496. [Abstract] [PDF] |
||||
![]() |
J. Shen, J. Zhang, X. Luo, W. Zhu, K. Yu, K. Chen, Y. Li, and H. Jiang Predicting protein-protein interactions based only on sequences information PNAS, March 13, 2007; 104(11): 4337 - 4341. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-B. Shen and K.-C. Chou Ensemble classifier for protein fold pattern recognition Bioinformatics, July 15, 2006; 22(14): 1717 - 1722. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. R. Li, H. H. Lin, L. Y. Han, L. Jiang, X. Chen, and Y. Z. Chen PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W32 - W37. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. H. Lin, L. Y. Han, H. L. Zhang, C. J. Zheng, B. Xie, and Y. Z. Chen Prediction of the functional class of lipid binding proteins from sequence-derived properties irrespective of sequence similarity J. Lipid Res., April 1, 2006; 47(4): 824 - 831. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Idicula-Thomas, A. J. Kulkarni, B. D. Kulkarni, V. K. Jayaraman, and P. V. Balaji A support vector machine-based method for predicting the propensity of a protein to be soluble or to form inclusion body on overexpression in Escherichia coli Bioinformatics, February 1, 2006; 22(3): 278 - 284. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Bhardwaj, R. E. Langlois, G. Zhao, and H. Lu Kernel-based machine learning protocol for predicting DNA-binding proteins Nucleic Acids Res., November 10, 2005; 33(20): 6486 - 6493. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. C. Kulkarni, R. Vigneshwar, V. K. Jayaraman, and B. D. Kulkarni Identification of coding and non-coding sequences using local Holder exponent formalism Bioinformatics, October 15, 2005; 21(20): 3818 - 3823. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. M. Kasson, J. B. Huppa, M. M. Davis, and A. T. Brunger A hybrid machine-learning approach for segmentation of protein localization data Bioinformatics, October 1, 2005; 21(19): 3778 - 3786. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Chen and H.-X. Zhou Prediction of solvent accessibility and sites of deleterious mutations from protein sequence Nucleic Acids Res., June 3, 2005; 33(10): 3193 - 3199. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Zheng and S. Doniach Fold recognition aided by constraints from small angle X-ray scattering data Protein Eng. Des. Sel., May 1, 2005; 18(5): 209 - 219. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. R. Bradford and D. R. Westhead Improved prediction of protein-protein binding sites using a support vector machines approach Bioinformatics, April 15, 2005; 21(8): 1487 - 1494. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Y. Han, C. Z. Cai, Z. L. Ji, Z. W. Cao, J. Cui, and Y. Z. Chen Predicting functional family of novel enzymes irrespective of sequence similarity: a statistical learning approach Nucleic Acids Res., December 7, 2004; 32(21): 6437 - 6444. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Wang, J. Yang, G.-P. Liu, Z.-J. Xu, and K.-C. Chou Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition Protein Eng. Des. Sel., June 1, 2004; 17(6): 509 - 516. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Y. HAN, C. Z. CAI, S. L. LO, M. C.M. CHUNG, and Y. Z. CHEN Prediction of RNA-binding proteins from primary sequence by a support vector machine approach RNA, March 1, 2004; 10(3): 355 - 368. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Bindewald, A. Cestaro, J. Hesser, M. Heiler, and S. C.E. Tosatto MANIFOLD: protein fold recognition based on secondary structure, sequence similarity and enzyme classification Protein Eng. Des. Sel., November 1, 2003; 16(11): 785 - 789. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Kim and H. Park Protein secondary structure prediction based on an improved support vector machines approach Protein Eng. Des. Sel., August 1, 2003; 16(8): 553 - 560. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.Z. Cai, L.Y. Han, Z.L. Ji, X. Chen, and Y.Z. Chen SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence Nucleic Acids Res., July 1, 2003; 31(13): 3692 - 3697. [Abstract] [Full Text] [PDF] |
||||
![]() |
K.-C. Chou and Y.-D. Cai Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location J. Biol. Chem., November 22, 2002; 277(48): 45765 - 45769. [Abstract] [Full Text] [PDF] |
||||









