Bioinformatics Advance Access originally published online on May 3, 2006
Bioinformatics 2006 22(14):1717-1722; doi:10.1093/bioinformatics/btl170
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Ensemble classifier for protein fold pattern recognition
1 Institute of Image Processing and Pattern Recognition, Shanghai Jiaotong University Shanghai 200030, China
2 Gordon Life Science Institute San Diego, CA 92130, USA
*To whom correspondence should be addressed.
Motivation: Prediction of protein folding patterns is one level deeper than that of protein structural classes, and hence is much more complicated and difficult. To deal with such a challenging problem, the ensemble classifier was introduced. It was formed by a set of basic classifiers, with each trained in different parameter systems, such as predicted secondary structure, hydrophobicity, van der Waals volume, polarity, polarizability, as well as different dimensions of pseudo-amino acid composition, which were extracted from a training dataset. The operation engine for the constituent individual classifiers was OET-KNN (optimized evidence-theoretic k-nearest neighbors) rule. Their outcomes were combined through a weighted voting to give a final determination for classifying a query protein. The recognition was to find the true fold among the 27 possible patterns.
Results: The overall success rate thus obtained was 62% for a testing dataset where most of the proteins have <25% sequence identity with the proteins used in training the classifier. Such a rate is 621% higher than the corresponding rates obtained by various existing NN (neural networks) and SVM (support vector machines) approaches, implying that the ensemble classifier is very promising and might become a useful vehicle in protein science, as well as proteomics and bioinformatics.
Availability: The ensemble classifier, called PFP-Pred, is available as a web-server at http://202.120.37.186/bioinf/fold/PFP-Pred.htm for public usage.
Contact: lifesci-sjtu{at}san.rr.com
Supplementary information: Supplementary data are available on Bioinformatics online.
Received on March 31, 2006; revised on April 26, 2006; accepted on April 27, 2006
This article has been cited by other articles:
![]() |
W.-Z. Lin, X. Xiao, and K.-C. Chou GPCR-GIA: a web-server for identifying G-protein coupled receptors and their families with grey incidence analysis Protein Eng. Des. Sel., November 1, 2009; 22(11): 699 - 705. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. Dong, S. Zhou, and J. Guan A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation Bioinformatics, October 15, 2009; 25(20): 2655 - 2662. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Rackovsky Sequence physical properties encode the global organization of protein structure space PNAS, August 25, 2009; 106(34): 14345 - 14348. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Guo and X. Gao A novel hierarchical ensemble classifier for protein fold recognition Protein Eng. Des. Sel., November 1, 2008; 21(11): 659 - 664. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. V. Aguilar, L. W. Leung, E. Wang, S. C. Weaver, and C. F. Basler A Five-Amino-Acid Deletion of the Eastern Equine Encephalitis Virus Capsid Protein Attenuates Replication in Mammalian Systems but Not in Mosquito Cells J. Virol., July 15, 2008; 82(14): 6972 - 6983. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Damoulas and M. A. Girolami Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection Bioinformatics, May 15, 2008; 24(10): 1264 - 1270. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. T. A. Shamim, M. Anwaruddin, and H.A. Nagarajaram Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs Bioinformatics, December 15, 2007; 23(24): 3320 - 3327. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Chen and L. Kurgan PFRES: protein fold classification by using evolutionary information and predicted secondary structure Bioinformatics, November 1, 2007; 23(21): 2843 - 2850. [Abstract] [Full Text] [PDF] |
||||



