Bioinformatics Vol. 18 no. 1 2002
Pages 147-159
© 2002 Oxford University Press
Classifying G-protein coupled receptors with support vector machines
1 Department of Computer Science
2 Department of Computer Engineering
3 Howard Hughes Medical Institute,
University of California, Santa Cruz, CA 95064, USA
Received on June 20, 2001
; revised on August 18, 2001
; accepted on August 23, 2001
Motivation: The enormous amount of protein sequence data uncovered by genome research has increased the demand for computer software that can automate the recognition of new proteins. We discuss the relative merits of various automated methods for recognizing G-Protein Coupled Receptors (GPCRs), a superfamily of cell membrane proteins. GPCRs are found in a wide range of organisms and are central to a cellular signalling network that regulates many basic physiological processes. They are the focus of a significant amount of current pharmaceutical research because they play a key role in many diseases. However, their tertiary structures remain largely unsolved. The methods described in this paper use only primary sequence information to make their predictions. We compare a simple nearest neighbor approach (BLAST), methods based on multiple alignments generated by a statistical profile Hidden Markov Model (HMM), and methods, including Support Vector Machines (SVMs), that transform protein sequences into fixed-length feature vectors.
Results: The last is the most computationally expensive method, but our experiments show that, for those interested in annotation-quality classification, the results are worth the effort. In two-fold cross-validation experiments testing recognition of GPCR subfamilies that bind a specific ligand (such as a histamine molecule), the errors per sequence at the Minimum Error Point (MEP) were 13.7% for multi-class SVMs, 17.1% for our SVMtree method of hierarchical multi-class SVM classification, 25.5% for BLAST, 30% for profile HMMs, and 49% for classification based on nearest neighbor feature vector Kernel Nearest Neighbor (kernNN). The percentage of true positives recognized before the first false positive was 65% for both SVM methods, 13% for BLAST, 5% for profile HMMs and 4% for kernNN.
Availability: We have set up a web server for GPCR subfamily classification based on hierarchical multi-class SVMs at http://www.soe.ucsc.edu/research/compbio/gpcr-subclass. By scanning predicted peptides found in the human genome with the SVMtree server, we have identified a large number of genes that encode GPCRs. A list of our predictions for human GPCRs is available at http://www.soe.ucsc.edu/research/compbio/gpcr·hg/class·results. We also provide suggested subfamily classification for 18 sequences previously identified as unclassified Class A (rhodopsin-like) GPCRs in GPCRDB (Horn et al. , Nucleic Acids Res. , 26, 277281, 1998), available at http://www.soe.ucsc.edu/research/compbio/gpcr/classA·unclassified/.
Contact: rachelk{at}soe.ucsc.edu
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
R. Schwarz, P. N. Seibel, S. Rahmann, C. Schoen, M. Huenerberg, C. Muller-Reible, T. Dandekar, R. Karchin, J. Schultz, and T. Muller Detecting species-site dependencies in large multiple sequence alignments Nucleic Acids Res., October 1, 2009; 37(18): 5959 - 5968. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Kumar and L. Cowen Augmented training of hidden Markov models to recognize remote homologs via simulated evolution Bioinformatics, July 1, 2009; 25(13): 1602 - 1608. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. T. A. Shamim, M. Anwaruddin, and H.A. Nagarajaram Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs Bioinformatics, December 15, 2007; 23(24): 3320 - 3327. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. N. Davies, A. Secker, A. A. Freitas, M. Mendao, J. Timmis, and D. R. Flower On the hierarchical classification of G protein-coupled receptors Bioinformatics, December 1, 2007; 23(23): 3113 - 3118. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-R. Xu, J.-X. Zhang, B.-C. Han, L. Liang, and Z.-L. Ji CytoSVM: an advanced server for identification of cytokine-receptor interactions Nucleic Acids Res., July 13, 2007; 35(suppl_2): W538 - W542. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Gough Genomic scale sub-family assignment of protein domains Nucleic Acids Res., July 28, 2006; 34(13): 3625 - 3633. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. R. Li, H. H. Lin, L. Y. Han, L. Jiang, X. Chen, and Y. Z. Chen PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W32 - W37. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. J. Zheng, L. Y. Han, C. W. Yap, Z. L. Ji, Z. W. Cao, and Y. Z. Chen Therapeutic targets: progress of their exploration and investigation of their characteristics. Pharmacol. Rev., June 1, 2006; 58(2): 259 - 279. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. M. Kasson, J. B. Huppa, M. M. Davis, and A. T. Brunger A hybrid machine-learning approach for segmentation of protein localization data Bioinformatics, October 1, 2005; 21(19): 3778 - 3786. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Bhasin and G. P. S. Raghava GPCRsclass: a web tool for the classification of amine type of G-protein-coupled receptors Nucleic Acids Res., July 1, 2005; 33(suppl_2): W143 - W147. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Yabuki, T. Muramatsu, T. Hirokawa, H. Mukai, and M. Suwa GRIFFIN: a system for predicting GPCR-G-protein coupling selectivity using a support vector machine and a hidden Markov model Nucleic Acids Res., July 1, 2005; 33(suppl_2): W148 - W153. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. J. Barwell, J. H. Boysen, W. Xu, and A. P. Mitchell Relationship of DFG16 to the Rim101p pH Response Pathway in Saccharomyces cerevisiae and Candida albicans Eukaryot. Cell, May 1, 2005; 4(5): 890 - 899. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Y. Han, C. Z. Cai, Z. L. Ji, Z. W. Cao, J. Cui, and Y. Z. Chen Predicting functional family of novel enzymes irrespective of sequence similarity: a statistical learning approach Nucleic Acids Res., December 7, 2004; 32(21): 6437 - 6444. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. P. Xing and R. M. Karp MotifPrototyper: A Bayesian profile model for motif families PNAS, July 20, 2004; 101(29): 10523 - 10528. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Bhasin and G. P. S. Raghava GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors Nucleic Acids Res., July 1, 2004; 32(suppl_2): W383 - W389. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Bhasin and G. P. S. Raghava Classification of Nuclear Receptors Based on Amino Acid Composition and Dipeptide Composition J. Biol. Chem., May 28, 2004; 279(22): 23262 - 23266. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Y. HAN, C. Z. CAI, S. L. LO, M. C.M. CHUNG, and Y. Z. CHEN Prediction of RNA-binding proteins from primary sequence by a support vector machine approach RNA, March 1, 2004; 10(3): 355 - 368. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.Z. Cai, L.Y. Han, Z.L. Ji, X. Chen, and Y.Z. Chen SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence Nucleic Acids Res., July 1, 2003; 31(13): 3692 - 3697. [Abstract] [Full Text] [PDF] |
||||






