Bioinformatics Vol. 19 no. 17 2003
pages 2294-2301
© 2003 Oxford University Press
Efficient remote homology detection using local structure
1 School of Computing, National University of Singapore, Singapore 117543 and 2 Department of Biology, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
Received on February 27, 2003
; revised on May 23, 2003
; accepted on June 3, 2003
Motivation: The function of an unknown biological sequence can often be accurately inferred if we are able to map this unknown sequence to its corresponding homologous family. At present, discriminative methods such as SVM-Fisher and SVM-pairwise, which combine support vector machine (SVM) and sequence similarity, are recognized as the most accurate methods, with SVM-pairwise being the most accurate. However, these methods typically encode sequence information into their feature vectors and ignore the structure information. They are also computationally inefficient. Based on these observations, we present an alternative method for SVM-based protein classification. Our proposed method, SVM-I-sites, utilizes structure similarity for remote homology detection.
Result: We run experiments on the Structural Classification of Proteins 1.53 data set. The results show that SVM-I-sites is more efficient than SVM-pairwise. Further, we find that SVM-I-sites outperforms sequence-based methods such as PSI-BLAST, SAM, and SVM-Fisher while achieving a comparable performance with SVM-pairwise.
Availability: I-sites server is accessible through the web at http://www.bioinfo.rpi.edu. Programs are available upon request for academics. Licensing agreements are available for commercial interests. The framework of encoding local structure into feature vector is available upon request.
Contact: houyuna{at}comp.nus.edu.sg
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. R. Shah, C. S. Oehmen, and B.-J. Webb-Robertson SVM-HUSTLE--an iterative semi-supervised machine learning approach for pairwise protein remote homology detection Bioinformatics, March 15, 2008; 24(6): 783 - 790. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Krause, A. C. McHardy, T. W. Nattkemper, A. Puhler, J. Stoye, and F. Meyer GISMO--gene identification using a support vector machine for ORF classification Nucleic Acids Res., January 28, 2007; 35(2): 540 - 549. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q.-w. Dong, X.-l. Wang, and L. Lin Application of latent semantic analysis to protein remote homology detection Bioinformatics, February 1, 2006; 22(3): 285 - 290. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Rangwala and G. Karypis Profile-based direct kernels for remote homology detection and fold recognition Bioinformatics, December 1, 2005; 21(23): 4239 - 4247. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Wang and R. Samudrala FSSA: a novel method for identifying functional signatures from structural alignments Bioinformatics, July 1, 2005; 21(13): 2969 - 2977. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Han, B.-c. Lee, S. T. Yu, C.-s. Jeong, S. Lee, and D. Kim Fold recognition by combining profile-profile alignment and support vector machine Bioinformatics, June 1, 2005; 21(11): 2667 - 2673. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Yuan and C. Bystroff Non-sequential structure-based alignments reveal topology-independent core packing arrangements in proteins Bioinformatics, April 1, 2005; 21(7): 1010 - 1019. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Hou, S.-R. Jun, C. Zhang, and S.-H. Kim From The Cover: Global mapping of the protein structure space and application in structure-based inference of protein function PNAS, March 8, 2005; 102(10): 3651 - 3656. [Abstract] [Full Text] [PDF] |
||||


