Bioinformatics Advance Access published online on July 12, 2006
Bioinformatics, doi:10.1093/bioinformatics/btl376
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Abteilung Bioinformatik, Institut für Mikrobiologie und Genetik, Georg-August-Universität Göttingen, Goldschmidtstr. 1, 37077 Göttingen, Germany
* To whom correspondence should be addressed.
Motivation: Remote homology detection is among the most intensively researched problems in bioinformatics. Currently discriminative approaches, especially kernel-based methods, provide the most accurate results. However, kernel methods also show several drawbacks: in many cases prediction of new sequences is computationally expensive, often kernels lack an interpretable model for analysis of characteristic sequence features, and finally most approaches make use of so-called hyperparameters which complicate the application of methods across different data sets. Results: We introduce a feature vector representation for protein sequences based on distances between short oligomers. The corresponding feature space arises from distance histograms for any possible pair of K-mers. Our distance-based approach shows important advantages in terms of computational speed while on common test data the prediction performance is highly competitive with state-of-the-art methods for protein remote homology detection. Furthermore the learnt model can easily be analyzed in terms of discriminative features and in contrast to other methods our representation does not require any tuning of kernel hyperparameters. Availability: Normalized kernel matrices for the experimental setup can be downloaded at www.gobics.de/thomas. Matlab code for computing the kernel matrices is available upon request.
Received March 30, 2006
Revised June 20, 2006
Accepted July 5, 2006
Article
Remote homology detection based on oligomer distances
Thomas Lingner 1 *
and
Peter Meinicke 1
Thomas Lingner, E-mail: thomas{at}gobics.de
![]()
Abstract
Associate Editor: Christos Ouzounis
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
T. Damoulas and M. A. Girolami Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection Bioinformatics, May 15, 2008; 24(10): 1264 - 1270. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. R. Shah, C. S. Oehmen, and B.-J. Webb-Robertson SVM-HUSTLE--an iterative semi-supervised machine learning approach for pairwise protein remote homology detection Bioinformatics, March 15, 2008; 24(6): 783 - 790. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Hochreiter, M. Heusel, and K. Obermayer Fast model-based protein homology detection without alignment Bioinformatics, July 15, 2007; 23(14): 1728 - 1736. [Abstract] [Full Text] [PDF] |
||||
