Skip Navigation

Bioinformatics 2008 24(13):i6-i14; doi:10.1093/bioinformatics/btn170
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Sonnenburg, S.
Right arrow Articles by Rätsch, G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sonnenburg, S.
Right arrow Articles by Rätsch, G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

POIMs: positional oligomer importance matrices—understanding support vector machine-based signal detectors

Sören Sonnenburg 1,{dagger},*, Alexander Zien 1,2,3,{dagger}, Petra Philips 2 and Gunnar Rätsch 2

1Fraunhofer Institute FIRST, Department IDA, Kekulèstr. 7, 12489 Berlin, 2Friedrich Miescher Laboratory, Max Planck Society, Spemannstr. 39 and 3Max Planck Institute for Biological Cybernetics, Spemannstr. 38, 72076 Tübingen, Germany

*To whom correspondence should be addressed.


   Abstract

Motivation: At the heart of many important bioinformatics problems, such as gene finding and function prediction, is the classification of biological sequences. Frequently the most accurate classifiers are obtained by training support vector machines (SVMs) with complex sequence kernels. However, a cumbersome shortcoming of SVMs is that their learned decision rules are very hard to understand for humans and cannot easily be related to biological facts.

Results: To make SVM-based sequence classifiers more accessible and profitable, we introduce the concept of positional oligomer importance matrices (POIMs) and propose an efficient algorithm for their computation. In contrast to the raw SVM feature weighting, POIMs take the underlying correlation structure of k-mer features induced by overlaps of related k-mers into account. POIMs can be seen as a powerful generalization of sequence logos: they allow to capture and visualize sequence patterns that are relevant for the investigated biological phenomena.

Availability: All source code, datasets, tables and figures are available at http://www.fml.tuebingen.mpg.de/raetsch/projects/POIM.

Contact: Soeren.Sonnenburg{at}first.fraunhofer.de

Supplementary information: Supplementary data are available at Bioinformatics online.

{dagger}The authors wish to be known that, in their opinion, the first two authors should be regarded as joint First Authors.



Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Genome ResHome page
M. Megraw, F. Pereira, S. T. Jensen, U. Ohler, and A. G. Hatzigeorgiou
A transcription factor affinity-based code for mammalian transcription initiation
Genome Res., April 1, 2009; 19(4): 644 - 656.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.