Skip Navigation


Bioinformatics Advance Access originally published online on December 7, 2004
Bioinformatics 2005 21(8):1383-1388; doi:10.1093/bioinformatics/bti200
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/8/1383    most recent
bti200v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (7)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Friedel, C. C.
Right arrow Articles by Tetko, I. V.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Friedel, C. C.
Right arrow Articles by Tetko, I. V.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2004. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

Support vector machines for separation of mixed plant–pathogen EST collections based on codon usage

Caroline C. Friedel 1,2,{dagger}, Katharina H. V. Jahn 1,2,{dagger}, Selina Sommer 1,2,{dagger}, Stephen Rudd 3, Hans W. Mewes 4,5 and Igor V. Tetko 4,*

1Institut fuer Informatik, Ludwig-Maximilians-Universitaet Muenchen Oettingenstrasse 67, 80538 Muenchen, Germany
2Fakultaet fuer Informatik, Technische Universitaet Muenchen Boltzmannstrasse 3, 85748 Garching b. Muenchen, Germany
3Bioinformatics group, Turku Centre for Biotechnology Finland
4Institute for Bioinformatics GSF—Forschungszentrum fuer Umwelt und Gesundheit, GmbH Ingolstaedter Landstrasse 1, 85764 Neuherberg, Germany
5Department of Genome-Oriented Bioinformatics, Wissenschaftszentrum Weihenstephan, Technische Universitaet Muenchen 85350 Freising, Germany

*To whom correspondence should be addressed.

Motivation: Discovery of host and pathogen genes expressed at the plant–pathogen interface often requires the construction of mixed libraries that contain sequences from both genomes. Sequence identification requires high-throughput and reliable classification of genome origin. When using single-pass cDNA sequences difficulties arise from the short sequence length, the lack of sufficient taxonomically relevant sequence data in public databases and ambiguous sequence homology between plant and pathogen genes.

Results: A novel method is described, which is independent of the availability of homologous genes and relies on subtle differences in codon usage between plant and fungal genes. We used support vector machines (SVMs) to identify the probable origin of sequences. SVMs were compared to several other machine learning techniques and to a probabilistic algorithm (PF-IND) for expressed sequence tag (EST) classification also based on codon bias differences. Our software (ECLAT) has achieved a classification accuracy of 93.1% on a test set of 3217 EST sequences from Hordeum vulgare and Blumeria graminis, which is a significant improvement compared to PF-IND (prediction accuracy of 81.2% on the same test set). EST sequences with at least 50 nt of coding sequence can be classified using ECLAT with high confidence. ECLAT allows training of classifiers for any host–pathogen combination for which there are sufficient classified training sequences.

Availability: ECLAT is freely available on the Internet (http://mips.gsf.de/proj/est) or on request as a standalone version.

Contact: friedel{at}informatik.uni-muenchen.de


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
J. E. Gewehr, M. Szugat, and R. Zimmer
BioWeka extending the Weka framework for bioinformatics
Bioinformatics, March 1, 2007; 23(5): 651 - 653.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Rudd and I. V. Tetko
Eclair--a web service for unravelling species origin of sequences sampled from mixed host interfaces
Nucleic Acids Res., July 1, 2005; 33(suppl_2): W724 - W727.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.