Skip Navigation



Bioinformatics Advance Access published online on January 4, 2007

Bioinformatics, doi:10.1093/bioinformatics/btl639
This Article
Right arrow Advance Access manuscript (PDF) Freely available
Right arrow All Versions of this Article:
23/4/414    most recent
btl639v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Saeys, Y.
Right arrow Articles by Van de Peer, Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Saeys, Y.
Right arrow Articles by Van de Peer, Y.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author (2006). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
Received August 30, 2006
Revised November 24, 2006
Accepted December 14, 2006

Article

In search of the small ones: improved prediction of short exons in vertebrates, plants, fungi, and protists

Yvan Saeys 1 *, Pierre Rouzé 2, and Yves Van de Peer 1

1 Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology (VIB), Ghent University, Technologiepark 927, B-9052 Ghent, Belgium
2 Laboratoire Associé de l'INRA (France) Ghent University, Technologiepark 927, B-9052 Ghent, Belgium

* To whom correspondence should be addressed.
Yvan Saeys, E-mail: yvan.saeys{at}psb.ugent.be


   Abstract

Motivation: Prediction of the coding potential for stretches of DNA is crucial in gene calling and genome annotation, where it is used to identify potential exons and to position their boundaries in conjunction with functional sites such as splice sites and translation initiation sites. The ability to discriminate between coding and non-coding sequences relates to the structure of coding sequences, which are organised in codons, and by their biased usage. For statistical reasons, the longer the sequences, the easier it is to detect this codon bias. However, in many eukaryotic genomes, where genes harbour many introns, both introns and exons might be small and hard to distinguish based on coding potential.

Results: Here, we present novel approaches that specifically aim at a better detection of coding potential in short sequences. The methods use complementary sequence features, combined with identification of which features are relevant in discriminating between coding and non-coding sequences. These newly developed methods are evaluated on different species, representative of four major eukaryotic kingdoms, and extensively compared to state-of-the-art Markov models, which are often used for predicting coding potential. The main conclusions drawn from our analyses are that 1) combining complementary sequence features clearly outperforms current Markov models for coding potential prediction in short sequence fragments, 2) coding potential prediction benefits from length-specific models, and these models are not necessarily the same for different sequence lengths, and 3) comparing the results across several species indicates that, although our combined method consistently performs extremely well, there are important differences across genomes.

Supplementary data: http://bioinformatics.psb.ugent.be/


Associate Editor: Alfonso Valencia
Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Mol Biol EvolHome page
V. Krauss, C. Thummler, F. Georgi, J. Lehmann, P. F. Stadler, and C. Eisenhardt
Near Intron Positions Are Reliable Phylogenetic Markers: An Application to Holometabolous Insects
Mol. Biol. Evol., May 1, 2008; 25(5): 821 - 830.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Y. Saeys, I. Inza, and P. Larranaga
A review of feature selection techniques in bioinformatics
Bioinformatics, October 1, 2007; 23(19): 2507 - 2517.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.