Bioinformatics Advance Access published online on January 24, 2008
Bioinformatics, doi:10.1093/bioinformatics/btn013
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Using native and syntenically mapped cDNA alignments to improve de novo gene finding
aCenter for Biomolecular Science and Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA
*To whom correspondence should be addressed. Dr. Mario Stanke, E-mail: mstanke{at}gwdg.de, mario.stanke{at}gmail.com
| Abstract |
|---|
Motivation: Computational annotation of protein coding genes in genomic DNA is a widely used and essential tool for analyzing newly sequenced genomes. However, current methods suffer from inaccuracy and do poorly with certain types of genes. Including additional sources of evidence of the existence and structure of genes can improve the quality of gene predictions. For many eukaryotic genomes, Expressed Sequence Tags (ESTs) are available as evidence for genes. Related genomes that have been sequenced, annotated, and aligned to the target genome provide evidence of existence and structure of genes.
Results: We incorporate several different evidence sources into the gene finder AUGUSTUS. The sources of evidence are gene and transcript annotations from related species syntenically mapped to the target genome using TRANSMAP, evolutionary conservation of DNA, mRNA and ESTs of the target species, and retroposed genes. The predictions include alternative splice variants where evidence supports it. Using only ESTs we were able to correctly predict at least one splice form exactly correct in 57% of human genes. Also using evidence from other species and human mRNAs, this number rises to 77%. Syntenic mapping is well suited to annotate genomes closely related to genomes that are already annotated or for which extensive transcript evidence is available. Native cDNA evidence is most helpful when the alignments are used as compound information rather than independent positionwise information.
Availability: AUGUSTUS is open source and available at http://augustus.gobics.de. The gene predictions for human can be browsed and downloaded at the UCSC Genome Browser (http://genome.ucsc.edu).
Contact: mstanke{at}gwdg.de
Supplementary Information: Supplementary data are available at Bioinformatics online.
Associate Editor: Prof. Alfonso Valencia
Received on October 9, 2007; revised on December 12, 2007; accepted on January 7, 2008
This article has been cited by other articles:
![]() |
L. A. Mueller, R. K. Lankhorst, S. D. Tanksley, J. J. Giovannoni, R. White, J. Vrebalov, Z. Fei, J. van Eck, R. Buels, A. A. Mills, et al. A Snapshot of the Emerging Tomato Genome Sequence The Plant Genome, March 1, 2009; 2(1): 78 - 92. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Ter-Hovhannisyan, A. Lomsadze, Y. O. Chernoff, and M. Borodovsky Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training Genome Res., December 1, 2008; 18(12): 1979 - 1990. [Abstract] [Full Text] [PDF] |
||||

