Bioinformatics Advance Access originally published online on February 5, 2004
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics 20(7) © Oxford University Press 2004; all rights reserved.
Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus


1 Department of Genetics, Development and Cell Biology and 2 Department of Statistics, Iowa State University, 2112 Molecular Biology Building, Ames, Iowa 500113260, USA
Received on July 14, 2003; revised on December 12, 2003; accepted on December 13, 2003
Advance Access Publication February 5, 2004
Motivation: Accurate gene structure annotation is a challenging computational problem in genomics. The best results are achieved with spliced alignment of full-length cDNAs or multiple expressed sequence tags (ESTs) with sufficient overlap to cover the entire gene. For most species, cDNA and EST collections are far from comprehensive. We sought to overcome this bottleneck by exploring the possibility of using combined EST resources from fairly diverged species that still share a common gene space. Previous spliced alignment tools were found inadequate for this task because they rely on very high sequence similarity between the ESTs and the genomic DNA.
Results: We have developed a computer program, GeneSeqer, which is capable of aligning thousands of ESTs with a long genomic sequence in a reasonable amount of time. The algorithm is uniquely designed to tolerate a high percentage of mismatches and insertions or deletions in the EST relative to the genomic template. This feature allows use of non-cognate ESTs for gene structure prediction, including ESTs derived from duplicated genes and homologous genes from related species. The increased gene prediction sensitivity results in part from novel splice site prediction models that are also available as a stand-alone splice site prediction tool. We assessed GeneSeqer performance relative to a standard Arabidopsis thaliana gene set and demonstrate its utility for plant genome annotation. In particular, we propose that this method provides a timely tool for the annotation of the rice genome, using abundant ESTs from other cereals and plants.
Availability: The source code is available for download at http://bioinformatics.iastate.edu/bioinformatics2go/gs/download.html. Web servers for Arabidopsis and other plant species are accessible at http://www.plantgdb.org/cgi-bin/AtGeneSeqer.cgi and http://www.plantgdb.org/cgi-bin/GeneSeqer.cgi, respectively. For non-plant species, use http://bioinformatics.iastate.edu/cgi-bin/gs.cgi. The splice site prediction tool (SplicePredictor) is distributed with the GeneSeqer code. A SplicePredictor web server is available at http://bioinformatics.iastate.edu/cgi-bin/sp.cgi
Supplementary information: http://www.plantgdb.org/AtGDB/prj/BXZ03B
Contact: vbrendel{at}iastate.edu
* To whom correspondence should be addressed.
Current address: BASF Plant Science NC, 26 Davis Drive, Research Triangle Park, NC 27709-3528, USA.
Current address: NewLink Genetics, 2901 S. Loop Dr, Ames, IA 50010, USA.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
K. Iida, K. Fukami-Kobayashi, A. Toyoda, Y. Sakaki, M. Kobayashi, M. Seki, and K. Shinozaki Analysis of Multiple Occurrences of Alternative Splicing Events in Arabidopsis thaliana Using Novel Sequenced Full-Length cDNAs DNA Res, June 1, 2009; 16(3): 155 - 164. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Iida, M. Shionyu, and Y. Suso Alternative Splicing at NAGNAG Acceptor Sites Shares Common Properties in Land Plants and Mammals Mol. Biol. Evol., April 1, 2008; 25(4): 709 - 718. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Duvick, A. Fu, U. Muppirala, M. Sabharwal, M. D. Wilkerson, C. J. Lawrence, C. Lushbough, and V. Brendel PlantGDB: a resource for comparative plant genomics Nucleic Acids Res., January 11, 2008; 36(suppl_1): D959 - D965. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Zhu and C. R. Buell Improvement of whole-genome annotation of cereals through comparative analyses Genome Res., March 1, 2007; 17(3): 299 - 310. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. D'Agostino, M. Aversano, L. Frusciante, and M. L. Chiusano TomatEST database: in silico exploitation of EST data to explore expression patterns in tomato species Nucleic Acids Res., January 12, 2007; 35(suppl_1): D901 - D905. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Fu, T.-J. Wen, Y. I. Ronin, H. D. Chen, L. Guo, D. I. Mester, Y. Yang, M. Lee, A. B. Korol, D. A. Ashlock, et al. Genetic Dissection of Intermated Recombinant Inbred Lines Using a New Genetic Map of Maize Genetics, November 1, 2006; 174(3): 1671 - 1683. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. J. Hsieh, C. Y. Lin, N. H. Liu, W. Y. Chow, and C. Y. Tang GeneAlign: a coding exon prediction tool based on phylogenetical comparisons. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W280 - W284. [Abstract] [Full Text] [PDF] |
||||
![]() |
B.-B. Wang and V. Brendel Genomewide comparative analysis of alternative splicing in plants PNAS, May 2, 2006; 103(18): 7175 - 7180. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Iida and M. Go Survey of Conserved Alternative Splicing Events of mRNAs Encoding SR Proteins in Land Plants Mol. Biol. Evol., May 1, 2006; 23(5): 1085 - 1094. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. Dong, C. J. Lawrence, S. D. Schlueter, M. D. Wilkerson, S. Kurtz, C. Lushbough, and V. Brendel Comparative Plant Genomics Resources at PlantGDB Plant Physiology, October 1, 2005; 139(2): 610 - 618. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Pan, L. Stein, and V. Brendel SynBrowse: a synteny browser for comparative sequence analysis Bioinformatics, September 1, 2005; 21(17): 3461 - 3468. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. J. Lawrence, T. E. Seigfried, and V. Brendel The Maize Genetics and Genomics Database. The Community Resource for Access to Diverse Maize Data Plant Physiology, May 1, 2005; 138(1): 55 - 58. [Abstract] [Full Text] [PDF] |
||||







