Bioinformatics Vol. 15 no. 11 1999
Pages 887-899
© 1999 Oxford University Press
Evaluation of gene prediction software using a genomic data set: application to Arabidopsis thalianasequences
1 Laboratoire
associéde lINRA(France) ,
2 Department of Plant Genetics, Flanders
Interuniversity Institute of Biotechnology, Universiteit Gent, K.L.
Ledeganckstraat, 35, Belgium
Davuluri V. V. Ramana: On leave from Avesthagen Graine Technologies, Plant Genome Biology Laboratory, P.O. Box 5091, Cubbon Park GPO, Bangalore-560001, India. Present address: CSHL, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA.
Philippe Leroy: Present address: Station INRA dAmélioration des Plantes - Domaine de Crouelle 234 avenue du Brézet, 63039 Clermont-Ferrand cedex 2, France.
Pierre Rouz é
Motivation: The annotation of the Arabidopsis thalianagenome remains a problem in terms of time and quality. To improve the annotation process, we want to choose the most appropriate tools to use inside a computer-assisted annotation platform. We therefore need evaluation of prediction programs with Arabidopsis sequences containing multiple genes.
Results: We have developed AraSet, a data set of contigs of validated genes, enabling the evaluation of multi-gene models for the Arabidopsis genome. Besides conventional metrics to evaluate gene prediction at the site and the exon levels, new measures were introduced for the prediction at the protein sequence level as well as for the evaluation of gene models. This evaluation method is of general interest and could apply to any new gene prediction software and to any eukaryotic genome. The GeneMark.hmm program appears to be the most accurate software at all three levels for the Arabidopsis genomic sequences. Gene modeling could be further improved by combination of prediction software.
Availability: The AraSet sequence set, the Perl programs and complementary results and notes are available at http://sphinx.rug.ac.be:8080/biocomp/napav/.
Contact: Pierre.Rouze{at}gengenp.rug.ac.be
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. Lomsadze, V. Ter-Hovhannisyan, Y. O. Chernoff, and M. Borodovsky Gene identification in novel eukaryotic genomes by self-training algorithm Nucleic Acids Res., November 28, 2005; 33(20): 6494 - 6506. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Wu, M. A. Schoenbeck, B. T. Greenhagen, S. Takahashi, S. Lee, R. M. Coates, and J. Chappell Surrogate Splicing for Functional Analysis of Sesquiterpene Synthase Genes Plant Physiology, July 1, 2005; 138(3): 1322 - 1333. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Degroeve, Y. Saeys, B. De Baets, P. Rouze, and Y. Van de Peer SpliceMachine: predicting splice sites from high-dimensional local context representations Bioinformatics, April 15, 2005; 21(8): 1332 - 1338. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. S. Katari, V. Balija, R. K. Wilson, R. A. Martienssen, and W. R. McCombie Comparing low coverage random shotgun sequence data from Brassica oleracea and Oryza sativa genome sequence for their ability to add to the annotation of Arabidopsis thaliana Genome Res., April 1, 2005; 15(4): 496 - 504. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Aubourg, V. Brunaud, C. Bruyere, M. Cock, R. Cooke, A. Cottet, A. Couloux, P. Dehais, G. Deleage, A. Duclert, et al. GeneFarm, structural and functional annotation of Arabidopsis gene and protein families by a network of experts Nucleic Acids Res., January 1, 2005; 33(suppl_1): D641 - D646. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. D. Schlueter, Q. Dong, and V. Brendel GeneSeqer@PlantGDB: gene structure prediction in plant genomes Nucleic Acids Res., July 1, 2003; 31(13): 3597 - 3600. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Foissac, P. Bardou, A. Moisan, M.-J. Cros, and T. Schiex EUGENE'HOM: a generic similarity-based gene finder using multiple homologous sequences Nucleic Acids Res., July 1, 2003; 31(13): 3742 - 3745. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Zhu, S. D. Schlueter, and V. Brendel Refined Annotation of the Arabidopsis Genome by Complete Expressed Sequence Tag Mapping Plant Physiology, June 1, 2003; 132(2): 469 - 484. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. L. Crowe, C. Serizet, V. Thareau, S. Aubourg, P. Rouze, P. Hilson, J. Beynon, P. Weisbeek, P. van Hummelen, P. Reymond, et al. CATMA: a complete Arabidopsis GST database Nucleic Acids Res., January 1, 2003; 31(1): 156 - 158. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. T. Morishige, K. L. Childs, L. D. Moore, and J. E. Mullet Targeted Analysis of Orthologous Phytochrome A Regions of the Sorghum, Maize, and Rice Genomes using Comparative Gene-Island Sequencing Plant Physiology, December 1, 2002; 130(4): 1614 - 1625. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Vandepoele, Y. Saeys, C. Simillion, J. Raes, and Y. Van de Peer The Automatic Detection of Homologous Regions (ADHoRe) and Its Application to Microcolinearity Between Arabidopsis and Rice Genome Res., November 1, 2002; 12(11): 1792 - 1801. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Mathe, M.-F. Sagot, T. Schiex, and P. Rouze Current methods of gene prediction, their strengths and weaknesses Nucleic Acids Res., October 1, 2002; 30(19): 4103 - 4117. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. M. Ausubel Summaries of National Science Foundation-Sponsored Arabidopsis 2010 Projects and National Science Foundation-Sponsored Plant Genome Projects That Are Generating Arabidopsis Resources for the Community Plant Physiology, June 1, 2002; 129(2): 394 - 437. [Full Text] [PDF] |
||||
![]() |
S. Bensmihen, S. Rippa, G. Lambert, D. Jublot, V. Pautot, F. Granier, J. Giraudat, and F. Parcy The Homologous ABI5 and EEL Transcription Factors Function Antagonistically to Fine-Tune Gene Expression during Late Embryogenesis PLANT CELL, June 1, 2002; 14(6): 1391 - 1403. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Vandepoele, J. Raes, L. De Veylder, P. Rouze, S. Rombauts, and D. Inze Genome-Wide Analysis of Core Cell Cycle Genes in Arabidopsis PLANT CELL, April 1, 2002; 14(4): 903 - 916. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. C. MacIntosh, C. Wilkerson, and P. J. Green Identification and Analysis of Arabidopsis Expressed Sequence Tags Characteristic of Non-Coding RNAs Plant Physiology, November 1, 2001; 127(3): 765 - 776. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Ashburner A Biologist's View of the Drosophila Genome Annotation Assessment Project Genome Res., April 1, 2000; 10(4): 391 - 393. [Full Text] |
||||
![]() |
A. Louis, E. Ollivier, J.-C. Aude, and J.-L. Risler Massive Sequence Comparisons as a Help in Annotating Genomic Sequences Genome Res., July 1, 2001; 11(7): 1296 - 1303. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Boudet, S. Aubourg, C. Toffano-Nioche, M. Kreis, and A. Lecharny Evolution of Intron/Exon Structure of DEAD Helicase Family Genes in Arabidopsis, Caenorhabditis, and Drosophila Genome Res., December 1, 2001; 11(12): 2101 - 2114. [Abstract] [Full Text] [PDF] |
||||




