Bioinformatics Advance Access originally published online on October 25, 2005
Bioinformatics 2005 21(24):4322-4329; doi:10.1093/bioinformatics/bti701
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Large-scale prokaryotic gene prediction and comparison to genome annotation
Bioinformatics Centre, Institute of Molecular Biology and Physiology, University of Copenhagen Universitetsparken 15, 2100 Copenhagen, Denmark
*To whom correspondence should be addressed.
Motivation: Prokaryotic genomes are sequenced and annotated at an increasing rate. The methods of annotation vary between sequencing groups. It makes genome comparison difficult and may lead to propagation of errors when questionable assignments are adapted from one genome to another. Genome comparison either on a large or small scale would be facilitated by using a single standard for annotation, which incorporates a transparency of why an open reading frame (ORF) is considered to be a gene.
Results: A total of 143 prokaryotic genomes were scored with an updated version of the prokaryotic genefinder EasyGene. Comparison of the GenBank and RefSeq annotations with the EasyGene predictions reveals that in some genomes up to
60% of the genes may have been annotated with a wrong start codon, especially in the GC-rich genomes. The fractional difference between annotated and predicted confirms that too many short genes are annotated in numerous organisms. Furthermore, genes might be missing in the annotation of some of the genomes. We predict 41 of 143 genomes to be over-annotated by >5%, meaning that too many ORFs are annotated as genes. We also predict that 12 of 143 genomes are under-annotated. These results are based on the difference between the number of annotated genes not found by EasyGene and the number of predicted genes that are not annotated in GenBank.
We argue that the average performance of our standardized and fully automated method is slightly better than the annotation.
Availability: The EasyGene 1.2 predictions and statistics can be accessed at http://www.binf.ku.dk/cgi-bin/easygene/search
Contact: pern{at}binf.ku.dk
Received on July 4, 2005; revised on September 13, 2005; accepted on September 30, 2005
This article has been cited by other articles:
![]() |
C. Ansong, S. O. Purvine, J. N. Adkins, M. S. Lipton, and R. D. Smith Proteogenomics: needs and roles to be filled by proteomics in genome annotation Brief Funct Genomic Proteomic, March 10, 2008; (2008) eln010v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
G.-Q. Hu, X. Zheng, Y.-F. Yang, P. Ortet, Z.-S. She, and H. Zhu ProTISA: a comprehensive resource for translation initiation site annotation in prokaryotic genomes Nucleic Acids Res., January 11, 2008; 36(suppl_1): D114 - D119. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Ibrahim, P. Nicolas, P. Bessieres, A. Bolotin, V. Monnet, and R. Gardan A genome-wide survey of short coding sequences in streptococci Microbiology, November 1, 2007; 153(11): 3631 - 3644. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. L. Delcher, K. A. Bratke, E. C. Powers, and S. L. Salzberg Identifying bacterial genes and endosymbiont DNA with Glimmer Bioinformatics, March 15, 2007; 23(6): 673 - 679. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Krause, A. C. McHardy, T. W. Nattkemper, A. Puhler, J. Stoye, and F. Meyer GISMO--gene identification using a support vector machine for ORF classification Nucleic Acids Res., January 28, 2007; 35(2): 540 - 549. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Noguchi, J. Park, and T. Takagi MetaGene: prokaryotic gene finding from environmental genome shotgun sequences Nucleic Acids Res., November 14, 2006; 34(19): 5623 - 5630. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Holtsmark, D. Mantzilas, V. G. H. Eijsink, and M. B. Brurberg Purification, Characterization, and Gene Sequence of Michiganin A, an Actagardine-Like Lantibiotic Produced by the Tomato Pathogen Clavibacter michiganensis subsp. michiganensis Appl. Envir. Microbiol., September 1, 2006; 72(9): 5814 - 5821. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Sopko and B. Andrews Small open reading frames: Not so small anymore. Genome Res., March 1, 2006; 16(3): 314 - 315. [Full Text] [PDF] |
||||





