Skip Navigation



Bioinformatics Advance Access published online on October 25, 2005

Bioinformatics, doi:10.1093/bioinformatics/bti701
This Article
Right arrow Advance Access manuscript (PDF) Freely available
Right arrow All Versions of this Article:
21/24/4322    most recent
bti701v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Nielsen, P.
Right arrow Articles by Krogh, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Nielsen, P.
Right arrow Articles by Krogh, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author (2005). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
Received July 4, 2005
Revised September 13, 2005
Accepted September 30, 2005

Article

Large scale prokaryotic gene prediction and comparison to genome annotation

Pernille Nielsen 1* and Anders Krogh 1

1 Bioinformatics Centre, Institute of Molecular Biology and Physiology, University of Copenhagen, Universitetsparken 15, 2100 Copenhagen, Denmark

* To whom correspondence should be addressed.
Pernille Nielsen, E-mail: pern{at}binf.ku.dk


   Abstract

Motivation: Prokaryotic genomes are sequenced and annotated at an increasing rate. The methods of annotation vary between sequencing groups. It makes genome comparison difficult and may lead to propagation of errors when questionable assignments are adapted from one genome to another. Genome comparison either on a large or small scale would be facilitated by using a single standard for annotation, which incorporates a transparency of why an open reading frame is considered to be a gene.

Results: 143 prokaryotic genomes were scored with an updated version of the prokaryotic genefinder EasyGene. Comparison of the GenBank and RefSeq annotations with the EasyGene predictions reveals that in some genomes up to about 60% of the genes may have been annotated with a wrong start codon, especially in the GC-rich genomes. The fractional difference between annotated and predicted confirms that too many short genes are annotated in numerous organisms. Furthermore, genes might be missing in the annotation of some of the genomes. We predict 41 of 143 genomes to be over-annotated by more than 5%, meaning that too many ORFs are annotated as genes. We also predict that 12 of 143 genomes are under-annotated. These results are based on the difference between the number of annotated genes not found by EasyGene and the number of predicted genes that are not annotated in GenBank.

We argue that the average performance of our standardised and fully automated method is slightly better than the annotation.

Availability: The EasyGene 1.2 predictions and statistics can be accessed at http://www.binf.ku.dk/cgi-bin/easygene/search.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Brief Funct Genomic ProteomicHome page
C. Ansong, S. O. Purvine, J. N. Adkins, M. S. Lipton, and R. D. Smith
Proteogenomics: needs and roles to be filled by proteomics in genome annotation
Brief Funct Genomic Proteomic, March 10, 2008; (2008) eln010v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
G.-Q. Hu, X. Zheng, Y.-F. Yang, P. Ortet, Z.-S. She, and H. Zhu
ProTISA: a comprehensive resource for translation initiation site annotation in prokaryotic genomes
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D114 - D119.
[Abstract] [Full Text] [PDF]


Home page
MicrobiologyHome page
M. Ibrahim, P. Nicolas, P. Bessieres, A. Bolotin, V. Monnet, and R. Gardan
A genome-wide survey of short coding sequences in streptococci
Microbiology, November 1, 2007; 153(11): 3631 - 3644.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. L. Delcher, K. A. Bratke, E. C. Powers, and S. L. Salzberg
Identifying bacterial genes and endosymbiont DNA with Glimmer
Bioinformatics, March 15, 2007; 23(6): 673 - 679.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
L. Krause, A. C. McHardy, T. W. Nattkemper, A. Puhler, J. Stoye, and F. Meyer
GISMO--gene identification using a support vector machine for ORF classification
Nucleic Acids Res., January 28, 2007; 35(2): 540 - 549.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
H. Noguchi, J. Park, and T. Takagi
MetaGene: prokaryotic gene finding from environmental genome shotgun sequences
Nucleic Acids Res., November 14, 2006; 34(19): 5623 - 5630.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
I. Holtsmark, D. Mantzilas, V. G. H. Eijsink, and M. B. Brurberg
Purification, Characterization, and Gene Sequence of Michiganin A, an Actagardine-Like Lantibiotic Produced by the Tomato Pathogen Clavibacter michiganensis subsp. michiganensis
Appl. Envir. Microbiol., September 1, 2006; 72(9): 5814 - 5821.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
R. Sopko and B. Andrews
Small open reading frames: Not so small anymore.
Genome Res., March 1, 2006; 16(3): 314 - 315.
[Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.