Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (122)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Andrade, M. A.
Right arrow Articles by Sander, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Andrade, M. A.
Right arrow Articles by Sander, C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics, Vol 15, 391-412, Copyright © 1999 by Oxford University Press


ARTICLES

Automated genome sequence analysis and annotation

MA Andrade, NP Brown, C Leroy, S Hoersch, A de Daruvar, C Reich, A Franchini, J Tamames, A Valencia, C Ouzounis and C Sander
European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK.

MOTIVATION: Large-scale genome projects generate a rapidly increasing number of sequences, most of them biochemically uncharacterized. Research in bioinformatics contributes to the development of methods for the computational characterization of these sequences. However, the installation and application of these methods require experience and are time consuming. RESULTS: We present here an automatic system for preliminary functional annotation of protein sequences that has been applied to the analysis of sets of sequences from complete genomes, both to refine overall performance and to make new discoveries comparable to those made by human experts. The GeneQuiz system includes a Web-based browser that allows examination of the evidence leading to an automatic annotation and offers additional information, views of the results, and links to biological databases that complement the automatic analysis. System structure and operating principles concerning the use of multiple sequence databases, underlying sequence analysis tools, lexical analyses of database annotations and decision criteria for functional assignments are detailed. The system makes automatic quality assessments of results based on prior experience with the underlying sequence analysis tools; overall error rates in functional assignment are estimated at 2.5-5% for cases annotated with highest reliability ('clear' cases). Sources of over-interpretation of results are discussed with proposals for improvement. A conservative definition for reporting 'new findings' that takes account of database maturity is presented along with examples of possible kinds of discoveries (new function, family and superfamily) made by the system. System performance in relation to sequence database coverage, database dynamics and database search methods is analysed, demonstrating the inherent advantages of an integrated automatic approach using multiple databases and search methods applied in an objective and repeatable manner. AVAILABILITY: The GeneQuiz system is publicly available for analysis of protein sequences through a Web server at http://www.sander.ebi.ac. uk/gqsrv/submit
Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
I. V. Tetko, I. V. Rodchenkov, M. C. Walter, T. Rattei, and H.-W. Mewes
Beyond the 'best' match: machine learning annotation of protein sequences by integration of different sources of information
Bioinformatics, March 1, 2008; 24(5): 621 - 628.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
T. Fuhrer, L. Chen, U. Sauer, and D. Vitkup
Computational Prediction and Experimental Verification of the Gene Encoding the NAD+/NADP+-Dependent Succinate Semialdehyde Dehydrogenase in Escherichia coli
J. Bacteriol., November 15, 2007; 189(22): 8073 - 8078.
[Abstract] [Full Text] [PDF]


Home page
Infect. Immun.Home page
Z. Ouyang and R. Isaacson
Identification and Characterization of a Novel ABC Iron Transport System, fit, in Escherichia coli
Infect. Immun., December 1, 2006; 74(12): 6949 - 6956.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
I. Friedberg
Automated protein function prediction--the genomic challenge
Brief Bioinform, September 1, 2006; 7(3): 225 - 242.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
K. Bryson, V. Loux, R. Bossy, P. Nicolas, S. Chaillou, M. van de Guchte, S. Penaud, E. Maguin, M. Hoebeke, P. Bessieres, et al.
AGMIAL: implementing an annotation strategy for prokaryote genomes as a distributed system
Nucleic Acids Res., July 19, 2006; 34(12): 3533 - 3545.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
L. Goldovsky, P. Janssen, D. Ahren, B. Audit, I. Cases, N. Darzentas, A. J. Enright, N. Lopez-Bigas, J. M. Peregrin-Alvarez, M. Smith, et al.
CoGenT++: an extensive and extensible data environment for computational genomics
Bioinformatics, October 1, 2005; 21(19): 3806 - 3810.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
G. H. Van Domselaar, P. Stothard, S. Shrivastava, J. A. Cruz, A. Guo, X. Dong, P. Lu, D. Szafron, R. Greiner, and D. S. Wishart
BASys: a web server for automated bacterial genome annotation
Nucleic Acids Res., July 1, 2005; 33(suppl_2): W455 - W459.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
F. Chalmel, A. Lardenois, J.D. Thompson, J. Muller, J.-A. Sahel, T. Leveillard, and O. Poch
GOAnno: GO annotation based on multiple alignment
Bioinformatics, May 1, 2005; 21(9): 2095 - 2096.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. Lu, D. Szafron, R. Greiner, D. S. Wishart, A. Fyshe, B. Pearcy, B. Poulin, R. Eisner, D. Ngo, and N. Lamb
PA-GOSUB: a searchable database of model organism protein sequences with their predicted Gene Ontology molecular function and subcellular localization
Nucleic Acids Res., January 1, 2005; 33(suppl_1): D147 - D153.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. Szafron, P. Lu, R. Greiner, D. S. Wishart, B. Poulin, R. Eisner, Z. Lu, J. Anvik, C. Macdonell, A. Fyshe, et al.
Proteome Analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations
Nucleic Acids Res., July 1, 2004; 32(suppl_2): W365 - W371.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
V. Kunin, J. B. Pereira-Leal, and C. A. Ouzounis
Functional Evolution of the Yeast Protein Interaction Network
Mol. Biol. Evol., July 1, 2004; 21(7): 1171 - 1176.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. D. D'Ascenzo, A. Collmer, and G. B. Martin
PeerGAD: a peer-review-based and community-centric web application for viewing and annotating prokaryotic genome sequences
Nucleic Acids Res., June 7, 2004; 32(10): 3124 - 3135.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
I. Yeh, T. Hanekamp, S. Tsoka, P. D. Karp, and R. B. Altman
Computational Analysis of Plasmodium falciparum Metabolism: Organizing Genomic Information to Facilitate Drug Discovery
Genome Res., May 1, 2004; 14(5): 917 - 924.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
A. Y. Lau and D. I. Chasman
Functional classification of proteins and protein variants
PNAS, April 27, 2004; 101(17): 6576 - 6581.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
S. E. Brown, A. T. Cao, E. R. Hines, R. J. Akhurst, and P. D. East
A Novel Secreted Protein Toxin from the Insect Pathogenic Bacterium Xenorhabdus nematophila
J. Biol. Chem., April 9, 2004; 279(15): 14595 - 14601.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
L. J. Jensen, D. W. Ussery, and S. Brunak
Functionality of System Components: Conservation of Protein Function in Protein Feature Space
Genome Res., November 1, 2003; 13(11): 2444 - 2449.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
P. K. Foreman, D. Brown, L. Dankmeyer, R. Dean, S. Diener, N. S. Dunn-Coleman, F. Goedegebuur, T. D. Houfek, G. J. England, A. S. Kelley, et al.
Transcriptional Regulation of Biomass-degrading Enzymes in the Filamentous Fungus Trichoderma reesei
J. Biol. Chem., August 22, 2003; 278(34): 31988 - 31997.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. J. Enright, V. Kunin, and C. A. Ouzounis
Protein families and TRIBES in genome sequence space
Nucleic Acids Res., August 1, 2003; 31(15): 4632 - 4638.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
F. Meyer, A. Goesmann, A. C. McHardy, D. Bartels, T. Bekel, J. Clausen, J. Kalinowski, B. Linke, O. Rupp, R. Giegerich, et al.
GenDB--an open source genome annotation system for prokaryote genomes
Nucleic Acids Res., April 15, 2003; 31(8): 2187 - 2195.
[Abstract] [Full Text] [PDF]


Home page
Infect. Immun.Home page
N. Ariel, A. Zvi, H. Grosfeld, O. Gat, Y. Inbar, B. Velan, S. Cohen, and A. Shafferman
Search for Potential Vaccine Candidate Open Reading Frames in the Bacillus anthracis Virulence Plasmid pXO1: In Silico and In Vitro Screening
Infect. Immun., December 1, 2002; 70(12): 6817 - 6827.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
R. Nair and B. Rost
Sequence conserved for subcellular localization
Protein Sci., December 1, 2002; 11(12): 2836 - 2847.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. Ayoubi, X. Jin, S. Leite, X. Liu, J. Martajaja, A. Abduraham, Q. Wan, W. Yan, E. Misawa, and R. A. Prade
PipeOnline 2.0: automated EST processing and functional data sorting
Nucleic Acids Res., November 1, 2002; 30(21): 4761 - 4769.
[Abstract] [Full Text] [PDF]


Home page
Biophys. JHome page
H. H. Gan, R. A. Perlow, S. Roy, J. Ko, M. Wu, J. Huang, S. Yan, A. Nicoletta, J. Vafai, D. Sun, et al.
Analysis of Protein Sequence/Structure Similarity Relationships
Biophys. J., November 1, 2002; 83(5): 2781 - 2791.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
I. Rigoutsos, T. Huynh, A. Floratos, L. Parida, and D. Platt
Dictionary-driven protein annotation
Nucleic Acids Res., September 1, 2002; 30(17): 3901 - 3916.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
T. Kawabata, S. Fukuchi, K. Homma, M. Ota, J. Araki, T. Ito, N. Ichiyoshi, and K. Nishikawa
GTOP: a database of protein structures predicted from genome sequences
Nucleic Acids Res., January 1, 2002; 30(1): 294 - 298.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. C. Mellor, I. Yanai, K. H. Clodfelter, J. Mintseris, and C. DeLisi
Predictome: a database of putative functional links between proteins
Nucleic Acids Res., January 1, 2002; 30(1): 306 - 309.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
L. Salwinski and D. Eisenberg
Motif-based fold assignment
Protein Sci., December 1, 2001; 10(12): 2460 - 2469.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. J. Janssen, B. Audit, and C. A. Ouzounis
Strain-specific genes of Helicobacter pylori: distribution, function and dynamics
Nucleic Acids Res., November 1, 2001; 29(21): 4395 - 4404.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. Overbeek, N. Larsen, G. D. Pusch, M. D'Souza, E. S. Jr, N. Kyrpides, M. Fonstein, N. Maltsev, and E. Selkov
WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction
Nucleic Acids Res., January 1, 2000; 28(1): 123 - 125.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.