Bioinformatics, Vol 15, 391-412, Copyright © 1999 by Oxford University Press
MA Andrade, NP Brown, C Leroy, S Hoersch, A de Daruvar, C Reich, A Franchini, J Tamames, A Valencia, C Ouzounis and C Sander
MOTIVATION: Large-scale genome projects generate a rapidly increasing
number of sequences, most of them biochemically uncharacterized. Research
in bioinformatics contributes to the development of methods for the
computational characterization of these sequences. However, the
installation and application of these methods require experience and are
time consuming. RESULTS: We present here an automatic system for
preliminary functional annotation of protein sequences that has been
applied to the analysis of sets of sequences from complete genomes, both to
refine overall performance and to make new discoveries comparable to those
made by human experts. The GeneQuiz system includes a Web-based browser
that allows examination of the evidence leading to an automatic annotation
and offers additional information, views of the results, and links to
biological databases that complement the automatic analysis. System
structure and operating principles concerning the use of multiple sequence
databases, underlying sequence analysis tools, lexical analyses of database
annotations and decision criteria for functional assignments are detailed.
The system makes automatic quality assessments of results based on prior
experience with the underlying sequence analysis tools; overall error rates
in functional assignment are estimated at 2.5-5% for cases annotated with
highest reliability ('clear' cases). Sources of over-interpretation of
results are discussed with proposals for improvement. A conservative
definition for reporting 'new findings' that takes account of database
maturity is presented along with examples of possible kinds of discoveries
(new function, family and superfamily) made by the system. System
performance in relation to sequence database coverage, database dynamics
and database search methods is analysed, demonstrating the inherent
advantages of an integrated automatic approach using multiple databases and
search methods applied in an objective and repeatable manner. AVAILABILITY:
The GeneQuiz system is publicly available for analysis of protein sequences
through a Web server at http://www.sander.ebi.ac. uk/gqsrv/submit
ARTICLES
Automated genome sequence analysis and annotation
European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
I. V. Tetko, I. V. Rodchenkov, M. C. Walter, T. Rattei, and H.-W. Mewes Beyond the 'best' match: machine learning annotation of protein sequences by integration of different sources of information Bioinformatics, March 1, 2008; 24(5): 621 - 628. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Fuhrer, L. Chen, U. Sauer, and D. Vitkup Computational Prediction and Experimental Verification of the Gene Encoding the NAD+/NADP+-Dependent Succinate Semialdehyde Dehydrogenase in Escherichia coli J. Bacteriol., November 15, 2007; 189(22): 8073 - 8078. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Ouyang and R. Isaacson Identification and Characterization of a Novel ABC Iron Transport System, fit, in Escherichia coli Infect. Immun., December 1, 2006; 74(12): 6949 - 6956. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Friedberg Automated protein function prediction--the genomic challenge Brief Bioinform, September 1, 2006; 7(3): 225 - 242. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Bryson, V. Loux, R. Bossy, P. Nicolas, S. Chaillou, M. van de Guchte, S. Penaud, E. Maguin, M. Hoebeke, P. Bessieres, et al. AGMIAL: implementing an annotation strategy for prokaryote genomes as a distributed system Nucleic Acids Res., July 19, 2006; 34(12): 3533 - 3545. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Goldovsky, P. Janssen, D. Ahren, B. Audit, I. Cases, N. Darzentas, A. J. Enright, N. Lopez-Bigas, J. M. Peregrin-Alvarez, M. Smith, et al. CoGenT++: an extensive and extensible data environment for computational genomics Bioinformatics, October 1, 2005; 21(19): 3806 - 3810. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. H. Van Domselaar, P. Stothard, S. Shrivastava, J. A. Cruz, A. Guo, X. Dong, P. Lu, D. Szafron, R. Greiner, and D. S. Wishart BASys: a web server for automated bacterial genome annotation Nucleic Acids Res., July 1, 2005; 33(suppl_2): W455 - W459. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Chalmel, A. Lardenois, J.D. Thompson, J. Muller, J.-A. Sahel, T. Leveillard, and O. Poch GOAnno: GO annotation based on multiple alignment Bioinformatics, May 1, 2005; 21(9): 2095 - 2096. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Lu, D. Szafron, R. Greiner, D. S. Wishart, A. Fyshe, B. Pearcy, B. Poulin, R. Eisner, D. Ngo, and N. Lamb PA-GOSUB: a searchable database of model organism protein sequences with their predicted Gene Ontology molecular function and subcellular localization Nucleic Acids Res., January 1, 2005; 33(suppl_1): D147 - D153. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Szafron, P. Lu, R. Greiner, D. S. Wishart, B. Poulin, R. Eisner, Z. Lu, J. Anvik, C. Macdonell, A. Fyshe, et al. Proteome Analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations Nucleic Acids Res., July 1, 2004; 32(suppl_2): W365 - W371. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Kunin, J. B. Pereira-Leal, and C. A. Ouzounis Functional Evolution of the Yeast Protein Interaction Network Mol. Biol. Evol., July 1, 2004; 21(7): 1171 - 1176. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. D. D'Ascenzo, A. Collmer, and G. B. Martin PeerGAD: a peer-review-based and community-centric web application for viewing and annotating prokaryotic genome sequences Nucleic Acids Res., June 7, 2004; 32(10): 3124 - 3135. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Yeh, T. Hanekamp, S. Tsoka, P. D. Karp, and R. B. Altman Computational Analysis of Plasmodium falciparum Metabolism: Organizing Genomic Information to Facilitate Drug Discovery Genome Res., May 1, 2004; 14(5): 917 - 924. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Y. Lau and D. I. Chasman Functional classification of proteins and protein variants PNAS, April 27, 2004; 101(17): 6576 - 6581. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. E. Brown, A. T. Cao, E. R. Hines, R. J. Akhurst, and P. D. East A Novel Secreted Protein Toxin from the Insect Pathogenic Bacterium Xenorhabdus nematophila J. Biol. Chem., April 9, 2004; 279(15): 14595 - 14601. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. J. Jensen, D. W. Ussery, and S. Brunak Functionality of System Components: Conservation of Protein Function in Protein Feature Space Genome Res., November 1, 2003; 13(11): 2444 - 2449. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. K. Foreman, D. Brown, L. Dankmeyer, R. Dean, S. Diener, N. S. Dunn-Coleman, F. Goedegebuur, T. D. Houfek, G. J. England, A. S. Kelley, et al. Transcriptional Regulation of Biomass-degrading Enzymes in the Filamentous Fungus Trichoderma reesei J. Biol. Chem., August 22, 2003; 278(34): 31988 - 31997. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Enright, V. Kunin, and C. A. Ouzounis Protein families and TRIBES in genome sequence space Nucleic Acids Res., August 1, 2003; 31(15): 4632 - 4638. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Meyer, A. Goesmann, A. C. McHardy, D. Bartels, T. Bekel, J. Clausen, J. Kalinowski, B. Linke, O. Rupp, R. Giegerich, et al. GenDB--an open source genome annotation system for prokaryote genomes Nucleic Acids Res., April 15, 2003; 31(8): 2187 - 2195. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Ariel, A. Zvi, H. Grosfeld, O. Gat, Y. Inbar, B. Velan, S. Cohen, and A. Shafferman Search for Potential Vaccine Candidate Open Reading Frames in the Bacillus anthracis Virulence Plasmid pXO1: In Silico and In Vitro Screening Infect. Immun., December 1, 2002; 70(12): 6817 - 6827. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Nair and B. Rost Sequence conserved for subcellular localization Protein Sci., December 1, 2002; 11(12): 2836 - 2847. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Ayoubi, X. Jin, S. Leite, X. Liu, J. Martajaja, A. Abduraham, Q. Wan, W. Yan, E. Misawa, and R. A. Prade PipeOnline 2.0: automated EST processing and functional data sorting Nucleic Acids Res., November 1, 2002; 30(21): 4761 - 4769. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. H. Gan, R. A. Perlow, S. Roy, J. Ko, M. Wu, J. Huang, S. Yan, A. Nicoletta, J. Vafai, D. Sun, et al. Analysis of Protein Sequence/Structure Similarity Relationships Biophys. J., November 1, 2002; 83(5): 2781 - 2791. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Rigoutsos, T. Huynh, A. Floratos, L. Parida, and D. Platt Dictionary-driven protein annotation Nucleic Acids Res., September 1, 2002; 30(17): 3901 - 3916. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Kawabata, S. Fukuchi, K. Homma, M. Ota, J. Araki, T. Ito, N. Ichiyoshi, and K. Nishikawa GTOP: a database of protein structures predicted from genome sequences Nucleic Acids Res., January 1, 2002; 30(1): 294 - 298. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Mellor, I. Yanai, K. H. Clodfelter, J. Mintseris, and C. DeLisi Predictome: a database of putative functional links between proteins Nucleic Acids Res., January 1, 2002; 30(1): 306 - 309. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Salwinski and D. Eisenberg Motif-based fold assignment Protein Sci., December 1, 2001; 10(12): 2460 - 2469. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. J. Janssen, B. Audit, and C. A. Ouzounis Strain-specific genes of Helicobacter pylori: distribution, function and dynamics Nucleic Acids Res., November 1, 2001; 29(21): 4395 - 4404. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Overbeek, N. Larsen, G. D. Pusch, M. D'Souza, E. S. Jr, N. Kyrpides, M. Fonstein, N. Maltsev, and E. Selkov WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction Nucleic Acids Res., January 1, 2000; 28(1): 123 - 125. [Abstract] [Full Text] [PDF] |
||||










