Skip Navigation


Bioinformatics Advance Access originally published online on January 12, 2005
Bioinformatics 2005 21(9):2095-2096; doi:10.1093/bioinformatics/bti252
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/9/2095    most recent
bti252v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (10)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Chalmel, F.
Right arrow Articles by Poch, O.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Chalmel, F.
Right arrow Articles by Poch, O.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

GOAnno: GO annotation based on multiple alignment

F. Chalmel 1,*, A. Lardenois 1, J.D. Thompson 1, J. Muller 1, J.-A. Sahel 2, T. Léveillard 2 and O. Poch 1

1Laboratoire de Biologie et Génomique Structurales, Institut de Génétique et de Biologie Moléculaire et Cellulaire, CNRS/INSERM/ULP BP 163, 67404 Illkirch cedex, France
2Laboratoire de Physiopathologie Cellulaire et Moléculaire de la Rétine, Inserm U592, Université Pierre et Marie Curie 75571 Paris, France

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 INTRODUCTION
 PROGRAM OVERVIEW
 REFERENCES
 

Summary: GOAnno is a web tool that automatically annotates proteins according to the Gene Ontology (GO) using evolutionary information available in hierarchized multiple alignments. GO terms present in the aligned functional subfamily can be cross-validated and propagated to obtain highly reliable predicted GO annotation based on the GOAnno algorithm.

Availability: The web tool and a reduced version for local installation are freely available at http://igbmc.u-strasbg.fr/GOAnno/GOAnno.html

Contact: chalmel{at}igbmc.u-strasbg.fr

Supplementary information: The website supplies a detailed explanation and illustration of the algorithm at http://igbmc.u-strasbg.fr/GOAnno/GOAnnoHelp.html


    INTRODUCTION
 TOP
 Abstract
 INTRODUCTION
 PROGRAM OVERVIEW
 REFERENCES
 
Recent efforts in high-throughput sequencing have given rise to a rapid increase in the amount of sequences available in the public databases. Since GeneQuiz (Andrade et al., 1999) that automatically annotated protein function, the systematic annotation of this data is now typically based on the Gene Ontology (GO) (Gene Ontology Consortium, 2000), a hierarchical and standardized vocabulary developed by the GO Consortium (www.geneontology.org). Several tools employ sequence similarities by best BLAST (Altschul et al., 1997) hits selection (Hennig et al., 2003; Khan et al., 2003; Zehetner et al., 2003) or a predefined subset of GO terms (Jensen et al., 2003).

GOAnno is a web tool for automated protein GO annotation. In contrast to the above methods, GOAnno takes advantage of the evolutionary information available in Multiple Alignments of Complete Sequences (MACS) (Lecompte et al., 2001) organized hierarchically into functional subfamilies. The members within subfamilies are conserved enough to filter, enrich and propagate GO terms using the GOAnno algorithm. Another originality is the absence of any predefined parameters such as GO level or subsets of GO terms. The tool uses a query protein sequence as input and proposes detailed GO annotations in an interactive HTML file as ouput.


    PROGRAM OVERVIEW
 TOP
 Abstract
 INTRODUCTION
 PROGRAM OVERVIEW
 REFERENCES
 
The GOAnno algorithm is explained and illustrated in detail in Supplementary information. GOAnno incorporates a five-step process:

  1. The query protein functional subfamily determination step incorporates the strategy used in PipeAlign (Plewniak et al., 2003), a toolkit for protein family analysis using a query sequence to perform a protein database sequence search and resulting in a hierarchized MACS of protein homologs clustered into potential functional subfamilies (http://igbmc.u-strasbg.fr/PipeAlign).
    The next four steps are independently applied for each of the three GO categories: cellular component, molecular function and biological process. At the end of each step, the redundant and parent GO terms are systematically removed.
  2. An Initial Protein gene Ontology (IPO) is constructed for each query subfamily member from the GO annotation associated with the protein in the sequence databases when available and extracted from the conversion tables available from the GO Consortium (InterPro, Pfam, Prints, PRODOM, Prosite, SMART protein motifs, Enzyme Commission numbers and Swiss-Prot keywords to GO terms).
  3. The construction of the MACS permits the identification of the Proximal Proteins (proteins sharing at least 98% identity with the input protein). All the IPO of these proximal proteins are concatenated to form the Proximal Protein gene Ontology (PPO).
  4. The quality of the query subfamily alignment is assessed using the objective scoring function norMD (Thompson et al., 2001). NorMD > 0.3 implies a high-quality and allows the propagation of GO terms within the subfamily according to the following criteria. Briefly, all IPO of the proteins are collected to build the corresponding GO tree. For each IPO, all paths to the root are decomposed into linear branches. Then, a score based on the number of the protein is calculated for each node and each branch. Afterward, highly specialized nodes and branches associated with rare nodes are eliminated based on two cut-off values p and f respectively. GO terms which pass these selections define the Mean Subfamily gene Ontology (MSO).
  5. The previously determined IPO of the query, PPO and MSO are collected to define the final GPO (Global Protein gene Ontology) that is finally assigned to the query.

In the context of the study of mechanisms leading to retinal degeneration, GOAnno was used on microarray experiments to analyze 1046 UniProt (Apweiler et al., 2004) proteins (Chalmel,F., Poch,O., Lavedan,C., Ripp,R., Wicker,N., Dolomeyer,A., Clérin,E., Mohand-Saïd,S., Lambrou,G., Sahel,J.-A. and Léveillard,T., in preparation). Of these 1046 proteins, 698 had an IPO, corresponding to a total of 2285 GO terms. Using the GOAnno algorithm, GPO were assigned to 191 supplementary proteins (27.4%), corresponding to 1520 new associated GO terms (66.5%).

The interface of GOAnno is designed to accept a single protein sequence as input. The user has the opportunity to modify the GOAnno parameters (e.g. f and p). The program proposes as output a downloadable XML file and an interactive HTML page containing a detailed table describing the IPO, PPO, MSO and GPO steps, where each GO term is linked to the AmiGO entry (http://www.godatabase.org/cgi-bin/amigo/go.cgi) and each protein accession number to the corresponding UniProt entry.

A light version of GOAnno excluding the first step is also available for local use. In this case, the homologies in terms of subfamily and proximal proteins of the query entry must be previously determined. The program allows automatic batch processing of a gene list, which is of particular interest in interpreting high-throughput experiments such as microarray transcription profiling.

GOAnno provides an efficient way to assign a potential GO to an unknown sequence and to increase an existing GO annotation. It can also be used for in-depth comparisons of functionality relative to a subfamily. GOAnno is designed to help biologists by automatically providing reliable protein functional information combined with an intuitive user interface that can be operated without any previous experience in judging the quality of predicted GO annotation.


    Acknowledgments
 
The authors thank G. Berthommier, L. Bianchetti, F. Plewniak, W. Raffelsberger and R. Ripp for stimulating discussions. This work was funded by the INSERM, the CNRS, the ULP de Strasbourg, the FNS (GENOPOLE), the SPINE (E.C. contract number QLG2-CT-2002–00988) and the RETNET (E.C. contract number MRTN-CT-2003–504003) projects.

Received on October 25, 2004; revised on December 22, 2004; accepted on December 23, 2004

    REFERENCES
 TOP
 Abstract
 INTRODUCTION
 PROGRAM OVERVIEW
 REFERENCES
 

    Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 25, 3389–3402[Abstract/Free Full Text].

    Andrade, M.A., Brown, N.P., Leroy, C., Hoersch, S., de Daruvar, A., Reich, C., Franchini, A., Tamames, J., Valencia, A., Ouzounis, C., Sander, C. (1999) Automated genome sequence analysis and annotation. Bioinformatics, 15, 391–412[Abstract/Free Full Text].

    Apweiler, R., Bairoch, A., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., et al. (2004) UniProt: the Universal Protein knowledgebase. Nucleic Acids Res., 32, D115–D119[Abstract/Free Full Text].

    The Gene Ontology Consortium. (2000) Gene Ontology: tool for the unification of biology. Nat. Genet, 25, 25–29[CrossRef][ISI][Medline].

    Hennig, S., Groth, D., Lehrach, H. (2003) Automated Gene Ontology annotation for anonymous sequence data. Nucleic Acids Res, 31, 3712–3715[Abstract/Free Full Text].

    Jensen, L.J., Gupta, R., Staerfeldt, H.H., Brunak, S. (2003) Prediction of human protein function according to Gene Ontology categories. Bioinformatics, 19, 635–642[Abstract/Free Full Text].

    Khan, S., Situ, G., Decker, K., Schmidt, C.J. (2003) GoFigure: automated Gene Ontology annotation. Bioinformatics, 19, 2484–2485[Abstract/Free Full Text].

    Lecompte, O., Thompson, J.D., Plewniak, F., Thierry, J., Poch, O. (2001) Multiple alignment of complete sequences (MACS) in the post-genomic era. Gene, 270, 17–30[CrossRef][ISI][Medline].

    Plewniak, F., Bianchetti, L., Brelivet, Y., Carles, A., Chalmel, F., Lecompte, O., Mochel, T., Moulinier, L., Muller, A., Muller, J., et al. (2003) PipeAlign: A new toolkit for protein family analysis. Nucleic Acids Res, 31, 3829–3832[Abstract/Free Full Text].

    Thompson, J.D., Plewniak, F., Ripp, R., Thierry, J.C., Poch, O. (2001) Towards a reliable objective function for multiple sequence alignments. J. Mol. Biol, 314, 937–951[CrossRef][ISI][Medline].

    Zehetner, G. (2003) OntoBlast function: from sequence similarities directly to potential functional annotations by ontology terms. Nucleic Acids Res, 31, 3799–3803[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
S. Gotz, J. M. Garcia-Gomez, J. Terol, T. D. Williams, S. H. Nagaraj, M. J. Nueda, M. Robles, M. Talon, J. Dopazo, and A. Conesa
High-throughput functional annotation and data mining with the Blast2GO suite
Nucleic Acids Res., June 1, 2008; 36(10): 3420 - 3435.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
G. Abou-Sleymane, F. Chalmel, D. Helmlinger, A. Lardenois, C. Thibault, C. Weber, K. Merienne, J.-L. Mandel, O. Poch, D. Devys, et al.
Polyglutamine expansion causes neurodegeneration by altering the neuronal differentiation program
Hum. Mol. Genet., March 1, 2006; 15(5): 691 - 703.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. ProteomicsHome page
E. Hodges, J. S. Redelius, W. Wu, and C. Hoog
Accelerated Discovery of Novel Protein Function in Cultured Human Cells
Mol. Cell. Proteomics, September 1, 2005; 4(9): 1319 - 1327.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. D. Thompson, S. R. Holbrook, K. Katoh, P. Koehl, D. Moras, E. Westhof, and O. Poch
MAO: a Multiple Alignment Ontology for nucleic acid and protein sequences
Nucleic Acids Res., July 25, 2005; 33(13): 4164 - 4171.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/9/2095    most recent
bti252v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (10)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Chalmel, F.
Right arrow Articles by Poch, O.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Chalmel, F.
Right arrow Articles by Poch, O.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?