Skip Navigation

Bioinformatics 2007 23(13):i529-i538; doi:10.1093/bioinformatics/btm195
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Tao, Y.
Right arrow Articles by Lussier, Y. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Tao, Y.
Right arrow Articles by Lussier, Y. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Information theory applied to the sparse gene ontology annotation network to predict novel gene function

Ying Tao 1,{dagger}, Lee Sam 2, Jianrong Li 2, Carol Friedman 1 and Yves A. Lussier 1,2,3,*,{ddagger}

1Department of Biomedical Informatics, Columbia University, 622 West 168th Street, VC5, New York, NY 100322Center for Biomedical Informatics, The University of Chicago, 5841 S Maryland Avenue, Chicago, IL 60637, USA 3UCCRC, The University of Chicago, 5841 S Maryland Avenue, Chicago, IL 60637, USA

*To whom correspondence should be addressed.


   Abstract

Motivation: Despite advances in the gene annotation process, the functions of a large portion of gene products remain insufficiently characterized. In addition, the in silico prediction of novel Gene Ontology (GO) annotations for partially characterized gene functions or processes is highly dependent on reverse genetic or functional genomic approaches. To our knowledge, no prediction method has been demonstrated to be highly accurate for sparsely annotated GO terms (those associated to fewer than 10 genes).

Results: We propose a novel approach, information theory-based semantic similarity (ITSS), to automatically predict molecular functions of genes based on existing GO annotations. Using a 10-fold cross-validation, we demonstrate that the ITSS algorithm obtains prediction accuracies (precision 97%, recall 77%) comparable to other machine learning algorithms when compared in similar conditions over densely annotated portions of the GO datasets. This method is able to generate highly accurate predictions in sparsely annotated portions of GO, where previous algorithms have failed. As a result, our technique generates an order of magnitude more functional predictions than previous methods. A 10-fold cross validation demonstrated a precision of 90% at a recall of 36% for the algorithm over sparsely annotated networks of the recent GO annotations (about 1400 GO terms and 11 000 genes in Homo sapiens). To our knowledge, this article presents the first historical rollback validation for the predicted GO annotations, which may represent more realistic conditions than more widely used cross-validation approaches. By manually assessing a random sample of 100 predictions conducted in a historical rollback evaluation, we estimate that a minimum precision of 51% (95% confidence interval: 43–58%) can be achieved for the human GO Annotation file dated 2003.

Availability: The program is available on request. The 97 732 positive predictions of novel gene annotations from the 2005 GO Annotation dataset and other supplementary information is available at http://phenos.bsd.uchicago.edu/ITSS/

Contact: Lussier{at}uchicago.edu

Supplementary information: Supplementary data are available atBioinformatics online.



Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
H. Lee, G.-S. Yi, and J. C. Park
E3Miner: a text mining tool for ubiquitin-protein ligases
Nucleic Acids Res., July 1, 2008; 36(suppl_2): W416 - W422.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.