Bioinformatics Advance Access originally published online on September 11, 2007
Bioinformatics 2007 23(22):3024-3031; doi:10.1093/bioinformatics/btm440
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Improved detection of overrepresentation of Gene-Ontology annotations with parent–child analysis
1Max-Planck-Institute for Molecular Genetics, Ihnestrasse 73, 14195 Berlin and 2Institute of Medical Genetics, Universitätsmedizin Charité, Augustenburger Platz 1, 13353 Berlin, Germany
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: High-throughput experiments such as microarray hybridizations often yield long lists of genes found to share a certain characteristic such as differential expression. Exploring Gene Ontology (GO) annotations for such lists of genes has become a widespread practice to get first insights into the potential biological meaning of the experiment. The standard statistical approach to measuring overrepresentation of GO terms cannot cope with the dependencies resulting from the structure of GO because they analyze each term in isolation. Especially the fact that annotations are inherited from more specific descendant terms can result in certain types of false-positive results with potentially misleading biological interpretation, a phenomenon which we term the inheritance problem.
Results: We present here a novel approach to analysis of GO term overrepresentation that determines overrepresentation of terms in the context of annotations to the term's parents. This approach reduces the dependencies between the individual term's measurements, and thereby avoids producing false-positive results owing to the inheritance problem. ROC analysis using study sets with overrepresented GO terms showed a clear advantage for our approach over the standard algorithm with respect to the inheritance problem. Although there can be no gold standard for exploratory methods such as analysis of GO term overrepresentation, analysis of biological datasets suggests that our algorithm tends to identify the core GO terms that are most characteristic of the dataset being analyzed.
Availability: The Ontologizer can be found at the project homepage http://www.charite.de/ch/medgen/ontologizer
Contact: peter.robinson{at}charite.de and vingron{at}molgen.mpg.de
Received on May 21, 2007; revised on August 3, 2007; accepted on August 20, 2007
This article has been cited by other articles:
![]() |
A. Senf and X.-w. Chen Identification of genes involved in the same pathways using a Hidden Markov Model-based approach Bioinformatics, November 15, 2009; 25(22): 2945 - 2954. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. L. Hufton, S. Mathia, H. Braun, U. Georgi, H. Lehrach, M. Vingron, A. J. Poustka, and G. Panopoulou Deeply conserved chordate noncoding sequences preserve genome synteny but do not drive gene duplicate retention Genome Res., November 1, 2009; 19(11): 2036 - 2051. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. Dupre, D. W. Burt, R. Talbot, A. Downing, D. Mouzaki, D. Waddington, B. Malpaux, J. R. E. Davis, G. A. Lincoln, and A. S. I. Loudon Identification of Melatonin-Regulated Genes in the Ovine Pituitary Pars Tuberalis, a Target Site for Seasonal Hormone Control Endocrinology, November 1, 2008; 149(11): 5527 - 5539. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Lu, R. Rosenfeld, I. Simon, G. J. Nau, and Z. Bar-Joseph A probabilistic generative model for GO enrichment analysis Nucleic Acids Res., October 1, 2008; 36(17): e109 - e109. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Bauer, S. Grossmann, M. Vingron, and P. N. Robinson Ontologizer 2.0--a multifunctional tool for GO term enrichment analysis and data exploration Bioinformatics, July 15, 2008; 24(14): 1650 - 1651. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. V. Antonov, T. Schmidt, Y. Wang, and H. W. Mewes ProfCom: a web tool for profiling the complex functionality of gene groups identified from high-throughput data Nucleic Acids Res., July 1, 2008; 36(suppl_2): W347 - W351. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Smeenk, S. J. van Heeringen, M. Koeppel, M. A. van Driel, S. J. J. Bartels, R. C. Akkers, S. Denissov, H. G. Stunnenberg, and M. Lohrum Characterization of genome-wide p53-binding sites upon stress response Nucleic Acids Res., June 1, 2008; 36(11): 3639 - 3654. [Abstract] [Full Text] [PDF] |
||||



