Bioinformatics Advance Access first published online on August 27, 2007
This version published online on August 30, 2007
Bioinformatics, doi:10.1093/bioinformatics/btm403
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Exploring the functional landscape of gene expression: directed search of large microarray compendia
1Lewis-Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Laboratory, Princeton, NJ, USA
2Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ, US
*To whom correspondence should be addressed. Dr. Olga G. Troyanskaya, E-mail: ogt{at}cs.princeton.edu
| Abstract |
|---|
Motivation: The increasing availability of gene expression microarray technology has resulted in the publication of thousands of microarray gene expression datasets investigating various biological conditions. This vast repository is still underutilized due to the lack of methods for fast, accurate exploration of the entire compendium.
Results: We have collected S. cerevisiae gene expression microarray data containing roughly 2400 experimental conditions. We analyzed the functional coverage of this collection and we designed a context-sensitive search algorithm for rapid exploration of the compendium. A researcher using our system provides a small set of query genes to establish a biological search context; based on this query, we weight each datasets relevance to the context, and within these weighted datasets we identify additional genes that are co-expressed with the query set. Our method exhibits an average increase in accuracy of 273% compared to previous mega-clustering approaches when recapitulating known biology. Further, we find that our search paradigm identifies novel biological predictions that can be verified through further experimentation. Our methodology provides the ability for biological researchers to explore the totality of existing microarray data in a manner useful for drawing conclusions and formulating hypotheses, which we believe is invaluable for the research community.
Availability: Our query-driven search engine, called SPELL, is available at http://function.princeton.edu/SPELL
Contact: ogt{at}cs.princeton.edu
Supplementary Information: Several additional data files, figures, and discussions are available at http://function.princeton.edu/SPELL/supplement
Associate Editor: Prof. David Rocke
Received on May 4, 2007; revised on August 2, 2007; accepted on August 2, 2007
This article has been cited by other articles:
![]() |
K. Pawlowski Uncharacterized/hypothetical proteins in biomedical 'omics' experiments: is novelty being swept under the carpet? Brief Funct Genomic Proteomic, July 19, 2008; (2008) eln033v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Huttenhower and O.G. Troyanskaya Assessing the functional structure of genomic data Bioinformatics, July 1, 2008; 24(13): i330 - i338. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Aguilar, L. Skrabanek, S. S. Gross, B. Oliva, and F. Campagne Beyond tissueInfo: functional prediction using tissue expression profile similarity searches Nucleic Acids Res., June 1, 2008; 36(11): 3728 - 3737. [Abstract] [Full Text] [PDF] |
||||


