Bioinformatics Advance Access originally published online on May 30, 2007
Bioinformatics 2007 23(15):1995-2003; doi:10.1093/bioinformatics/btm261
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Clustering microarray-derived gene lists through implicit literature relationships
1Departments of Internal Medicine and Biochemistry, The McDermott Center for Human Growth and Development, Division of Translational Research, The University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, Texas 75390, 2Arthritis & Immunology Program, Oklahoma Medical Research Foundation, 825 N.E. 13th Street, Oklahoma City, Oklahoma 73104, 3Lineberger Comprehensive Cancer Center, 4Department of Genetics and 5Department of Pathology & Laboratory Medicine, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: Microarrays rapidly generate large quantities of gene expression information, but interpreting such data within a biological context is still relatively complex and laborious. New methods that can identify functionally related genes via shared literature concepts will be useful in addressing these needs.
Results: We have developed a novel method that uses implicit literature relationships (concepts related via shared, intermediate concepts) to cluster related genes. Genes are evaluated for implicit connections within a network of biomedical objects (other genes, ontological concepts and diseases) that are connected via their co-occurrences in Medline titles and/or abstracts. On the basis of these implicit relationships, individual gene pairs are scored using a probability-based algorithm. Scores are generated for all pairwise combinations of genes, which are then clustered based on the scores. We applied this method to a test set composed of nine functional groups with known relationships. The method scored highly for all nine groups and significantly better than a benchmark co-occurrence-based method for six groups. We then applied this method to gene sets specific to two previously defined breast tumor subtypes. Analysis of the results recapitulated known biological relationships and identified novel pathway relationships unique to each tumor subtype. We demonstrate that this method provides a valuable new means of identifying and visualizing significantly related genes within gene lists via their implicit relationships in the literature.
Contact: mark.burkart{at}utsouthwestern.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
Associate Editor: Limsoon Wong
Received on December 29, 2006; accepted on May 8, 2007
This article has been cited by other articles:
![]() |
R. Frijters, B. Heupers, P. van Beek, M. Bouwhuis, R. van Schaik, J. de Vlieg, J. Polman, and W. Alkema CoPub: a literature-based keyword enrichment tool for microarray data analysis Nucleic Acids Res., July 1, 2008; 36(suppl_2): W406 - W410. [Abstract] [Full Text] [PDF] |
||||
