Bioinformatics Advance Access published online on August 12, 2004
Bioinformatics, doi:10.1093/bioinformatics/bth464
Bioinformatics © Oxford University Press 2004; all rights reserved
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Dept. of Neurology, University of Tennessee Health Science Center, Memphis, TN 38163
* To whom correspondence should be addressed. E-mail: rhomayouni{at}utmem.edu.
Motivation: A major challenge in interpretation of high throughput genomic data is understanding the functional associations between genes. Previously, several approaches have been described to extract gene relationships from various biological databases by term-matching methods. However, more flexible automated methods are needed to identify functional relationships (both explicit and implicit) between genes from the biomedical literature. In this study, we explored the utility of Latent Semantic Indexing (LSI), a vector space model for information retrieval, to automatically identify conceptual gene relationships from titles and abstracts in MEDLINE citations. Results: We found that LSI identified gene-to-gene and keyword-to-gene relationships with high average precision. In addition, LSI identified implicit gene relationships based on word usage patterns in the gene abstract documents. Lastly, we demonstrate here that pairwise distances derived from the vector angles of gene abstract documents can be effectively used to functionally group genes by hierarchical clustering. Our results provide proof-of-principle that LSI is a robust automated method to elucidate both known (explicit) and unknown (implicit) gene relationships from the biomedical literature. These features make LSI particularly useful for analysis of novel associations discovered in genomic experiments. Availability: The 50-gene document collection used in this study can be interactively queried at http://shad.cs.utk.edu/sgo/sgo.html.
Revised July 10, 2004
Accepted August 2, 2004
Article
Gene clustering by latent semantic indexing of MEDLINE abstracts
2 Dept. of Computer Science, University of Tennessee, Knoxville, TN, 37996-3450
![]()
Abstract ![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
P. Agarwal and D. B. Searls Literature mining in support of drug discovery Brief Bioinform, September 27, 2008; (2008) bbn035v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Lin, W. Li, K. Chen, and Y. Liu A Document Clustering and Ranking System for Exploring MEDLINE Citations J. Am. Med. Inform. Assoc., September 1, 2007; 14(5): 651 - 661. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. I. Torvik and N. R. Smalheiser A quantitative model for linking two disparate sets of articles in MEDLINE Bioinformatics, July 1, 2007; 23(13): 1658 - 1665. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Matsunaga and M.-a. Muramatsu Knowledge-based computational search for genes associated with the metabolic syndrome Bioinformatics, July 15, 2005; 21(14): 3146 - 3154. [Abstract] [Full Text] [PDF] |
||||


