A probabilistic model for mining implicit chemical compoundgene relations from literature
1Bioinformatics Center, Institute for Chemical Research, Kyoto University Gokasho, Uji 611-0011, Japan
2Graduate School of Pharmaceutical Sciences, Kyoto University Sakyo-ku, Kyoto 606-8501, Japan
*To whom correspondence should be addressed.
Motivation: The importance of chemical compounds has been emphasized more in molecular biology, and chemical genomics has attracted a great deal of attention in recent years. Thus an important issue in current molecular biology is to identify biological-related chemical compounds (more specifically, drugs) and genes. Co-occurrence of biological entities in the literature is a simple, comprehensive and popular technique to find the association of these entities. Our focus is to mine implicit chemical compound and gene relations from the co-occurrence in the literature.
Results: We propose a probabilistic model, called the mixture aspect model (MAM), and an algorithm for estimating its parameters to efficiently handle different types of co-occurrence datasets at once. We examined the performance of our approach not only by a cross-validation using the data generated from the MEDLINE records but also by a test using an independent human-curated dataset of the relationships between chemical compounds and genes in the ChEBI database. We performed experimentation on three different types of co-occurrence datasets (i.e. compoundgene, genegene and compoundcompound co-occurrences) in both cases. Experimental results have shown that MAM trained by all datasets outperformed any simple model trained by other combinations of datasets with the difference being statistically significant in all cases. In particular, we found that incorporating compoundcompound co-occurrences is the most effective in improving the predictive performance. We finally computed the likelihoods of all unknown compoundgene (more specifically, druggene) pairs using our approach and selected the top 20 pairs according to the likelihoods. We validated them from biological, medical and pharmaceutical viewpoints.
Contact: mami{at}kuicr.kyoto-u.ac.jp
This article has been cited by other articles:
![]() |
T. Seto, H. Isogai, M. Ozaki, and S. Nosaka Noble Gas Binding to Human Serum Albumin Using Docking Simulation: Nonimmobilizers and Anesthetics Bind to Different Sites Anesth. Analg., October 1, 2008; 107(4): 1223 - 1228. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Agarwal and D. B. Searls Literature mining in support of drug discovery Brief Bioinform, September 27, 2008; (2008) bbn035v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Yamanishi, M. Araki, A. Gutteridge, W. Honda, and M. Kanehisa Prediction of drug-target interaction networks from the integration of chemical and genomic spaces Bioinformatics, July 1, 2008; 24(13): i232 - i240. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Li, L. Wu, and Z. Zhang Constructing biological networks through combined literature mining and microarray analysis: a LMMA approach Bioinformatics, September 1, 2006; 22(17): 2143 - 2150. [Abstract] [Full Text] [PDF] |
||||


