Bioinformatics Advance Access originally published online on February 21, 2007
Bioinformatics 2007 23(8):1015-1022; doi:10.1093/bioinformatics/btm056
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Gene symbol disambiguation using knowledge-based profiles


1Department of Biomedical Informatics, Columbia University, 622 168th St and 2Department of Biostatistics, Columbia University, 722 168th St, New York City, New York, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: The ambiguity of biomedical entities, particularly of gene symbols, is a big challenge for text-mining systems in the biomedical domain. Existing knowledge sources, such as Entrez Gene and the MEDLINE database, contain information concerning the characteristics of a particular gene that could be used to disambiguate gene symbols.
Results: For each gene, we create a profile with different types of information automatically extracted from related MEDLINE abstracts and readily available annotated knowledge sources. We apply the gene profiles to the disambiguation task via an information retrieval method, which ranks the similarity scores between the context where the ambiguous gene is mentioned, and candidate gene profiles. The gene profile with the highest similarity score is then chosen as the correct sense. We evaluated the method on three automatically generated testing sets of mouse, fly and yeast organisms, respectively. The method achieved the highest precision of 93.9% for the mouse, 77.8% for the fly and 89.5% for the yeast.
Availability: The testing data sets and disambiguation programs are available at http://www.dbmi.columbia.edu/~hux7002/gsd2006
Contact: friedman{at}dbmi.columbia.edu
Associate Editor: Alfonso Valencia
The authors wish it to be known that, in their opinion, the last two authors should be regarded as joint First Authors.
Received on November 27, 2006; revised on January 22, 2007; accepted on February 11, 2007
This article has been cited by other articles:
![]() |
J. Wermter, K. Tomanek, and U. Hahn High-performance gene name normalization with GENO Bioinformatics, March 15, 2009; 25(6): 815 - 821. [Abstract] [Full Text] [PDF] |
||||
