Bioinformatics Advance Access originally published online on November 15, 2005
Bioinformatics 2006 22(6):658-664; doi:10.1093/bioinformatics/bti783
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Automatic assignment of biomedical categories: toward a generic approach
University Hospitals of Geneva, Medical Informatics Service CH-1201, Geneva
*To whom correspondence should be addressed.
ABSTRACT
Motivation: We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categorizer is largely data-independent.
Methods: In order to evaluate the robustness of our approach we test the system on two different biomedical terminologies: the Medical Subject Headings (MeSH) and the Gene Ontology (GO). Our lightweight categorizer, based on two ranking modules, combines a pattern matcher and a vector space retrieval engine, and uses both stems and linguistically-motivated indexing units.
Results and Conclusion: Results show the effectiveness of phrase indexing for both GO and MeSH categorization, but we observe the categorization power of the tool depends on the controlled vocabulary: precision at high ranks ranges from above 90% for MeSH to <20% for GO, establishing a new baseline for categorizers based on retrieval methods.
Contact: Patrick.Ruch{at}sim.hcuge.ch
Received on April 18, 2005; revised on November 11, 2005; accepted on November 13, 2005
This article has been cited by other articles:
![]() |
D. Trieschnigg, P. Pezik, V. Lee, F. de Jong, W. Kraaij, and D. Rebholz-Schuhmann MeSH Up: effective MeSH text classification for improved document retrieval Bioinformatics, June 1, 2009; 25(11): 1412 - 1418. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Sohn, W. Kim, D. C. Comeau, and W. J. Wilbur Optimal Training Sets for Bayesian Prediction of MeSH(R) Assignment J. Am. Med. Inform. Assoc., July 1, 2008; 15(4): 546 - 553. [Abstract] [Full Text] [PDF] |
||||

