Bioinformatics Advance Access published online on April 17, 2009
Bioinformatics, doi:10.1093/bioinformatics/btp249
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
MeSH Up: Effective MeSH Text Classification for Improved Document Retrieval
1 European Bioinformatics Institute, Hinxton, United Kingdom
2 HMI, University of Twente, Enschede, The Netherlands
3 TNO ICT, Delft, The Netherlands
*To whom correspondence should be addressed. Mr. Dolf Trieschnigg, E-mail: trieschn{at}ewi.utwente.nl, dolf{at}trieschnigg.nl
| Abstract |
|---|
Motivation: Controlled vocabularies such as the Medical Subject Headings (MeSH) thesaurus and the Gene Ontology (GO) provide an efficient way of accessing and organizing biomedical information by reducing the ambiguity inherent to free-text data. Different methods of automating the assignment of MeSH concepts have been proposed to replace manual annotation, but they are either limited to a small subset of MeSH or have only been compared to a limited number of other systems.
Results: We compare the performance of 6 MeSH classification systems (MetaMap, EAGL, a language and a vector space model based approach, a K-Nearest Neighbor approach and MTI) in terms of reproducing and complementing manual MeSH annotations. A KNearest Neighbor system clearly outperforms the other published approaches and scales well with large amounts of text using the full MeSH thesaurus. Our measurements demonstrate to what extent manual MeSH annotations can be reproduced and how they can be complemented by automatic annotations. We also show that a statistically significant improvement can be obtained in information retrieval (IR) when the text of a user's query is automatically annotated with MeSH concepts, compared to using the original textual query alone.
Conclusions: The annotation of biomedical texts using controlled vocabularies such as MeSH can be automated to improve text-only IR. Furthermore, the automatic MeSH annotation system we propose is highly scalable and it generates improvements in IR comparable to those observed for manual annotations.
Contact: trieschn{at}ewi.utwente.nl
Associate Editor: Dr. Limsoon Wong
Received on November 20, 2008; revised on April 2, 2009; accepted on April 7, 2009
This article has been cited by other articles:
![]() |
A. Neveol, J. G. Mork, and A. R. Aronson Comment on 'MeSH-up: effective MeSH text classification for improved document retrieval' Bioinformatics, October 15, 2009; 25(20): 2770 - 2771. [Abstract] [Full Text] [PDF] |
||||
