Skip Navigation



Bioinformatics Advance Access published online on October 18, 2006

Bioinformatics, doi:10.1093/bioinformatics/btl534
This Article
Right arrow Advance Access manuscript (PDF) Freely available
Right arrowOA All Versions of this Article:
22/24/3089    most recent
btl534v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Okazaki, N.
Right arrow Articles by Ananiadou, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Okazaki, N.
Right arrow Articles by Ananiadou, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2006 The Author(s)
Received July 1, 2006
Revised October 10, 2006
Accepted October 12, 2006

Article

Building an abbreviation dictionary using a term recognition approach

Naoaki Okazaki 1 * and Sophia Ananiadou 2

1 Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8651, Japan; Japan Society for the Promotion of Science (JSPS)
2 School of Computer Science, The University of Manchester, Oxford Road, Manchester, M13 9PL; National Centre for Text Mining (NaCTeM), Manchester Interdisciplinary Biocentre, Oxford Road, Manchester, M13 9PL

* To whom correspondence should be addressed.
Naoaki Okazaki, E-mail: okazaki{at}mi.ci.i.u-tokyo.ac.jp


   Abstract

Motivation: Acronyms result from a highly productive type of term variation and trigger the need for an acronym dictionary to establish associations between acronyms and their expanded forms.

Results: We propose a novel method for recognizing acronym definitions in a text collection. Assuming a word sequence cooccurring frequently with a parenthetical expression to be a potential expanded form, our method identifies acronym definitions in a similar manner to the statistical term-recognition task. Applied to the whole MEDLINE (7,811,582 abstracts), the implemented system extracted 886,755 acronym candidates and recognized 300,954 expanded forms in reasonable time. Our method outperformed baseline systems, achieving 99% precision and 82-95% recall on our evaluation corpus that roughly emulates the whole MEDLINE.

Availability and Supplementary Information: The implementations and supplementary information are available at our web site: http://www.chokkan.org/research/acromine/.


Associate Editor: Golan Yona
Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Brief BioinformHome page
R. Winnenburg, T. Wachter, C. Plake, A. Doms, and M. Schroeder
Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies?
Brief Bioinform, December 6, 2008; (2008) bbn043v1.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
C. E. Crangle, J. M. Cherry, E. L. Hong, and A. Zbyslaw
Mining experimental evidence of molecular function claims from the literature
Bioinformatics, December 1, 2007; 23(23): 3232 - 3240.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.