Bioinformatics Advance Access published online on October 25, 2005
Bioinformatics, doi:10.1093/bioinformatics/bti733
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Department of Computer and Information Science, University of Pennsylvania, 3330 Walnut Street, Philadelphia, PA 19104
* To whom correspondence should be addressed.
Motivation: Many entity-taggers and information extraction systems make use of lists of terms of entities such as people, places, genes or chemicals. These lists have traditionally been constructed manually. We show that distributional clustering methods which group words based on the contexts that they appear in, including neighboring words and syntactic relations extracted using a shallow parser, can be used to aid in the construction of term lists. Results: Experiments on learning lists of terms and using them as part of a gene-tagger on a corpus of abstracts from the scientific literature show that our automatically generated term lists significantly boost the precision of a state-of-the-art CRF-based gene-tagger to a degree that is competitive with using hand curated lists, and boosts recall to a degree that surpasses that of the hand curated lists. Our results also show that these distributional clustering methods do not generate lists as helpful as those generated by supervised techniques, but that they can be used to complement supervised techniques so as to obtain better performance. Availability: The code used in this paper is available from http://www.cis.upenn.edu/datamining/software_dist/autoterm/.
Received April 29, 2005
Revised October 20, 2005
Accepted October 20, 2005
Article
Automatic term list generation for entity tagging
Ted Sandler, E-mail: tsandler{at}seas.upenn.edu
![]()
Abstract ![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
P. Agarwal and D. B. Searls Literature mining in support of drug discovery Brief Bioinform, September 27, 2008; (2008) bbn035v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Zweigenbaum, D. Demner-Fushman, H. Yu, and K. B. Cohen Frontiers of biomedical text mining: current progress Brief Bioinform, October 30, 2007; (2007) bbm045v1. [Abstract] [Full Text] [PDF] |
||||
