Bioinformatics Vol. 19 Suppl. 1 2003
Pages i331-i339
© 2003 Oxford University Press
Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup
The MITRE Corporation, 202 Burlington Road, Bedford, MA 01730, USA
Received on January 6, 2003
; accepted on February 20, 2003
Motivation: The biological literature is a major repository of knowledge. Many biological databases draw much of their content from a careful curation of this literature. However, as the volume of literature increases, the burden of curation increases. Text mining may provide useful tools to assist in the curation process. To date, the lack of standards has made it impossible to determine whether text mining techniques are sufficiently mature to be useful.
Results: We report on a Challenge Evaluation task that we created for the Knowledge Discovery and Data Mining (KDD) Challenge Cup. We provided a training corpus of 862 articles consisting of journal articles curated in FlyBase, along with the associated lists of genes and gene products, as well as the relevant data fields from FlyBase. For the test, we provided a corpus of 213 new (blind) articles; the 18 participating groups provided systems that flagged articles for curation, based on whether the article contained experimental evidence for gene expression products. We report on the evaluation results and describe the techniques used by the top performing groups.
Contact: asy{at}mitre.org
Keywords: text mining, evaluation, curation, genomics, data management
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
C. E. Crangle, J. M. Cherry, E. L. Hong, and A. Zbyslaw Mining experimental evidence of molecular function claims from the literature Bioinformatics, December 1, 2007; 23(23): 3232 - 3240. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Zweigenbaum, D. Demner-Fushman, H. Yu, and K. B. Cohen Frontiers of biomedical text mining: current progress Brief Bioinform, October 30, 2007; (2007) bbm045v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. N. Sarkar Biodiversity informatics: organizing and linking information across the spectrum of life Brief Bioinform, September 1, 2007; 8(5): 347 - 357. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Hearst, A. Divoli, H. Guturu, A. Ksikes, P. Nakov, M. A. Wooldridge, and J. Ye BioText Search Engine: beyond abstract search Bioinformatics, August 15, 2007; 23(16): 2196 - 2197. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Lu, B. Zheng, A. Velivelli, and C. Zhai Enhancing Text Categorization with Semantic-enriched Representation and Training Data Augmentation J. Am. Med. Inform. Assoc., September 1, 2006; 13(5): 526 - 535. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. K. Shah and P. Bork LSAT: learning about alternative transcripts in MEDLINE Bioinformatics, April 1, 2006; 22(7): 857 - 865. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. L. Rubin, C. F. Thorn, T. E. Klein, and R. B. Altman A Statistical Approach to Scanning the Biomedical Literature for Pharmacogenetics Knowledge J. Am. Med. Inform. Assoc., March 1, 2005; 12(2): 121 - 129. [Abstract] [Full Text] [PDF] |
||||


