Skip Navigation


Bioinformatics Advance Access originally published online on August 20, 2008
Bioinformatics 2008 24(18):2086-2093; doi:10.1093/bioinformatics/btn381
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
24/18/2086    most recent
btn381v2
btn381v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Google Scholar
Right arrow Articles by Shatkay, H.
Right arrow Articles by Wilbur, W. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Shatkay, H.
Right arrow Articles by Wilbur, W. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users

Hagit Shatkay 1,*, Fengxia Pan 1, Andrey Rzhetsky 2,3 and W. John Wilbur 4

1The Computational Biology and Machine Learning Lab, School of Computing, Queen's University, Kingston, Ontario, Canada, 2Department of Medicine, 3Department of Human Genetics, Computation Institute, and Institute for Genomics and Systems Biology, University of Chicago, Chicago, IL and 4National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, USA

*To whom correspondence should be addressed.


   Abstract

Motivation: Much current research in biomedical text mining is concerned with serving biologists by extracting certain information from scientific text. We note that there is no ‘average biologist’ client; different users have distinct needs. For instance, as noted in past evaluation efforts (BioCreative, TREC, KDD) database curators are often interested in sentences showing experimental evidence and methods. Conversely, lab scientists searching for known information about a protein may seek facts, typically stated with high confidence. Text-mining systems can target specific end-users and become more effective, if the system can first identify text regions rich in the type of scientific content that is of interest to the user, retrieve documents that have many such regions, and focus on fact extraction from these regions. Here, we study the ability to characterize and classify such text automatically. We have recently introduced a multi-dimensional categorization and annotation scheme, developed to be applicable to a wide variety of biomedical documents and scientific statements, while intended to support specific biomedical retrieval and extraction tasks.

Results: The annotation scheme was applied to a large corpus in a controlled effort by eight independent annotators, where three individual annotators independently tagged each sentence. We then trained and tested machine learning classifiers to automatically categorize sentence fragments based on the annotation. We discuss here the issues involved in this task, and present an overview of the results. The latter strongly suggest that automatic annotation along most of the dimensions is highly feasible, and that this new framework for scientific sentence categorization is applicable in practice.

Contact: shatkay{at}cs.queensu.ca

Associate Editor: Thomas Lengauer


Received on May 25, 2008; revised on July 17, 2008; accepted on July 19, 2008

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
S. Agarwal and H. Yu
Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion
Bioinformatics, December 1, 2009; 25(23): 3174 - 3180.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.