Skip Navigation


Bioinformatics Advance Access originally published online on September 25, 2009
Bioinformatics 2009 25(23):3174-3180; doi:10.1093/bioinformatics/btp548
This Article
Right arrow Full Text
Right arrow Full Text (Print PDF)
Right arrow All Versions of this Article:
25/23/3174    most recent
btp548v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Agarwal, S.
Right arrow Articles by Yu, H.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Agarwal, S.
Right arrow Articles by Yu, H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2009. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion

Shashank Agarwal and Hong Yu *

University of Wisconsin, Milwaukee, Milwaukee WI 53211, USA

* To whom correspondence should be addressed.


   Abstract

Biomedical texts can be typically represented by four rhetorical categories: Introduction, Methods, Results and Discussion (IMRAD). Classifying sentences into these categories can benefit many other text-mining tasks. Although many studies have applied different approaches for automatically classifying sentences in MEDLINE abstracts into the IMRAD categories, few have explored the classification of sentences that appear in full-text biomedical articles. We first evaluated whether sentences in full-text biomedical articles could be reliably annotated into the IMRAD format and then explored different approaches for automatically classifying these sentences into the IMRAD categories. Our results show an overall annotation agreement of 82.14% with a Kappa score of 0.756. The best classification system is a multinomial naïve Bayes classifier trained on manually annotated data that achieved 91.95% accuracy and an average F-score of 91.55%, which is significantly higher than baseline systems. A web version of this system is available online at—http://wood.ims.uwm.edu/full_text_classifier/.

Contact: hongyu{at}uwm.edu

Associate Editor: Limsoon Wong


Received on May 13, 2009; revised on September 11, 2009; accepted on September 14, 2009

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.