Skip Navigation


Bioinformatics Advance Access originally published online on July 1, 2004
Bioinformatics 2004 20(17):3206-3213; doi:10.1093/bioinformatics/bth386
This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow All Versions of this Article:
20/17/3206    most recent
bth386v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (18)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Corney, D. P. A.
Right arrow Articles by Jones, D. T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Corney, D. P. A.
Right arrow Articles by Jones, D. T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics vol. 20 issue 17 © Oxford University Press 2004; all rights reserved.

BioRAT: extracting biological information from full-length papers

David P. A. Corney , Bernard F. Buxton , William B. Langdon and David T. Jones *

Bioinformatics Unit, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK

Received on December 19, 2003; revised on June 4, 2004; accepted on June 25, 2004
Advance Access Publication July 1, 2004

Motivation: Converting the vast quantity of free-format text found in journals into a concise, structured format makes the researcher's quest for information easier. Recently, several information extraction systems have been developed that attempt to simplify the retrieval and analysis of biological and medical data. Most of this work has used the abstract alone, owing to the convenience of access and the quality of data. Abstracts are generally available through central collections with easy direct access (e.g. PubMed). The full-text papers contain more information, but are distributed across many locations (e.g. publishers' web sites, journal web sites and local repositories), making access more difficult.

In this paper, we present BioRAT, a new information extraction (IE) tool, specifically designed to perform biomedical IE, and which is able to locate and analyse both abstracts and full-length papers. BioRAT is a Biological Research Assistant for Text mining, and incorporates a document search ability with domain-specific IE.

Results: We show first, that BioRAT performs as well as existing systems, when applied to abstracts; and second, that significantly more information is available to BioRAT through the full-length papers than via the abstracts alone. Typically, less than half of the available information is extracted from the abstract, with the majority coming from the body of each paper. Overall, BioRAT recalled 20.31% of the target facts from the abstracts with 55.07% precision, and achieved 43.6% recall with 51.25% precision on full-length papers.

Availability: The software and documentation can be found at http://bioinf.cs.ucl.ac.uk/biorat

Contact: d.corney{at}cs.ucl.ac.uk; dtj{at}cs.ucl.ac.uk

* To whom correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Mol. Biol. CellHome page
K. W. Kohn, M. I. Aladjem, J. N. Weinstein, and Y. Pommier
Molecular Interaction Maps of Bioregulatory Networks: A General Rubric for Systems Biology
Mol. Biol. Cell, January 1, 2006; 17(1): 1 - 13.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.