Bioinformatics Advance Access published online on December 9, 2008
Bioinformatics, doi:10.1093/bioinformatics/btn631
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Evaluating Contributions of Natural Language Parsers to Protein-Protein Interaction Extraction
1Department of Computer Science, University of Tokyo, Japan
2Institute for Creative Technologies, University of Southern California, USA
3School of Computer Science, University of Manchester, UK
4National Center for Text Mining, UK
*To whom correspondence should be addressed. Dr. Yusuke Miyao, E-mail: yusuke{at}is.s.u-tokyo.ac.jp
| Abstract |
|---|
Motivation: While text mining technologies for biomedical research have gained popularity as a way to take advantage of the explosive growth of information in text form in biomedical papers, selecting appropriate natural language processing (NLP) tools is still dif.cult for researchers who are not familiar with recent advances in NLP. This paper provides a comparative evaluation of several state-of-the-art natural language parsers, focusing on the task of extracting proteinprotein interaction (PPI) from biomedical papers. We measure how each parser, and its output representation, contributes to accuracy improvement when the parser is used as a component in a PPI system.
Results: All the parsers attained improvements in accuracy of PPI extraction. The levels of accuracy obtained with these different parsers vary slightly, while differences in parsing speed are larger. The best accuracy in this work was obtained when we combined (27)'s Enju parser and (6)'s reranking parser, and the accuracy is better than the state-of-the-art results on the same data.
Availability: The PPI extraction system used in this work (AkanePPI) is available online at http://www-tsujii.is.s.u-tokyo.ac.jp/downloads/downloads.cgi. The evaluated parsers are also available online from each developer's site.
Contact: yusuke{at}is.s.u-tokyo.ac.jp
Associate Editor: Dr. Jonathan Wren
Received on September 18, 2008; revised on November 9, 2008; accepted on December 3, 2008