Bioinformatics Advance Access originally published online on April 15, 2009
Bioinformatics 2009 25(12):1536-1542; doi:10.1093/bioinformatics/btp245
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bayesian inference of protein–protein interactions from biological literature


1Department of Statistics, Harvard University, Cambridge, MA 02138, 2Marshfield Clinic-Marshfield Center, MCRF-BIRC, 1000 North Oak Avenue, Marshfield, WI 54449 and 3Department of Statistics, Florida State University, Tallahassee, FL 32306, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: Protein–protein interaction (PPI) extraction from published biological articles has attracted much attention because of the importance of protein interactions in biological processes. Despite significant progress, mining PPIs from literatures still rely heavily on time- and resource-consuming manual annotations.
Results: In this study, we developed a novel methodology based on Bayesian networks (BNs) for extracting PPI triplets (a PPI triplet consists of two protein names and the corresponding interaction word) from unstructured text. The method achieved an overall accuracy of 87% on a cross-validation test using manually annotated dataset. We also showed, through extracting PPI triplets from a large number of PubMed abstracts, that our method was able to complement human annotations to extract large number of new PPIs from literature.
Availability: Programs/scripts we developed/used in the study are available at http://stat.fsu.edu/~jinfeng/datasets/Bio-SI-programs-Bayesian-chowdhary-zhang-liu.zip
Contact: jliu{at}stat.harvard.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First authors.
Associate Editor: Jonathan Wren
Received on December 24, 2008; revised on March 30, 2009; accepted on April 5, 2009