Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (39)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Temkin, J. M.
Right arrow Articles by Gilder, M. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Temkin, J. M.
Right arrow Articles by Gilder, M. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics Vol. 19 no. 16 2003
pages 2046-2053
© 2003 Oxford University Press

Extraction of protein interaction information from unstructured text using a context-free grammar

Joshua M. Temkin and Mark R. Gilder *

GE Global Research, 1 Research Circle, Niskayuna, NY 12309, USA

Received on February 18, 2003 ; revised on April 14, 2003 ; accepted on April 26, 2003

Motivation: As research into disease pathology and cellular function continues to generate vast amounts of data pertaining to protein, gene and small molecule (PGSM) interactions, there exists a critical need to capture these results in structured formats allowing for computational analysis. Although many efforts have been made to create databases that store this information in computer readable form, populating these sources largely requires a manual process of interpreting and extracting interaction relationships from the biological research literature. Being able to efficiently and accurately automate the extraction of interactions from unstructured text, would greatly improve the content of these databases and provide a method for managing the continued growth of new literature being published.

Results: In this paper, we describe a system for extracting PGSM interactions from unstructured text. By utilizing a lexical analyzer and context free grammar (CFG), we demonstrate that efficient parsers can be constructed for extracting these relationships from natural language with high rates of recall and precision. Our results show that this technique achieved a recall rate of 83.5% and a precision rate of 93.1% for recognizing PGSM names and a recall rate of 63.9% and a precision rate of 70.2% for extracting interactions between these entities. In contrast to other published techniques, the use of a CFG significantly reduces the complexities of natural language processing by focusing on domain specific structure as opposed to analyzing the semantics of a given language. Additionally, our approach provides a level of abstraction for adding new rules for extracting other types of biological relationships beyond PGSM relationships.

Availability: The program and corpus are available by request from the authors.

Contact: gilder{at}research.ge.com; jtemkin1{at}comcast.net

* To whom correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
O. Sanchez-Graillet and M. Poesio
Negation of protein protein interactions: analysis and extraction
Bioinformatics, July 1, 2007; 23(13): i424 - i432.
[Abstract] [Full Text] [PDF]


Home page
IEICE Trans Inf & SystHome page
T. MITSUMORI, M. MURATA, Y. FUKUDA, K. DOI, and H. DOI
Extracting Protein-Protein Interaction Information from Biomedical Text with SVM
IEICE Trans D: Information, August 1, 2006; E89-D(8): 2464 - 2466.
[Abstract] [PDF]


Home page
BioinformaticsHome page
Y. Hao, X. Zhu, M. Huang, and M. Li
Discovering patterns to extract protein-protein interactions from the literature: Part II
Bioinformatics, August 1, 2005; 21(15): 3294 - 3300.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
O. Hofmann and D. Schomburg
Concept-based annotation of enzyme classes
Bioinformatics, May 1, 2005; 21(9): 2059 - 2066.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
C. Santos, D. Eggle, and David. J. States
Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction
Bioinformatics, April 15, 2005; 21(8): 1653 - 1658.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.