Skip Navigation


Bioinformatics Advance Access originally published online on January 22, 2004
This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow All Versions of this Article:
20/5/604    most recent
btg452v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (46)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Daraselia, N.
Right arrow Articles by Mazo, I.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Daraselia, N.
Right arrow Articles by Mazo, I.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics 20(5) © Oxford University Press 2004; all rights reserved.

Extracting human protein interactions from MEDLINE using a full-sentence parser

Nikolai Daraselia *, Anton Yuryev , Sergei Egorov , Svetalana Novichkova , Alexander Nikitin and Ilya Mazo

Ariadne Genomics, Inc., 9700 Great Seneca Hwy, Rockville, MD 20850, USA

Received on July 22, 2003 ; revised on September 29, 2003 ; accepted on October 3, 2003
Advance Access Publication January 22, 2004

Motivation: The living cell is a complex machine that depends on the proper functioning of its numerous parts, including proteins. Understanding protein functions and how they modify and regulate each other is the next great challenge for life-sciences researchers. The collective knowledge about protein functions and pathways is scattered throughout numerous publications in scientific journals. Bringing the relevant information together becomes a bottleneck in a research and discovery process. The volume of such information grows exponentially, which renders manual curation impractical. As a viable alternative, automated literature processing tools could be employed to extract and organize biological data into a knowledge base, making it amenable to computational analysis and data mining.

Results: We present MedScan, a completely automated natural language processing-based information extraction system. We have used MedScan to extract 2976 interactions between human proteins from MEDLINE abstracts dated after 1988. The precision of the extracted information was found to be 91%. Comparison with the existing protein interaction databases BIND and DIP revealed that 96% of extracted information is novel. The recall rate of MedScan was found to be 21%. Additional experiments with MedScan suggest that MEDLINE is a unique source of diverse protein function information, which can be extracted in a completely automated way with a reasonably high precision. Further directions of the MedScan technology improvement are discussed.

Availability: MedScan is available for commercial licensing from Ariadne Genomics, Inc.

Contact: nikolai{at}ariadnegenomics.com

* To whom correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J. Am. Med. Inform. Assoc.Home page
H. Liu, Z.-Z. Hu, M. Torii, C. Wu, and C. Friedman
Quantitative Assessment of Dictionary-based Protein Named Entity Tagging
J. Am. Med. Inform. Assoc., September 1, 2006; 13(5): 497 - 507.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
P. K. Shah and P. Bork
LSAT: learning about alternative transcripts in MEDLINE
Bioinformatics, April 1, 2006; 22(7): 857 - 865.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
I. Ispolatov, A. Yuryev, I. Mazo, and S. Maslov
Binding properties and evolution of homodimers in protein-protein interaction networks
Nucleic Acids Res., June 27, 2005; 33(11): 3629 - 3635.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
C. Santos, D. Eggle, and David. J. States
Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction
Bioinformatics, April 15, 2005; 21(8): 1653 - 1658.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.