Bioinformatics Advance Access published online on May 11, 2007
Bioinformatics, doi:10.1093/bioinformatics/btm235
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
MutationFinder: A high-performance system for extracting point mutation mentions from text
a Department of Biochemistry and Molecular Genetics, bCenter for Computational Pharmacology University of Colorado Health Sciences Center, Aurora, CO, USA
c Motorola Mobile Devices, Libertyville, IL, USA
d Department of Computer Science, eDepartment of Linguistics University of Colorado at Boulder, Boulder, CO, USA
*To whom correspondence should be addressed. Mr. J. Gregory Caporaso, E-mail: gregcaporaso{at}gmail.com
| Abstract |
|---|
Summary: Discussion of point mutations is ubiquitous in biomedical literature, and manually compiling databases or literature on mutations in specific genes or proteins is tedious. We present an open-source, rule-based system, MutationFinder, for extracting point mutation mentions from text. On blind test data, it achieves nearly perfect precision and a markedly improved recall over a baseline.
Availability: MutationFinder, along with a high-quality gold standard data set, and a scoring script for mutation extraction systems have been made publicly available. Implementations, source code, and unit tests are available in Python, Perl, and Java. MutationFinder can be used as a stand-alone script, or imported by other applications.
Project URL: http://bionlp.sourceforge.net
Associate Editor: Prof. Alfonso Valencia
Received on January 29, 2007; revised on April 18, 2007; accepted on April 26, 2007