Bioinformatics Advance Access originally published online on May 11, 2007
Bioinformatics 2007 23(14):1862-1865; doi:10.1093/bioinformatics/btm235
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
MutationFinder: a high-performance system for extracting point mutation mentions from text
1Department of Biochemistry and Molecular Genetics, 2Center for Computational Pharmacology, University of Colorado Health Sciences Center, Aurora, CO, 3Motorola Mobile Devices, Libertyville, IL, 4Department of Computer Science and 5Department of Linguistics, University of Colorado at Boulder, Boulder, CO, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
Summary: Discussion of point mutations is ubiquitous in biomedical literature, and manually compiling databases or literature on mutations in specific genes or proteins is tedious. We present an open-source, rule-based system, MutationFinder, for extracting point mutation mentions from text. On blind test data, it achieves nearly perfect precision and a markedly improved recall over a baseline.
Availability: MutationFinder, along with a high-quality gold standard data set, and a scoring script for mutation extraction systems have been made publicly available. Implementations, source code and unit tests are available in Python, Perl and Java. MutationFinder can be used as a stand-alone script, or imported by other applications.
Project URL: http://bionlp.sourceforge.net
Contact: gregcaporaso{at}gmail.com
Associate Editor: Alfonso Valencia
Received on January 29, 2007; revised on April 18, 2007; accepted on April 26, 2007