Vol. 20 no. 2 2004, pages 216-225
Bioinformatics © Oxford University Press 2004; all rights reserved.
GAPSCORE: finding gene and protein names one word at a time
1 Department of Genetics, Stanford Medical Center, 300 Pasteur Drive, Lane L 301, Mail Code 5120, Stanford, CA 94305-5120, USA and 2 Enkata Technologies, 2121 South El Camino Real, Suite 1200 San Mateo, CA 94403-1855, USA
Received on April 24, 2003
; revised on July 3, 2003
; accepted on July 18, 2003
Motivation: New high-throughput technologies have accelerated the accumulation of knowledge about genes and proteins. However, much knowledge is still stored as written natural language text. Therefore, we have developed a new method, GAPSCORE, to identify gene and protein names in text. GAPSCORE scores words based on a statistical model of gene names that quantifies their appearance, morphology and context.
Results: We evaluated GAPSCORE against the Yapex data set and achieved an F-score of 82.5% (83.3% recall, 81.5% precision) for partial matches and 57.6% (58.5% recall, 56.7% precision) for exact matches. Since the method is statistical, users can choose score cutoffs that adjust the performance according to their needs.
Availability: GAPSCORE is available at http://bionlp.stanford.edu/gapscore/
Contact: russ.altman{at}stanford.edu
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
B. Contreras-Moreira 3D-footprint: a database for the structural analysis of protein-DNA complexes Nucleic Acids Res., September 18, 2009; (2009) gkp781v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Torii, Z. Hu, C. H. Wu, and H. Liu BioTagger-GM: A Gene/Protein Name Recognition System J. Am. Med. Inform. Assoc., March 1, 2009; 16(2): 247 - 255. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Peifer, J. E. Karro, and H. H. von Grunberg Is there an acceleration of the CpG transition rate during the mammalian radiation? Bioinformatics, October 1, 2008; 24(19): 2157 - 2164. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Bonis, L. I. Furlong, and F. Sanz OSIRIS: a tool for retrieving literature about sequence variants Bioinformatics, October 15, 2006; 22(20): 2567 - 2569. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Liu, Z.-Z. Hu, M. Torii, C. Wu, and C. Friedman Quantitative Assessment of Dictionary-based Protein Named Entity Tagging J. Am. Med. Inform. Assoc., September 1, 2006; 13(5): 497 - 507. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Malik, L. Franke, and A. Siebes Combination of text-mining algorithms increases the performance Bioinformatics, September 1, 2006; 22(17): 2151 - 2157. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Winnenburg, T. K. Baldwin, M. Urban, C. Rawlings, J. Kohler, and K. E. Hammond-Kosack PHI-base: a new database for pathogen host interactions Nucleic Acids Res., January 1, 2006; 34(suppl_1): D459 - D464. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. L. Rubin, C. F. Thorn, T. E. Klein, and R. B. Altman A Statistical Approach to Scanning the Biomedical Literature for Pharmacogenetics Knowledge J. Am. Med. Inform. Assoc., March 1, 2005; 12(2): 121 - 129. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Mika and B. Rost NLProt: extracting protein names and sequences from papers Nucleic Acids Res., July 1, 2004; 32(suppl_2): W634 - W637. [Abstract] [Full Text] [PDF] |
||||


