Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (19)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Chang, J. T.
Right arrow Articles by Altman, R. B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Chang, J. T.
Right arrow Articles by Altman, R. B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Vol. 20 no. 2 2004, pages 216-225
Bioinformatics © Oxford University Press 2004; all rights reserved.

GAPSCORE: finding gene and protein names one word at a time

Jeffrey T. Chang 1, Hinrich Schütze 2 and Russ B. Altman 1,*

1 Department of Genetics, Stanford Medical Center, 300 Pasteur Drive, Lane L 301, Mail Code 5120, Stanford, CA 94305-5120, USA and 2 Enkata Technologies, 2121 South El Camino Real, Suite 1200 San Mateo, CA 94403-1855, USA

Received on April 24, 2003 ; revised on July 3, 2003 ; accepted on July 18, 2003

Motivation: New high-throughput technologies have accelerated the accumulation of knowledge about genes and proteins. However, much knowledge is still stored as written natural language text. Therefore, we have developed a new method, GAPSCORE, to identify gene and protein names in text. GAPSCORE scores words based on a statistical model of gene names that quantifies their appearance, morphology and context.

Results: We evaluated GAPSCORE against the Yapex data set and achieved an F-score of 82.5% (83.3% recall, 81.5% precision) for partial matches and 57.6% (58.5% recall, 56.7% precision) for exact matches. Since the method is statistical, users can choose score cutoffs that adjust the performance according to their needs.

Availability: GAPSCORE is available at http://bionlp.stanford.edu/gapscore/

Contact: russ.altman{at}stanford.edu

* To whom correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
B. Contreras-Moreira
3D-footprint: a database for the structural analysis of protein-DNA complexes
Nucleic Acids Res., September 18, 2009; (2009) gkp781v1.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Med. Inform. Assoc.Home page
M. Torii, Z. Hu, C. H. Wu, and H. Liu
BioTagger-GM: A Gene/Protein Name Recognition System
J. Am. Med. Inform. Assoc., March 1, 2009; 16(2): 247 - 255.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Peifer, J. E. Karro, and H. H. von Grunberg
Is there an acceleration of the CpG transition rate during the mammalian radiation?
Bioinformatics, October 1, 2008; 24(19): 2157 - 2164.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Bonis, L. I. Furlong, and F. Sanz
OSIRIS: a tool for retrieving literature about sequence variants
Bioinformatics, October 15, 2006; 22(20): 2567 - 2569.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Med. Inform. Assoc.Home page
H. Liu, Z.-Z. Hu, M. Torii, C. Wu, and C. Friedman
Quantitative Assessment of Dictionary-based Protein Named Entity Tagging
J. Am. Med. Inform. Assoc., September 1, 2006; 13(5): 497 - 507.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. Malik, L. Franke, and A. Siebes
Combination of text-mining algorithms increases the performance
Bioinformatics, September 1, 2006; 22(17): 2151 - 2157.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. Winnenburg, T. K. Baldwin, M. Urban, C. Rawlings, J. Kohler, and K. E. Hammond-Kosack
PHI-base: a new database for pathogen host interactions
Nucleic Acids Res., January 1, 2006; 34(suppl_1): D459 - D464.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Med. Inform. Assoc.Home page
D. L. Rubin, C. F. Thorn, T. E. Klein, and R. B. Altman
A Statistical Approach to Scanning the Biomedical Literature for Pharmacogenetics Knowledge
J. Am. Med. Inform. Assoc., March 1, 2005; 12(2): 121 - 129.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Mika and B. Rost
NLProt: extracting protein names and sequences from papers
Nucleic Acids Res., July 1, 2004; 32(suppl_2): W634 - W637.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.