Skip Navigation

Bioinformatics 2008 24(16):i126-i132; doi:10.1093/bioinformatics/btn299
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Hakenberg, J.
Right arrow Articles by Gonzalez, G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hakenberg, J.
Right arrow Articles by Gonzalez, G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Inter-species normalization of gene mentions with GNAT

Jörg Hakenberg 1,*, Conrad Plake 2,3, Robert Leaman 1, Michael Schroeder 2 and Graciela Gonzalez 4

1Department of Computer Science and Engineering, Arizona State University, Tempe, AZ 85287, USA, 2Biotechnological Centre, Technische Universität Dresden, Tatzberg 47–51, 01307 Dresden, 3Transinsight GmbH, Tatzberg 47–51, 01307 Dresden, Germany and 4Department of Biomedical Informatics, Arizona State University, Phoenix, AZ 85004, USA

*To whom correspondence should be addressed.


   Abstract

Motivation: Text mining in the biomedical domain aims at helping researchers to access information contained in scientific publications in a faster, easier and more complete way. One step towards this aim is the recognition of named entities and their subsequent normalization to database identifiers. Normalization helps to link objects of potential interest, such as genes, to detailed information not contained in a publication; it is also key for integrating different knowledge sources. From an information retrieval perspective, normalization facilitates indexing and querying. Gene mention normalization (GN) is particularly challenging given the high ambiguity of gene names: they refer to orthologous or entirely different genes, are named after phenotypes and other biomedical terms, or they resemble common English words.

Results: We present the first publicly available system, GNAT, reported to handle inter-species GN. Our method uses extensive background knowledge on genes to resolve ambiguous names to EntrezGene identifiers. It performs comparably to single-species approaches proposed by us and others. On a benchmark set derived from BioCreative 1 and 2 data that contains genes from 13 species, GNAT achieves an F-measure of 81.4% (90.8% precision at 73.8% recall). For the single-species task, we report an F-measure of 85.4% on human genes.

Availability: A web-frontend is available at http://cbioc.eas.asu.edu/gnat/. GNAT will also be available within the BioCreative MetaService project, see http://bcms.bioinfo.cnio.es.

Contact: joerg.hakenberg{at}asu.edu

Supplementary information: The test data set, lexica, and links to external data are available at http://cbioc.eas.asu.edu/gnat/



Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
C. Plake, L. Royer, R. Winnenburg, J. Hakenberg, and M. Schroeder
GoGene: gene annotation in the fast lane
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W300 - W304.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Wermter, K. Tomanek, and U. Hahn
High-performance gene name normalization with GENO
Bioinformatics, March 15, 2009; 25(6): 815 - 821.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
R. Winnenburg, T. Wachter, C. Plake, A. Doms, and M. Schroeder
Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies?
Brief Bioinform, December 6, 2008; (2008) bbn043v1.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.