Skip Navigation


Bioinformatics Advance Access originally published online on January 22, 2004
This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow All Versions of this Article:
20/4/557    most recent
btg449v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (27)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Horn, F.
Right arrow Articles by Cohen, F. E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Horn, F.
Right arrow Articles by Cohen, F. E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics 20(4) © Oxford University Press 2004; all rights reserved.

Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors

Florence Horn 1,*, Anthony L. Lau 1 and Fred E. Cohen 1,2

1 Department of Cellular and Molecular Pharmacology and 2 Department of Biochemistry and Biophysics, University of California of San Francisco, Genentech Hall, Box 2240, 600 16th Street, San Francisco, CA 94143, USA

Received on August 8, 2003 ; accepted on September 29, 2003
Advance Access Publication January 22, 2004

Motivation: The amount of genomic and proteomic data that is published daily in the scientific literature is outstripping the ability of experimental scientists to stay current. Reviews, the traditional medium for collating published observations, are also unable to keep pace. For some specific classes of information (e.g. sequences and protein structures), obligatory data deposition policies have helped. However, a great deal of other valuable information is spread throughout the literature hindering coherent access. We are involved in the Molecular Class-Specific Information System (MCSIS) project, a collaborative effort to design and automate the maintenance of protein family databases. The first two databases, the GPCRDB and NucleaRDB, are focused on G protein-coupled receptors (GPCRs) and nuclear hormone receptors (NRs), respectively. The main aim of the MCSIS project is to gather heterogeneous data from across a variety of electronic and literature sources in order to draw new inferences about the target protein families.

Results: We present a computational method that identifies and extracts mutation data from the scientific literature. We focused on the extraction of single point mutations for the GPCR and NR superfamilies. After validation by plausibility filters, the mutation data is integrated into the corresponding MCSIS where it is combined with structural and sequence information already stored in these databases. We extracted and validated 2736 true point mutations from 914 articles on GPCRs and 785 true point mutations from 1094 articles on NRs. The current version of our automated extraction algorithm identifies 49.3% of the GPCR point mutations with a specificity of 87.9%, and 64.5% of the NR point mutations with a specificity of 85.8%. MuteXt routinely analyzes 100 electronic articles in approximately 1 h.

Availability: Extracted results are available via the GPCRDB and NucleaRDB at http://www.gpcr.org/7tm/mutation/ and http://www.receptors.org/NR/mutation/, respectively. The algorithm is available upon request.

Contact: horn{at}cmpharm.ucsf.edu

* To whom correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Brief BioinformHome page
P. Zweigenbaum, D. Demner-Fushman, H. Yu, and K. B. Cohen
Frontiers of biomedical text mining: current progress
Brief Bioinform, October 30, 2007; (2007) bbm045v1.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
W. Xuan, P. Wang, S. J. Watson, and F. Meng
Medline search engine for finding genetic markers with biological significance
Bioinformatics, September 15, 2007; 23(18): 2477 - 2484.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. G. Caporaso, W. A. Baumgartner Jr, D. A. Randolph, K. B. Cohen, and L. Hunter
MutationFinder: a high-performance system for extracting point mutation mentions from text
Bioinformatics, July 15, 2007; 23(14): 1862 - 1865.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
W. A. Baumgartner Jr, K. B. Cohen, L. M. Fox, G. Acquaah-Mensah, and L. Hunter
Manual curation is not sufficient for annotation of genomic databases
Bioinformatics, July 1, 2007; 23(13): i41 - i48.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Bonis, L. I. Furlong, and F. Sanz
OSIRIS: a tool for retrieving literature about sequence variants
Bioinformatics, October 15, 2006; 22(20): 2567 - 2569.
[Abstract] [Full Text] [PDF]


Home page
Mol. Endocrinol.Home page
J. Van Durme, F. Horn, S. Costagliola, G. Vriend, and G. Vassart
GRIS: Glycoprotein-Hormone Receptor Information System
Mol. Endocrinol., September 1, 2006; 20(9): 2247 - 2255.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. Malik, L. Franke, and A. Siebes
Combination of text-mining algorithms increases the performance
Bioinformatics, September 1, 2006; 22(17): 2151 - 2157.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. R. Gabdoulline, S. Ulbrich, S. Richter, and R. C. Wade
ProSAT2--Protein Structure Annotation Server.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W79 - W83.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.