Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (65)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Andrade, M. A.
Right arrow Articles by Valencia, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Andrade, M. A.
Right arrow Articles by Valencia, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics, Vol 14, 600-607, Copyright © 1998 by Oxford University Press


ARTICLES

Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families

MA Andrade and A Valencia
Protein Design Group, CNB-CSIC, Cantoblanco, E-28049 Madrid, Spain.

MOTIVATION: Annotation of the biological function of different protein sequences is a time-consuming process currently performed by human experts. Genome analysis tools encounter great difficulty in performing this task. Database curators, developers of genome analysis tools and biologists in general could benefit from access to tools able to suggest functional annotations and facilitate access to functional information. APPROACH: We present here the first prototype of a system for the automatic annotation of protein function. The system is triggered by collections of s related to a given protein, and it is able to extract biological information directly from scientific literature, i.e. MEDLINE abstracts. Relevant keywords are selected by their relative accumulation in comparison with a domain-specific background distribution. Simultaneously, the most representative sentences and MEDLINE abstracts are selected and presented to the end- user. Evolutionary information is considered as a predominant characteristic in the domain of protein function. Our system consequently extracts domain-specific information from the analysis of a set of protein families. RESULTS: The system has been tested with different protein families, of which three examples are discussed in detail here: 'ataxia-telangiectasia associated protein', 'ran GTPase' and 'carbonic anhydrase'. We found generally good correlation between the amount of information provided to the system and the quality of the annotations. Finally, the current limitations and future developments of the system are discussed. AVAILABILITY: The current system can be considered as a prototype system. As such, it can be accessed as a server at http://columba.ebi.ac. uk:8765/andrade/abx. The system accepts text related to the protein or proteins to be evaluated (optimally, the result of a MEDLINE search by keyword) and the results are returned in the form of Web pages for keywords, sentences and s. SUPPLEMENTARY INFORMATION: Web pages containing full information on the examples mentioned in the text are available at: http://www.cnb.uam.es/ approximately cnbprot/keywords/ CONTACT: valencia@cnb.uam.es
Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
P. Minguez, F. Al-Shahrour, D. Montaner, and J. Dopazo
Functional profiling of microarray experiments using text-mining derived bioentities
Bioinformatics, November 15, 2007; 23(22): 3098 - 3099.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
F. Al-Shahrour, P. Minguez, J. Tarraga, I. Medina, E. Alloza, D. Montaner, and J. Dopazo
FatiGO +: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W91 - W96.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Y. Tao, L. Sam, J. Li, C. Friedman, and Y. A. Lussier
Information theory applied to the sparse gene ontology annotation network to predict novel gene function
Bioinformatics, July 1, 2007; 23(13): i529 - i538.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Med. Inform. Assoc.Home page
H. Liu, Z.-Z. Hu, M. Torii, C. Wu, and C. Friedman
Quantitative Assessment of Dictionary-based Protein Named Entity Tagging
J. Am. Med. Inform. Assoc., September 1, 2006; 13(5): 497 - 507.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
B. Han, Z. Obradovic, Z.-Z. Hu, C. H. Wu, and S. Vucetic
Substring selection for biomedical document classification
Bioinformatics, September 1, 2006; 22(17): 2136 - 2142.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
F. Al-Shahrour, P. Minguez, J. Tarraga, D. Montaner, E. Alloza, J. M. Vaquerizas, L. Conde, C. Blaschke, J. Vera, and J. Dopazo
BABELOMICS: a systems biology perspective in the functional annotation of genome-scale experiments.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W472 - W476.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
V. B. Bajic, M. Veronika, P. S. Veladandi, A. Meka, M.-W. Heng, K. Rajaraman, H. Pan, and S. Swarup
Dragon Plant Biology Explorer. A Text-Mining Tool for Integrating Associations between Genetic and Biochemical Entities with Genome Annotation and Biochemical Terms Lists
Plant Physiology, August 1, 2005; 138(4): 1914 - 1925.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
C. Diez-Tascon, O. M. Keane, T. Wilson, A. Zadissa, D. L. Hyndman, D. B. Baird, J. C. McEwan, and A. M. Crawford
Microarray analysis of selection lines from outbred populations to identify genes involved with nematode parasite resistance in sheep
Physiol Genomics, March 21, 2005; 21(1): 59 - 69.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Med. Inform. Assoc.Home page
H. Yu, G. Hripcsak, and C. Friedman
Mapping Abbreviations to Full Forms in Biomedical Articles
J. Am. Med. Inform. Assoc., May 1, 2002; 9(3): 262 - 272.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.