Bioinformatics, Vol 14, 600-607, Copyright © 1998 by Oxford University Press
MA Andrade and A Valencia
MOTIVATION: Annotation of the biological function of different protein
sequences is a time-consuming process currently performed by human experts.
Genome analysis tools encounter great difficulty in performing this task.
Database curators, developers of genome analysis tools and biologists in
general could benefit from access to tools able to suggest functional
annotations and facilitate access to functional information. APPROACH: We
present here the first prototype of a system for the automatic annotation
of protein function. The system is triggered by collections of s related to
a given protein, and it is able to extract biological information directly
from scientific literature, i.e. MEDLINE abstracts. Relevant keywords are
selected by their relative accumulation in comparison with a
domain-specific background distribution. Simultaneously, the most
representative sentences and MEDLINE abstracts are selected and presented
to the end- user. Evolutionary information is considered as a predominant
characteristic in the domain of protein function. Our system consequently
extracts domain-specific information from the analysis of a set of protein
families. RESULTS: The system has been tested with different protein
families, of which three examples are discussed in detail here:
'ataxia-telangiectasia associated protein', 'ran GTPase' and 'carbonic
anhydrase'. We found generally good correlation between the amount of
information provided to the system and the quality of the annotations.
Finally, the current limitations and future developments of the system are
discussed. AVAILABILITY: The current system can be considered as a
prototype system. As such, it can be accessed as a server at
http://columba.ebi.ac. uk:8765/andrade/abx. The system accepts text related
to the protein or proteins to be evaluated (optimally, the result of a
MEDLINE search by keyword) and the results are returned in the form of Web
pages for keywords, sentences and s. SUPPLEMENTARY INFORMATION: Web pages
containing full information on the examples mentioned in the text are
available at: http://www.cnb.uam.es/ approximately cnbprot/keywords/
CONTACT: valencia@cnb.uam.es
ARTICLES
Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families
Protein Design Group, CNB-CSIC, Cantoblanco, E-28049 Madrid, Spain.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
P. Minguez, F. Al-Shahrour, D. Montaner, and J. Dopazo Functional profiling of microarray experiments using text-mining derived bioentities Bioinformatics, November 15, 2007; 23(22): 3098 - 3099. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Al-Shahrour, P. Minguez, J. Tarraga, I. Medina, E. Alloza, D. Montaner, and J. Dopazo FatiGO +: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments Nucleic Acids Res., July 13, 2007; 35(suppl_2): W91 - W96. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Tao, L. Sam, J. Li, C. Friedman, and Y. A. Lussier Information theory applied to the sparse gene ontology annotation network to predict novel gene function Bioinformatics, July 1, 2007; 23(13): i529 - i538. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Liu, Z.-Z. Hu, M. Torii, C. Wu, and C. Friedman Quantitative Assessment of Dictionary-based Protein Named Entity Tagging J. Am. Med. Inform. Assoc., September 1, 2006; 13(5): 497 - 507. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Han, Z. Obradovic, Z.-Z. Hu, C. H. Wu, and S. Vucetic Substring selection for biomedical document classification Bioinformatics, September 1, 2006; 22(17): 2136 - 2142. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Al-Shahrour, P. Minguez, J. Tarraga, D. Montaner, E. Alloza, J. M. Vaquerizas, L. Conde, C. Blaschke, J. Vera, and J. Dopazo BABELOMICS: a systems biology perspective in the functional annotation of genome-scale experiments. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W472 - W476. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. B. Bajic, M. Veronika, P. S. Veladandi, A. Meka, M.-W. Heng, K. Rajaraman, H. Pan, and S. Swarup Dragon Plant Biology Explorer. A Text-Mining Tool for Integrating Associations between Genetic and Biochemical Entities with Genome Annotation and Biochemical Terms Lists Plant Physiology, August 1, 2005; 138(4): 1914 - 1925. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Diez-Tascon, O. M. Keane, T. Wilson, A. Zadissa, D. L. Hyndman, D. B. Baird, J. C. McEwan, and A. M. Crawford Microarray analysis of selection lines from outbred populations to identify genes involved with nematode parasite resistance in sheep Physiol Genomics, March 21, 2005; 21(1): 59 - 69. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Yu, G. Hripcsak, and C. Friedman Mapping Abbreviations to Full Forms in Biomedical Articles J. Am. Med. Inform. Assoc., May 1, 2002; 9(3): 262 - 272. [Abstract] [Full Text] [PDF] |
||||




