Bioinformatics Advance Access originally published online on October 27, 2004
Bioinformatics 2005 21(7):1227-1236; doi:10.1093/bioinformatics/bti084
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Automatic extraction of gene/protein biological functions from biomedical text
1Department of Computational Biology, Graduate School of Frontier Science, The University of Tokyo Kiban-3A1(CB01) 5-1-5, Kashiwanoha Kashiwa, Chiba 277-8561, Japan
2Central Research Laboratory, Hitachi Ltd. 1-280 Higashi-koigakubo, Kokubunji City, Tokyo 185-8601, Japan
*To whom correspondence should be addressed.
Motivation: With the rapid advancement of biomedical science and the development of high-throughput analysis methods, the extraction of various types of information from biomedical text has become critical. Since automatic functional annotations of genes are quite useful for interpreting large amounts of high-throughput data efficiently, the demand for automatic extraction of information related to gene functions from text has been increasing.
Results: We have developed a method for automatically extracting the biological process functions of genes/protein/families based on Gene Ontology (GO) from text using a shallow parser and sentence structure analysis techniques. When the gene/protein/family names and their functions are described in ACTOR (doer of action) and OBJECT (receiver of action) relationships, the corresponding GO-IDs are assigned to the genes/proteins/families. The gene/protein/family names are recognized using the gene/protein/family name dictionaries developed by our group. To achieve wide recognition of the gene/protein/family functions, we semi-automatically gather functional terms based on GO using co-occurrence, collocation similarities and rule-based techniques. A preliminary experiment demonstrated that our method has an estimated recall of 5464% with a precision of 9194% for actually described functions in abstracts. When applied to the PUBMED, it extracted over 190 000 geneGO relationships and 150 000 familyGO relationships for major eukaryotes.
Availability: The extracted gene functions are available at http://prime.ontology.ims.u-tokyo.ac.jp
Contact: akoike{at}hgc.jp
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
C. E. Crangle, J. M. Cherry, E. L. Hong, and A. Zbyslaw Mining experimental evidence of molecular function claims from the literature Bioinformatics, December 1, 2007; 23(23): 3232 - 3240. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-B. Lee, J.-j. Kim, and J. C. Park Automatic extension of Gene Ontology with flexible identification of candidate terms Bioinformatics, March 15, 2006; 22(6): 665 - 670. [Abstract] [Full Text] [PDF] |
||||
