Bioinformatics Advance Access published online on July 26, 2005
Bioinformatics, doi:10.1093/bioinformatics/bti597
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 EML Research gGmbH, D-69118 Heidelberg, Germany
* To whom correspondence should be addressed.
Motivation: We have previously developed a rule based approach for extracting information on the regulation of gene expression in yeast. The biomedical literature, however, contains information on several other equally important regulatory mechanisms, in particular phosphorylation, which we now expanded our rule based system to also extract. Results: This paper presents new results for extraction of relational information from biomedical text. We have improved our system, STRING-IE, to both capture new types of linguistic constructs as well as new types of biological information (i.e. (de-)phosphorylation). The precision remains stable with a slight increase in recall. From almost one million PubMed abstracts related to four model organisms, we manage to extract regulatory networks and binary phosphorylations comprising 3319 relation chunks. The accuracy is 83-90% and 86-95% for gene expression and (de-)phosphorylation relations, respectively. To achieve this, we made use of an organism-specific resource of gene/protein names considerably larger than those used in most other biology related information extraction approaches. These names were included in the lexicon when retraining the part-of-speech tagger on the GENIA corpus. For the domain in question an accuracy of 96.4% was attained on POS-tags. It should be noted that the rules were developed for yeast and successfully applied to both abstracts and full-text articles related to other organisms with comparable accuracy. Availability: The revised GENIA corpus, the POS-tagger, the extraction rules, and the full sets of extracted relations are available from http://www.bork.embl.de/Docu/STRING-IE.
Received April 15, 2005
Revised June 20, 2005
Accepted July 22, 2005
Article
Extraction of regulatory gene/protein networks from Medline
ari
1* *,
2 European Molecular Biology Laboratory, D-69117 Heidelberg, Germany
Jasmin
ari
, E-mail: saric{at}eml-r.org
![]()
Abstract
*These authors contributed equally
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. Kuhn, C. von Mering, M. Campillos, L. J. Jensen, and P. Bork STITCH: interaction networks of chemicals and proteins Nucleic Acids Res., January 11, 2008; 36(suppl_1): D684 - D688. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-H. Kim, A. Mitchell, T. K. Attwood, and M. Hilario Learning to extract relations for protein annotation Bioinformatics, July 1, 2007; 23(13): i256 - i263. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Fundel, R. Kuffner, and R. Zimmer RelEx--Relation extraction using dependency parse trees Bioinformatics, February 1, 2007; 23(3): 365 - 371. [Abstract] [Full Text] [PDF] |
||||

