Bioinformatics Vol. 19 no. 1 2003
Pages 135-143
© 2003 Oxford University Press
Protein Structures and Information Extraction from Biological Texts: The PASTA System
1 Department of Computer Science
2 Department of Molecular Biology and Biotechnology
3 Department of Information Studies, University of Sheffield,
Western Bank, Sheffield S10 2TU, UK
Received on January 31, 2002
; revised on July 29, 2002
; accepted on August 7, 2002
Motivation: The rapid increase in volume of protein structure literature means useful information may be hidden or lost in the published literature and the process of finding relevant material, sometimes the rate-determining factor in new research, may be arduous and slow.
Results: We describe the Protein Active Site Template Acquisition (PASTA) system, which addresses these problems by performing automatic extraction of information relating to the roles of specific amino acid residues in protein molecules from online scientific articles and abstracts. Both the terminology recognition and extraction capabilities of the system have been extensively evaluated against manually annotated data and the results compare favourably with state-of-the-art results obtained in less challenging domains. PASTA is the first information extraction (IE) system developed for the protein structure domain and one of the most thoroughly evaluated IE system operating on biological scientific text to date.
Availability: PASTA makes its extraction results available via a browser-based front end: http://www.dcs.shef.ac.uk/nlp/pasta/. The evaluation resources (manually annotated corpora) are also available through the website: http://www.dcs.shef.ac.uk/nlp/pasta/results.html.
Contact: r.gaizauskas{at}dcs.shef.ac.uk g.demetriou{at}dcs.shef.ac.uk p.artymiuk{at}shef.ac.uk p.willett{at}shef.ac.uk
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
J.-H. Kim, A. Mitchell, T. K. Attwood, and M. Hilario Learning to extract relations for protein annotation Bioinformatics, July 1, 2007; 23(13): i256 - i263. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Rebholz-Schuhmann, H. Kirsch, M. Arregui, S. Gaudan, M. Riethoven, and P. Stoehr EBIMed--text crunching to gather facts for proteins from Medline Bioinformatics, January 15, 2007; 23(2): e237 - e244. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. L. Mitchell, A. Divoli, J.-H. Kim, M. Hilario, I. Selimas, and T. K. Attwood METIS: multiple extraction techniques for informative sentences Bioinformatics, November 15, 2005; 21(22): 4196 - 4197. [Abstract] [Full Text] [PDF] |
||||
