Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (24)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by King, R. D.
Right arrow Articles by Dehaspe, L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by King, R. D.
Right arrow Articles by Dehaspe, L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics Vol. 17 no. 5 2001
Pages 445-454
© 2001 Oxford University Press

The utility of different representations of protein sequence for predicting functional class

Ross D. King 1,*, Andreas Karwath 1, Amanda Clare 1 and Luc Dehaspe 2

1 Department of Computer Science, University of Wales, Aberystwyth, Penglais, Aberystwyth, Ceredigion SY23 3DB, Wales, UK
2 PharmaDM, Ambachtenlaan 54, B3-3001 Leuven, Belgium

Received on October 17, 2000 ; revised on January 19, 2001 ; accepted on January 19, 2001

Motivation: Data Mining Prediction (DMP) is a novel approach to predicting protein functional class from sequence. DMP works even in the absence of a homologous protein of known function. We investigate the utility of different ways of representing protein sequence in DMP (residue frequencies, phylogeny, predicted structure) using the Escherichia coli genome as a model.

Results: Using the different representations DMP learnt prediction rules that were more accurate than default at every level of function using every type of representation. The most effective way to represent sequence was using phylogeny (75% accuracy and 13% coverage of unassigned ORFs at the most general level of function: 69% accuracy and 7% coverage at the most detailed). We tested different methods for combining predictions from the different types of representation. These improved both the accuracy and coverage of predictions, e.g. 40% of all unassigned ORFs could be predicted at an estimated accuracy of 60% and 5% of unassigned ORFs could be predicted at an estimated accuracy of 86%.

Availability: The rules and data are freely available. Warmr is free to academics.

Contact: rdk{at}aber.ac.uk

Supplementary information: http://www.aber.ac.uk/~dcswww/Research/bio/ProteinFunction

* To whom correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
A. Clare, A. Karwath, H. Ougham, and R. D. King
Functional bioinformatics for Arabidopsis thaliana
Bioinformatics, May 1, 2006; 22(9): 1130 - 1136.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.