Bioinformatics Vol. 17 no. 5 2001
Pages 445-454
© 2001 Oxford University Press
The utility of different representations of protein sequence for predicting functional class
1 Department of Computer Science, University
of Wales, Aberystwyth, Penglais, Aberystwyth, Ceredigion SY23 3DB,
Wales, UK
2 PharmaDM, Ambachtenlaan 54, B3-3001
Leuven, Belgium
Received on October 17, 2000
; revised on January 19, 2001
; accepted on January 19, 2001
Motivation: Data Mining Prediction (DMP) is a novel approach to predicting protein functional class from sequence. DMP works even in the absence of a homologous protein of known function. We investigate the utility of different ways of representing protein sequence in DMP (residue frequencies, phylogeny, predicted structure) using the Escherichia coli genome as a model.
Results: Using the different representations DMP learnt prediction rules that were more accurate than default at every level of function using every type of representation. The most effective way to represent sequence was using phylogeny (75% accuracy and 13% coverage of unassigned ORFs at the most general level of function: 69% accuracy and 7% coverage at the most detailed). We tested different methods for combining predictions from the different types of representation. These improved both the accuracy and coverage of predictions, e.g. 40% of all unassigned ORFs could be predicted at an estimated accuracy of 60% and 5% of unassigned ORFs could be predicted at an estimated accuracy of 86%.
Availability: The rules and data are freely available. Warmr is free to academics.
Contact: rdk{at}aber.ac.uk
Supplementary information: http://www.aber.ac.uk/~dcswww/Research/bio/ProteinFunction
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. Clare, A. Karwath, H. Ougham, and R. D. King Functional bioinformatics for Arabidopsis thaliana Bioinformatics, May 1, 2006; 22(9): 1130 - 1136. [Abstract] [Full Text] [PDF] |
||||
