Bioinformatics Advance Access published online on June 16, 2005
Bioinformatics, doi:10.1093/bioinformatics/bti542
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 EMBL Outstation Hinxton, The European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
* To whom correspondence should be addressed.
Summary: The CluSTr database employs a fully automatic single-linkage hierarchical clustering method, based on a similarity matrix. In order to compute the matrix, firstly all-against-all pair-wise comparisons between protein sequences are computed using the Smith-Waterman algorithm. The statistical significance of the similarity scores is then assessed using a Monte-Carlo analysis, yielding Z-values, which are used to populate the matrix. This paper describes automated annotation experiments that quantify the predictive power and hence the biological relevance of the CluSTr data. The experiments utilised the UniProt data-mining framework to derive annotation predictions using combinations of InterPro and CluSTr. We show that this combination of data sources greatly increases the precision of predictions made by the data-mining framework, compared to using InterPro data alone. We conclude that the CluSTr approach to clustering proteins makes a valuable contribution to traditional protein classifications. Availability: http://www.ebi.ac.uk/clustr/.
Received April 5, 2005
Revised June 14, 2005
Accepted June 14, 2005
Article
The predictive power of the CluSTr database
Rolf Apweiler, E-mail: rolf.apweiler{at}ebi.ac.uk
![]()
Abstract ![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
Y. Loewenstein, E. Portugaly, M. Fromer, and M. Linial Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space Bioinformatics, July 1, 2008; 24(13): i41 - i49. [Abstract] [PDF] |
||||
![]() |
B. E. Suzek, H. Huang, P. McGarvey, R. Mazumder, and C. H. Wu UniRef: comprehensive and non-redundant UniProt reference clusters Bioinformatics, May 15, 2007; 23(10): 1282 - 1288. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. J. Mulder, R. Apweiler, T. K. Attwood, A. Bairoch, A. Bateman, D. Binns, P. Bork, V. Buillard, L. Cerutti, R. Copley, et al. New developments in the InterPro database Nucleic Acids Res., January 12, 2007; 35(suppl_1): D224 - D228. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Rattei, R. Arnold, P. Tischler, D. Lindner, V. Stumpflen, and H. W. Mewes SIMAP: the similarity matrix of proteins Nucleic Acids Res., January 1, 2006; 34(suppl_1): D252 - D256. [Abstract] [Full Text] [PDF] |
||||

