Bioinformatics Vol. 18 no. 90001 2002
Pages S14-S21
© 2002 Oxford University Press
The metric space of proteinscomparative study of clustering algorithms
1 School of Computer Science and Engineering
2 Department of Biological Chemistry, Institute of Life Sciences, Hebrew University,
Jerusalem 91904, Israel
Received on January 24, 2002
; revised on April 1, 2002
; accepted on April 1, 2002
Motivation: A large fraction of biological research concentrates on individual proteins and on small families of proteins. One of the current major challenges in bioinformatics is to extend our knowledge to very large sets of proteins. Several major projects have tackled this problem. Such undertakings usually start with a process that clusters all known proteins or large subsets of this space. Some work in this area is carried out automatically, while other attempts incorporate expert advice and annotation.
Results: We propose a novel technique that automatically clusters protein sequences. We consider all proteins in SWISSPROT, and carry out an all-against-all BLAST similarity test among them. With this similarity measure in hand we proceed to perform a continuous bottom-up clustering process by applying alternative rules for merging clusters. The outcome of this clustering process is a classification of the input proteins into a hierarchy of clusters of varying degrees of granularity. Here we compare the clusters that result from alternative merging rules, and validate the results against InterPro.
Our preliminary results show that clusters that are consistent with several rather than a single merging rule tend to comply with InterPro annotation. This is an affirmation of the view that the protein space consists of families that differ markedly in their evolutionary conservation.
Availability: The outcome of these investigations can be viewed in an interactive Web site at http://www.protonet.cs.huji.ac.il
Supplementary information: Biological examples for comparing the performance of the different algorithms used for classification are presented in http://www.protonet.cs.huji.ac.il/examples.html
Contact: ori{at}cs.huji.ac.il
Keywords: protein families; protein classification; sequence alignment; clustering.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. Paccanaro, J. A. Casbon, and M. A. S. Saqi Spectral clustering of protein sequences Nucleic Acids Res., March 17, 2006; 34(5): 1571 - 1580. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Uchiyama Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes Nucleic Acids Res., January 25, 2006; 34(2): 647 - 658. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Sasson and M. Linial ProTarget: automatic prediction of protein structure novelty Nucleic Acids Res., July 1, 2005; 33(suppl_2): W81 - W84. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Lazareva-Ulitsky, K. Diemer, and P. D. Thomas On the quality of tree-based protein classification Bioinformatics, May 1, 2005; 21(9): 1876 - 1890. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Kifer, O. Sasson, and M. Linial Predicting fold novelty based on ProtoNet hierarchical classification Bioinformatics, April 1, 2005; 21(7): 1020 - 1027. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Sasson, A. Vaaknin, H. Fleischer, E. Portugaly, Y. Bilu, N. Linial, and M. Linial ProtoNet: hierarchical classification of the protein space Nucleic Acids Res., January 1, 2003; 31(1): 348 - 352. [Abstract] [Full Text] [PDF] |
||||

