Vol. 20 no. 2 2004, pages 243-252
Bioinformatics © Oxford University Press 2004; all rights reserved.
Graph-based clustering for finding distant relationships in a large set of protein sequences
1 Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka 560-8531, Japan and 2 NTT Software Corporation, 223-1 Yamashita-cho, Naka-ku, Yokohama, Kanagawa 231-8554, Japan
Received on May 26, 2003
; revised on July 22, 2003
; accepted on July 26, 2003
Motivation: Clustering of protein sequences is widely used for the functional characterization of proteins. However, it is still not easy to cluster distantly-related proteins, which have only regional similarity among their sequences. It is therefore necessary to develop an algorithm for clustering such distantly-related proteins.
Results: We have developed a time and space efficient clustering algorithm. It uses a graph representation where its vertices and edges denote proteins and their sequence similarities above a certain cutoff score, respectively. It repeatedly partitions the graph by removing edges that have small weights, which correspond to low sequence similarities. To find the appropriate partitions, we introduce a score combining the normalized cut and a locally minimal cut capacities. Our method is applied to the entire 40 703 human proteins in SWISS-PROT and TrEMBL. The resulting clusters shows a 76% recall (20 529 proteins) of the 26 917 classified by InterPro. It also finds relationships not found by other clustering methods.
Availability: The complete result of our algorithm for all the human proteins in SWISS-PROT and TrEMBL, and other supplementary information are available at http://motif.ics.es.osaka-u.ac.jp/Ncut-KL/
Contact: matsuda{at}ist.osaka-u.ac.jp
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
B. Andreopoulos, A. An, X. Wang, and M. Schroeder A roadmap of clustering algorithms: finding a match for a biomedical application Brief Bioinform, May 1, 2009; 10(3): 297 - 314. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Hou, P. Berman, C.-H. Hsu, and R. S. Harris HomologMiner: looking for homologous genomic groups in whole genomes Bioinformatics, April 15, 2007; 23(8): 917 - 925. [Abstract] [Full Text] [PDF] |
||||

