Bioinformatics Advance Access originally published online on January 21, 2009
Bioinformatics 2009 25(5):599-605; doi:10.1093/bioinformatics/btp047
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Gclust: trans-kingdom classification of proteins using automatic individual threshold setting
Department of Life Sciences, Graduate School of Arts and Sciences, University of Tokyo, Komaba, Meguro-ku, Tokyo, 153-8902, Japan
| Abstract |
|---|
Motivation: Trans-kingdom protein clustering remained difficult because of large sequence divergence between eukaryotes and prokaryotes and the presence of a transit sequence in organellar proteins. A large-scale protein clustering including such divergent organisms needs a heuristic to efficiently select similar proteins by setting a proper threshold for homologs of each protein. Here a method is described using two similarity measures and organism count.
Results: The Gclust software constructs minimal homolog groups using all-against-all BLASTP results by single-linkage clustering. Major points include (i) estimation of domain structure of proteins; (ii) exclusion of multi-domain proteins; (iii) explicit consideration of transit peptides; and (iv) heuristic estimation of a similarity threshold for homologs of each protein by entropy-optimized organism count method. The resultant clusters were evaluated in the light of power law. The software was used to construct protein clusters for up to 95 organisms.
Availability: Software and data are available at http://gclust.c.u-tokyo.ac.jp/Gclust_Download.html.
Contact: naokisat{at}bio.c.u-tokyo.ac.jp
Supplementary information: Supplementary data are available at Bioinformatics online.
Associate Editor: Martin Bishop
Received on December 16, 2008; revised on January 16, 2009; accepted on January 16, 2009
This article has been cited by other articles:
![]() |
N. V. Sasaki and N. Sato Elucidating Genome Structure Evolution by Analysis of Isoapostatic Gene Clusters using Statistics of Variance of Gene Distances Gen Biol Evol, January 18, 2010; 2010(0): 1 - 12. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. V. Sasaki and N. Sato CyanoClust: comparative genome resources of cyanobacteria and plastids Database, January 8, 2010; 2010(0): bap025 - bap025. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Nakao, S. Okamoto, M. Kohara, T. Fujishiro, T. Fujisawa, S. Sato, S. Tabata, T. Kaneko, and Y. Nakamura CyanoBase: the cyanobacteria genome database update 2010 Nucleic Acids Res., January 1, 2010; 38(suppl_1): D379 - D381. [Abstract] [Full Text] [PDF] |
||||


