Bioinformatics Advance Access published online on January 21, 2009
Bioinformatics, doi:10.1093/bioinformatics/btp047
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Gclust: trans-kingdom classification of proteins using automatic individual threshold setting
1 Department of Life Sciences, Graduate School of Arts and Sciences, University of Tokyo, Komaba, Meguro-ku, Tokyo, 153-8902, Japan
*To whom correspondence should be addressed. Dr. Naoki Sato, E-mail: naokisat{at}bio.c.u-tokyo.ac.jp
| Abstract |
|---|
Motivation: Trans-kingdom protein clustering remained difficult because of large sequence divergence between eukaryotes and prokaryotes and the presence of a transit sequence in organellar proteins. A large-scale protein clustering including such divergent organisms needs a heuristic to efficiently select similar proteins by setting a proper threshold for homologs of each protein. Here a method is described using two similarity measures and organism count.
Results: The Gclust software constructs minimal homolog groups using all-against-all BLASTP results by single-linkage clustering. Major points include (1) estimation of domain structure of proteins, (2) exclusion of multi-domain proteins, (3) explicit consideration of transit peptides, and (4) heuristic estimation of a similarity threshold for homologs of each protein by entropy-optimized organism count method. The resultant clusters were evaluated in the light of power law. The software was used to construct protein clusters for up to 95 organisms.
Availability: Software and data are available at http://gclust.c.u-tokyo.ac.jp/Gclust_Download.html.
Contact: naokisat{at}bio.c.u-tokyo.ac.jp
Associate Editor: Prof. Martin Bishop
Received on December 16, 2008; revised on January 16, 2009; accepted on January 16, 2009
This article has been cited by other articles:
![]() |
M. Nakao, S. Okamoto, M. Kohara, T. Fujishiro, T. Fujisawa, S. Sato, S. Tabata, T. Kaneko, and Y. Nakamura CyanoBase: the cyanobacteria genome database update 2010 Nucleic Acids Res., October 30, 2009; (2009) gkp915v1. [Abstract] [Full Text] [PDF] |
||||
