Bioinformatics Advance Access originally published online on May 29, 2008
Bioinformatics 2008 24(16):1765-1771; doi:10.1093/bioinformatics/btn244
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Efficient functional clustering of protein sequences using the Dirichlet process
1Department of Bioengineering, UC Berkeley and 2Merck & Co., Inc., 1700 Owens St, San Francisco, CA 94158, USA
| Abstract |
|---|
Motivation: Automatic clustering of protein sequences is an important problem in computational biology. The recent explosion in genome sequences has given biological researchers a vast number of novel protein sequences. However, the majority of these sequences have no experimental evidence for their molecular function in the cell, and the responsibility for correctly annotating these sequences falls upon the bioinformatics community. Ideally, we would like to be able to group sequences of similar or identical molecular function in an automatic fashion, without relying on experimental evidence.
Results: In this article I present a novel probabilistic framework that models subfamilies within a known protein family. Given a multiple sequence alignment, the model uses Dirichlet mixture densities to estimate amino acid preferences within subfamily clusters, and places a Dirichlet process prior on the overall set of clusters. Based on results from several datasets, the model breaks data accurately into functional subgroups.
Availability: The algorithm is implemented as c++ software available at bpg-research.berkeley.edu/~duncanb/dpcluster/
Contact: duncan_brown{at}merck.com
Supplementary information: Supplementary data are available at Bioinformatics online.
Associate Editor: Burkhard Rost
Received on March 10, 2008; revised on May 22, 2008; accepted on May 22, 2008
This article has been cited by other articles:
![]() |
B. Andreopoulos, A. An, X. Wang, and M. Schroeder A roadmap of clustering algorithms: finding a match for a biomedical application Brief Bioinform, May 1, 2009; 10(3): 297 - 314. [Abstract] [Full Text] [PDF] |
||||
