Bioinformatics Advance Access published online on July 26, 2006
Bioinformatics, doi:10.1093/bioinformatics/btl411
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Department of Mathematics and Statistics, P.O. Box 68, 00014 University of Helsinki, Finland
* To whom correspondence should be addressed.
Motivation: The rapid increase in the amount of protein sequence data has created a need for an automated identification of evolutionarily related subgroups from large datasets. The existing methods typically require a priori specification of the number of putative groups, which defines the resolution of the classification solution. Results: We introduce a Bayesian model-based approach to simultaneous identification of evolutionary groups and conserved parts of the protein sequences. The model-based approach provides an intuitive and efficient way of determining the number of groups from the sequence data, in contrast to the ad hoc methods often exploited for similar purposes. Our model recognizes the areas in the sequences that are relevant for the clustering and regards other areas as noise. We have implemented the method using a fast stochastic optimization algorithm which yields a clustering associated with the estimated maximum posterior probability. The method has been shown to have high specificity and sensitivity in simulated and real clustering tasks. With real datasets the method also highlights the residues close to the active site. Availability: Software "kPax" and supplementary material are available at http://www.rni.helsinki.fi/~jic/softa.html.
Received February 16, 2006
Revised July 23, 2006
Accepted July 24, 2006
Article
Bayesian search of functionally divergent protein subgroups and their function specific residues
Pekka Marttinen 1 *, Jukka Corander 1, Petri Törönen 2, and Liisa Holm 3
2 Institute of Biotechnology, P.O. Box 56, 00014 University of Helsinki, Finland
3 Institute of Biotechnology, P.O. Box 56, 00014 University of Helsinki, Finland; Department of Biological and EnvironmentalSciences, P.O. Box 56, 00014 University of Helsinki, Finland
Pekka Marttinen, E-mail: pekka.marttinen{at}helsinki.fi
![]()
Abstract
Associate Editor: Dmitrij Frishman
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
J. E. Donald and E. I. Shakhnovich SDR: a database of predicted specificity-determining residues in proteins Nucleic Acids Res., January 1, 2009; 37(suppl_1): D191 - D194. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Vigfusdottir, S. Palsson, and A. Ingolfsson Hybridization of glaucous gull (Larus hyperboreus) and herring gull (Larus argentatus) in Iceland: mitochondrial and microsatellite data Phil Trans R Soc B, September 12, 2008; 363(1505): 2851 - 2860. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Capra and M. Singh Characterization and prediction of residues determining protein functional specificity Bioinformatics, July 1, 2008; 24(13): 1473 - 1480. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. D. Fischer, C. E. Mayer, and J. Soding Prediction of protein functional residues from sequence by probability density estimation Bioinformatics, March 1, 2008; 24(5): 613 - 620. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Heger, E. Korpelainen, T. Hupponen, K. Mattila, V. Ollikainen, and L. Holm PairsDB atlas of protein sequence space Nucleic Acids Res., January 11, 2008; 36(suppl_1): D276 - D280. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Heger, S. Mallick, C. Wilton, and L. Holm The global trace graph, a novel paradigm for searching protein sequence databases Bioinformatics, September 15, 2007; 23(18): 2361 - 2367. [Abstract] [Full Text] [PDF] |
||||


