Skip Navigation



Bioinformatics Advance Access published online on July 26, 2006

Bioinformatics, doi:10.1093/bioinformatics/btl411
This Article
Right arrow Advance Access manuscript (PDF) Freely available
Right arrow All Versions of this Article:
22/20/2466    most recent
btl411v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Marttinen, P.
Right arrow Articles by Holm, L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Marttinen, P.
Right arrow Articles by Holm, L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author (2006). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
Received February 16, 2006
Revised July 23, 2006
Accepted July 24, 2006

Article

Bayesian search of functionally divergent protein subgroups and their function specific residues

Pekka Marttinen 1 *, Jukka Corander 1, Petri Törönen 2, and Liisa Holm 3

1 Department of Mathematics and Statistics, P.O. Box 68, 00014 University of Helsinki, Finland
2 Institute of Biotechnology, P.O. Box 56, 00014 University of Helsinki, Finland
3 Institute of Biotechnology, P.O. Box 56, 00014 University of Helsinki, Finland; Department of Biological and EnvironmentalSciences, P.O. Box 56, 00014 University of Helsinki, Finland

* To whom correspondence should be addressed.
Pekka Marttinen, E-mail: pekka.marttinen{at}helsinki.fi


   Abstract

Motivation: The rapid increase in the amount of protein sequence data has created a need for an automated identification of evolutionarily related subgroups from large datasets. The existing methods typically require a priori specification of the number of putative groups, which defines the resolution of the classification solution.

Results: We introduce a Bayesian model-based approach to simultaneous identification of evolutionary groups and conserved parts of the protein sequences. The model-based approach provides an intuitive and efficient way of determining the number of groups from the sequence data, in contrast to the ad hoc methods often exploited for similar purposes. Our model recognizes the areas in the sequences that are relevant for the clustering and regards other areas as noise. We have implemented the method using a fast stochastic optimization algorithm which yields a clustering associated with the estimated maximum posterior probability. The method has been shown to have high specificity and sensitivity in simulated and real clustering tasks. With real datasets the method also highlights the residues close to the active site.

Availability: Software "kPax" and supplementary material are available at http://www.rni.helsinki.fi/~jic/softa.html.


Associate Editor: Dmitrij Frishman
Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
J. A. Capra and M. Singh
Characterization and prediction of residues determining protein functional specificity
Bioinformatics, July 1, 2008; 24(13): 1473 - 1480.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. D. Fischer, C. E. Mayer, and J. Soding
Prediction of protein functional residues from sequence by probability density estimation
Bioinformatics, March 1, 2008; 24(5): 613 - 620.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Heger, E. Korpelainen, T. Hupponen, K. Mattila, V. Ollikainen, and L. Holm
PairsDB atlas of protein sequence space
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D276 - D280.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Heger, S. Mallick, C. Wilton, and L. Holm
The global trace graph, a novel paradigm for searching protein sequence databases
Bioinformatics, September 15, 2007; 23(18): 2361 - 2367.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.