Skip Navigation

Bioinformatics 2005 21(Suppl 2):ii130-ii136; doi:10.1093/bioinformatics/bti1122
This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Vogl, C.
Right arrow Articles by Trajanoski, Z.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Vogl, C.
Right arrow Articles by Trajanoski, Z.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oxfordjournals.org

A fully Bayesian model to cluster gene-expression profiles

C. Vogl 1,*,{dagger}, F. Sanchez-Cabo 2,{dagger}, G. Stocker 2, S. Hubbard 3, O. Wolkenhauer 4 and Z. Trajanoski 2

1Institute of Animal Breeding and Genetics, Veterinärmedizinische Universität Wien 1210 Vienna, Austria
2Institute for Genomics and Bioinformatics and Christian Doppler Laboratory for Genomics and Bioinformatics, Graz University of Technology Petersgasse 14, 8010 Graz, Austria
3Faculty of Life Sciences, University of Manchester M60 1QD Manchester, UK
4Institute of Informatics, University of Rostock 18051 Rostock, Germany

*To whom correspondence should be addressed.

Motivation: With cDNA or oligonucleotide chips, gene-expression levels of essentially all genes in a genome can be simultaneously monitored over a time-course or under different experimental conditions. After proper normalization of the data, genes are often classified into co-expressed classes (clusters) to identify subgroups of genes that share common regulatory elements, a common function or a common cellular origin. With most methods, e.g. k-means, the number of clusters needs to be specified in advance; results depend strongly on this choice. Even with likelihood-based methods, estimation of this number is difficult. Furthermore, missing values often cause problems and lead to the loss of data.

Results: We propose a fully probabilistic Bayesian model to cluster gene-expression profiles. The number of classes does not need to be specified in advance; instead it is adjusted dynamically using a Reversible Jump Markov Chain Monte Carlo sampler. Imputation of missing values is integrated into the model. With simulations, we determined the speed of convergence of the sampler as well as the accuracy of the inferred variables. Results were compared with the widely used k-means algorithm. With our method, biologically related co-expressed genes could be identified in a yeast transcriptome dataset, even when some values were missing.

Availability: The code is available at http://genome.tugraz.at/BayesianClustering/

Contact: claus.vogl{at}vu-wien.ac.at

Supplementary information: The supplementary material is available at http://genome.tugraz.at/BayesianClustering/



Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.