Bioinformatics Advance Access published online on October 27, 2005
Bioinformatics, doi:10.1093/bioinformatics/bti746
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Center for Microbial Biotechnology, BioCentrum-DTU, Building 223, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
* To whom correspondence should be addressed.
Motivation: Hierarchical and relocation clustering (e.g. K-means and self-organising maps) have been successful tools in the display and analysis of whole genome DNA microarray expression data. However, the results of hierarchical clustering are sensitive to outliers, and most relocation methods give results which are dependent on the initialisation of the algorithm. Therefore, it is difficult to assess the significance of the results. We have developed a consensus clustering algorithm, where the final result is averaged over multiple clustering runs, giving a robust and reproducible clustering, capable of capturing small signal variations. The algorithm preserves valuable properties of hierarchical clustering, which is useful for visualisation and interpretation of the results. Results: We show for the first time that one can take advantage of multiple clustering runs in DNA microarray analysis by collecting re-occurring clustering patterns in a co-occurrence matrix. The results show that consensus clustering obtained from clustering multiple times with Variational Bayes Mixtures of Gaussians or K-means significantly reduces the classification error rate for a simulated dataset. The method is flexible and it is possible to find consensus clusters from different clustering algorithms. Thus, the algorithm can be used as a framework to test in a quantitative manner the homogeneity of different clustering algorithms. We compare the method with a number of state-of-the-art clustering methods. It is shown that the method is robust and gives low classification error rates for a realistic, simulated dataset. The algorithm is also demonstrated for real datasets. It is shown that more biological meaningful transcriptional patterns can be found without conservative statistical or fold-change exclusion of data. Availability: Matlab source code for the clustering algorithm ClusterLustre, and the simulated dataset for testing are available upon request from T.G.
Received February 11, 2005
Revised October 13, 2005
Accepted October 25, 2005
Article
Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm
2 Informatics and Mathematical Modelling, Building 321, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
Thomas Grotkjær, E-mail: tg{at}biocentrum.dtu.dk
![]()
Abstract ![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. Senf and X.-w. Chen Identification of genes involved in the same pathways using a Hidden Markov Model-based approach Bioinformatics, November 15, 2009; 25(22): 2945 - 2954. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Yu, H.-S. Wong, and H. Wang Graph-based consensus clustering for class discovery from gene expression data Bioinformatics, November 1, 2007; 23(21): 2888 - 2896. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Lu, X. He, and S. Zhong Cross-species microarray analysis with the OSCAR system suggests an INSR->Pax6->NQO1 neuro-protective pathway in aging and Alzheimer's disease Nucleic Acids Res., July 13, 2007; 35(suppl_2): W105 - W114. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Usaite, K. R. Patil, T. Grotkjaer, J. Nielsen, and B. Regenberg Global Transcriptional and Physiological Responses of Saccharomyces cerevisiae to Ammonium, L-Alanine, or L-Glutamine Limitation Appl. Envir. Microbiol., September 1, 2006; 72(9): 6194 - 6203. [Abstract] [Full Text] [PDF] |
||||


