Bioinformatics Advance Access originally published online on May 24, 2005
Bioinformatics 2005 21(15):3201-3212; doi:10.1093/bioinformatics/bti517
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Computational cluster validation in post-genomic data analysis
School of Chemistry, University of Manchester Faraday Building, Sackville Street, PO Box 88,Manchester M60 1QD, UK
*To whom correspondence should be addressed.
Motivation: The discovery of novel biological knowledge from the ab initio analysis of post-genomic data relies upon the use of unsupervised processing methods, in particular clustering techniques. Much recent research in bioinformatics has therefore been focused on the transfer of clustering methods introduced in other scientific fields and on the development of novel algorithms specifically designed to tackle the challenges posed by post-genomic data. The partitions returned by a clustering algorithm are commonly validated using visual inspection and concordance with prior biological knowledgewhether the clusters actually correspond to the real structure in the data is somewhat less frequently considered. Suitable computational cluster validation techniques are available in the general data-mining literature, but have been given only a fraction of the same attention in bioinformatics.
Results: This review paper aims to familiarize the reader with the battery of techniques available for the validation of clustering results, with a particular focus on their application to post-genomic data analysis. Synthetic and real biological datasets are used to demonstrate the benefits, and also some of the perils, of analytical clustervalidation.
Availability: The software used in the experiments is available at http://dbkweb.ch.umist.ac.uk/handl/clustervalidation/
Contact: J.Handl{at}postgrad.manchester.ac.uk
Supplementary information: Enlarged colour plots are provided in the Supplementary Material, which is available at http://dbkweb.ch.umist.ac.uk/handl/clustervalidation/
Received on March 24, 2005; revised on May 24, 2005; accepted on May 24, 2005
This article has been cited by other articles:
![]() |
W. N. Van Wieringen, M. A. Van De Wiel, and B. Ylstra Weighted clustering of called array CGH data Biostat., July 1, 2008; 9(3): 484 - 500. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Tarraga, I. Medina, J. Carbonell, J. Huerta-Cepas, P. Minguez, E. Alloza, F. Al-Shahrour, S. Vegas-Azcarate, S. Goetz, P. Escobar, et al. GEPAS, a web-based tool for microarray data analysis and interpretation Nucleic Acids Res., July 1, 2008; 36(suppl_2): W308 - W314. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Trinidad, A. Thalhammer, C. G. Specht, A. J. Lynn, P. R. Baker, R. Schoepfer, and A. L. Burlingame Quantitative Analysis of Synaptic Phosphorylation and Protein Expression Mol. Cell. Proteomics, April 1, 2008; 7(4): 684 - 696. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Yu, H.-S. Wong, and H. Wang Graph-based consensus clustering for class discovery from gene expression data Bioinformatics, November 1, 2007; 23(21): 2888 - 2896. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. J. Ploran, S. M. Nelson, K. Velanova, D. I. Donaldson, S. E. Petersen, and M. E. Wheeler Evidence Accumulation and the Moment of Recognition: Dissociating Perceptual Recognition Processes Using fMRI J. Neurosci., October 31, 2007; 27(44): 11912 - 11924. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Jaqaman, J. F. Dorn, E. Marco, P. K. Sorger, and G. Danuser Phenotypic clustering of yeast mutants based on kinetochore microtubule dynamics Bioinformatics, July 1, 2007; 23(13): 1666 - 1673. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Pihur, S. Datta, and S. Datta Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach Bioinformatics, July 1, 2007; 23(13): 1607 - 1615. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Valentini Mosclust: a software library for discovering significant structures in bio-molecular data Bioinformatics, February 1, 2007; 23(3): 387 - 389. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Thalamuthu, I. Mukhopadhyay, X. Zheng, and G. C. Tseng Evaluation and comparison of gene clustering methods in microarray analysis Bioinformatics, October 1, 2006; 22(19): 2405 - 2412. [Abstract] [Full Text] [PDF] |
||||
![]() |
L.-C. Lai, A. L. Kosorukoff, P. V. Burke, and K. E. Kwast Metabolic-State-Dependent Remodeling of the Transcriptome in Response to Anoxia and Subsequent Reoxygenation in Saccharomyces cerevisiae. Eukaryot. Cell, September 1, 2006; 5(9): 1468 - 1489. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Liu, J. Mohammed, J. Carter, S. Ranka, T. Kahveci, and M. Baudis Distance-based clustering of CGH data Bioinformatics, August 15, 2006; 22(16): 1971 - 1978. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Huang and W. Pan Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data Bioinformatics, May 15, 2006; 22(10): 1259 - 1268. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Prelic, S. Bleuler, P. Zimmermann, A. Wille, P. Buhlmann, W. Gruissem, L. Hennig, L. Thiele, and E. Zitzler A systematic comparison and evaluation of biclustering methods for gene expression data Bioinformatics, May 1, 2006; 22(9): 1122 - 1129. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Pan Incorporating gene functions as priors in model-based clustering of microarray gene expression data Bioinformatics, April 1, 2006; 22(7): 795 - 801. [Abstract] [Full Text] [PDF] |
||||





