Bioinformatics Advance Access published online on September 14, 2007
Bioinformatics, doi:10.1093/bioinformatics/btm463
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Graph based Consensus Clustering for Class Discovery from Gene Expression Data
aDepartment of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
To whom correspondence should be addressed. Mr. Zhiwen Yu, E-mail: yuzhiwen{at}cs.cityu.edu.hk
| Abstract |
|---|
Motivation: Consensus clustering, also known as cluster ensemble, is one of the important techniques for microarray data analysis, and is particularly useful for class discovery from microarray data. Compared with traditional clustering algorithms, consensus clustering approaches have the ability to integrate multiple partitions from different cluster solutions to improve the robustness, stability, scalability and parallelization of the clustering algorithms. By consensus clustering, one can discover the underlying classes of the samples in gene expression data.
Results: In addition to exploring a graph based consensus clustering algorithm to estimate the underlying classes of the samples in microarray data, we also design a new validation index to determine the number of classes in microarray data. To our knowledge, this is the first time in which graph based consensus clustering is applied to class discovery for microarray data. Given a pre-specified maximum number of classes (denoted as Kmax in this paper), our algorithm can discover the true number of classes for the samples in microarray data according to a new cluster validation index called the Modified Rand Index. Experiments on gene expression data indicate that our new algorithm can (i) outperform most of the existing algorithms, (ii) identify the number of classes correctly in real cancer datasets, and (iii) discover the classes of samples with biological meaning.
Availability: Matlab source code for the graph based consensus clustering algorithm (GCC) is available upon request from Zhiwen Yu.
Contact: cshswong{at}cityu.edu.hk and yuzhiwen{at}cs.cityu.edu.hk
Associate Editor: Prof. David Rocke
Received on April 11, 2007; revised on August 15, 2007; accepted on September 8, 2007