Bioinformatics Advance Access originally published online on August 25, 2007
Bioinformatics 2007 23(21):2859-2865; doi:10.1093/bioinformatics/btm418
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
An improved algorithm for clustering gene expression data
1Machine Intelligence Unit, Indian Statistical Institute, Kolkata-700108, 2Department of Computer Science & Engineering, University of Kalyani, Kalyani-741235 and 3Department of Computer Science & Engineering, Jadavpur University, Kolkata 700032, India
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: Recent advancements in microarray technology allows simultaneous monitoring of the expression levels of a large number of genes over different time points. Clustering is an important tool for analyzing such microarray data, typical properties of which are its inherent uncertainty, noise and imprecision. In this article, a two-stage clustering algorithm, which employs a recently proposed variable string length genetic scheme and a multiobjective genetic clustering algorithm, is proposed. It is based on the novel concept of points having significant membership to multiple classes. An iterated version of the well-known Fuzzy C-Means is also utilized for clustering.
Results: The significant superiority of the proposed two-stage clustering algorithm as compared to the average linkage method, Self Organizing Map (SOM) and a recently developed weighted Chinese restaurant-based clustering method (CRC), widely used methods for clustering gene expression data, is established on a variety of artificial and publicly available real life data sets. The biological relevance of the clustering solutions are also analyzed.
Contact: anirbanbuba{at}yahoo.com
Supplementary information: The processed and normalized data sets, supplementary figures, tables and other related materials are available at http://d.1asphost.com/anirbanmukhopadhyay/simmts.html
Associate Editor: Olga Troyanskaya
Received on April 20, 2007; accepted on August 10, 2007