Bioinformatics Advance Access originally published online on May 27, 2004
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics 20(11) © Oxford University Press 2004; all rights reserved.
Correcting the loss of cell-cycle synchrony in clustering analysis of microarray data using weights
Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT 06520-8034, USA
Received on July 11, 2003; revised on February 23, 2004; accepted on February 27, 2004
Advance Access Publication May 27, 2004
Motivation: Due to the existence of the loss of synchrony in cell-cycle data sets, standard clustering methods (e.g. k-means), which group open reading frames (ORFs) based on similar expression levels, are deficient unless the temporal pattern of the expression levels of the ORFs is taken into account.
Methods: We propose to improve the performance of the k-means method by assigning a decreasing weight on its variable level and evaluating the weighted k-means on a yeast cell-cycle data set. Protein complexes from a public website are used as biological benchmarks. To compare the k-means clusters with the structures of the protein complexes, we measure the agreement between these two ways of clustering via the adjusted Rand index.
Results: Our results show the time-decreasing weight functionexp[(1/2)(t2/C2)]which we assign to the variable level of k-means, generally increases the agreement between protein complexes and k-means clusters when C is near the length of two cell cycles.
Contact: Heping.Zhang{at}Yale.edu