Bioinformatics Vol. 17 no. 4 2001
Pages 309-318
© 2001 Oxford University Press
Original Paper |
Validating clustering for gene expression data
1 Computer Science and Engineering, Box
352350, University of Washington, Seattle, WA 98195, USA
2 Radiology, Box 357115, University of
Washington, Seattle, WA 98195, USA
Received on August 23, 2000
; revised on November 23, 2000
; accepted on December 1, 2000
Motivation: Many clustering algorithms have been proposed for the analysis of gene expression data, but little guidance is available to help choose among them. We provide a systematic framework for assessing the results of clustering algorithms. Clustering algorithms attempt to partition the genes into groups exhibiting similar patterns of variation in expression level. Our methodology is to apply a clustering algorithm to the data from all but one experimental condition. The remaining condition is used to assess the predictive power of the resulting clustersmeaningful clusters should exhibit less variation in the remaining condition than clusters formed by chance.
Results: We successfully applied our methodology to compare six clustering algorithms on four gene expression data sets. We found our quantitative measures of cluster quality to be positively correlated with external standards of cluster quality.
Availability: The software is under development.
Contact: kayee{at}cs.washington.edu
Supplementary information: http://www.cs.washington.edu/homes/kayee/cluster or http://www.cs.washington.edu/homes/ruzzo/cluster
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
B. Andreopoulos, A. An, X. Wang, and M. Schroeder A roadmap of clustering algorithms: finding a match for a biomedical application Brief Bioinform, May 1, 2009; 10(3): 297 - 314. [Abstract] [Full Text] [PDF] |
||||
![]() |
E.L. Hendrickson, R.J. Lamont, and M. Hackett Tools for Interpreting Large-scale Protein Profiling in Microbiology Journal of Dental Research, November 1, 2008; 87(11): 1004 - 1015. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Brehelin, O. Gascuel, and O. Martin Using repeated measurements to validate hierarchical gene clusters Bioinformatics, March 1, 2008; 24(5): 682 - 688. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. A. van den Burg, D. I. Tsitsigiannis, O. Rowland, J. Lo, G. Rallapalli, D. MacLean, F. L.W. Takken, and J. D.G. Jones The F-Box Protein ACRE189/ACIF1 Regulates Cell Death and Defense Responses Activated during Pathogen Recognition in Tobacco and Tomato PLANT CELL, March 1, 2008; 20(3): 697 - 719. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Bandyopadhyay, A. Mukhopadhyay, and U. Maulik An improved algorithm for clustering gene expression data Bioinformatics, November 1, 2007; 23(21): 2859 - 2865. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Pihur, S. Datta, and S. Datta Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach Bioinformatics, July 1, 2007; 23(13): 1607 - 1615. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Ma and J. Huang Clustering threshold gradient descent regularization: with applications to microarray studies Bioinformatics, February 15, 2007; 23(4): 466 - 472. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. V. Kapp and R. Tibshirani Are clusters found in one dataset present in another dataset? Biostat., January 1, 2007; 8(1): 9 - 31. [Abstract] [Full Text] [PDF] |
||||
![]() |
D.-W. Kim, K.-Y. Lee, K. H. Lee, and D. Lee Towards clustering of incomplete microarray data without the use of imputation Bioinformatics, January 1, 2007; 23(1): 107 - 113. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. O. Martinez, S. Gordon, M. Locati, and A. Mantovani Transcriptional Profiling of the Human Monocyte-to-Macrophage Differentiation and Polarization: New Molecules and Patterns of Gene Expression J. Immunol., November 15, 2006; 177(10): 7303 - 7311. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Thalamuthu, I. Mukhopadhyay, X. Zheng, and G. C. Tseng Evaluation and comparison of gene clustering methods in microarray analysis Bioinformatics, October 1, 2006; 22(19): 2405 - 2412. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. H. Bergman, E. C. Anderson, E. E. Swenson, M. M. Niemeyer, A. D. Miyoshi, and P. C. Hanna Transcriptional Profiling of the Bacillus anthracis Life Cycle In Vitro and an Implied Model for Regulation of Spore Formation. J. Bacteriol., September 1, 2006; 188(17): 6092 - 6100. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Prelic, S. Bleuler, P. Zimmermann, A. Wille, P. Buhlmann, W. Gruissem, L. Hennig, L. Thiele, and E. Zitzler A systematic comparison and evaluation of biclustering methods for gene expression data Bioinformatics, May 1, 2006; 22(9): 1122 - 1129. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Larranaga, B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza, J. A. Lozano, R. Armananzas, G. Santafe, A. Perez, et al. Machine learning in bioinformatics Brief Bioinform, March 1, 2006; 7(1): 86 - 112. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Vanneste, B. De Rybel, G. T.S. Beemster, K. Ljung, I. De Smet, G. Van Isterdael, M. Naudts, R. Iida, W. Gruissem, M. Tasaka, et al. Cell Cycle Progression in the Pericycle Is Not Sufficient for SOLITARY ROOT/IAA14-Mediated Lateral Root Initiation in Arabidopsis thaliana PLANT CELL, November 1, 2005; 17(11): 3035 - 3050. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Wagner, R. Saffrich, U. Wirkner, V. Eckstein, J. Blake, A. Ansorge, C. Schwager, F. Wein, K. Miesala, W. Ansorge, et al. Hematopoietic Progenitor Cells and Cellular Microenvironment: Behavioral and Molecular Changes upon Interaction Stem Cells, September 1, 2005; 23(8): 1180 - 1191. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Handl, J. Knowles, and D. B. Kell Computational cluster validation in post-genomic data analysis Bioinformatics, August 1, 2005; 21(15): 3201 - 3212. [Abstract] [Full Text] [PDF] |
||||
![]() |
D.-W. Kim, K. H. Lee, and D. Lee Detecting clusters of different geometrical shapes in microarray gene expression data Bioinformatics, May 1, 2005; 21(9): 1927 - 1934. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Zhou, J. A. Young, A. Santrosyan, K. Chen, S. F. Yan, and E. A. Winzeler In silico gene function prediction using ontology-based pattern identification Bioinformatics, April 1, 2005; 21(7): 1237 - 1245. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Bolshakova, F. Azuaje, and Pád. Cunningham An integrated tool for microarray data clustering and cluster validity assessment Bioinformatics, February 15, 2005; 21(4): 451 - 455. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Xu, V. Olman, L. Wang, and Y. Xu EXCAVATOR: a computer program for efficiently mining gene expression data Nucleic Acids Res., October 1, 2003; 31(19): 5582 - 5589. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. B. Owen, J. Stuart, K. Mach, A. M. Villeneuve, and S. Kim A Gene Recommender Algorithm to Identify Coexpressed Genes in C. elegans Genome Res., August 1, 2003; 13(8): 1828 - 1837. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Knudsen, C. Workman, T. Sicheritz-Ponten, and C. Friis GenePublisher: automated analysis of DNA microarray data Nucleic Acids Res., July 1, 2003; 31(13): 3471 - 3476. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Ressom, D. Wang, and P. Natarajan Clustering gene expression data using adaptive double self-organizing map Physiol Genomics, June 24, 2003; 14(1): 35 - 46. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. M. Ronning, S. S. Stegalkina, R. A. Ascenzi, O. Bougri, A. L. Hart, T. R. Utterbach, S. E. Vanaken, S. B. Riedmuller, J. A. White, J. Cho, et al. Comparative Analyses of Potato Expressed Sequence Tag Libraries Plant Physiology, February 1, 2003; 131(2): 419 - 429. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. D. Gibbons and F. P. Roth Judging the Quality of Gene Expression-Based Clustering Methods Using Gene Annotation Genome Res., October 1, 2002; 12(10): 1574 - 1581. [Abstract] [Full Text] [PDF] |
||||











