Bioinformatics Advance Access published online on February 24, 2006
Bioinformatics, doi:10.1093/bioinformatics/btl060
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Computer Engineering and Networks Laboratory, ETH Zurich, 8092 Zurich, Switzerland
* To whom correspondence should be addressed.
Motivation: In recent years, there have been various efforts to overcome the limitations of standard clustering approaches for the analysis of gene expression data by grouping genes and samples simultaneously. The underlying concept, which is often referred to as biclustering, allows to identify sets of genes sharing compatible expression patterns across subsets of samples, and its usefulness has been demonstrated for different organisms and data sets. Several biclustering methods have been proposed in the literature; however, it is not clear how the different techniques compare to each other with respect to the biological relevance of the clusters as well as to other characteristics such as robustness and sensitivity to noise. Accordingly, no guidelines concerning the choice of the biclustering method are currently available. Results: First, this paper provides a methodology for comparing and validating biclustering methods that includes a simple binary reference model. Although this model captures the essential features of most biclustering approaches, it is still simple enough to exactly determine all optimal groupings; to this end, we propose a fast divide-and-conquer algorithm (Bimax). Second, we evaluate the performance of five salient biclustering algorithms together with the reference model and a hierarchical clustering method on various synthetic and real data sets for Saccharomyces cerevisiae and Arabidopsis thaliana. The comparison reveals that (i) biclustering in general has advantages over a conventional hierarchical clustering approach, that (ii) there are considerable performance differences between the tested methods, and that (iii) already the simple reference model delivers relevant patterns within all considered settings. Availability: The data sets used, the outcomes of the biclustering algorithms, and the Bimax implementation for the reference model are available at http://www.tik.ee.ethz.ch/sop/bimax.
Received July 27, 2005
Revised January 4, 2006
Accepted February 15, 2006
Article
A systematic comparison and evaluation of biclustering methods for gene expression data
Amela Preli
1,
Stefan Bleuler 1,
Philip Zimmermann 2,
Anja Wille 3,
Peter Bühlmann 4,
Wilhelm Gruissem 2,
Lars Hennig 2,
Lothar Thiele 1,
and
Eckart Zitzler 1 *
2 Institute for Plant Sciences and Functional Genomics Center Zurich, ETH Zurich, 8092 Zurich, Switzerland
3 Colab, ETH Zurich, 8092 Zurich, Switzerland; Seminar for Statistics, ETH Zurich, 8092 Zurich, Switzerland
4 Seminar for Statistics, ETH Zurich, 8092 Zurich, Switzerland
Eckart Zitzler, E-mail: zitzler{at}tik.ee.ethz.ch
![]()
Abstract
Associate Editor: Alfonso Valencia
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. Bhattacharya and R. K. De Bi-correlation clustering algorithm for determining a set of co-regulated genes Bioinformatics, November 1, 2009; 25(21): 2795 - 2801. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Zeng and J. Li Maximization of negative correlations in time-course gene expression data for enhancing understanding of molecular pathways Nucleic Acids Res., October 23, 2009; (2009) gkp822v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Kinoshita and T. Obayashi Multi-dimensional correlations for gene coexpression and application to the large-scale data of Arabidopsis Bioinformatics, October 15, 2009; 25(20): 2677 - 2684. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Liu, X.-w. Chen, and R. Jothi Knowledge-guided inference of domain-domain interactions from incomplete protein-protein interaction networks Bioinformatics, October 1, 2009; 25(19): 2492 - 2499. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Li, Q. Ma, H. Tang, A. H. Paterson, and Y. Xu QUBIC: a qualitative biclustering algorithm for analyses of gene expression data Nucleic Acids Res., August 1, 2009; 37(15): e101 - e101. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Meng, S.-J. Gao, and Y. Huang Enrichment constrained time-dependent clustering analysis for finding meaningful temporal transcription modules Bioinformatics, June 15, 2009; 25(12): 1521 - 1527. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. French, S. Lane, T. Law, L. Xu, and P. Pavlidis Application and evaluation of automated semantic annotation of gene expression experiments Bioinformatics, June 15, 2009; 25(12): 1543 - 1549. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Fode, T. Siemsen, C. Thurow, R. Weigel, and C. Gatz The Arabidopsis GRAS Protein SCL14 Interacts with Class II TGA Transcription Factors and Is Essential for the Activation of Stress-Inducible Promoters PLANT CELL, November 1, 2008; 20(11): 3122 - 3135. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Pu, K. Ronen, J. Vlasblom, J. Greenblatt, and S. J. Wodak Local coherence in genetic interaction patterns reveals prevalent functional versatility Bioinformatics, October 15, 2008; 24(20): 2376 - 2383. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Krishnan and A. Pereira Integrative approaches for mining transcriptional regulatory programs in Arabidopsis Brief Funct Genomic Proteomic, July 16, 2008; (2008) eln035v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bhattacharya and R. K. De Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles Bioinformatics, June 1, 2008; 24(11): 1359 - 1366. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Santamaria, R. Theron, and L. Quintales BicOverlapper: A tool for bicluster visualization Bioinformatics, May 1, 2008; 24(9): 1212 - 1213. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Horan, C. Jang, J. Bailey-Serres, R. Mittler, C. Shelton, J. F. Harper, J.-K. Zhu, J. C. Cushman, M. Gollery, and T. Girke Annotating Genes of Known and Unknown Function by Large-Scale Coexpression Analysis Plant Physiology, May 1, 2008; 147(1): 41 - 57. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Lee, S. W. Kong, and P. J. Park Integrative analysis reveals the direct and indirect interactions between DNA copy number aberrations and gene expression changes Bioinformatics, April 1, 2008; 24(7): 889 - 896. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Wang, R. R. Gutell, and D. P. Miranker Biclustering as a method for RNA local multiple sequence alignment Bioinformatics, December 15, 2007; 23(24): 3289 - 3296. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Dhollander, Q. Sheng, K. Lemmens, B. De Moor, K. Marchal, and Y. Moreau Query-driven module discovery in microarray data Bioinformatics, October 1, 2007; 23(19): 2573 - 2580. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Buness, R. Kuner, M. Ruschhaupt, A. Poustka, H. Sultmann, and A. Tresch Identification of aberrant chromosomal regions from gene expression microarray studies applied to human breast cancer Bioinformatics, September 1, 2007; 23(17): 2273 - 2280. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Liu and L. Wang Computing the maximum similarity bi-clusters of gene expression data Bioinformatics, January 1, 2007; 23(1): 50 - 56. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Barkow, S. Bleuler, A. Prelic, P. Zimmermann, and E. Zitzler BicAT: a biclustering analysis toolbox Bioinformatics, May 15, 2006; 22(10): 1282 - 1283. [Abstract] [Full Text] [PDF] |
||||




