Bioinformatics Vol. 18 no. 3 2002
Pages 413-422
© 2002 Oxford University Press
A mixture model-based approach to the clustering of microarray expression data
Department of Mathematics, University of Queensland, Brisbane, Queensland 4072, Australia
Received on August 30, 2001
; revised on October 26, 2001
; accepted on November 2, 2001
Motivation: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of tdistributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes.
Results: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets.
Availability: EMMIX-GENE is available at http://www.maths.uq.edu.au/~gjm/emmix-gene/
Contact: gjm{at}maths.uq.edu.au
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
J.-L. Dortet-Bernadet and N. Wicker Model-based clustering on the unit sphere with an illustration using gene expression profiles Biostat., January 1, 2008; 9(1): 66 - 80. [Abstract] [Full Text] [PDF] |
||||
![]() |
Seo Young Kim and J. Won Lee Ensemble clustering method based on the resampling similarity measure for gene expression data Statistical Methods in Medical Research, December 1, 2007; 16(6): 539 - 564. [Abstract] [PDF] |
||||
![]() |
G. C. Tseng Penalized and weighted K-means for clustering with scattered objects and prior information in high-throughput biological data Bioinformatics, September 1, 2007; 23(17): 2247 - 2255. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Lu, X. He, and S. Zhong Cross-species microarray analysis with the OSCAR system suggests an INSR->Pax6->NQO1 neuro-protective pathway in aging and Alzheimer's disease Nucleic Acids Res., July 13, 2007; 35(suppl_2): W105 - W114. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. S. V. Wong, F. K. Wong, and G. R. Wood A multi-stage approach to clustering and imputation of gene expression profiles Bioinformatics, April 15, 2007; 23(8): 998 - 1005. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Vuocolo, K. Byrne, J. White, S. McWilliam, A. Reverter, N. E. Cockett, and R. L. Tellam Identification of a gene network contributing to hypertrophy in callipyge skeletal muscle Physiol Genomics, February 12, 2007; 28(3): 253 - 272. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Lepage, S. Brinster, C. Caron, C. Ducroix-Crepy, L. Rigottier-Gois, G. Dunny, C. Hennequet-Antier, and P. Serror Comparative Genomic Hybridization Analysis of Enterococcus faecalis: Identification of Genes Absent from Food Strains. J. Bacteriol., October 1, 2006; 188(19): 6858 - 6868. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Reverter, A. Ingham, S. A. Lehnert, S.-H. Tan, Y. Wang, A. Ratnakumar, and B. P. Dalrymple Simultaneous identification of differential gene expression and connectivity in inflammation, adipogenesis and cancer Bioinformatics, October 1, 2006; 22(19): 2396 - 2404. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Pan, X. Shen, A. Jiang, and R. P. Hebbel Semi-supervised learning via penalized mixture model with application to microarray sample classification Bioinformatics, October 1, 2006; 22(19): 2388 - 2395. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Thalamuthu, I. Mukhopadhyay, X. Zheng, and G. C. Tseng Evaluation and comparison of gene clustering methods in microarray analysis Bioinformatics, October 1, 2006; 22(19): 2405 - 2412. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. S. Qin Clustering microarray gene expression data using weighted Chinese restaurant process Bioinformatics, August 15, 2006; 22(16): 1988 - 1997. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. K. Ng, G. J. McLachlan, K. Wang, L. Ben-Tovim Jones, and S.-W. Ng A Mixture model with random-effects components for clustering correlated gene-expression profiles Bioinformatics, July 15, 2006; 22(14): 1745 - 1752. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Liu, S. Sivaganesan, K. Y. Yeung, J. Guo, R. E. Bumgarner, and M. Medvedovic Context-specific infinite mixtures for clustering gene expression profiles across diverse microarray dataset Bioinformatics, July 15, 2006; 22(14): 1737 - 1744. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Yoshida, T. Higuchi, S. Imoto, and S. Miyano ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles Bioinformatics, June 15, 2006; 22(12): 1538 - 1539. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Huang and W. Pan Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data Bioinformatics, May 15, 2006; 22(10): 1259 - 1268. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Pan Incorporating gene functions as priors in model-based clustering of microarray gene expression data Bioinformatics, April 1, 2006; 22(7): 795 - 801. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. D. Siegmund, A. J. Levine, J. Chang, and P. W. Laird Modeling exposures for DNA methylation profiles. Cancer Epidemiol. Biomarkers Prev., March 1, 2006; 15(3): 567 - 572. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Martella Classification of microarray data with factor mixture models Bioinformatics, January 15, 2006; 22(2): 202 - 208. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Grotkjaer, O. Winther, B. Regenberg, J. Nielsen, and L. K. Hansen Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm Bioinformatics, January 1, 2006; 22(1): 58 - 67. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. E. Teschendorff, Y. Wang, N. L. Barbosa-Morais, J. D. Brenton, and C. Caldas A variational Bayesian mixture modelling framework for cluster analysis of gene-expression data Bioinformatics, July 1, 2005; 21(13): 3025 - 3033. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. H. Asyali and M. Alci Reliability analysis of microarray data using fuzzy c-means and normal mixture modeling based classification methods Bioinformatics, March 1, 2005; 21(5): 644 - 649. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Reverter, Y. H. Wang, K. A. Byrne, S. H. Tan, G. S. Harper, and S. A. Lehnert Joint analysis of multiple cDNA microarray studies via multivariate mixed models applied to genetic improvement of beef cattle J Anim Sci, December 1, 2004; 82(12): 3430 - 3439. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Zareparsi, A. Hero, D. J. Zack, R. W. Williams, and A. Swaroop Seeing the Unseen: Microarray-Based Gene Expression Profiling in Vision Invest. Ophthalmol. Vis. Sci., August 1, 2004; 45(8): 2457 - 2462. [Full Text] [PDF] |
||||
![]() |
M. H. Asyali, M. M. Shoukri, O. Demirkaya, and K. S. A. Khabar Assessment of reliability of microarray data and estimation of signal thresholds using mixture modeling Nucleic Acids Res., April 27, 2004; 32(8): 2323 - 2335. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Ambroise and G. J. McLachlan Selection bias in gene extraction on the basis of microarray gene-expression data PNAS, May 14, 2002; 99(10): 6562 - 6566. [Abstract] [Full Text] [PDF] |
||||









