Bioinformatics Advance Access originally published online on January 29, 2004
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics 20(6) © Oxford University Press 2004; all rights reserved.
Gaussian mixture clustering and imputation of microarray data
1 Environmental and Occupational Health Sciences Institute, UMDNJRobert Wood Johnson Medical School and Rutgers, The State University of New Jersey, 170 Frelinghuysen Road, Piscataway, NJ 08854, USA and 2 Department of Pharmacology, UMDNJRobert Wood Johnson Medical School and Informatics Institute, University of Medicine and Dentistry of New Jersey, 675 Hoes Lane, Piscataway, NJ 08854, USA
Received on September 15, 2003
; accepted on November 4, 2003
Advance Access Publication January 29, 2004
Motivation: In microarray experiments, missing entries arise from blemishes on the chips. In large-scale studies, virtually every chip contains some missing entries and more than 90% of the genes are affected. Many analysis methods require a full set of data. Either those genes with missing entries are excluded, or the missing entries are filled with estimates prior to the analyses. This study compares methods of missing value estimation.
Results: Two evaluation metrics of imputation accuracy are employed. First, the root mean squared error measures the difference between the true values and the imputed values. Second, the number of mis-clustered genes measures the difference between clustering with true values and that with imputed values; it examines the bias introduced by imputation to clustering. The Gaussian mixture clustering with model averaging imputation is superior to all other imputation methods, according to both evaluation metrics, on both time-series (correlated) and non-time series (uncorrelated) data sets.
Availability: Matlab code is available on request from the authors.
Contact: ouyang{at}fidelio.rutgers.edu
* To whom correspondence should be addressed at Informatics Institute, University of Medicine, Dentistry of New Jersey, 675 Hoes Lane, Piscataway, NJ 08854, USA.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
D. S. V. Wong, F. K. Wong, and G. R. Wood A multi-stage approach to clustering and imputation of gene expression profiles Bioinformatics, April 15, 2007; 23(8): 998 - 1005. [Abstract] [Full Text] [PDF] |
||||
![]() |
D.-W. Kim, K.-Y. Lee, K. H. Lee, and D. Lee Towards clustering of incomplete microarray data without the use of imputation Bioinformatics, January 1, 2007; 23(1): 107 - 113. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Gan, A. W.-C. Liew, and H. Yan Microarray missing data imputation based on a set theoretic framework and biological knowledge Nucleic Acids Res., March 20, 2006; 34(5): 1608 - 1619. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Tuikkala, L. Elo, O. S. Nevalainen, and T. Aittokallio Improving missing value estimation in microarray data with gene ontology Bioinformatics, March 1, 2006; 22(5): 566 - 572. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Scheel, M. Aldrin, I. K. Glad, R. Sorum, H. Lyng, and A. Frigessi The influence of missing value imputation on detection of differentially expressed genes from microarray data Bioinformatics, December 1, 2005; 21(23): 4272 - 4279. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Jornsten, H.-Y. Wang, W. J. Welsh, and M. Ouyang DNA microarray data imputation and significance analysis of differential expression Bioinformatics, November 15, 2005; 21(22): 4155 - 4161. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. S. B. Sehgal, I. Gondal, and L. S. Dooley Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data Bioinformatics, May 15, 2005; 21(10): 2417 - 2423. [Abstract] [Full Text] [PDF] |
||||

