Bioinformatics Vol. 19 no. 16 2003
pages 2088-2096
© 2003 Oxford University Press
A Bayesian missing value estimation method for gene expression profile data
1 Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma 630-0192, Japan, 2 ATR Human Information Science Laboratories, 2-2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto, Japan, 3 Graduate School of Medicine, Osaka University, 2-2 Yamadaoka, Suita, Osaka, Japan, 4 DNA Chip Research Institute, 134 Kobecho, Hodogayaku, Yokohama, Japan and 5 CREST, Japan Science and Technology Corporation
Received on March 10, 2003
; revised on May 6, 2003
; accepted on May 9, 2003
Motivation: Gene expression profile analyses have been used in numerous studies covering a broad range of areas in biology. When unreliable measurements are excluded, missing values are introduced in gene expression profiles. Although existing multivariate analysis methods have difficulty with the treatment of missing values, this problem has received little attention. There are many options for dealing with missing values, each of which reaches drastically different results. Ignoring missing values is the simplest method and is frequently applied. This approach, however, has its flaws. In this article, we propose an estimation method for missing values, which is based on Bayesian principal component analysis (BPCA). Although the methodology that a probabilistic model and latent variables are estimated simultaneously within the framework of Bayes inference is not new in principle, actual BPCA implementation that makes it possible to estimate arbitrary missing variables is new in terms of statistical methodology.
Results: When applied to DNA microarray data from various experimental conditions, the BPCA method exhibited markedly better estimation ability than other recently proposed methods, such as singular value decomposition and K-nearest neighbors. While the estimation performance of existing methods depends on model parameters whose determination is difficult, our BPCA method is free from this difficulty. Accordingly, the BPCA method provides accurate and convenient estimation for missing values.
Availability: The software is available at http://hawaii.aist-nara.ac.jp/~shige-o/tools/
Contact: ishii{at}is.aist-nara.ac.jp
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
W. Stacklies, H. Redestig, M. Scholz, D. Walther, and J. Selbig pcaMethods a bioconductor package providing PCA methods for incomplete data Bioinformatics, May 1, 2007; 23(9): 1164 - 1167. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. S. V. Wong, F. K. Wong, and G. R. Wood A multi-stage approach to clustering and imputation of gene expression profiles Bioinformatics, April 15, 2007; 23(8): 998 - 1005. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Hua and Y. Lai An ensemble approach to microarray data-based gene prioritization after missing value imputation Bioinformatics, March 15, 2007; 23(6): 747 - 754. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Wang, Y. Lv, Z. Guo, X. Li, Y. Li, J. Zhu, D. Yang, J. Xu, C. Wang, S. Rao, et al. Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules Bioinformatics, December 1, 2006; 22(23): 2883 - 2889. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Gan, A. W.-C. Liew, and H. Yan Microarray missing data imputation based on a set theoretic framework and biological knowledge Nucleic Acids Res., March 20, 2006; 34(5): 1608 - 1619. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Rattray, X. Liu, G. Sanguinetti, M. Milo, and N. D. Lawrence Propagating uncertainty in microarray data analysis Brief Bioinform, March 1, 2006; 7(1): 37 - 47. |
||||
![]() |
J. Tuikkala, L. Elo, O. S. Nevalainen, and T. Aittokallio Improving missing value estimation in microarray data with gene ontology Bioinformatics, March 1, 2006; 22(5): 566 - 572. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Scheel, M. Aldrin, I. K. Glad, R. Sorum, H. Lyng, and A. Frigessi The influence of missing value imputation on detection of differentially expressed genes from microarray data Bioinformatics, December 1, 2005; 21(23): 4272 - 4279. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Jornsten, H.-Y. Wang, W. J. Welsh, and M. Ouyang DNA microarray data imputation and significance analysis of differential expression Bioinformatics, November 15, 2005; 21(22): 4155 - 4161. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Scholz, F. Kaplan, C. L. Guy, J. Kopka, and J. Selbig Non-linear PCA: a missing data approach Bioinformatics, October 15, 2005; 21(20): 3887 - 3895. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Sanguinetti, M. Milo, M. Rattray, and N. D. Lawrence Accounting for probe-level noise in principal component analysis of microarray data Bioinformatics, October 1, 2005; 21(19): 3748 - 3754. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. S. B. Sehgal, I. Gondal, and L. S. Dooley Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data Bioinformatics, May 15, 2005; 21(10): 2417 - 2423. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Kim, G. H. Golub, and H. Park Missing value estimation for DNA microarray gene expression data: local least squares imputation Bioinformatics, January 15, 2005; 21(2): 187 - 198. [Abstract] [Full Text] [PDF] |
||||


