Bioinformatics Advance Access originally published online on August 27, 2004
Bioinformatics 2005 21(2):187-198; doi:10.1093/bioinformatics/bth499
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics vol. 21 issue 2 © Oxford University Press 2005; all rights reserved.
Missing value estimation for DNA microarray gene expression data: local least squares imputation
1 Department of Computer Science and Engineering, University of Minnesota Twin Cities, 200 Union Street S.E., Minneapolis, MN 55455, USA
2 Computer Science Department, Stanford University Gates Building 2B #280, Stanford, CA 94305-9025, USA
3 The National Science Foundation 4201 Wilson Boulevard, Arlington, VA 22230, USA
*To whom correspondence should be addressed.
Motivation: Gene expression data often contain missing expression values. Effective missing value estimation methods are needed since many algorithms for gene expression data analysis require a complete matrix of gene array values. In this paper, imputation methods based on the least squares formulation are proposed to estimate missing values in the gene expression data, which exploit local similarity structures in the data as well as least squares optimization process.
Results: The proposed local least squares imputation method (LLSimpute) represents a target gene that has missing values as a linear combination of similar genes. The similar genes are chosen by k-nearest neighbors or k coherent genes that have large absolute values of Pearson correlation coefficients. Non-parametric missing values estimation method of LLSimpute are designed by introducing an automatic k-value estimator. In our experiments, the proposed LLSimpute method shows competitive results when compared with other imputation methods for missing value estimation on various datasets and percentages of missing values in the data.
Availability: The software is available at http://www.cs.umn.edu/~hskim/tools.html
Contact: hpark{at}cs.umn.edu
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
N. Pavelka, M. L. Fournier, S. K. Swanson, M. Pelizzola, P. Ricciardi-Castagnoli, L. Florens, and M. P. Washburn Statistical Similarities between Transcriptomics and Quantitative Shotgun Proteomics Data Mol. Cell. Proteomics, April 1, 2008; 7(4): 631 - 644. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Stacklies, H. Redestig, M. Scholz, D. Walther, and J. Selbig pcaMethods a bioconductor package providing PCA methods for incomplete data Bioinformatics, May 1, 2007; 23(9): 1164 - 1167. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. S. V. Wong, F. K. Wong, and G. R. Wood A multi-stage approach to clustering and imputation of gene expression profiles Bioinformatics, April 15, 2007; 23(8): 998 - 1005. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Hua and Y. Lai An ensemble approach to microarray data-based gene prioritization after missing value imputation Bioinformatics, March 15, 2007; 23(6): 747 - 754. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Wang, Y. Lv, Z. Guo, X. Li, Y. Li, J. Zhu, D. Yang, J. Xu, C. Wang, S. Rao, et al. Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules Bioinformatics, December 1, 2006; 22(23): 2883 - 2889. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Chen, M. Chen, and K. Ning BNArray: an R package for constructing gene regulatory networks from microarray data by using Bayesian network Bioinformatics, December 1, 2006; 22(23): 2952 - 2954. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Gan, A. W.-C. Liew, and H. Yan Microarray missing data imputation based on a set theoretic framework and biological knowledge Nucleic Acids Res., March 20, 2006; 34(5): 1608 - 1619. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Tuikkala, L. Elo, O. S. Nevalainen, and T. Aittokallio Improving missing value estimation in microarray data with gene ontology Bioinformatics, March 1, 2006; 22(5): 566 - 572. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Alter and G. H. Golub Reconstructing the pathways of a cellular system from genome-scale signals by using matrix and tensor computations PNAS, December 6, 2005; 102(49): 17559 - 17564. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Scheel, M. Aldrin, I. K. Glad, R. Sorum, H. Lyng, and A. Frigessi The influence of missing value imputation on detection of differentially expressed genes from microarray data Bioinformatics, December 1, 2005; 21(23): 4272 - 4279. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Jornsten, H.-Y. Wang, W. J. Welsh, and M. Ouyang DNA microarray data imputation and significance analysis of differential expression Bioinformatics, November 15, 2005; 21(22): 4155 - 4161. [Abstract] [Full Text] [PDF] |
||||



