Bioinformatics Advance Access originally published online on February 24, 2005
Bioinformatics 2005 21(10):2417-2423; doi:10.1093/bioinformatics/bti345
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data
Gippsland School of Computing and Information Technology, Monash University VIC 3842, Australia
*To whom correspondence should be addressed.
Motivation: Microarray data are used in a range of application areas in biology, although often it contains considerable numbers of missing values. These missing values can significantly affect subsequent statistical analysis and machine learning algorithms so there is a strong motivation to estimate these values as accurately as possible before using these algorithms. While many imputation algorithms have been proposed, more robust techniques need to be developed so that further analysis of biological data can be accurately undertaken. In this paper, an innovative missing value imputation algorithm called collateral missing value estimation (CMVE) is presented which uses multiple covariance-based imputation matrices for the final prediction of missing values. The matrices are computed and optimized using least square regression and linear programming methods.
Results: The new CMVE algorithm has been compared with existing estimation techniques including Bayesian principal component analysis imputation (BPCA), least square impute (LSImpute) and K-nearest neighbour (KNN). All these methods were rigorously tested to estimate missing values in three separate non-time series (ovarian cancer based) and one time series (yeast sporulation) dataset. Each method was quantitatively analyzed using the normalized root mean square (NRMS) error measure, covering a wide range of randomly introduced missing value probabilities from 0.01 to 0.2. Experiments were also undertaken on the yeast dataset, which comprised 1.7% actual missing values, to test the hypothesis that CMVE performed better not only for randomly occurring but also for a real distribution of missing values. The results confirmed that CMVE consistently demonstrated superior and robust estimation capability of missing values compared with other methods for both series types of data, for the same order of computational complexity. A concise theoretical framework has also been formulated to validate the improved performance of the CMVE algorithm.
Availability: The CMVE software is available upon request from the authors.
Contact: Shoaib.Sehgal{at}infotech.monash.edu.au
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
N. Pavelka, M. L. Fournier, S. K. Swanson, M. Pelizzola, P. Ricciardi-Castagnoli, L. Florens, and M. P. Washburn Statistical Similarities between Transcriptomics and Quantitative Shotgun Proteomics Data Mol. Cell. Proteomics, April 1, 2008; 7(4): 631 - 644. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Stacklies, H. Redestig, M. Scholz, D. Walther, and J. Selbig pcaMethods a bioconductor package providing PCA methods for incomplete data Bioinformatics, May 1, 2007; 23(9): 1164 - 1167. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. S. V. Wong, F. K. Wong, and G. R. Wood A multi-stage approach to clustering and imputation of gene expression profiles Bioinformatics, April 15, 2007; 23(8): 998 - 1005. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Hua and Y. Lai An ensemble approach to microarray data-based gene prioritization after missing value imputation Bioinformatics, March 15, 2007; 23(6): 747 - 754. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Gan, A. W.-C. Liew, and H. Yan Microarray missing data imputation based on a set theoretic framework and biological knowledge Nucleic Acids Res., March 20, 2006; 34(5): 1608 - 1619. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Tuikkala, L. Elo, O. S. Nevalainen, and T. Aittokallio Improving missing value estimation in microarray data with gene ontology Bioinformatics, March 1, 2006; 22(5): 566 - 572. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Jornsten, H.-Y. Wang, W. J. Welsh, and M. Ouyang DNA microarray data imputation and significance analysis of differential expression Bioinformatics, November 15, 2005; 21(22): 4155 - 4161. [Abstract] [Full Text] [PDF] |
||||


