Bioinformatics Advance Access published online on October 10, 2005
Bioinformatics, doi:10.1093/bioinformatics/bti708
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Department of Mathematics, University of Oslo, PO Box 1053, Blindern, NO-0316 Oslo, Norway
* To whom correspondence should be addressed.
Motivation: Missing values are problematic for the analysis of microarray data. Imputation methods have been compared in terms of the similarity between imputed and true values in simulation experiments, and not of their influence on the final analysis. The focus has been on missing at random, while entries are missing also not at random. Results: We investigate the influence of imputation on the detection of differentially expressed genes from cDNA microarray data. We apply ANOVA for microarrays and SAM and look to the differentially expressed genes that are lost because of imputation. We show that this new measure provides useful information that the traditional root mean squared error cannot capture. We also show that the type of missingness matters: imputing 5% missing not at random has the same effect as imputing 10-30% missing at random. We propose a new method for imputation (LinImp), fitting a simple linear model for each channel separately, and compare it with the widely used KNNimpute method. For 10% missing at random, KNNimpute leads to twice as many lost differentially expressed genes as LinImp. Availability: The R package for LinImp is available from http://folk.uio.no/idasch/imp. Supplementary information: http://folk.uio.no/idasch/imp.
Received June 9, 2005
Revised September 20, 2005
Accepted October 5, 2005
Article
The influence of missing value imputation on detection of differentially expressed genes from microarray data
2 Department of Statistical Analysis, Image Analysis and Pattern Recognition, Norwegian Computing Center, NO-0314 Oslo, Norway
3 Department of Radiation Biology, The Norwegian Radium Hospital, NO-0310 Oslo, Norway
4 Department of Statistical Analysis, Image Analysis and Pattern Recognition, Norwegian Computing Center, NO-0314 Oslo, Norway; Department of Biostatistics, University of Oslo, NO-0317 Oslo, Norway
Ida Scheel, E-mail: idasch{at}math.uio.no
![]()
Abstract ![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
R. Varshavsky, A. Gottlieb, D. Horn, and M. Linial Unsupervised feature selection under perturbations: meeting the challenges of biological data Bioinformatics, December 15, 2007; 23(24): 3343 - 3349. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. S. V. Wong, F. K. Wong, and G. R. Wood A multi-stage approach to clustering and imputation of gene expression profiles Bioinformatics, April 15, 2007; 23(8): 998 - 1005. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Hua and Y. Lai An ensemble approach to microarray data-based gene prioritization after missing value imputation Bioinformatics, March 15, 2007; 23(6): 747 - 754. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Wang, Y. Lv, Z. Guo, X. Li, Y. Li, J. Zhu, D. Yang, J. Xu, C. Wang, S. Rao, et al. Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules Bioinformatics, December 1, 2006; 22(23): 2883 - 2889. [Abstract] [Full Text] [PDF] |
||||
