Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (341)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Troyanskaya, O.
Right arrow Articles by Altman, R. B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Troyanskaya, O.
Right arrow Articles by Altman, R. B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics Vol. 17 no. 6 2001
Pages 520-525
© 2001 Oxford University Press

Missing value estimation methods for DNA microarrays

Olga Troyanskaya 1, Michael Cantor 1, Gavin Sherlock 2, Pat Brown 3, Trevor Hastie 4, Robert Tibshirani 4, David Botstein 2 and Russ B. Altman 1,*

1 Stanford Medical Informatics
2 Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
3 Department of Biochemistry, Stanford University School of Medicine, and Howard Hughes Medical Institute, Stanford, CA, USA
4 Departments of Statistics and Health Research and Policy, Stanford University, Stanford, CA, USA

Received on November 13, 2000 ; revised on February 22, 2001 ; accepted on February 26, 2001

Motivation: Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. For example, methods such as hierarchical clustering and K-means clustering are not robust to missing data, and may lose effectiveness even with a few missing values. Methods for imputing missing data are needed, therefore, to minimize the effect of incomplete data sets on analyses, and to increase the range of data sets to which these algorithms can be applied. In this report, we investigate automated methods for estimating missing data.

Results: We present a comparative study of several methods for the estimation of missing values in gene microarray data. We implemented and evaluated three methods: a Singular Value Decomposition (SVD) based method (SVDimpute), weighted K-nearest neighbors (KNNimpute), and row average. We evaluated the methods using a variety of parameter settings and over different real data sets, and assessed the robustness of the imputation methods to the amount of missing data over the range of 1–20% missing values. We show that KNNimpute appears to provide a more robust and sensitive method for missing value estimation than SVDimpute, and both SVDimpute and KNNimpute surpass the commonly used row average method (as well as filling missing values with zeros). We report results of the comparative experiments and provide recommendations and tools for accurate estimation of missing microarray data under a variety of conditions.

Availability: The software is available at http://smi-web.stanford.edu/projects/helix/pubs/impute/

Contact: russ.altman{at}stanford.edu

* To whom correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J. Thorac. Cardiovasc. Surg.Home page
V. Ruppert, T. Meyer, S. Pankuweit, E. Moller, R. C. Funck, W. Grimm, B. Maisch, and German Heart Failure Network
Gene expression profiling from endomyocardial biopsy tissue allows distinction between subentities of dilated cardiomyopathy.
J. Thorac. Cardiovasc. Surg., August 1, 2008; 136(2): 360 - 369.e1.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. Schachtner, D. Lutter, P. Knollmuller, A. M. Tome, F. J. Theis, G. Schmitz, M. Stetter, P. G. Vilda, and E. W. Lang
Knowledge-based gene expression classification via matrix factorization
Bioinformatics, August 1, 2008; 24(15): 1688 - 1697.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. D. Polpitiya, W.-J. Qian, N. Jaitly, V. A. Petyuk, J. N. Adkins, D. G. Camp II, G. A. Anderson, and R. D. Smith
DAnTE: a statistical tool for quantitative analysis of -omics data
Bioinformatics, July 1, 2008; 24(13): 1556 - 1558.
[Abstract] [PDF]


Home page
Nucleic Acids ResHome page
F. Geraci, M. Pellegrini, and M. E. Renda
AMIC@: All MIcroarray Clusterings @ once
Nucleic Acids Res., July 1, 2008; 36(suppl_2): W315 - W319.
[Abstract] [Full Text] [PDF]


Home page
BiostatisticsHome page
D. Ghosh and A. M. Chinnaiyan
Genomic outlier profile analysis: mixture models, null hypotheses, and nonparametric estimation
Biostat., June 6, 2008; (2008) kxn015v1.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
C. Chang, Z. Ding, Y. S. Hung, and P. C. W. Fung
Fast network component analysis (FastNCA) for gene regulatory network reconstruction from microarray data
Bioinformatics, June 1, 2008; 24(11): 1349 - 1358.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. A. Shabalin, H. Tjelmeland, C. Fan, C. M. Perou, and A. B. Nobel
Merging two gene-expression studies via cross-platform normalization
Bioinformatics, May 1, 2008; 24(9): 1154 - 1160.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. E. Futschik and H. Herzel
Are we overestimating the number of cell-cycling genes? The impact of background models on time-series analysis
Bioinformatics, April 15, 2008; 24(8): 1063 - 1069.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. ProteomicsHome page
N. Pavelka, M. L. Fournier, S. K. Swanson, M. Pelizzola, P. Ricciardi-Castagnoli, L. Florens, and M. P. Washburn
Statistical Similarities between Transcriptomics and Quantitative Shotgun Proteomics Data
Mol. Cell. Proteomics, April 1, 2008; 7(4): 631 - 644.
[Abstract] [Full Text] [PDF]


Home page
BloodHome page
H. W. Tun, D. Personett, K. A. Baskerville, D. M. Menke, K. A. Jaeckle, P. Kreinest, B. Edenfield, A. C. Zubair, B. P. O'Neill, W. R. Lai, et al.
Pathway analysis of primary central nervous system lymphoma
Blood, March 15, 2008; 111(6): 3200 - 3210.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
P. Adler, J. Reimand, J. Janes, R. Kolde, H. Peterson, and J. Vilo
KEGGanim: pathway animations for high-throughput data
Bioinformatics, February 15, 2008; 24(4): 588 - 590.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D. Yang, Y. Li, H. Xiao, Q. Liu, M. Zhang, J. Zhu, W. Ma, C. Yao, J. Wang, D. Wang, et al.
Gaining confidence in biological interpretation of the microarray data: the functional consistence of the significant GO categories
Bioinformatics, January 15, 2008; 24(2): 265 - 271.
[Abstract] [Full Text] [PDF]


Home page
Mol. Biol. CellHome page
M. J. Brauer, C. Huttenhower, E. M. Airoldi, R. Rosenstein, J. C. Matese, D. Gresham, V. M. Boer, O. G. Troyanskaya, and D. Botstein
Coordination of Growth Rate, Cell Cycle, Stress Response, and Metabolic Activity in Yeast
Mol. Biol. Cell, January 1, 2008; 19(1): 352 - 367.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. Varshavsky, A. Gottlieb, D. Horn, and M. Linial
Unsupervised feature selection under perturbations: meeting the challenges of biological data
Bioinformatics, December 15, 2007; 23(24): 3343 - 3349.
[Abstract] [Full Text] [PDF]


Home page
J. Neurosci.Home page
H. J. Kang, D. H. Adams, A. Simen, B. B. Simen, G. Rajkowska, C. A. Stockmeier, J. C. Overholser, H. Y. Meltzer, G. J. Jurjus, L. C. Konick, et al.
Gene Expression Profiling in Postmortem Prefrontal Cortex of Major Depressive Disorder
J. Neurosci., November 28, 2007; 27(48): 13329 - 13340.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
D. Ardigo, T. L. Assimes, S. P. Fortmann, A. S. Go, M. Hlatky, E. Hytopoulos, C. Iribarren, P. S. Tsao, R. Tabibiazar, T. Quertermous, et al.
Circulating chemokines accurately identify individuals with clinically significant atherosclerotic heart disease
Physiol Genomics, November 14, 2007; 31(3): 402 - 409.
[Abstract] [Full Text] [PDF]


Home page
Hum ReprodHome page
A.-N. Spiess, C. Feig, W. Schulze, F. Chalmel, H. Cappallo-Obermann, M. Primig, and C. Kirchhoff
Cross-platform gene expression signature of human spermatogenic failure reveals inflammatory-like response
Hum. Reprod., November 1, 2007; 22(11): 2936 - 2946.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. A. Hibbs, D. C. Hess, C. L. Myers, C. Huttenhower, K. Li, and O. G. Troyanskaya
Exploring the functional landscape of gene expression: directed search of large microarray compendia
Bioinformatics, October 15, 2007; 23(20): 2692 - 2699.
[Abstract] [Full Text] [PDF]


Home page
BiostatisticsHome page
Y. Lai
A moment-based method for estimating the proportion of true null hypotheses and its application to microarray gene expression data
Biostat., October 1, 2007; 8(4): 744 - 755.
[Abstract] [Full Text] [PDF]


Home page
Toxicol SciHome page
R. P. Beyer, R. C. Fry, M. R. Lasarev, L. A. McConnachie, L. B. Meira, V. S. Palmer, C. L. Powell, P. K. Ross, T. K. Bammler, B. U. Bradford, et al.
Multicenter Study of Acetaminophen Hepatotoxicity Reveals the Importance of Biological Endpoints in Genomic Analyses
Toxicol. Sci., September 1, 2007; 99(1): 326 - 337.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. Roback, J. Beard, D. Baumann, C. Gille, K. Henry, S. Krohn, H. Wiste, M.I. Voskuil, C. Rainville, and R. Rutherford
A predicted operon map for Mycobacterium tuberculosis
Nucleic Acids Res., August 1, 2007; 35(15): 5085 - 5095.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
M. Zhan, H. Yamaza, Y. Sun, J. Sinclair, H. Li, and S. Zou
Temporal and spatial transcriptional profiles of aging in Drosophila melanogaster
Genome Res., August 1, 2007; 17(8): 1236 - 1243.
[Abstract] [Full Text] [PDF]


Home page
Cancer Res.Home page
A. Naderi, A. E. Teschendorff, J. Beigel, M. Cariati, I. O. Ellis, J. D. Brenton, and C. Caldas
BEX2 Is Overexpressed in a Subset of Primary Breast Cancers and Mediates Nerve Growth Factor/Nuclear Factor-{kappa}B Inhibition of Apoptosis in Breast Cancer Cell Lines
Cancer Res., July 15, 2007; 67(14): 6725 - 6736.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Roberts, L. McMillan, W. Wang, J. Parker, I. Rusyn, and D. Threadgill
Inferring missing genotypes in large SNP panels using fast nearest-neighbor searches over sliding windows
Bioinformatics, July 1, 2007; 23(13): i401 - i407.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Shiga, I. Takigawa, and H. Mamitsuka
Annotating gene function by combining expression data with a modular gene network
Bioinformatics, July 1, 2007; 23(13): i468 - i478.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. Sahoo, D. L. Dill, R. Tibshirani, and S. K. Plevritis
Extracting binary signals from microarray time-course data
Nucleic Acids Res., June 28, 2007; 35(11): 3705 - 3712.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
B. J. F. Keijser, A. Ter Beek, H. Rauwerda, F. Schuren, R. Montijn, H. van der Spek, and S. Brul
Analysis of Temporal Gene Expression during Bacillus subtilis Spore Germination and Outgrowth
J. Bacteriol., May 1, 2007; 189(9): 3624 - 3634.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
W. Stacklies, H. Redestig, M. Scholz, D. Walther, and J. Selbig
pcaMethods a bioconductor package providing PCA methods for incomplete data
Bioinformatics, May 1, 2007; 23(9): 1164 - 1167.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Nicolau, R. Tibshirani, A.-L. Borresen-Dale, and S. S. Jeffrey
Disease-specific genomic analysis: identifying the signature of pathologic biology
Bioinformatics, April 15, 2007; 23(8): 957 - 965.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D. S. V. Wong, F. K. Wong, and G. R. Wood
A multi-stage approach to clustering and imputation of gene expression profiles
Bioinformatics, April 15, 2007; 23(8): 998 - 1005.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D. Hua and Y. Lai
An ensemble approach to microarray data-based gene prioritization after missing value imputation
Bioinformatics, March 15, 2007; 23(6): 747 - 754.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
H. Li, Y. Sun, and M. Zhan
The discovery of transcriptional modules by a two-stage matrix decomposition approach
Bioinformatics, February 15, 2007; 23(4): 473 - 479.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
Y. Guan, M. J. Dunham, and O. G. Troyanskaya
Functional Analysis of Gene Duplications in Saccharomyces cerevisiae
Genetics, February 1, 2007; 175(2): 933 - 943.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Demeter, C. Beauheim, J. Gollub, T. Hernandez-Boussard, H. Jin, D. Maier, J. C. Matese, M. Nitzberg, F. Wymore, Z. K. Zachariah, et al.
The Stanford Microarray Database: implementation of new analysis tools and open source release of software
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D766 - D770.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
X. Liu and L. Wang
Computing the maximum similarity bi-clusters of gene expression data
Bioinformatics, January 1, 2007; 23(1): 50 - 56.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D.-W. Kim, K.-Y. Lee, K. H. Lee, and D. Lee
Towards clustering of incomplete microarray data without the use of imputation
Bioinformatics, January 1, 2007; 23(1): 107 - 113.
[Abstract] [Full Text] [PDF]


Home page
CirculationHome page
E. A. Ashley, R. Ferrara, J. Y. King, A. Vailaya, A. Kuchinsky, X. He, B. Byers, U. Gerckens, S. Oblin, A. Tsalenko, et al.
Network Analysis of Human In-Stent Restenosis
Circulation, December 12, 2006; 114(24): 2644 - 2654.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
F. Rodrigues, M. Sarkar-Tyson, S. V. Harding, S. H. Sim, H. H. Chua, C. H. Lin, X. Han, R. K. M. Karuturi, K. Sung, K. Yu, et al.
Global Map of Growth-Regulated Gene Expression in Burkholderia pseudomallei, the Causative Agent of Melioidosis
J. Bacteriol., December 1, 2006; 188(23): 8178 - 8188.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D. Wang, Y. Lv, Z. Guo, X. Li, Y. Li, J. Zhu, D. Yang, J. Xu, C. Wang, S. Rao, et al.
Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules
Bioinformatics, December 1, 2006; 22(23): 2883 - 2889.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
C. Huttenhower, M. Hibbs, C. Myers, and O. G. Troyanskaya
A scalable method for integration and functional analysis of multiple microarray datasets
Bioinformatics, December 1, 2006; 22(23): 2890 - 2897.
[Abstract] [Full Text] [PDF]


Home page
Toxicol SciHome page
H.-Q. Yin, M. Kim, J.-H. Kim, G. Kong, M.-O. Lee, K.-S. Kang, B.-I. Yoon, H.-L. Kim, and B.-H. Lee
Hepatic Gene Expression Profiling and Lipid Homeostasis in Mice Exposed to Steatogenic Drug, Tetracycline
Toxicol. Sci., November 1, 2006; 94(1): 206 - 216.
[Abstract] [Full Text] [PDF]


Home page
Molecular Cancer TherapeuticsHome page
T. Sorlie, C. M. Perou, C. Fan, S. Geisler, T. Aas, A. Nobel, G. Anker, L. A. Akslen, D. Botstein, A.-L. Borresen-Dale, et al.
Gene expression profiles do not consistently predict the clinical treatment response in locally advanced breast cancer.
Mol. Cancer Ther., November 1, 2006; 5(11): 2914 - 2918.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Thalamuthu, I. Mukhopadhyay, X. Zheng, and G. C. Tseng
Evaluation and comparison of gene clustering methods in microarray analysis
Bioinformatics, October 1, 2006; 22(19): 2405 - 2412.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
A. C. Y. Chang, L. Zsak, Y. Feng, R. Mosseri, Q. Lu, P. Kowalski, A. Zsak, T. G. Burrage, J. G. Neilan, G. F. Kutish, et al.
Phenotype-based identification of host genes required for replication of african Swine Fever virus.
J. Virol., September 1, 2006; 80(17): 8705 - 8717.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Li, L. Wu, and Z. Zhang
Constructing biological networks through combined literature mining and microarray analysis: a LMMA approach
Bioinformatics, September 1, 2006; 22(17): 2143 - 2150.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Z. S. Qin
Clustering microarray gene expression data using weighted Chinese restaurant process
Bioinformatics, August 15, 2006; 22(16): 1988 - 1997.
[Abstract] [Full Text] [PDF]


Home page
BiostatisticsHome page
T. Hothorn, P. Buhlmann, S. Dudoit, A. Molinaro, and M. J. Van Der Laan
Survival ensembles
Biostat., July 1, 2006; 7(3): 355 - 373.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
N. Sun, R. J. Carroll, and H. Zhao
Bayesian error analysis model for reconstructing transcriptional regulatory networks
PNAS, May 23, 2006; 103(21): 7988 - 7993.
[Abstract] [Full Text] [PDF]


Home page
Molecular Cancer TherapeuticsHome page
V. Dumeaux, J. Johansen, A.-L. Borresen-Dale, and E. Lund
Gene expression profiling of whole-blood samples from women exposed to hormone replacement therapy.
Mol. Cancer Ther., April 1, 2006; 5(4): 868 - 876.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Z. Barutcuoglu, R. E. Schapire, and O. G. Troyanskaya
Hierarchical multi-label prediction of gene function
Bioinformatics, April 1, 2006; 22(7): 830 - 836.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
X. Gan, A. W.-C. Liew, and H. Yan
Microarray missing data imputation based on a set theoretic framework and biological knowledge
Nucleic Acids Res., March 20, 2006; 34(5): 1608 - 1619.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. Missal, M. A. Cross, and D. Drasdo
Gene network inference from incomplete expression data: transcriptional control of hematopoietic commitment
Bioinformatics, March 15, 2006; 22(6): 731 - 738.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Tuikkala, L. Elo, O. S. Nevalainen, and T. Aittokallio
Improving missing value estimation in microarray data with gene ontology
Bioinformatics, March 1, 2006; 22(5): 566 - 572.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
L. Li
Survival prediction of diffuse large-B-cell lymphoma based on both clinical and gene expression information
Bioinformatics, February 15, 2006; 22(4): 466 - 471.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. T. Leek, E. Monsen, A. R. Dabney, and J. D. Storey
EDGE: extraction and analysis of differential gene expression
Bioinformatics, February 15, 2006; 22(4): 507 - 508.
[Abstract] [Full Text] [PDF]


Home page
Clin. Cancer Res.Home page
B. Dysvik, E. N. Vasstrand, R. Lovlie, O. A-A. Elgindi, K. W. Kross, H. J. Aarstad, A. Chr. Johannessen, I. Jonassen, and S. O. Ibrahim
Gene Expression Profiles of Head and Neck Carcinomas from Sudanese and Norwegian Patients Reveal Common Biological Pathways Regardless of Race and Lifestyle
Clin. Cancer Res., February 15, 2006; 12(4): 1109 - 1120.
[Abstract] [Full Text] [PDF]