Skip Navigation



Bioinformatics Advance Access published online on August 23, 2005

Bioinformatics, doi:10.1093/bioinformatics/bti638
This Article
Right arrow Advance Access manuscript (PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
21/22/4155    most recent
bti638v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Jörnsten, R.
Right arrow Articles by Ouyang, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Jörnsten, R.
Right arrow Articles by Ouyang, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author (2005). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oupjournals.org
Received September 24, 2004
Revised August 2, 2005
Accepted August 17, 2005

Article

DNA microarray data imputation and significance analysis of differential expression

Rebecka Jörnsten 1*, Hui-Yu Wang 2, William J. Welsh 3, and Ming Ouyang 3

1 Department of Statistics, Rutgers, the State University of New Jersey, New Brunswick, NJ 08903, USA
2 Stoecker Road, Holmdel, NJ 07733, USA
3 Department of Pharmacology, Robert Wood Johnson Medical School, and Informatics Institute, University of Medicine and Dentistry of New Jersey, Piscataway, NJ 08854, USA

* To whom correspondence should be addressed.
Rebecka Jörnsten, E-mail: rebecka{at}stat.rutgers.edu


   Abstract

Motivation: Significance analysis of differential expression in DNA microarray data is an important task. Much of current research is focused on developing improved tests and software tools. The task is difficult due not only to the high dimensionality of the data (number of genes), but also the often non-negligible presence of missing values. There is thus a great need to reliably impute these missing values prior to the statistical analyses. Many imputation methods have been developed for DNA microarray data, but their impact on statistical analyses has not been well studied. In this work we examine how missing values and their imputation affect significance analysis of differential expression.

Results: We develop a new imputation method (LinCmb) that is superior to the widely used methods in terms of normalized root mean squared error. Its estimates are the convex combinations of the estimates of existing methods. We find that LinCmb adapts to the structure of the data: If data are heterogeneous or if there are few missing values, LinCmb puts more weight on local imputation methods; if data are homogeneous or if there are many missing values, LinCmb puts more weight on global imputation methods. Thus LinCmb is a useful tool to understand the merits of different imputation methods. We also demonstrate that missing values affect significance analysis. Two data sets, different amounts of missing values, different imputation methods, and the standard t-test, the regularized t-test (Baldi and Long, 2001), and ANOVA are employed in the simulations. We conclude that good imputation alleviates the impact of missing values and should be an integral part of microarray data analysis. The most competitive methods are LinCmb, GMC (Ouyang et al., 2004), and BPCA (Oba et al., 2003). Popular imputation schemes such as SVD, row mean, and KNN all exhibit high variance and poor performance. The regularized t-test is less affected by missing values than the standard t-test.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Brief BioinformHome page
T. Aittokallio
Dealing with missing values in large-scale studies: microarray data imputation and beyond
Brief Bioinform, December 4, 2009; (2009) bbp059v1.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D. S. V. Wong, F. K. Wong, and G. R. Wood
A multi-stage approach to clustering and imputation of gene expression profiles
Bioinformatics, April 15, 2007; 23(8): 998 - 1005.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D. Hua and Y. Lai
An ensemble approach to microarray data-based gene prioritization after missing value imputation
Bioinformatics, March 15, 2007; 23(6): 747 - 754.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D. Wang, Y. Lv, Z. Guo, X. Li, Y. Li, J. Zhu, D. Yang, J. Xu, C. Wang, S. Rao, et al.
Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules
Bioinformatics, December 1, 2006; 22(23): 2883 - 2889.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.