Bioinformatics Advance Access originally published online on January 31, 2007
Bioinformatics 2007 23(6):747-754; doi:10.1093/bioinformatics/btm010
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
An ensemble approach to microarray data-based gene prioritization after missing value imputation
1Department of Computer Science, The George Washington University, 801 22nd Street, Suite 704 and 2Department of Statistics and Biostatistics Center, The George Washington University, 2140 Pennsylvania Avenue, N.W. Washington, DC 20052, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: Microarrays have been widely used to discover novel disease related genes. Some types of microarray, such as cDNA arrays, usually contain a considerable portion of missing values. When missing value imputation and gene prioritization are sequentially conducted, it is necessary to consider the distribution space of prioritization scores due to the existence of missing values. We propose an ensemble approach to address this issue. A bootstrap procedure enables us to generate a resample multivariate distribution of the prioritization scores and then to obtain the expected prioritization scores.
Results: We used a published microarray two-sample data set to illustrate our approach. We focused on the following issues after missing value imputation: (i) concordance of gene prioritization and (ii) control of true and false positives. We compared our approach with the traditional non-ensemble approach to missing value imputation. We also evaluated the performance of non-imputation approach when the theoretical test distribution was available. The results showed that the ensemble imputation approach provided clearly improved performances in the concordance of gene prioritization and the control of true/false positives, especially when sample sizes were about 5–10 per group and missing rates were about 10–20%, which was a common situation for cDNA microarray studies.
Availability: The Matlab codes are freely available at http://home.gwu.edu/~ylai/research/Missing.
Contact: ylai{at}gwu.edu
Received on August 31, 2006; revised on December 26, 2006; accepted on January 14, 2007
This article has been cited by other articles:
![]() |
R. Varshavsky, A. Gottlieb, D. Horn, and M. Linial Unsupervised feature selection under perturbations: meeting the challenges of biological data Bioinformatics, December 15, 2007; 23(24): 3343 - 3349. [Abstract] [Full Text] [PDF] |
||||
