Skip Navigation



Bioinformatics Advance Access published online on May 19, 2005

Bioinformatics, doi:10.1093/bioinformatics/bti499
This Article
Right arrow Advance Access manuscript (PDF) Freely available
Right arrow All Versions of this Article:
21/15/3301    most recent
bti499v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Molinaro, A. M.
Right arrow Articles by Pfeiffer, R. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Molinaro, A. M.
Right arrow Articles by Pfeiffer, R. M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author (2005). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oupjournals.org
Received April 6, 2005
Revised April 28, 2005
Accepted May 12, 2005

Article

Prediction error estimation: a comparison of resampling methods

Annette M. Molinaro 1*, Richard Simon 2, and Ruth M. Pfeiffer 3

1 Biostatistics Branch, Division of Cancer Epidemiology and Genetics, NCI, NIH, Rockville, MD 20852; Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT 06520
2 Biometric Research Branch, Division of Cancer Treatment and Diagnostics, NCI, NIH, Rockville, MD 20852
3 Biostatistics Branch, Division of Cancer Epidemiology and Genetics, NCI, NIH, Rockville, MD 20852

* To whom correspondence should be addressed.
Annette M. Molinaro, E-mail: annette.molinaro{at}yale.edu


   Abstract

Motivation: In genomic studies, thousands of features are collected on relatively few samples. One of the goals of these studies is to build classifiers to predict the outcome of future observations. There are three inherent steps to this process: feature selection, model selection, and prediction assessment. With a focus on prediction assessment, we compare several methods for estimating the ‘true’ prediction error of a prediction model in the presence of feature selection.

Results: For small studies where features are selected from thousands of candidates, the resubstitution and simple split-sample estimates are seriously biased. In these small samples, leave-one-out (LOOCV), 10-fold cross-validation (CV), and the .632+ bootstrap have the smallest bias for diagonal discriminant analysis, nearest neighbor, and classification trees. LOOCV and 10-fold CV have the smallest bias for linear discriminant analysis. Additionally, LOOCV, 5- and 10-fold CV, and the .632+ bootstrap have the lowest mean square error. The .632+ bootstrap is quite biased in small sample sizes with strong signal to noise ratios. Differences in performance among resampling methods are reduced as the number of specimens available increase.

Supplementary Information: A complete compilation of results is available in Molinaro et al. (2005). R code for simulations and analyses is available from the authors.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Am. J. Clin. Nutr.Home page
F. Imamura, P. F Jacques, D. M Herrington, G. E Dallal, and A. H Lichtenstein
Adherence to 2005 Dietary Guidelines for Americans is associated with a reduced progression of coronary artery atherosclerosis in women with established coronary artery disease
Am. J. Clinical Nutrition, July 1, 2009; 90(1): 193 - 201.
[Abstract] [Full Text] [PDF]


Home page
Clin. Cancer Res.Home page
J. W.F. Catto, M. F. Abbod, D. A. Linkens, S. Larre, D. J. Rosario, and F. C. Hamdy
Neurofuzzy Modeling to Determine Recurrence Risk Following Radical Cystectomy for Nonmetastatic Urothelial Carcinoma of the Bladder
Clin. Cancer Res., May 1, 2009; 15(9): 3150 - 3155.
[Abstract] [Full Text] [PDF]


Home page
Ann Clin BiochemHome page
G. Sartorius, L. P Ly, K. Sikaris, R. McLachlan, and D. J Handelsman
Predictive accuracy and sources of variability in calculated free testosterone estimates
Ann Clin Biochem, March 1, 2009; 46(2): 137 - 143.
[Abstract] [Full Text] [PDF]


Home page
Clin. Cancer Res.Home page
O. Hartmann, F. Spyratos, N. Harbeck, D. Dietrich, A. Fassbender, M. Schmitt, S. Eppenberger-Castori, V. Vuaroqueaux, F. Lerebours, K. Welzel, et al.
DNA Methylation Markers Predict Outcome in Node-Positive, Estrogen Receptor-Positive Breast Cancer with Adjuvant Anthracycline-Based Chemotherapy
Clin. Cancer Res., January 1, 2009; 15(1): 315 - 323.
[Abstract] [Full Text] [PDF]


Home page
Stat Methods Med ResHome page
S. Lee
Mistakes in validating the accuracy of a prediction classifier in high-dimensional but small-sample microarray data
Statistical Methods in Medical Research, December 1, 2008; 17(6): 635 - 642.
[Abstract] [PDF]


Home page
Clin. Cancer Res.Home page
R. Simon
The Use of Genomics in Clinical Trial Design
Clin. Cancer Res., October 1, 2008; 14(19): 5984 - 5993.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A.-L. Boulesteix, C. Porzelius, and M. Daumer
Microarray-based classification and clinical predictors: on combined classifiers and additional predictive value
Bioinformatics, August 1, 2008; 24(15): 1698 - 1706.
[Abstract] [Full Text] [PDF]


Home page
Cancer Res.Home page
T. Bonome, D. A. Levine, J. Shih, M. Randonovich, C. A. Pise-Masison, F. Bogomolniy, L. Ozbun, J. Brady, J. C. Barrett, J. Boyd, et al.
A Gene Signature Predicting for Survival in Suboptimally Debulked Patients with Ovarian Cancer
Cancer Res., July 1, 2008; 68(13): 5478 - 5486.
[Abstract] [Full Text] [PDF]


Home page
aacredbookHome page
P. Maruvada and S. Srivastava
Joint National Cancer Institute-Food and Drug Administration Workshop on Research Strategies, Study Designs, and Statistical Approaches to Biomarker Validation for Cancer Diagnosis and Detection
Am. Assoc. Cancer Res. Educ. Book, April 12, 2008; 2008(1): 239 - 247.
[Abstract] [Full Text] [PDF]


Home page
IOVSHome page
J. S. Ng, M. A. Bearse Jr, M. E. Schneck, S. Barez, and A. J. Adams
Local Diabetic Retinopathy Prediction by Multifocal ERG Delays over 3 Years
Invest. Ophthalmol. Vis. Sci., April 1, 2008; 49(4): 1622 - 1628.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
G. Jurman, S. Merler, A. Barla, S. Paoli, A. Galea, and C. Furlanello
Algebraic stability indicators for ranked lists in molecular profiling
Bioinformatics, January 15, 2008; 24(2): 258 - 264.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
P. R. Bushel, A. N. Heinloth, J. Li, L. Huang, J. W. Chou, G. A. Boorman, D. E. Malarkey, C. D. Houle, S. M. Ward, R. E. Wilson, et al.
Blood gene expression signatures predict exposure levels
PNAS, November 13, 2007; 104(46): 18211 - 18216.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Y. Saeys, I. Inza, and P. Larranaga
A review of feature selection techniques in bioinformatics
Bioinformatics, October 1, 2007; 23(19): 2507 - 2517.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Huang, A. Gusnanto, K. O'Sullivan, J. Staaf, A. Borg, and Y. Pawitan
Robust smooth segmentation approach for array CGH data analysis
Bioinformatics, September 15, 2007; 23(18): 2463 - 2469.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Schumacher, H. Binder, and T. Gerds
Assessment of survival prediction models based on microarray data
Bioinformatics, July 15, 2007; 23(14): 1768 - 1774.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A.-L. Boulesteix
WilcoxCV: an R package for fast variable selection in cross-validation
Bioinformatics, July 1, 2007; 23(13): 1702 - 1704.
[Abstract] [Full Text] [PDF]


Home page
JNCI J Natl Cancer InstHome page
K. Yanagisawa, S. Tomida, Y. Shimada, Y. Yatabe, T. Mitsudomi, and T. Takahashi
A 25-Signal Proteomic Signature and Outcome for Patients With Resected Non-Small-Cell Lung Cancer
J Natl Cancer Inst, June 6, 2007; 99(11): 858 - 867.
[Abstract] [Full Text] [PDF]


Home page
BiometrikaHome page
L. Tian, T. Cai, E. Goetghebeur, and L. J. Wei
Model evaluation based on the sampling distribution of estimated absolute prediction error
Biometrika, June 1, 2007; 94(2): 297 - 311.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
I. A. Wood, P. M. Visscher, and K. L. Mengersen
Classification based upon gene expression data: bias and precision of error rates
Bioinformatics, June 1, 2007; 23(11): 1363 - 1370.
[Abstract] [Full Text] [PDF]


Home page
JNCI J Natl Cancer InstHome page
A. Dupuy and R. M. Simon
Critical Review of Published Microarray Studies for Cancer Outcome and Guidelines on Statistical Analysis and Reporting
J Natl Cancer Inst, January 17, 2007; 99(2): 147 - 157.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. J. Kelly, D. M. Jacobsen, Y. V. Sun, J. A. Smith, and S. L. R. Kardia
KGraph: a system for visualizing and evaluating complex genetic associations
Bioinformatics, January 15, 2007; 23(2): 249 - 251.
[Abstract] [Full Text] [PDF]


Home page
BiostatisticsHome page
K. K. Dobbin and R. M. Simon
Sample size planning for developing classifiers using high-dimensional DNA microarray data
Biostat., January 1, 2007; 8(1): 101 - 117.
[Abstract] [Full Text] [PDF]


Home page
JNCI J Natl Cancer InstHome page
R. Simon
Development and evaluation of therapeutically relevant predictive classifiers using gene expression profiling.
J Natl Cancer Inst, September 6, 2006; 98(17): 1169 - 1171.
[Full Text] [PDF]


Home page
BioinformaticsHome page
H. Pang, A. Lin, M. Holford, B. E. Enerson, B. Lu, M. P. Lawton, E. Floyd, and H. Zhao
Pathway analysis using random forests classification and regression
Bioinformatics, August 15, 2006; 22(16): 2028 - 2036.
[Abstract] [Full Text] [PDF]


Home page
NEJMHome page
S. S. Dave, K. Fu, G. W. Wright, L. T. Lam, P. Kluin, E.-J. Boerma, T. C. Greiner, D. D. Weisenburger, A. Rosenwald, G. Ott, et al.
Molecular diagnosis of Burkitt's lymphoma.
N. Engl. J. Med., June 8, 2006; 354(23): 2431 - 2442.
[Abstract] [Full Text] [PDF]


Home page
Cancer Epidemiol. Biomarkers Prev.Home page
P. Maruvada and S. Srivastava
Joint national cancer institute-food and drug administration workshop on research strategies, study designs, and statistical approaches to biomarker validation for cancer diagnosis and detection.
Cancer Epidemiol. Biomarkers Prev., June 1, 2006; 15(6): 1078 - 1082.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
L. J. Buturovic
PCP: a program for supervised classification of gene expression profiles
Bioinformatics, January 15, 2006; 22(2): 245 - 247.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.