Skip Navigation


Bioinformatics Advance Access originally published online on May 19, 2005
Bioinformatics 2005 21(15):3301-3307; doi:10.1093/bioinformatics/bti499
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/15/3301    most recent
bti499v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (53)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Molinaro, A. M.
Right arrow Articles by Pfeiffer, R. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Molinaro, A. M.
Right arrow Articles by Pfeiffer, R. M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Published by Oxford University Press 2005

Prediction error estimation: a comparison of resampling methods

Annette M. Molinaro 1,3,*, Richard Simon 2 and Ruth M. Pfeiffer 1

1Biostatistics Branch, Division of Cancer Epidemiology and Genetics, NCI, NIH Rockville, MD 20852 USA
2Biometric Research Branch, Division of Cancer Treatment and Diagnostics, NCI, NIH Rockville, MD 20852 USA
3Department of Epidemiology and Public Health, Yale University School of Medicine New Haven, CT 06520, USA

*To whom correspondence should be addressed.

Motivation: In genomic studies, thousands of features are collected on relatively few samples. One of the goals of these studies is to build classifiers to predict the outcome of future observations. There are three inherent steps to this process: feature selection, model selection and prediction assessment. With a focus on prediction assessment, we compare several methods for estimating the ‘true’ prediction error of a prediction model in the presence of feature selection.

Results: For small studies where features are selected from thousands of candidates, the resubstitution and simple split-sample estimates are seriously biased. In these small samples, leave-one-out cross-validation (LOOCV), 10-fold cross-validation (CV) and the .632+ bootstrap have the smallest bias for diagonal discriminant analysis, nearest neighbor and classification trees. LOOCV and 10-fold CV have the smallest bias for linear discriminant analysis. Additionally, LOOCV, 5- and 10-fold CV, and the .632+ bootstrap have the lowest mean square error. The .632+ bootstrap is quite biased in small sample sizes with strong signal-to-noise ratios. Differences in performance among resampling methods are reduced as the number of specimens available increase.

Contact: annette.molinaro{at}yale.edu

Supplementary Information: A complete compilation of results and R code for simulations and analyses are available in Molinaro et al. (2005) (http://linus.nci.nih.gov/brb/TechReport.htm).


Received on April 6, 2005; revised on April 28, 2005; accepted on May 12, 2005

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Clin. Cancer Res.Home page
R. Simon
The Use of Genomics in Clinical Trial Design
Clin. Cancer Res., October 1, 2008; 14(19): 5984 - 5993.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A.-L. Boulesteix, C. Porzelius, and M. Daumer
Microarray-based classification and clinical predictors: on combined classifiers and additional predictive value
Bioinformatics, August 1, 2008; 24(15): 1698 - 1706.
[Abstract] [Full Text] [PDF]


Home page
Cancer Res.Home page
T. Bonome, D. A. Levine, J. Shih, M. Randonovich, C. A. Pise-Masison, F. Bogomolniy, L. Ozbun, J. Brady, J. C. Barrett, J. Boyd, et al.
A Gene Signature Predicting for Survival in Suboptimally Debulked Patients with Ovarian Cancer
Cancer Res., July 1, 2008; 68(13): 5478 - 5486.
[Abstract] [Full Text] [PDF]


Home page
aacredbookHome page
P. Maruvada and S. Srivastava
Joint National Cancer Institute-Food and Drug Administration Workshop on Research Strategies, Study Designs, and Statistical Approaches to Biomarker Validation for Cancer Diagnosis and Detection
Am. Assoc. Cancer Res. Educ. Book, April 12, 2008; 2008(1): 239 - 247.
[Abstract] [Full Text] [PDF]


Home page
IOVSHome page
J. S. Ng, M. A. Bearse Jr, M. E. Schneck, S. Barez, and A. J. Adams
Local Diabetic Retinopathy Prediction by Multifocal ERG Delays over 3 Years
Invest. Ophthalmol. Vis. Sci., April 1, 2008; 49(4): 1622 - 1628.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
G. Jurman, S. Merler, A. Barla, S. Paoli, A. Galea, and C. Furlanello
Algebraic stability indicators for ranked lists in molecular profiling
Bioinformatics, January 15, 2008; 24(2): 258 - 264.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
P. R. Bushel, A. N. Heinloth, J. Li, L. Huang, J. W. Chou, G. A. Boorman, D. E. Malarkey, C. D. Houle, S. M. Ward, R. E. Wilson, et al.
Blood gene expression signatures predict exposure levels
PNAS, November 13, 2007; 104(46): 18211 - 18216.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Y. Saeys, I. Inza, and P. Larranaga
A review of feature selection techniques in bioinformatics
Bioinformatics, October 1, 2007; 23(19): 2507 - 2517.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Huang, A. Gusnanto, K. O'Sullivan, J. Staaf, A. Borg, and Y. Pawitan
Robust smooth segmentation approach for array CGH data analysis
Bioinformatics, September 15, 2007; 23(18): 2463 - 2469.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Schumacher, H. Binder, and T. Gerds
Assessment of survival prediction models based on microarray data
Bioinformatics, July 15, 2007; 23(14): 1768 - 1774.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A.-L. Boulesteix
WilcoxCV: an R package for fast variable selection in cross-validation
Bioinformatics, July 1, 2007; 23(13): 1702 - 1704.
[Abstract] [Full Text] [PDF]


Home page
JNCI J Natl Cancer InstHome page
K. Yanagisawa, S. Tomida, Y. Shimada, Y. Yatabe, T. Mitsudomi, and T. Takahashi
A 25-Signal Proteomic Signature and Outcome for Patients With Resected Non-Small-Cell Lung Cancer
J Natl Cancer Inst, June 6, 2007; 99(11): 858 - 867.
[Abstract] [Full Text] [PDF]


Home page
BiometrikaHome page
L. Tian, T. Cai, E. Goetghebeur, and L. J. Wei
Model evaluation based on the sampling distribution of estimated absolute prediction error
Biometrika, June 1, 2007; 94(2): 297 - 311.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
I. A. Wood, P. M. Visscher, and K. L. Mengersen
Classification based upon gene expression data: bias and precision of error rates
Bioinformatics, June 1, 2007; 23(11): 1363 - 1370.
[Abstract] [Full Text] [PDF]


Home page
JNCI J Natl Cancer InstHome page
A. Dupuy and R. M. Simon
Critical Review of Published Microarray Studies for Cancer Outcome and Guidelines on Statistical Analysis and Reporting
J Natl Cancer Inst, January 17, 2007; 99(2): 147 - 157.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. J. Kelly, D. M. Jacobsen, Y. V. Sun, J. A. Smith, and S. L. R. Kardia
KGraph: a system for visualizing and evaluating complex genetic associations
Bioinformatics, January 15, 2007; 23(2): 249 - 251.
[Abstract] [Full Text] [PDF]


Home page
BiostatisticsHome page
K. K. Dobbin and R. M. Simon
Sample size planning for developing classifiers using high-dimensional DNA microarray data
Biostat., January 1, 2007; 8(1): 101 - 117.
[Abstract] [Full Text] [PDF]


Home page
JNCI J Natl Cancer InstHome page
R. Simon
Development and evaluation of therapeutically relevant predictive classifiers using gene expression profiling.
J Natl Cancer Inst, September 6, 2006; 98(17): 1169 - 1171.
[Full Text] [PDF]


Home page
BioinformaticsHome page
H. Pang, A. Lin, M. Holford, B. E. Enerson, B. Lu, M. P. Lawton, E. Floyd, and H. Zhao
Pathway analysis using random forests classification and regression
Bioinformatics, August 15, 2006; 22(16): 2028 - 2036.
[Abstract] [Full Text] [PDF]


Home page
NEJMHome page
S. S. Dave, K. Fu, G. W. Wright, L. T. Lam, P. Kluin, E.-J. Boerma, T. C. Greiner, D. D. Weisenburger, A. Rosenwald, G. Ott, et al.
Molecular diagnosis of Burkitt's lymphoma.
N. Engl. J. Med., June 8, 2006; 354(23): 2431 - 2442.
[Abstract] [Full Text] [PDF]


Home page
Cancer Epidemiol. Biomarkers Prev.Home page
P. Maruvada and S. Srivastava
Joint national cancer institute-food and drug administration workshop on research strategies, study designs, and statistical approaches to biomarker validation for cancer diagnosis and detection.
Cancer Epidemiol. Biomarkers Prev., June 1, 2006; 15(6): 1078 - 1082.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
L. J. Buturovic
PCP: a program for supervised classification of gene expression profiles
Bioinformatics, January 15, 2006; 22(2): 245 - 247.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.