Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (15)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Braga-Neto, U.
Right arrow Articles by Carroll, R. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Braga-Neto, U.
Right arrow Articles by Carroll, R. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Vol. 20 no. 2 2004, pages 253-258
Bioinformatics © Oxford University Press 2004; all rights reserved.

Is cross-validation better than resubstitution for ranking genes?

Ulisses Braga-Neto 1,4, Ronaldo Hashimoto 3,4, Edward R. Dougherty 2,4,*, Danh V. Nguyen 5 and Raymond J. Carroll 5

1 Section of Clinical Cancer Genetics and 2 Department of Pathology, University of Texas M. D. Anderson Cancer Center, Houston, TX, USA, 3 Departamento de Ciencia de Computação, Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo, Brazil, 4 Department of Electrical Engineering and 5 Department of Statistics, Texas A&M University, College Station, TX, USA

Received on January 24, 2003 ; revised on April 29, 2003 ; accepted on August 5, 2003

Motivation: Ranking gene feature sets is a key issue for both phenotype classification, for instance, tumor classification in a DNA microarray experiment, and prediction in the context of genetic regulatory networks. Two broad methods are available to estimate the error (misclassification rate) of a classifier. Resubstitution fits a single classifier to the data, and applies this classifier in turn to each data observation. Cross-validation (in leave-one-out form) removes each observation in turn, constructs the classifier, and then computes whether this leave-one-out classifier correctly classifies the deleted observation. Resubstitution typically underestimates classifier error, severely so in many cases. Cross-validation has the advantage of producing an effectively unbiased error estimate, but the estimate is highly variable. In many applications it is not the misclassification rate per se that is of interest, but rather the construction of gene sets that have the potential to classify or predict. Hence, one needs to rank feature sets based on their performance.

Results: A model-based approach is used to compare the ranking performances of resubstitution and cross-validation for classification based on real-valued feature sets and for prediction in the context of probabilistic Boolean networks (PBNs). For classification, a Gaussian model is considered, along with classification via linear discriminant analysis and the 3-nearest-neighbor classification rule. Prediction is examined in the steady-distribution of a PBN. Three metrics are proposed to compare feature-set ranking based on error estimation with ranking based on the true error, which is known owing to the model-based approach. In all cases, resubstitution is competitive with cross-validation relative to ranking accuracy. This is in addition to the enormous savings in computation time afforded by resubstitution.

Contact: edward{at}ee.tamu.edu

* To whom correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
D. Wang, Y. Lv, Z. Guo, X. Li, Y. Li, J. Zhu, D. Yang, J. Xu, C. Wang, S. Rao, et al.
Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules
Bioinformatics, December 1, 2006; 22(23): 2883 - 2889.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
C. Sima, U. Braga-Neto, and E. R. Dougherty
Superior feature-set ranking for small samples using bolstered error estimation
Bioinformatics, April 1, 2005; 21(7): 1046 - 1054.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.