Bioinformatics Advance Access originally published online on August 14, 2006
Bioinformatics 2006 22(20):2507-2515; doi:10.1093/bioinformatics/btl438
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The ties problem resulting from counting-based error estimators and its impact on gene selection algorithms
1 School of Electrical & Electronic Engineering, Nanyang Technological University, Nanyang Avenue Singapore 639798, Singapore
2 Bioinformatics Research Centre, Nanyang Technological University, Nanyang Avenue Singapore 639798, Singapore
*To whom correspondence should be addressed.
Motivation: Feature selection approaches, such as filter and wrapper, have been applied to address the gene selection problem in the literature of microarray data analysis. In wrapper methods, the classification error is usually used as the evaluation criterion of feature subsets. Due to the nature of high dimensionality and small sample size of microarray data, however, counting-based error estimation may not necessarily be an ideal criterion for gene selection problem.
Results: Our study reveals that evaluating genes in terms of counting-based error estimators such as resubstitution error, leave-one-out error, cross-validation error and bootstrap error may encounter severe ties problem, i.e. two or more gene subsets score equally, and this in turn results in uncertainty in gene selection. Our analysis finds that the ties problem is caused by the discrete nature of counting-based error estimators and could be avoided by using continuous evaluation criteria instead. Experiment results show that continuous evaluation criteria such as generalised ||w||2 measure for support vector machines and modified Relief's measure for k-nearest neighbors produce improved gene selection compared with counting-based error estimators.
Availability: The companion website is at http://www.ntu.edu.sg/home5/pg02776030/wrappers/. The website contains (1) the source code of all the gene selection algorithms and (2) the complete set of tables and figures of experiments.
Contact: ekzmao{at}ntu.edu.sg
Received on November 22, 2005; revised on July 31, 2006; accepted on August 9, 2006
This article has been cited by other articles:
![]() |
X. Zhou and D. P. Tuck MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data Bioinformatics, May 1, 2007; 23(9): 1106 - 1114. [Abstract] [Full Text] [PDF] |
||||
