Bioinformatics Advance Access published online on August 14, 2006
Bioinformatics, doi:10.1093/bioinformatics/btl438
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 School of Electrical & Electronic Engineering, Nanyang Technological University, Nanyang Avenue, Singapore 639798; Bioinformatics Research Centre, Nanyang Technological University, Nanyang Avenue, Singapore 639798
* To whom correspondence should be addressed.
Motivation: Feature selection approaches, such as filter and wrapper, have been applied to address the gene selection problem in the literature of microarray data analysis. In wrapper methods, the classification error is usually used as the evaluation criterion of feature subsets. Due to the nature of high dimensionality and small sample size of microarray data, however, counting-based error estimation may not necessarily be an ideal criterion for gene selection problem. Results: Our study reveals that evaluating genes in terms of counting-based error estimators such as resubstitution error, leave-one-out error, cross-validation error and bootstrap error may encounter severe ties problem, i.e. more than one gene subset score equally, and this in turn results in uncertainty in gene selection. Our analysis finds that the ties problem is caused by the discrete nature of counting-based error estimators and could be avoided by using continuous evaluation criteria instead. Experiment results show that continuous evaluation criteria such as generalised measure for Support Vector Machines and modified Relief's measure for k-Nearest Neighbors produce improved gene selection compared with counting-based error estimators. Availability: The companion website is at http://www.ntu.edu.sg/home5/pg02776030/wrappers/. The website contains: 1. The source code of all the gene selection algorithms; 2. The complete set of tables and figures of experiments.
Received November 22, 2005
Revised July 31, 2006
Accepted August 9, 2006
Article
The ties problem resulting from counting-based error estimators and its impact on gene selection algorithms
Xin Zhou 1 and K. Z. Mao 2 *
2 School of Electrical & Electronic Engineering, Nanyang Technological University, Nanyang Avenue, Singapore 639798
K. Z. Mao, E-mail: ekzmao{at}ntu.edu.sg
![]()
Abstract
Associate Editor: Martin Bishop
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
X. Zhou and D. P. Tuck MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data Bioinformatics, May 1, 2007; 23(9): 1106 - 1114. [Abstract] [Full Text] [PDF] |
||||
