Bioinformatics Advance Access published online on October 25, 2005
Bioinformatics, doi:10.1093/bioinformatics/bti736
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
* To whom correspondence should be addressed.
Motivation: With the development of DNA microarray technology, scientists can now measure the expression levels of thousands of genes simultaneously in one single experiment. One current difficulty in interpreting microarray data comes from their innate nature of "high dimensional low sample size." Therefore, robust and accurate gene selection methods are required to identify differentially expressed group of genes across different samples, e.g., between cancerous and normal cells. Successful gene selection will help to classify different cancer types, lead to a better understanding of genetic signatures in cancers, and improve treatment strategies. Although gene selection and cancer classification are two closely related problems, most existing approaches handle them separately by selecting genes prior to classification. We provide a unified procedure for simultaneous gene selection and cancer classification, achieving high accuracy in both aspects. Results: In this paper we develop a novel type of regularization in support vector machines (SVMs) to identify important genes for cancer classification. A special nonconvex penalty, called the smoothly clipped absolute deviation penalty, is imposed on the hinge loss function in the SVM. By systematically thresholding small estimates to zeros, the new procedure eliminates redundant genes automatically and yields a compact and accurate classifier. A successive quadratic algorithm is proposed to convert the non-differentiable and nonconvex optimization problem into easily solved linear equation systems. The method is applied to two real data sets and has produced very promising results. Availability: MATLAB codes are available upon request from the authors. Supplementary information: http://www4.stat.ncsu.edu/hzhang/pub.html.
Received June 12, 2005
Revised October 20, 2005
Accepted October 20, 2005
Article
Gene selection using support vector machines with nonconvex penalty
2 Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC 27599, USA
3 Department of Mathematical Sciences, University of Cincinnati, OH 45221, USA
4 Department of Statistics, University of Georgia, Athens, GA 30602, USA
Hao Helen Zhang, E-mail: hzhang{at}stat.ncsu.edu
![]()
Abstract ![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
S. Keerthikumar, S. Bhadra, K. Kandasamy, R. Raju, Y.L. Ramachandra, C. Bhattacharyya, K. Imai, O. Ohara, S. Mohan, and A. Pandey Prediction of Candidate Primary Immunodeficiency Disease Genes Using a Support Vector Machine Learning Approach DNA Res, December 1, 2009; 16(6): 345 - 351. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Becker, W. Werft, G. Toedt, P. Lichter, and A. Benner penalizedSVM: a R-package for feature selection SVM classification Bioinformatics, July 1, 2009; 25(13): 1711 - 1712. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Ma and M. R. Kosorok Identification of differential gene pathways with principal component analysis Bioinformatics, April 1, 2009; 25(7): 882 - 889. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Hwang, H. Sicotte, Z. Tian, B. Wu, J.-P. Kocher, D. A. Wigle, V. Kumar, and R. Kuang Robust and efficient identification of biomarkers by classifying features on graphs Bioinformatics, September 15, 2008; 24(18): 2023 - 2029. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Ma and J. Huang Penalized feature selection and classification in bioinformatics Brief Bioinform, September 1, 2008; 9(5): 392 - 403. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Huang and T. W. S. Chow Identifying the biologically relevant gene categories based on gene expression and biological data: an example on prostate cancer Bioinformatics, June 15, 2007; 23(12): 1503 - 1510. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Zhou and D. P. Tuck MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data Bioinformatics, May 1, 2007; 23(9): 1106 - 1114. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Wang and J. Zhu Improved centroids estimation for the nearest shrunken centroid classifier Bioinformatics, April 15, 2007; 23(8): 972 - 979. [Abstract] [Full Text] [PDF] |
||||


