Bioinformatics Advance Access originally published online on September 16, 2004
Bioinformatics 2005 21(5):631-643; doi:10.1093/bioinformatics/bti033
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis
1 Department of Biomedical Informatics, Vanderbilt University Nashville, TN, USA
2 Department of Mathematics, Vanderbilt University Nashville, TN, USA
*To whom correspondence should be addressed.
Motivation: Cancer diagnosis is one of the most important emerging clinical applications of gene expression microarray technology. We are seeking to develop a computer system for powerful and reliable cancer diagnostic model creation based on microarray data. To keep a realistic perspective on clinical applications we focus on multicategory diagnosis. To equip the system with the optimum combination of classifier, gene selection and cross-validation methods, we performed a systematic and comprehensive evaluation of several major algorithms for multicategory classification, several gene selection methods, multiple ensemble classifier methods and two cross-validation designs using 11 datasets spanning 74 diagnostic categories and 41 cancer types and 12 normal tissue types.
Results: Multicategory support vector machines (MC-SVMs) are the most effective classifiers in performing accurate cancer diagnosis from gene expression data. The MC-SVM techniques by Crammer and Singer, Weston and Watkins and one-versus-rest were found to be the best methods in this domain. MC-SVMs outperform other popular machine learning algorithms, such as k-nearest neighbors, backpropagation and probabilistic neural networks, often to a remarkable degree. Gene selection techniques can significantly improve the classification performance of both MC-SVMs and other non-SVM learning algorithms. Ensemble classifiers do not generally improve performance of the best non-ensemble models. These results guided the construction of a software system GEMS (Gene Expression Model Selector) that automates high-quality model construction and enforces sound optimization and performance estimation procedures. This is the first such system to be informed by a rigorous comparative analysis of the available algorithms and datasets.
Availability: The software system GEMS is available for download from http://www.gems-system.org for non-commercial use.
Contact: alexander.statnikov{at}vanderbilt.edu
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A.-L. Boulesteix, C. Porzelius, and M. Daumer Microarray-based classification and clinical predictors: on combined classifiers and additional predictive value Bioinformatics, August 1, 2008; 24(15): 1698 - 1706. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. R. Bushel, A. N. Heinloth, J. Li, L. Huang, J. W. Chou, G. A. Boorman, D. E. Malarkey, C. D. Houle, S. M. Ward, R. E. Wilson, et al. Blood gene expression signatures predict exposure levels PNAS, November 13, 2007; 104(46): 18211 - 18216. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Saeys, I. Inza, and P. Larranaga A review of feature selection techniques in bioinformatics Bioinformatics, October 1, 2007; 23(19): 2507 - 2517. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Mramor, G. Leban, J. Demsar, and B. Zupan Visualization-based cancer microarray data classification analysis Bioinformatics, August 15, 2007; 23(16): 2147 - 2154. [Abstract] [Full Text] [PDF] |
||||
![]() |
A.-L. Boulesteix WilcoxCV: an R package for fast variable selection in cross-validation Bioinformatics, July 1, 2007; 23(13): 1702 - 1704. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. A. Wood, P. M. Visscher, and K. L. Mengersen Classification based upon gene expression data: bias and precision of error rates Bioinformatics, June 1, 2007; 23(11): 1363 - 1370. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Zhou and D. P. Tuck MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data Bioinformatics, May 1, 2007; 23(9): 1106 - 1114. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. T. Ryan and T. Sweeney Integrating Molecular Biology into the Veterinary Curriculum J Vet Med Educ, January 1, 2007; 34(5): 658 - 673. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. A. Davis, F. Gerick, V. Hintermair, C. C. Friedel, K. Fundel, R. Kuffner, and R. Zimmer Reliable gene signatures for microarray classification: assessment of stability and performance Bioinformatics, October 1, 2006; 22(19): 2356 - 2363. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.-C. Liu, W.-S. E. Chen, C.-C. Lin, H.-C. Liu, H.-Y. Chen, P.-C. Yang, P.-C. Chang, and J. J.W. Chen Topology-based cancer classification and related pathway mining using microarray data Nucleic Acids Res., September 1, 2006; 34(14): 4069 - 4080. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Berrar, I. Bradbury, and W. Dubitzky Avoiding model selection bias in small-sample genomic datasets Bioinformatics, May 15, 2006; 22(10): 1245 - 1250. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Wang, Y. Wang, J. Xuan, Y. Dong, M. Bakay, Y. Feng, R. Clarke, and E. P. Hoffman Optimized multilayer perceptrons for molecular classification and diagnosis using genomic data Bioinformatics, March 15, 2006; 22(6): 755 - 761. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Larranaga, B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza, J. A. Lozano, R. Armananzas, G. Santafe, A. Perez, et al. Machine learning in bioinformatics Brief Bioinform, March 1, 2006; 7(1): 86 - 112. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Zhang, R. Rekaya, and K. Bertrand A method for predicting disease subtypes in presence of misclassification among training samples using gene expression: application to human breast cancer Bioinformatics, February 1, 2006; 22(3): 317 - 325. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Usadel, A. Nagel, O. Thimm, H. Redestig, O. E. Blaesing, N. Palacios-Rojas, J. Selbig, J. Hannemann, M. C. Piques, D. Steinhauser, et al. Extension of the Visualization Tool MapMan to Allow Statistical Analysis of Arrays, Display of Coresponding Genes, and Comparison with Known Responses Plant Physiology, July 1, 2005; 138(3): 1195 - 1204. [Abstract] [Full Text] [PDF] |
||||





