Bioinformatics Advance Access published online on September 16, 2004
Bioinformatics, doi:10.1093/bioinformatics/bti033
Bioinformatics © Oxford University Press 2004; all rights reserved
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
* To whom correspondence should be addressed. E-mail: alexander.statnikov{at}vanderbilt.edu.
Motivation: Cancer diagnosis is one of the most important emerging clinical applications of gene expression microarray technology. We are seeking to develop a computer system for powerful and reliable cancer diagnostic model creation based on microarray data. To keep a realistic perspective on clinical applications we focus on multicategory diagnosis. In order to equip the system with the optimum combination of classifier, gene selection and cross-validation methods, we performed a systematic and comprehensive evaluation of several major algorithms for multicategory classification, several gene selection methods, multiple ensemble classifier methods, and two cross validation designs using 11 datasets spanning 74 diagnostic categories and 41 cancer types and 12 normal tissue types. Results: Multicategory Support Vector Machines (MC-SVMs) are the most effective classifiers in performing accurate cancer diagnosis from gene expression data. The MC-SVM techniques by Crammer and Singer, Weston and Watkins, and one-versus-rest were found to be the best methods in this domain. MC-SVMs outperform other popular machine learning algorithms such as K-Nearest Neighbors, Backpropagation and Probabilistic Neural Networks, often to a remarkable degree. Gene selection techniques can significantly improve classification performance of both MC-SVMs and other non-SVM learning algorithms. Ensemble classifiers do not generally improve performance of the best non-ensemble models. These results guided the construction of a software system GEMS (Gene Expression Model Selector) that automates high-quality model construction and enforces sound optimization and performance estimation procedures. This is the first such system to be informed by a rigorous comparative analysis of the available algorithms and datasets. Availability: The software system GEMS is available for download from http://www.gems-system.org for non-commercial use.
Revised July 29, 2004
Accepted September 10, 2004
Article
A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis
2 Department of Mathematics, Vanderbilt University, Nashville, TN, USA
![]()
Abstract ![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A.-L. Boulesteix, C. Porzelius, and M. Daumer Microarray-based classification and clinical predictors: on combined classifiers and additional predictive value Bioinformatics, August 1, 2008; 24(15): 1698 - 1706. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. R. Bushel, A. N. Heinloth, J. Li, L. Huang, J. W. Chou, G. A. Boorman, D. E. Malarkey, C. D. Houle, S. M. Ward, R. E. Wilson, et al. Blood gene expression signatures predict exposure levels PNAS, November 13, 2007; 104(46): 18211 - 18216. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Saeys, I. Inza, and P. Larranaga A review of feature selection techniques in bioinformatics Bioinformatics, October 1, 2007; 23(19): 2507 - 2517. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Mramor, G. Leban, J. Demsar, and B. Zupan Visualization-based cancer microarray data classification analysis Bioinformatics, August 15, 2007; 23(16): 2147 - 2154. [Abstract] [Full Text] [PDF] |
||||
![]() |
A.-L. Boulesteix WilcoxCV: an R package for fast variable selection in cross-validation Bioinformatics, July 1, 2007; 23(13): 1702 - 1704. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. A. Wood, P. M. Visscher, and K. L. Mengersen Classification based upon gene expression data: bias and precision of error rates Bioinformatics, June 1, 2007; 23(11): 1363 - 1370. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Zhou and D. P. Tuck MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data Bioinformatics, May 1, 2007; 23(9): 1106 - 1114. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. T. Ryan and T. Sweeney Integrating Molecular Biology into the Veterinary Curriculum J Vet Med Educ, January 1, 2007; 34(5): 658 - 673. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. A. Davis, F. Gerick, V. Hintermair, C. C. Friedel, K. Fundel, R. Kuffner, and R. Zimmer Reliable gene signatures for microarray classification: assessment of stability and performance Bioinformatics, October 1, 2006; 22(19): 2356 - 2363. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.-C. Liu, W.-S. E. Chen, C.-C. Lin, H.-C. Liu, H.-Y. Chen, P.-C. Yang, P.-C. Chang, and J. J.W. Chen Topology-based cancer classification and related pathway mining using microarray data Nucleic Acids Res., September 1, 2006; 34(14): 4069 - 4080. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Berrar, I. Bradbury, and W. Dubitzky Avoiding model selection bias in small-sample genomic datasets Bioinformatics, May 15, 2006; 22(10): 1245 - 1250. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Wang, Y. Wang, J. Xuan, Y. Dong, M. Bakay, Y. Feng, R. Clarke, and E. P. Hoffman Optimized multilayer perceptrons for molecular classification and diagnosis using genomic data Bioinformatics, March 15, 2006; 22(6): 755 - 761. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Larranaga, B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza, J. A. Lozano, R. Armananzas, G. Santafe, A. Perez, et al. Machine learning in bioinformatics Brief Bioinform, March 1, 2006; 7(1): 86 - 112. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Zhang, R. Rekaya, and K. Bertrand A method for predicting disease subtypes in presence of misclassification among training samples using gene expression: application to human breast cancer Bioinformatics, February 1, 2006; 22(3): 317 - 325. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Usadel, A. Nagel, O. Thimm, H. Redestig, O. E. Blaesing, N. Palacios-Rojas, J. Selbig, J. Hannemann, M. C. Piques, D. Steinhauser, et al. Extension of the Visualization Tool MapMan to Allow Statistical Analysis of Arrays, Display of Coresponding Genes, and Comparison with Known Responses Plant Physiology, July 1, 2005; 138(3): 1195 - 1204. [Abstract] [Full Text] [PDF] |
||||





