Bioinformatics Advance Access originally published online on May 12, 2005
Bioinformatics 2005 21(14):3185-3186; doi:10.1093/bioinformatics/bti495
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
M@CBETH: a microarray classification benchmarking tool
K. U. Leuven, ESATSCD, Kasteelpark Arenberg 10 B-3001 Leuven (Heverlee), Belgium
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Summary: Microarray classification can be useful to support clinical management decisions for individual patients in, for example, oncology. However, comparing classifiers and selecting the best for each microarray dataset can be a tedious and non-straightforward task. The M@CBETH (a MicroArray Classification BEnchmarking Tool on a Host server) web service offers the microarray community a simple tool for making optimal two-class predictions. M@CBETH aims at finding the best prediction among different classification methods by using randomizations of the benchmarking dataset. The M@CBETH web service intends to introduce an optimal use of clinical microarray data classification.
Availability: Web service at http://www.esat.kuleuven.be/MACBETH/
Contact: Nathalie.Pochet{at}esat.kuleuven.be
| INTRODUCTION |
|---|
|
|
|---|
Using microarray data allows making predictions on, for example, therapy response, prognosis and metastatic phenotype of an individual patient. Microarray technology has shown to be useful in supporting clinical management decisions for individual patients [e.g. breast cancer (van't Veer et al., 2002), acute myeloid leukemia (Valk et al., 2004) and ovarian cancer (De Smet et al., 2005)] in combination with classification methods (Furey et al., 2000). Finding the best classifier for each dataset can be a tedious and non-straightforward task for users not familiar with these classification techniques. In this note, a web service is presented that compares, for each microarray dataset introduced to this service, different classifiers and selects the best in terms of randomized independent test set performances.
Systematic benchmarking of microarray data classification revealed that either regularization or dimensionality reduction is required to obtain good independent test set performances (Pochet et al., 2004). Regularizationas is performed in support vector machines (SVMs) (Cristianini and Shawe-Taylor, 2000)already led to the Gist web service, which offers SVM classification on the web (Pavlidis et al., 2004). This note allows comparing different classification methods. By exploring different combinations of nonlinearity and dimensionality reduction, our benchmarking study showed that the optimal classifier can differ for each dataset. Also important, but often underestimated in the model building process, is the fine-tuning of all hyperparameters (e.g. regularization parameter, kernel parameter and number of principal components). Exploring all combinations to find the optimal classifier for each dataset can be complicated.
| WEBSITE |
|---|
|
|
|---|
The M@CBETH website offers two services: benchmarking and prediction. After registration and logging on to the web service, users can request benchmarking or prediction analyses. Users are notified by email about the status of their analyses running on the host server. They can also check this on the analysis results page, which gives an overview of all analyses and contains links to corresponding results pages.
Benchmarking, the main service on the M@CBETH website, involves selection and training of an optimal model based on the submitted benchmarking dataset and corresponding class labels. This model is then stored for immediate or later use on prospective data. Benchmarking results in a table showing summary statistics [leave-one-out cross-validation (LOO-CV), training set accuracy (ACC) and area under the receiver operating characteristic curve (AUC) performance, test set ACC and AUC] for all selected classification methods, highlighting the best method. Prospective data can also be submitted and evaluated immediately during the same benchmarking analysis. By using the prediction service, the M@CBETH website offers a way for later evaluation of prospective data by reusing an existing optimal prediction model (built up in a previous benchmarking analysis by the same user). For both services, if the corresponding prospective labels are submitted, the prospective accuracy is calculated. Otherwise, labels are predicted for all prospective samples. This latter application is useful for classifying new unseen patients in clinical practice.
The M@CBETH web service is intended for the classification of patient samples, supposing microarray data are represented by an expression matrix characterized by high dimensionality in the sense of a small number of patients and a large number of gene expression levels for each patient. Two kinds of data formats are accepted: spreadsheet-like tab-delimited text files and matrix-like matlab files. Datasets are not allowed to contain missing values. Class labels are restricted to +1 or 1. Several publicly available microarray datasets are present on the website in correct data format as examples.
Users can select the classification methods that will be compared (default selection set to the best overall and most efficient methods from the benchmarking study), change the number of randomizations (default 20, while keeping in mind that results are more reliable when the number of randomizations is large) and switch off normalization (although performing normalization is better from a statistical viewpoint).
| ALGORITHM |
|---|
|
|
|---|
An overview of the algorithm behind this web sevice is presented in Figure 1. Different classification methodsbased on least squares SVM (Suykens et al., 2002) [based on linear and radial basis function (RBF) kernels], Fisher discriminant analysis, principal component analysis (PCA) and kernel PCA (Schölkopf et al., 1998; Suykens et al., 2002) (based on linear and RBF kernels)are considered. More detailed information on these methods can be found in (Pochet et al., 2004).
|
| Acknowledgments |
|---|
This work was supported by Research Council KUL: GOA-AMBioRICS, IDO (IOTA Oncology, Genetic networks), several PhD/postdoc and fellow grants; Flemish Government: FWO: PhD/postdoc grants, projects G.0115.01, G.0407.02, G.0413.03, G.0388.03, G.0229.03 and IWT: PhD Grants, STWW-Genprom, GBOU-McKnow, GBOU-SQUAD, GBOU-ANA; Belgian Federal Government: DWTC [IUAP V-22 (20022006)]; and EU: CAGE; Biopattern.
Received on April 18, 2005; revised on May 10, 2005; accepted on May 10, 2005
| REFERENCES |
|---|
|
|
|---|
Cristianini, N. and Shawe-Taylor, J. An Introduction to Support Vector Machines (and other Kernel-Based Learning Methods), (2000) , Cambridge Cambridge University Press.
De Smet, F., Pochet, N., Engelen, K., Van Gorp, T., Van Hummelen, P., Suykens, J., Marchal, K., Amant, F., Moreau, Y., Timmerman, D., De Moor, B., Vergote, I. (2005) Predicting the clinical behavior of ovarian cancer from gene expression profiles. Accepted for publication in International Journal of Gynecological cancer.
Furey, T.S., et al. (2000) Support vector machines classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 16, 906914
Pavlidis, P., et al. (2004) Support vector machine classification on the web. Bioinformatics, 20, 586587
Pochet, N., et al. (2004) Systematic benchmarking of microarray data classification: assessing the role of nonlinearity and dimensionality reduction. Bioinformatics, 20, 31853195
Schölkopf, B., et al. (1998) Nonlinear component analysis as a kernel eigenvalueproblem. Neural Comput., 10, 12991319[CrossRef][Web of Science].
Least Squares Support Vector Machines Suykens, J.A.K., Van Gestel, T., De Brabanter, J., De Moor, B., Vandewalle, J. (2002) , Singapore ISBN981-238-151-1 World Scientific.
Valk, P.J.M., et al. (2004) Prognostically useful gene-expression profiles in acute myeloid leukemia. N. Eng. J. Med., 350, 16171628
van't Veer, L.J., et al. (2002) Gene expression profiling predicts clinical outcome ofbreast cancer. Nature, 415, 530536[CrossRef][Medline].
This article has been cited by other articles:
![]() |
I. Medina, D. Montaner, J. Tarraga, and J. Dopazo Prophet, a web-based tool for class prediction using microarray data Bioinformatics, February 1, 2007; 23(3): 390 - 391. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. De Smet, N. L.M.M. Pochet, B. L.R. De Moor, T. Van Gorp, D. Timmerman, I. B. Vergote, L. C. Hartmann, A. I. Damokosh, and S. Hoersch Independent Test Set Performance in the Prediction of Early Relapse in Ovarian Cancer with Gene Expression Profiles Clin. Cancer Res., November 1, 2005; 11(21): 7958 - 7959. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


