Skip Navigation


Bioinformatics Advance Access originally published online on May 12, 2005
Bioinformatics 2005 21(14):3185-3186; doi:10.1093/bioinformatics/bti495
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/14/3185    most recent
bti495v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Pochet, N. L. M. M.
Right arrow Articles by De Moor, B. L. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Pochet, N. L. M. M.
Right arrow Articles by De Moor, B. L. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

M@CBETH: a microarray classification benchmarking tool

Nathalie L. M. M. Pochet *, Frizo A. L. Janssens , Frank De Smet , Kathleen Marchal , Johan A. K. Suykens and Bart L. R. De Moor

K. U. Leuven, ESAT–SCD, Kasteelpark Arenberg 10 B-3001 Leuven (Heverlee), Belgium

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 INTRODUCTION
 WEBSITE
 ALGORITHM
 REFERENCES
 

Summary: Microarray classification can be useful to support clinical management decisions for individual patients in, for example, oncology. However, comparing classifiers and selecting the best for each microarray dataset can be a tedious and non-straightforward task. The M@CBETH (a MicroArray Classification BEnchmarking Tool on a Host server) web service offers the microarray community a simple tool for making optimal two-class predictions. M@CBETH aims at finding the best prediction among different classification methods by using randomizations of the benchmarking dataset. The M@CBETH web service intends to introduce an optimal use of clinical microarray data classification.

Availability: Web service at http://www.esat.kuleuven.be/MACBETH/

Contact: Nathalie.Pochet{at}esat.kuleuven.be


    INTRODUCTION
 TOP
 Abstract
 INTRODUCTION
 WEBSITE
 ALGORITHM
 REFERENCES
 
Using microarray data allows making predictions on, for example, therapy response, prognosis and metastatic phenotype of an individual patient. Microarray technology has shown to be useful in supporting clinical management decisions for individual patients [e.g. breast cancer (van't Veer et al., 2002), acute myeloid leukemia (Valk et al., 2004) and ovarian cancer (De Smet et al., 2005)] in combination with classification methods (Furey et al., 2000). Finding the best classifier for each dataset can be a tedious and non-straightforward task for users not familiar with these classification techniques. In this note, a web service is presented that compares, for each microarray dataset introduced to this service, different classifiers and selects the best in terms of randomized independent test set performances.

Systematic benchmarking of microarray data classification revealed that either regularization or dimensionality reduction is required to obtain good independent test set performances (Pochet et al., 2004). Regularization—as is performed in support vector machines (SVMs) (Cristianini and Shawe-Taylor, 2000)—already led to the Gist web service, which offers SVM classification on the web (Pavlidis et al., 2004). This note allows comparing different classification methods. By exploring different combinations of nonlinearity and dimensionality reduction, our benchmarking study showed that the optimal classifier can differ for each dataset. Also important, but often underestimated in the model building process, is the fine-tuning of all hyperparameters (e.g. regularization parameter, kernel parameter and number of principal components). Exploring all combinations to find the optimal classifier for each dataset can be complicated.


    WEBSITE
 TOP
 Abstract
 INTRODUCTION
 WEBSITE
 ALGORITHM
 REFERENCES
 
The M@CBETH website offers two services: benchmarking and prediction. After registration and logging on to the web service, users can request benchmarking or prediction analyses. Users are notified by email about the status of their analyses running on the host server. They can also check this on the analysis results page, which gives an overview of all analyses and contains links to corresponding results pages.

Benchmarking, the main service on the M@CBETH website, involves selection and training of an optimal model based on the submitted benchmarking dataset and corresponding class labels. This model is then stored for immediate or later use on prospective data. Benchmarking results in a table showing summary statistics [leave-one-out cross-validation (LOO-CV), training set accuracy (ACC) and area under the receiver operating characteristic curve (AUC) performance, test set ACC and AUC] for all selected classification methods, highlighting the best method. Prospective data can also be submitted and evaluated immediately during the same benchmarking analysis. By using the prediction service, the M@CBETH website offers a way for later evaluation of prospective data by reusing an existing optimal prediction model (built up in a previous benchmarking analysis by the same user). For both services, if the corresponding prospective labels are submitted, the prospective accuracy is calculated. Otherwise, labels are predicted for all prospective samples. This latter application is useful for classifying new unseen patients in clinical practice.

The M@CBETH web service is intended for the classification of patient samples, supposing microarray data are represented by an expression matrix characterized by high dimensionality in the sense of a small number of patients and a large number of gene expression levels for each patient. Two kinds of data formats are accepted: spreadsheet-like tab-delimited text files and matrix-like matlab files. Datasets are not allowed to contain missing values. Class labels are restricted to ‘+1’ or ‘–1’. Several publicly available microarray datasets are present on the website in correct data format as examples.

Users can select the classification methods that will be compared (default selection set to the best overall and most efficient methods from the benchmarking study), change the number of randomizations (default 20, while keeping in mind that results are more reliable when the number of randomizations is large) and switch off normalization (although performing normalization is better from a statistical viewpoint).


    ALGORITHM
 TOP
 Abstract
 INTRODUCTION
 WEBSITE
 ALGORITHM
 REFERENCES
 
An overview of the algorithm behind this web sevice is presented in Figure 1. Different classification methods—based on least squares SVM (Suykens et al., 2002) [based on linear and radial basis function (RBF) kernels], Fisher discriminant analysis, principal component analysis (PCA) and kernel PCA (Schölkopf et al., 1998; Suykens et al., 2002) (based on linear and RBF kernels)—are considered. More detailed information on these methods can be found in (Pochet et al., 2004).



View larger version (47K):
[in this window]
[in a new window]
 
Fig. 1 Overview of the algorithm. The benchmarking dataset is reshuffled until the number of requested randomizations is reached. All randomizations are split (two-third of the samples for training, the rest as test set) in a stratified way (class labels are equally distributed over the training–test split). Iteratively, all selected classification methods (1) are applied to all randomizations. In each iteration, selection of the hyperparameters is first performed by means of LOO-CV, then the model is trained based on the training set and finally, this model is then applied onto the test set resulting in a test set ACC. The mean randomized test set ACC is calculated for each classification method. The best generalizing method (2)—with best test set ACC—is then used for building the optimal classifier onto the complete benchmarking dataset, which is stored for application onto prospective datasets.

 


    Acknowledgments
 
This work was supported by Research Council KUL: GOA-AMBioRICS, IDO (IOTA Oncology, Genetic networks), several PhD/postdoc and fellow grants; Flemish Government: FWO: PhD/postdoc grants, projects G.0115.01, G.0407.02, G.0413.03, G.0388.03, G.0229.03 and IWT: PhD Grants, STWW-Genprom, GBOU-McKnow, GBOU-SQUAD, GBOU-ANA; Belgian Federal Government: DWTC [IUAP V-22 (2002–2006)]; and EU: CAGE; Biopattern.

Received on April 18, 2005; revised on May 10, 2005; accepted on May 10, 2005

    REFERENCES
 TOP
 Abstract
 INTRODUCTION
 WEBSITE
 ALGORITHM
 REFERENCES
 

    Cristianini, N. and Shawe-Taylor, J. An Introduction to Support Vector Machines (and other Kernel-Based Learning Methods), (2000) , Cambridge Cambridge University Press.

    De Smet, F., Pochet, N., Engelen, K., Van Gorp, T., Van Hummelen, P., Suykens, J., Marchal, K., Amant, F., Moreau, Y., Timmerman, D., De Moor, B., Vergote, I. (2005) Predicting the clinical behavior of ovarian cancer from gene expression profiles. Accepted for publication in International Journal of Gynecological cancer.

    Furey, T.S., et al. (2000) Support vector machines classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 16, 906–914[Abstract/Free Full Text].

    Pavlidis, P., et al. (2004) Support vector machine classification on the web. Bioinformatics, 20, 586–587[Abstract/Free Full Text].

    Pochet, N., et al. (2004) Systematic benchmarking of microarray data classification: assessing the role of nonlinearity and dimensionality reduction. Bioinformatics, 20, 3185–3195[Abstract/Free Full Text].

    Schölkopf, B., et al. (1998) Nonlinear component analysis as a kernel eigenvalueproblem. Neural Comput., 10, 1299–1319[CrossRef][Web of Science].

    Least Squares Support Vector Machines Suykens, J.A.K., Van Gestel, T., De Brabanter, J., De Moor, B., Vandewalle, J. (2002) , Singapore ISBN981-238-151-1 World Scientific.

    Valk, P.J.M., et al. (2004) Prognostically useful gene-expression profiles in acute myeloid leukemia. N. Eng. J. Med., 350, 1617–1628[Abstract/Free Full Text].

    van't Veer, L.J., et al. (2002) Gene expression profiling predicts clinical outcome ofbreast cancer. Nature, 415, 530–536[CrossRef][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
I. Medina, D. Montaner, J. Tarraga, and J. Dopazo
Prophet, a web-based tool for class prediction using microarray data
Bioinformatics, February 1, 2007; 23(3): 390 - 391.
[Abstract] [Full Text] [PDF]


Home page
Clin. Cancer Res.Home page
F. De Smet, N. L.M.M. Pochet, B. L.R. De Moor, T. Van Gorp, D. Timmerman, I. B. Vergote, L. C. Hartmann, A. I. Damokosh, and S. Hoersch
Independent Test Set Performance in the Prediction of Early Relapse in Ovarian Cancer with Gene Expression Profiles
Clin. Cancer Res., November 1, 2005; 11(21): 7958 - 7959.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/14/3185    most recent
bti495v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Pochet, N. L. M. M.
Right arrow Articles by De Moor, B. L. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Pochet, N. L. M. M.
Right arrow Articles by De Moor, B. L. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?