Bioinformatics Advance Access published online on April 7, 2005
Bioinformatics, doi:10.1093/bioinformatics/bti429
1 Department of Mediamatics, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Mekelweg 4, 2628 CD Delft, The Netherlands; Department of Pathology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands
* To whom correspondence should be addressed.
Motivation: Microarray gene expression data is increasingly employed to identify sets of marker genes that accurately predict disease development and outcome in cancer. Many computational approaches have been proposed to construct such predictors. However, there is, as yet, no objective way to evaluate whether a new approach truly improves on the current state-of-the-art. In addition no standard computational approach has emerged which enables robust outcome prediction. Results: An important contribution of this work is the description of a principled training and validation protocol, which allows objective evaluation of the complete methodology for constructing a predictor. We review the possible choices of computational approaches, with specific emphasis on predictor choice and reporter selection strategies. Employing this training-validation protocol, we evaluated different reporter selection strategies and predictors on six gene expression datasets of varying degree of difficulty. We demonstrate that simple reporter selection strategies (forward filtering and shrunken centroids) work surprisingly well and outperform partial least squares in four of the six datasets. Similarly, simple predictors, such as the nearest mean classifier, outperform more complex classifiers. Our training-validation protocol provides a robust methodology to evaluate the performance of new computational approaches and to objectively compare outcome predictions on different datasets.
Received November 2, 2004
Revised April 1, 2005
Accepted April 2, 2005
Article
A protocol for building and evaluating predictors of disease state based on microarray data
2 Department of Mediamatics, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Mekelweg 4, 2628 CD Delft, The Netherlands
3 Department of Radiotherapy, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands
4 Rosetta Inpharmatics LLC, A wholly owned subsidiary of Merck & Co., Inc., 401 Terry Avenue N. Seattle, Washington 98109, USA
5 Department of Pathology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands
Lodewyk F. A. Wessels, E-mail: l.f.a.wessels{at}ewi.tudelft.nl
![]()
Abstract ![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. J. L. de Groot, R. J. P. van Berlo, W. A. van Winden, P. J. T. Verheijen, M. J. T. Reinders, and D. de Ridder Metabolite and reaction inference based on enzyme specificities Bioinformatics, November 15, 2009; 25(22): 2975 - 2982. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Chanrion, V. Negre, H. Fontaine, N. Salvetat, F. Bibeau, G. M. Grogan, L. Mauriac, D. Katsaros, F. Molina, C. Theillet, et al. A Gene Expression Signature that Can Predict the Recurrence of Tamoxifen-Treated Primary Breast Cancer Clin. Cancer Res., March 15, 2008; 14(6): 1744 - 1752. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Arena, C. Isella, M. Martini, A. de Marco, E. Medico, and A. Bardelli Knock-in of Oncogenic Kras Does Not Transform Mouse Somatic Cells But Triggers a Transcriptional Response that Classifies Human Cancers Cancer Res., September 15, 2007; 67(18): 8468 - 8476. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. A. Wood, P. M. Visscher, and K. L. Mengersen Classification based upon gene expression data: bias and precision of error rates Bioinformatics, June 1, 2007; 23(11): 1363 - 1370. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. Glas, L. Knoops, L. Delahaye, M. J. Kersten, R. E. Kibbelaar, L. A. Wessels, R. van Laar, J. H. J.M. van Krieken, J. W. Baars, J. Raemaekers, et al. Gene-Expression and Immunohistochemical Study of Specific T-Cell Subsets and Accessory Cell Types in the Transformation and Prognosis of Follicular Lymphoma J. Clin. Oncol., February 1, 2007; 25(4): 390 - 398. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Medina, D. Montaner, J. Tarraga, and J. Dopazo Prophet, a web-based tool for class prediction using microarray data Bioinformatics, February 1, 2007; 23(3): 390 - 391. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Sun, S. Goodison, J. Li, L. Liu, and W. Farmerie Improved breast cancer prognosis through the combination of clinical and genetic markers Bioinformatics, January 1, 2007; 23(1): 30 - 37. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Denkert, J. Budczies, T. Kind, W. Weichert, P. Tablack, J. Sehouli, S. Niesporek, D. Konsgen, M. Dietel, and O. Fiehn Mass Spectrometry-Based Metabolic Profiling Reveals Different Metabolite Patterns in Invasive Ovarian Carcinomas and Ovarian Borderline Tumors. Cancer Res., November 15, 2006; 66(22): 10795 - 10804. [Abstract] [Full Text] [PDF] |
||||



