Bioinformatics Advance Access originally published online on October 5, 2004
Bioinformatics 2004 20(18):3583-3593; doi:10.1093/bioinformatics/bth447
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics vol. 20 issue 18 © Oxford University Press 2004; all rights reserved.
BagBoosting for tumor classification with gene expression data
Seminar für Statistik, ETH Zürich, CH-8092 Switzerland
Received on June 5, 2004; revised on July 5, 2004; accepted on July 9, 2004
Advance Access Publication October 5, 2004
Motivation: Microarray experiments are expected to contribute significantly to the progress in cancer treatment by enabling a precise and early diagnosis. They create a need for class prediction tools, which can deal with a large number of highly correlated input variables, perform feature selection and provide class probability estimates that serve as a quantification of the predictive uncertainty. A very promising solution is to combine the two ensemble schemes bagging and boosting to a novel algorithm called BagBoosting.
Results: When bagging is used as a module in boosting, the resulting classifier consistently improves the predictive performance and the probability estimates of both bagging and boosting on real and simulated gene expression data. This quasi-guaranteed improvement can be obtained by simply making a bigger computing effort. The advantageous predictive potential is also confirmed by comparing BagBoosting to several established class prediction tools for microarray data.
Availability: Software for the modified boosting algorithms, for benchmark studies and for the simulation of microarray data are available as an R package under GNU public license at http://stat.ethz.ch/~dettling/bagboost.html
Contact: dettling{at}stat.math.ethz.ch
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
I. M. Johnstone and D. M. Titterington Statistical challenges of high-dimensional data Phil Trans R Soc A, November 13, 2009; 367(1906): 4237 - 4253. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. K. Nicodemus and J. D. Malley Predictor correlation impacts machine learning algorithms: implications for genomic studies Bioinformatics, August 1, 2009; 25(15): 1884 - 1890. [Abstract] [Full Text] [PDF] |
||||
![]() |
K.-H. Liu and C.-G. Xu A genetic programming-based approach to the classification of multiclass microarray datasets Bioinformatics, February 1, 2009; 25(3): 331 - 337. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Donoho and J. Jin Higher criticism thresholding: Optimal feature selection when useful features are rare and weak PNAS, September 30, 2008; 105(39): 14790 - 14795. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Leone, Sumedha, and M. Weigt Clustering by soft-constraint affinity propagation: applications to gene-expression data Bioinformatics, October 15, 2007; 23(20): 2708 - 2715. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Pang, A. Lin, M. Holford, B. E. Enerson, B. Lu, M. P. Lawton, E. Floyd, and H. Zhao Pathway analysis using random forests classification and regression Bioinformatics, August 15, 2006; 22(16): 2028 - 2036. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-Y. Koo, I. Sohn, S. Kim, and J. W. Lee Structured polychotomous machine diagnosis of multiple cancer types using gene expression Bioinformatics, April 15, 2006; 22(8): 950 - 958. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Taylor and R. Tibshirani A tail strength measure for assessing the overall univariate significance in a dataset Biostat., April 1, 2006; 7(2): 167 - 181. [Abstract] [Full Text] [PDF] |
||||



