Bioinformatics Advance Access originally published online on June 6, 2007
Bioinformatics 2007 23(16):2080-2087; doi:10.1093/bioinformatics/btm305
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Predicting survival from microarray data—a comparative study


1Department of Mathematics, 2Department of Informatics, University of Oslo, 3Norwegian Computing Center and 4Institute of Basic Medical Sciences, Department of Biostatistics, University of Oslo and Statistics for Innovation – (sfi) 2, Norway
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: Survival prediction from gene expression data and other high-dimensional genomic data has been subject to much research during the last years. These kinds of data are associated with the methodological problem of having many more gene expression values than individuals. In addition, the responses are censored survival times. Most of the proposed methods handle this by using Cox's proportional hazards model and obtain parameter estimates by some dimension reduction or parameter shrinkage estimation technique. Using three well-known microarray gene expression data sets, we compare the prediction performance of seven such methods: univariate selection, forward stepwise selection, principal components regression (PCR), supervised principal components regression, partial least squares regression (PLS), ridge regression and the lasso.
Results: Statistical learning from subsets should be repeated several times in order to get a fair comparison between methods. Methods using coefficient shrinkage or linear combinations of the gene expression values have much better performance than the simple variable selection methods. For our data sets, ridge regression has the overall best performance.
Availability: Matlab and R code for the prediction methods are available at http://www.med.uio.no/imb/stat/bmms/software/microsurv/.
Contact: hegembo{at}math.uio.no
Associate Editor: Joaquin Dopazo
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.
Received on October 31, 2006; revised on May 24, 2007; accepted on May 28, 2007