Bioinformatics Advance Access published online on May 15, 2009
Bioinformatics, doi:10.1093/bioinformatics/btp322
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Gradient Lasso for Cox Proportional Hazards Model
aDepartment of Biostatistics & Bioinformatics, Duke University, NC 27705, USA, bDepartment of Statistics and Information Science, Dongguk University, Gyeongju, 780-714, cDepartment of Statistics, University of Seoul, Seoul 130-743, Korea
*To whom correspondence should be addressed. Dr. Changyi Park, E-mail: park463{at}uos.ac.kr
| Abstract |
|---|
Motivation: There has been an increasing interest in expressing a survival phenotype (e.g., time to cancer recurrence or death) or its distribution in terms of a subset of the expression data of a subset of genes. Due to high dimensionality of gene expression data, however, there is a serious problem of collinearity in fitting a prediction model, e.g. Cox's proportional hazards model. To avoid the collinearity problem, several methods based on penalized Cox proportional hazards models have been proposed. However, those methods suffer from severe computational problems, such as slow or even failed convergence, because of high dimensional matrix inversions required for model fitting. We propose to implement the penalized Cox regression with a lasso penalty via the gradient lasso algorithm that yields faster convergence to the global optimum than do other algorithms. Moreover the gradient lasso algorithm is guaranteed to converge to the optimum under mild regularity conditions. Hence our gradient lasso algorithm can be a useful tool in developing a prediction model based on high dimensional covariates including gene expression data.
Results: Results from simulation studies showed that the prediction model by gradient lasso recovers the prognostic genes. Also results from diffuse large B-cell lymphoma datasets and Norway/Stanford breast cancer dataset indicate that our method is very competitive compared with popular existing methods by Park and Hastie (2007) and Goeman (2008) in its computational time, prediction, and selectivity.
Availability: R package glcoxph is available at http://datamining.dongguk.ac.kr/R/glcoxph.
Contact: park463{at}uos.ac.kr
Associate Editor: Prof. John Quackenbush
Received on February 9, 2009; revised on May 7, 2009; accepted on May 12, 2009