Bioinformatics Vol. 18 no. 12 2002
Pages 1625-1632
© 2002 Oxford University Press
Partial least squares proportional hazard regression for application to DNA microarray survival data
1 Department of Statistics, Texas A&M University,
College Station, TX 77843, USA
2 Department of Applied Science,
University of California, Davis, CA 95616, USA
Received on October 14, 2001
; revised on March 12, 2002
; accepted on June 18, 2002
Motivation: Microarrays are increasingly used in cancer research. When gene transcription data from microarray experiments also contains patient survival information, it is often of interest to predict the survival times based on the gene expression. In this paper we consider the well-known proportional hazard (PH) regression model for survival analysis. Ordinarily, the PH model is used with a few covariates and many observations (subjects). We consider here the case that the number of covariates, p, exceeds the number of samples, N, a setting typical of gene expression data from DNA microarrays.
Results: For a given vector of response values which are survival times and p gene expressions (covariates) we examine the problem of how to predict the survival probabilities, when N << p. The approach taken to cope with the high dimensionality is to reduce the dimension using partial least squares with the response variable as the vector of survival times. After dimension reduction, the extracted PLS gene components are then used as covariates in a PH regression to predict the survival probabilities. We demonstrate the use of the methodology on two cDNA gene expression data sets, both containing survival data. The first data set contains 40 diffuse large B-cell lymphoma (DLBCL) tissue samples and the second data set contains 49 tissue samples from patients with locally advanced breast cancer in a prospective study.
Availability: The methodology can be implemented using a combination of standard statistical methods, available, for example, in SAS. Sample SAS macro codes to implement the methods will be available at http://stat.tamu.edu/~dnguyen/supplemental.html
Contact: dnguyen{at}stat.tamu.edudmrocke{at}ucdavis.edu
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
T. Bonome, D. A. Levine, J. Shih, M. Randonovich, C. A. Pise-Masison, F. Bogomolniy, L. Ozbun, J. Brady, J. C. Barrett, J. Boyd, et al. A Gene Signature Predicting for Survival in Suboptimally Debulked Patients with Ovarian Cancer Cancer Res., July 1, 2008; 68(13): 5478 - 5486. [Abstract] [Full Text] [PDF] |
||||
![]() |
A.-L. Boulesteix and K. Strimmer Partial least squares: a versatile tool for the analysis of high-dimensional genomic data Brief Bioinform, January 1, 2007; 8(1): 32 - 44. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Rajicic, D. M. Finkelstein, D. A. Schoenfeld, and the Inflammation Host Response to Injury Research Survival analysis of longitudinal microarrays Bioinformatics, November 1, 2006; 22(21): 2643 - 2649. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Sha, M. G. Tadesse, and M. Vannucci Bayesian variable selection for the analysis of microarray data with censored outcomes Bioinformatics, September 15, 2006; 22(18): 2262 - 2268. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Li Survival prediction of diffuse large-B-cell lymphoma based on both clinical and gene expression information Bioinformatics, February 15, 2006; 22(4): 466 - 471. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Tan, L. Shi, S. M. Hussain, J. Xu, W. Tong, J. M. Frazier, and C. Wang Integrating time-course microarray gene expression profiles with cytotoxicity for identification of biomarkers in primary rat hepatocytes exposed to cadmium Bioinformatics, January 1, 2006; 22(1): 77 - 87. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Ein-Dor, I. Kela, G. Getz, D. Givol, and E. Domany Outcome signature genes in breast cancer: is there a unique set? Bioinformatics, January 15, 2005; 21(2): 171 - 178. [Abstract] [Full Text] [PDF] |
||||


