Bioinformatics Advance Access originally published online on September 11, 2008
Bioinformatics 2008 24(21):2569; doi:10.1093/bioinformatics/btn485
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
In Response to Comment on Network-constrained regularization and variable selection for analysis of genomic data
Department of Biostatistics and Epidemiology, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Contact: hongzhe{at}mail.med.upenn.edu
We would like to thank (Binder and Schumacher, 2008) for their insightful comments on our paper (Li and Li, 2008). We would like to emphasize that the main motivation of our proposed method is to incorporate a priori network/pathway information (e.g. KEGG regulatory pathways) into regression analysis so that the results can be more interpretable in the context of the known biological pathways. The focus of our paper is not on prediction, although prediction is of course related to variable selection. For the real dataset, we are interested in identifying whether some sub-networks of the KEGG might be related to survival from glioblastoma. This is the reason why we only used the expression data for the genes on the KEGG pathways in our analysis of the real dataset.
The comments by Dr Binder and Dr Schumacher mainly focused on the prediction performances of various regularized regression procedures that we used for real data analysis. First, after checking our codes again, we found one error in calculating the test mean-squared errors (T-MSEs). We used the standardized expression values in building the predictive models but forgot to standardize the test-set data when calculating the T-MSEs, which resulted in seemingly high T-MSEs. The corrected T-MSEs are given in Table 1. We regret this mistake and apologize for this error (see attached Erratum). The numbers are however comparable to the results of null model and the clinical models (0.83 and 0.82, respectively), if the variances of the T-MSEs are taken into account. Based on 1000 bootstrap samples of the testing datasets, the SDs of the T-MSEs for all these procedures ranged from 0.17 to 0.19. This did not even account for the variability in variable selection. We agree with Dr Binder and Dr Schumacher that the null model or/and clinical model should be used as a benchmark when evaluating prediction. Second, in our analysis of the real dataset, we used the 5-fold cross-validation to choose the tuning parameters, which might not be ideal for the purpose of variable selection. The variables selected and the prediction performance also depend on particular partitions of the samples during the cross-validation, especially when the sample size is small relative to the number of the predictors.
|
We absolutely agree with Dr Binder and Dr Schumacher that survival analysis models/methods such as the Cox regression analysis should be used for dealing with censored survival outcomes. For the accelerated failure time (AFT) model (Wei, 1992) that we used in our paper,
|
|
|
|
i is the event indicator and Finally, we also agree with Dr Binder and Dr Schumacher that if the goal is to build the best predictive model for survival, one should try other methods specially developed for prediction (e.g. the boosting procedure) and use all the available gene expression data and clinical covariates. As demonstrated by Dr Binder and Dr Schumacher, better prediction can be achieved by doing this.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Olga Troyanskaya
Received on September 3, 2008; revised on September 3, 2008; accepted on September 8, 2008
| REFERENCES |
|---|
|
|
|---|
Binder H, Schumacher M. Comment on Network-constrained regulariza-tion and variable selection for analysis of genomic data. In: Bioinformatics. (2008) in press.
Li C, Li H. Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics (2008) 24:1175–1182.
van der Laan MJ, Robins JM. Unified Methods for Censored Longitudinal Data and Causality. (2002) New York: Springer-Verlag.
Wei LJ. The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. Stat. Med. (1992) 11:1871–1879.[Web of Science][Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||