Skip Navigation


Bioinformatics Advance Access originally published online on September 11, 2008
Bioinformatics 2008 24(21):2569; doi:10.1093/bioinformatics/btn485
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/21/2569    most recent
btn485v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Li, C.
Right arrow Articles by Li, H.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Li, C.
Right arrow Articles by Li, H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

In Response to Comment on ‘Network-constrained regularization and variable selection for analysis of genomic data’

Caiyan Li and Hongzhe Li *

Department of Biostatistics and Epidemiology, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 REFERENCES
 

Contact: hongzhe{at}mail.med.upenn.edu

We would like to thank (Binder and Schumacher, 2008) for their insightful comments on our paper (Li and Li, 2008). We would like to emphasize that the main motivation of our proposed method is to incorporate a priori network/pathway information (e.g. KEGG regulatory pathways) into regression analysis so that the results can be more interpretable in the context of the known biological pathways. The focus of our paper is not on prediction, although prediction is of course related to variable selection. For the real dataset, we are interested in identifying whether some sub-networks of the KEGG might be related to survival from glioblastoma. This is the reason why we only used the expression data for the genes on the KEGG pathways in our analysis of the real dataset.

The comments by Dr Binder and Dr Schumacher mainly focused on the prediction performances of various regularized regression procedures that we used for real data analysis. First, after checking our codes again, we found one error in calculating the test mean-squared errors (T-MSEs). We used the standardized expression values in building the predictive models but forgot to standardize the test-set data when calculating the T-MSEs, which resulted in seemingly high T-MSEs. The corrected T-MSEs are given in Table 1. We regret this mistake and apologize for this error (see attached Erratum). The numbers are however comparable to the results of null model and the clinical models (0.83 and 0.82, respectively), if the variances of the T-MSEs are taken into account. Based on 1000 bootstrap samples of the testing datasets, the SDs of the T-MSEs for all these procedures ranged from 0.17 to 0.19. This did not even account for the variability in variable selection. We agree with Dr Binder and Dr Schumacher that the null model or/and clinical model should be used as a benchmark when evaluating prediction. Second, in our analysis of the real dataset, we used the 5-fold cross-validation to choose the tuning parameters, which might not be ideal for the purpose of variable selection. The variables selected and the prediction performance also depend on particular partitions of the samples during the cross-validation, especially when the sample size is small relative to the number of the predictors.


View this table:
[in this window]
[in a new window]

 
Table 1. Corrected Table 2 of Li and Li (2008)

 
We absolutely agree with Dr Binder and Dr Schumacher that survival analysis models/methods such as the Cox regression analysis should be used for dealing with censored survival outcomes. For the accelerated failure time (AFT) model (Wei, 1992) that we used in our paper,


Formula

where Ti is the age to death and Xi is the vector of gene expression levels, one way to estimate the regression coefficients is to minimize a weighted loss function by the inverse probability of censoring (van der Laan and Robins, 2002), defined as


Formula

where {delta}i is the event indicator and Formula is the Kaplan–Meier estimate of the survival function for the censoring variable. However, for the particular glioblastoma dataset that we analyzed, ignoring the censored individuals should not make a big difference since the censoring time for all five censored patients in the training set was greater than all the observed time to event and all the uncensored individuals get the same weight. In fact, for this particular dataset, the inverse-probability weighted loss is exactly the same as the least-square loss ignoring the five censored individuals.

Finally, we also agree with Dr Binder and Dr Schumacher that if the goal is to build the best predictive model for survival, one should try other methods specially developed for prediction (e.g. the boosting procedure) and use all the available gene expression data and clinical covariates. As demonstrated by Dr Binder and Dr Schumacher, better prediction can be achieved by doing this.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Olga Troyanskaya

Received on September 3, 2008; revised on September 3, 2008; accepted on September 8, 2008

    REFERENCES
 TOP
 ABSTRACT
 REFERENCES
 

    Binder H, Schumacher M. Comment on ‘Network-constrained regulariza-tion and variable selection for analysis of genomic data’. In: Bioinformatics. (2008) in press.

    Li C, Li H. Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics (2008) 24:1175–1182.[Abstract/Free Full Text]

    van der Laan MJ, Robins JM. Unified Methods for Censored Longitudinal Data and Causality. (2002) New York: Springer-Verlag.

    Wei LJ. The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. Stat. Med. (1992) 11:1871–1879.[Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/21/2569    most recent
btn485v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Li, C.
Right arrow Articles by Li, H.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Li, C.
Right arrow Articles by Li, H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?