Skip Navigation


Bioinformatics Advance Access originally published online on February 10, 2005
Bioinformatics 2005 21(10):2438-2446; doi:10.1093/bioinformatics/bti312
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/10/2438    most recent
bti312v2
bti312v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (13)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Rahnenführer, J.
Right arrow Articles by Lengauer, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Rahnenführer, J.
Right arrow Articles by Lengauer, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

Estimating cancer survival and clinical outcome based on genetic tumor progression scores

Jörg Rahnenführer 1,*, Niko Beerenwinkel 1,{dagger}, Wolfgang A. Schulz 2, Christian Hartmann 3, Andreas von Deimling 3, Bernd Wullich 4 and Thomas Lengauer 1

1Max-Planck Institute for Informatics Stuhlsatzenhausweg 85, D-66123 Saarbrücken, Germany
2Department of Urology, Heinrich-Heine University D-40225 Düsseldorf, Germany
3Department of Neuropathology, Charité, Humboldt University D-13353 Berlin, Germany
4Department of Urology and Pediatric Urology, University of the Saarland D-66421 Homburg/Saar, Germany

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 CONCLUSIONS
 REFERENCES
 

Motivation: In cancer research, prediction of time to death or relapse is important for a meaningful tumor classification and selecting appropriate therapies. Survival prognosis is typically based on clinical and histological parameters. There is increasing interest in identifying genetic markers that better capture the status of a tumor in order to improve on existing predictions. The accumulation of genetic alterations during tumor progression can be used for the assessment of the genetic status of the tumor. For modeling dependences between the genetic events, evolutionary tree models have been applied.

Results: Mixture models of oncogenetic trees provide a probabilistic framework for the estimation of typical pathogenetic routes. From these models we derive a genetic progression score (GPS) that estimates the genetic status of a tumor. GPS is calculated for glioblastoma patients from loss of heterozygosity measurements and for prostate cancer patients from comparative genomic hybridization measurements. Cox proportional hazard models are then fitted to observed survival times of glioblastoma patients and to times until PSA relapse following radical prostatectomy of prostate cancer patients. It turns out that the genetically defined GPS is predictive even after adjustment for classical clinical markers and thus can be considered a medically relevant prognostic factor.

Availability: Mtreemix, a software package for estimating tree mixture models, is freely available for non-commercial users at http://mtreemix.bioinf.mpi-sb.mpg.de. The raw cancer datasets and R code for the analysis with Cox models are available upon request from the corresponding author.

Contact: rahnenfj{at}mpi-sb.mpg.de


    1 INTRODUCTION
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 CONCLUSIONS
 REFERENCES
 
For an appropriate treatment of cancer patients, prediction of time until death or time until relapse after surgery is an important task. As prognostic factors, typically clinical and histological measurements such as age, sex, histopathological findings, tumor stage, tumor volume or lymph node status are considered. The identification of genetic prognostic markers that better reflect tumor biology is eminent. For many cancer types, tumor progression can be characterized by the accumulation of complex chromosome alterations. For example Ohgaki et al. (2004) describe typical genetic pathways to glioblastoma, and Strohmeyer et al. (2004) associate genetic aberrations in prostate carcinoma with tumor progression measured by microvessel density.

There is a vast amount of literature on linking single genetic alterations to survival, but only few efforts have been made to construct more complex and comprehensive markers. Jiang et al. (2000) describe the construction of evolutionary tree models for renal cell carcinoma from comparative genomic hybridization (CGH) data. Their directed tree models estimate the sequential order of the accumulation of genetic events together with conditional probabilities for observing subsequent events if the respective precursor events are known to be present. These models were first applied by Desper et al. (1999) to oncogenesis using CGH data. Simon et al. (2000) analyzed chromosome abnormalities in ovarian adenocarcinoma with the same approach. Previously, fixed linear pathways had been proposed; see, for example, Vogelstein et al. (1988). The directed trees are more flexible than the linear models, as different pathways can be represented simultaneously.

The tree models of Desper et al. (1999) are of high explanatory power, but often only for a portion of the analyzed tumor samples. A subset of genetic events is only represented by the model if for any event in this subset all precursor events in the tree also belong to the subset. All other subsets are assigned likelihood zero. Von Heydebreck et al. (2004) propose including additional hidden events in the tree and to model genetic events as leaves in the tree. This method trades feasibility of maximum-likelihood estimation of oncogenetic trees with reduced interpretability due to the hidden events.

Beerenwinkel et al. (2005a) introduce mixture models of the oncogenetic trees used in Desper et al. (1999). In these mixture models, one tree component is modeled to have a star-like topology, representing independence between genetic events. Such oncogenetic mixture models combine interpretability of the trees of Desper et al. (1999) with an appropriate probabilistic framework. All parameters of the mixture model are estimated with an expectation-maximization (EM)-type algorithm. Owing to the star-like component, every combination of genetic events is represented in the model.

In the present paper, we propose to use oncogenetic trees mixture models for estimating the state of tumors characterized by subsets of observed genetic events. The main methodological contributions of the paper are the introduction of a genetic progression score (GPS) and its application in cancer survival analysis. The relevance of the GPS as a prognostic marker is analyzed with Cox proportional hazards regression models (Cox, 1972; Andersen and Gill, 1982) for two different cancer datasets.

In the Methods section we formally describe oncogenetic tree models, the derivation of scores for genetic progression from binary measurements and Cox regression models. In the Results section we present evidence for the predictive power of the GPSs. The findings are consistent, although two genetically different tumor types have been analyzed, different measurements have been used and the clinical outcome was defined differently, namely by time to death and by time to biochemical relapse.


    2 METHODS
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 CONCLUSIONS
 REFERENCES
 
We first describe oncogenetic tree models (Desper et al., 1999) and mixtures of oncogenetic trees (Beerenwinkel et al., 2005a). Then, the count statistic, the weighted count statistic and the GPS are introduced as scores for tumor progression, and the use of Cox models in the context of survival analysis is explained.

2.1 Oncogenetic tree models
Oncogenetic trees are models that describe the order in which genetic events occur in the course of human tumor development. In oncogenetic trees, an edge between two consecutive genetic events is labeled with the conditional probability that the event associated with the child vertex occurs given that the event of the parent vertex has occurred. Let {1,...,{ell}} be the set of genetic events. Then each genetic pattern can be represented as a binary vector of length {ell} + 1 with x0 = 1 indicating that the initial null event has occurred. The measured genetic events are often chromosome aberrations such as gains or losses of parts of chromosomes, which can be read out from CGH data, for example.

Formally, an oncogenetic tree consists of a set of vertices V = {0,...,{ell}}, a set of edges E, a special root vertex r V and a mapping p:E -> [0,1]. The vertices V correspond to the binary random variables X1,...,X{ell} that indicate the occurrence of genetic events, and the root vertex r represents the initial null event. (V,E,r) is a connected branching rooted at r and for all edges e = (u,v) E, p(e) = Pr(Xv = 1|Xu = 1) is the conditional probability of event v given that event u has occurred.

An oncogenetic tree induces a probability distribution on the set of all possible genetic patterns. The probability that generates a sample x can be calculated as follows. Let S {subseteq} V be the set of events specified by x. If there exists a subset E' {subseteq} E such that S is the set of all vertices reachable from r in the induced subtree ), then x can be generated by , and

(1)
If no such edge subset exists, x cannot be generated from and .

Desper et al. (1999) show that oncogenetic trees can be reconstructed efficiently from all pairwise probabilities of events. The tree structure is obtained as the solution of the maximum weight branching problem in the complete graph on {ell} + 1 vertices. The weight of an edge between two events depends only on the respective marginal and joint probabilities of these two events; see Desper et al. (1999) for a more detailed description.

Owing to the topology of an oncogenetic tree, many samples x are assigned probability zero, rendering impossible for example a cross-validation approach. To overcome this drawback, Beerenwinkel et al. (2005a) introduce K-mutagenetic trees mixture models. The first tree component of such a model exhibits a star topology and models spontaneous and independent occurrence of genetic events. The other components are arbitrary trees estimated from the observed data. In the following we call such models oncogenetic trees mixture models in order to reference the application scenario.

Formally, an oncogenetic trees mixture model M is a weighted sum of oncogenetic trees,

(2)
where is a star. The likelihood of a pattern x in the mixture model is

(3)
The mixture model can be learned in an EM-like fashion by iteratively estimating the responsibilities of the different tree components for the data and by inferring the structure and parameters of the tree models from the weighted data. In this model, all samples have positive probability due to the tree component with star topology. An implementation of oncogenetic trees mixture models is presented in Beerenwinkel et al. (2005b). In this framework, the treatment of missing data values is straightforward as they can be estimated in the E-step of the algorithm.

2.2 Genetic progression scores
Various clinical and histological markers have been proposed and used for determining the progression status of human tumors. The main applications are prediction of survival time and selection of a suitable therapy for every patient. Here, we introduce two naive and one complex but interpretable score for estimating the genetic progression of a tumor. The scores are defined for tumor samples that are represented by binary vectors indicating the occurrence of a list of genetic events.

2.2.1 Count statistic
Consider {ell} binary random variables X1,...,X{ell} that represent a set of genetic events. A naive summary measure of genetic progression is the number of events that have occurred in a tumor sample. Simple counting is an effective measure if the following assumptions hold: All events are independent and of the same relevance, and the impact of different events on progression is cumulative. Let x = (x1,...,x{ell}) be the binary vector of observed genetic events of a tumor sample. The count statistic is defined as

(4)

2.2.2 Weighted count statistic
In general it is not reasonable to assume that the observed genetic events are equally important. Events that are frequently observed in a population of tumor patients can be expected to occur earlier in the accumulation process, whereas having observed a generally less frequent event indicates more advanced progression. Let n = (n1,...,n{ell}) be the vector of absolute frequencies of observed genetic events in a set of N tumor samples. Then the weighted count statistic is defined as

(5)
where wi is the weight of the single event xi.

2.2.3 Genetic progression score
Apparently, the assumptions of independence and cumulative effects do not hold in general either. Oncogenetic trees can be used to integrate dependences between events in the progression score, since they model the ordered accumulation of the events. We present a score that can be calculated from the oncogenetic trees mixture model defined by Beerenwinkel et al. (2005a).

A timed oncogenetic tree can be obtained by assuming independent Poisson processes for the occurrence of events on the tree edges and for the sampling time of the tumor. Denote by Ti the waiting time of event i, i.e. the difference of occurrence times of its parent and of the event itself. Let Ti be exponentially distributed with parameter {lambda}i and let the sampling time of the tumor TS be exponentially distributed with parameter {lambda}S. Since the exponential distribution is memoryless, in this model the conditional probability pi on the edge entering event i can be calculated as pi = {lambda}i/({lambda}i + {lambda}S). The expected waiting time E(Ti) for event i then is given by

(6)
The tumor age at the time of sampling is not known, in general. Thus the parameter {lambda}S cannot be estimated from the data, and the expected waiting time E(Ti) cannot be scaled to the true time scale of the oncogenetic process. We define unitless waiting times E(Ti) by normalizing the mean sampling time to E(TS) = 1/{lambda}S = 1.

For a single event the expected waiting time can explicitly be calculated from the oncogenetic tree using formula (6). In general, for an arbitrary fixed pattern of genetic events such a formula is not available. Within the timed oncogenetic tree model the expected waiting time can be obtained by simulating the waiting process along the tree edges. For a large number nsim of simulations (typically nsim ≥ 106), first waiting times for all conditional events on the tree edges are drawn from exponential distributions with parameters {lambda}i = pi/(1 – pi). Waiting times on subsequent edges are then added according to tree topology. The expected waiting time of a pattern is finally estimated as the average of all waiting times at which the pattern is observed. We refer to this unitless waiting time as the GPS of the pattern. Thus, GPS reflects the progression of tumor development along the oncogenetic tree model of genetic aberrations. For the sample x = (x1,...,x{ell}) we define

(7)
where Tx denotes the waiting time until pattern x and the expectation is taken with respect to the distribution induced by the underlying oncogenetic trees mixture model M.

2.3 Survival analysis
The Cox proportional hazards regression model introduced by Cox (1972) is a means of estimating the relevance of predictors for a survival time of interest. In the context of tumor data, the survival time starts at the time of diagnosis, surgery or treatment. The endpoint is defined by some explicit event of interest. In cancer research, this event typically is the death of the patient caused by the disease or some kind of relapse, for example recurrence of a tumor.

2.3.1 Cox proportional hazards model
Often a medical study does not span enough time for observing the event of interest for all patients in the study. In the case of right censored data only a lower bound of the survival time is known. This happens when patients drop out of the study for reasons unrelated to the study, for example due to a move, an accident, or death for other reasons than the disease. Another reason for right censoring is that the event of interest has not yet occurred when the data are collected. In all cases, a lower bound for the survival time is given by the time until last follow-up without having observed the event of interest.

The Cox proportional hazards model is expressed in terms of a single survival time for each patient, with possible right censoring. In the presence of censoring, hazard models can be used to calculate the risk of death. The hazard rate {lambda}(t) at a time point t is defined as the instantaneous rate of death during the next instant of time among survivors to time t. For a random variable T denoting a survival time, the hazard rate is defined as

(8)
The semiparametric Cox proportional hazards model is the most commonly used model in hazard regression. Let z = (z1,...,zp) be a p-dimensional vector of covariates that are potential predictors for the survival time. We are interested in assessing the impact of the single covariates. In the Cox model, the conditional hazard function, given the values of the covariates, is assumed to be of the form

(9)
where ß = (ß1,...,ßp)T is the vector of regression coefficients, and {lambda}0 denotes a baseline hazard function. In the model (9) the contributions of predictors to the hazard are multiplicative. For an introduction to the Cox model, see Klein and Moeschberger (1997); for more details, see Andersen and Gill (1982). In the following we call a Cox model with p = 1 predictor a univariate model and a Cox model with p > 1 predictors a multivariate model.

The output of a Cox hazard regression are estimates for the regression coefficients and their variances. These estimates are used in statistical hypothesis tests for identifying the coefficients that are significantly different from 0. A low p-value then indicates a significant influence of the respective predictor.


    3 RESULTS
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 CONCLUSIONS
 REFERENCES
 
The GPS described in the Methods section were calculated for two different cancer datasets, comprising brain tumors (glioblastomas) and prostate tumors, respectively. In both cases Cox regression models were used to determine the prognostic value of the scores. This value is typically expressed through the ability of the score to identify patient subgroups with a significant difference in clinical outcome.

3.1 Oncogenetic data
Brain tumors pose a particular challenge to molecular oncology, since different subtypes seem to follow distinct pathogenetic routes. We analyzed a dataset relating to glioblastomas, as described in von Deimling et al. (2000). The GBM samples selected for this study consist of primary glioblastomas without a giant-cell or gliosarkoma morphology. The survival time was defined as time from diagnosis to death. The dataset contained 75 patients, comprising 70 patients who died from the disease and five patients with censored survival times. The genetic events of interest were chromosome aberrations measured by loss of heterozygosity (LOH) on the p-arm or q-arm of single chromosomes. In total, measurements for 40 chromosome arms were available. We selected the events that were observed in at least 15% of the tumor samples, i.e. events that were observed for at least 12 out of 75 patients. The resulting list contained 7 out of the original 40 events, namely LOH on 10q, 10p, 9p, 19q, 17p, 13q and 22q, respectively.

The prostate cancer dataset contained 54 patients. Here, the endpoint defining the survival time was (prostate-specific antigen PSA) recurrence, indicating tumor relapse. For 20 out of the 54 patients, PSA recurrence was observed, 34 recurrence times were censored. The genetic events were chromosome aberrations measured by CGH. For the p-arm or q-arm of all chromosomes, gains and losses of chromosome parts were analyzed. We selected important aberrations based on medical relevance and a minimum relative frequency of 10%, i.e. the event was observed in at least 6 out of 54 patients. The final list consists of the events 3q+, 4q+, 6q+, 7q+, 8p–, 8q+, 10q–, 13q+ and Xq+, where a plus indicates a chromosomal gain and a minus a chromosomal loss. Both datasets, including the survival and the PSA relapse time, respectively, as well as the censoring time, are available upon request.

3.2 Estimated oncogenetic tree models
We estimate oncogenetic trees mixture models (2) consisting of a star component and non-trivial tree components in order to obtain a concise description of the genetic development of different cancer types.

3.2.1 Glioblastomas
Figure 1 shows the estimated trees mixture model for the glioblastoma dataset. A total of 56% of the tumors can be explained by the non-trivial tree component. The most frequent events, related to LOH on 9p, 10p and 10q, occur as early events in both tree components. This is in agreement with previously published studies (von Deimling et al., 2000; Ohgaki et al., 2004). In the non-trivial component, the events 9p, 10q (connected with 10p with probability 1) and 17p are early events. LOH on 19q and the path 13q followed by 22q are estimated as successors of LOH on 10q.



View larger version (12K):
[in this window]
[in a new window]
 
Fig. 1 Oncogenetic trees mixture model for the development of glioblastomas. Both mixture components are labeled with their weight in the upper left corner. Tree vertices correspond to loss of heterozygosity on chromosome arms. Edges are labeled with conditional probabilities.

 
3.2.2 Prostate cancers
Figure 2 shows the estimated trees mixture model for the prostate cancer dataset. A total of 57% of tumor samples are best explained by assuming independence of events. For this subset, the most likely and hence early events are loss on chromosome arm 8p and gain on Xq. All other events occur with lower probabilities. The remaining 43% of samples are better explained by the tree structure displayed in the second model component. For this subset, we estimate 6q+ to be an early event followed by either 10q–, or, more likely, by 4q+ and 8q+. The latter pathway continues with gains on 13q, 3q, 7q or Xq. Loss of 8p is estimated to be a late event that occurs after 7q+ in this subgroup of tumors.



View larger version (12K):
[in this window]
[in a new window]
 
Fig. 2 Oncogenetic trees mixture model for the development of prostate cancer. For a detailed description, see Figure 1.

 
3.2.3 Quality of estimated mixture models
The mixture models are estimated with a maximum-likelihood-type approach, such that the stability of the estimated topology and conditional likelihoods are not guaranteed in general. A naive assumption for the true underlying model structure is independence of genetic events. For the relevance of the mixture model it is essential to demonstrate superiority over this naive model. We thus calculate the information gain of the mixture model with respect to the independence model. This gain is measured by the difference of the log-likelihoods of the observed data. Owing to the higher complexity of the mixture model that includes the independence model as a special case, its log-likelihood is inherently larger. We thus compute a p-value for the observed gain using a permutation test. We randomly permute the indicator vectors of genetic events within all samples, creating new datasets with independent events. For these datasets, the log-likelihood gain is only caused by increased model complexity and not by capturing some of the true dependence structure. We calculate this gain for 1000 permutations. The fraction of cases with a value larger than the gain for the original data is denoted as the p-value pmix of the mixture model. For the glioblastomas we obtain pmix = 0.003, for the prostate cancers pmix < 0.001, since no value was larger than the original gain. This analysis shows that the estimated oncogenetic trees mixture models significantly improve on the naive model and thus capture at least parts of the true dependence structure between the genetic events.

3.3 Predictive power of genetic progression scores
The Cox proportional hazards model (9) is used for identifying genetic markers that are relevant for estimating clinical outcome. We analyzed both single genetic events and complex GPSs as prognostic markers. The univariate Cox model has the form {lambda}(t|z1) = {lambda}0(t)exp(ß1z1), where z1 is a binary random variable that indicates if the respective genetic event has occurred or not. The hazard ratio exp(ß1) thus quantifies the relative risk of death for z1 = 1 with respect to the baseline condition z1 = 0. Cox regression provides an estimate for the hazard ratio with the corresponding 95% confidence interval. For testing the hypothesis ß1 != 0 against ß1 = 0 we calculate p-values using Efron's approximation. This method is the default implementation in Cox regression with the statistical programming package R (R Development Core Team, 2004, http:/www.R-project.org).

3.3.1 Glioblastomas
For the glioblastoma dataset, Table 1 lists all observed patterns of LOH measurements for the previously selected chromosome arms. The patterns are ordered by the corresponding GPS. In the table, the value 1 indicates an event (here loss of heterozygosity), the value 0 indicates no event, and the value –1 denotes a missing value. Overall, the dataset contains 35.7% events and 12.2% missing data. The missing values are estimated during the EM-type algorithm. For example, see the eight patterns in Table 1 with GPS = 0.276. In these cases the estimated pattern is 0110000, i.e. LOH only on chromosome arms 10p and 10q. Since these two events are the most frequent events (Table 2), in case of missing values it is more likely to estimate a 1 than for the other events. Note that this is a simplification, since the oncogenetic tree model also considers interactions in the estimation process.


View this table:
[in this window]
[in a new window]
 
Table 1 Glioblastoma data set: pattern of observed LOH measurements for selected events, frequency of pattern and GPS calculated from oncogenetic tree model

 

View this table:
[in this window]
[in a new window]
 
Table 2 Glioblastoma dataset: genetic events defined by LOH on single chromosomes, frequencies and p-values in Cox models (original and false discovery rate adjusted in univariate and original in multivariate model)

 
For the glioblastomas, survival was defined as time from surgery to death. Tables 2 and 3 summarize the results of the Cox regression analysis. Table 2 shows p-values for single genetic events relating to LOH on selected chromosome arms. Both univariate and multivariate Cox models are considered. In the multivariate model all genetic events are included as predictors. Interactions cannot be considered due to the low number of samples compared with the number of predictors. In the third column the resulting p-values of univariate Cox models are listed. The fourth column contains p-values after adjustment for multiple testing according to the false discovery rate (Benjamini and Hochberg, 1995). Controlling the false discovery rate is the least stringent criterion among commonly used adjustment methods. The false discovery rate is the expected proportion of false discoveries among the rejected hypotheses. The fifth column of Table 2 contains p-values in the multivariate model.


View this table:
[in this window]
[in a new window]
 
Table 3 Glioblastoma dataset: GPS with hazard ratios, 95% confidence intervals and p-values in univariate and bivariate Cox regression model

 
Only LOH on 13q leads to a univariate p-value below p = 0.05. After adjustment no single event is significant, and LOH on 13q is associated with a false-discovery-rate-adjusted p-value of p = 0.240. In the multivariate model, no event is significant. The age of the patient at the time of surgery is known to be a critical factor for the survival time. When correcting p-values for age, in all Cox models almost exactly the same results are obtained (data not shown).

Table 3 shows significance results for Cox models using as predictors the GPSs defined in Section 2.2, namely the count statistic (4) the weighted count statistic (5) and the GPS. Table 3 lists the estimated hazard ratios, confidence intervals and p-values, referring to univariate and to bivariate models. In the latter case age is included as a second predictor. Both the count statistic and the weighted count statistic are significantly linked with survival, with p-values of 0.037 and 0.039, respectively, in the univariate model. GPS is the best discriminating measure, with a p-value of 0.015. Furthermore, in the bivariate model almost identical estimates and p-values are obtained for all progression scores, whereas age is not significant.

Figure 3 shows Kaplan–Meier survival curves for subgroups of glioblastoma patients. The average empirical GPS of all samples (GPS = 0.8) was used to split the patients into two genetically defined groups. Patients with glioblastomas generally have a poor prognosis, which is expressed by low expected survival time. Still, the split induced by GPS is informative. For example, the estimated survival rate at 1 year after diagnosis is 0.34 for the low genetic progression group (GPS ≤ 0.8), but only 0.19 for the group with advanced genetic progression (GPS > 0.8).



View larger version (13K):
[in this window]
[in a new window]
 
Fig. 3 Kaplan–Meier survival curves for the glioblastoma dataset; patients split into subgroups according to GPS.

 
3.3.2 Prostate cancer
A significance analysis equivalent to that for the glioblastoma tumors was performed for the prostate cancer patients. As established clinical scores, the Gleason score and the lymph node status were analyzed. The Gleason score is the most commonly used prostate cancer grading system, which is based on assessing the histological pattern of tumor growth. The lymph node status differentiates between patients with regional lymph node metastases and patients without such metastases. The endpoint for calculating the survival time was defined by time from radical prostatectomy to PSA recurrence, which indicates a relapse of the tumor.

Table 4 lists patterns of CGH measurements for the selected genetic events as well as the corresponding GPS for every pattern. The dataset contains no missing values. Tables 5 and 6 give results of the Cox regression analysis. For the genetic markers defined by CGH alterations, the only gain on chromosome 13q is significantly correlated with longer time to PSA recurrence (p = 0.042). After false discovery rate adjustment or in the multivariate model no alteration is significant. Concerning the clinical markers, in a univariate model the Gleason score is significant (p = 0.011), but not the lymph node status (p = 0.34).


View this table:
[in this window]
[in a new window]
 
Table 4 Prostate cancer dataset: pattern of observed CGH measurements for selected events, frequency of pattern and GPS calculated from oncogenetic tree model

 

View this table:
[in this window]
[in a new window]
 
Table 5 Prostate cancer dataset: genetic events defined by CGH on single chromosomes, frequencies and p-values in Cox models (original and false discovery rate adjusted in univariate and original in multivariate model)

 

View this table:
[in this window]
[in a new window]
 
Table 6 Prostate cancer dataset: genetic progression scores with hazard ratios, 95% confidence intervals and p-values in univariate and bivariate Cox regression model

 
The count statistic and the weighted count statistic are not significantly correlated with PSA recurrence time (p = 0.088 and p = 0.082, respectively), whereas the p-value for GPS in the univariate model is p = 0.040 (Table 6). Grouping patients into two subgroups defined by GPS > 1 and GPS ≤ 1 results in an even smaller p-value of 0.014. Again, the cutoff at GPS = 1 is chosen as average empirical GPS of all samples. Figure 4 shows Kaplan–Meier plots for time to PSA relapse for the two subgroups.



View larger version (15K):
[in this window]
[in a new window]
 
Fig. 4 Kaplan–Meier curves for prostate cancer dataset; patients split into subgroups according to GPS.

 
The Gleason score is widely used as a prognostic marker in prostate cancer studies. In multivariate Cox models combining the GPSs, the Gleason score and the age of the patient at diagnosis, the Gleason score is always significant, with p-values consistently below p = 0.02 (Table 6). The p-values of the GPSs are slightly increased in comparison with univariate models, partly due to the low sample size. For GPS we obtain p = 0.040 in the univariate model, p = 0.061 with Gleason score included as second predictor, and p = 0.072 with age included as third predictor (Table 6). Age itself is not associated with survival in any of the models (data not shown).

The results underline the predictive value of the Gleason score. Nevertheless, a large group of 30 out of 54 patients is assigned the same Gleason score of 7, such that these patients cannot be discriminated. Restricting to these patients only, we subdivide the samples again into two groups with advanced genetic progression (defined by GPS = 1) and with low genetic progression (GPS ≤ 1), respectively. It turns out that GPS is of additional prognostic value for the subgroup of tumors with Gleason score 7, with a p-value of p = 0.052 in the univariate model and p = 0.047 in the bivariate model including age. Figure 5 shows Kaplan–Meier plots for the two subgroups defined by GPS.



View larger version (14K):
[in this window]
[in a new window]
 
Fig. 5 Kaplan–Meier curves for prostate cancer dataset; patients with Gleason score 7 split into subgroups according to GPS.

 

    4 CONCLUSIONS
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 CONCLUSIONS
 REFERENCES
 
The identification of pathogenic routes in human tumors is one of the main challenges in molecular oncology. For many tumor types, genetic events defined by chromosome alterations are known to accumulate over time in the course of the disease. Such data on genetic alterations are usually only available at a single point in time for every tumor. Each tumor is associated with a genetic pattern that is represented by a binary vector, indicating which genetic events have occurred in the tumor and which have not yet occurred. We use mixtures of oncogenetic tree models for estimating the most likely pathways from such cross-sectional data. Timed oncogenetic trees are obtained by assuming independent Poisson processes on the tree edges. The timed models provide a means to assign scores for the genetic progression to every genetic pattern.

We introduce the GPS of a tumor as the estimated average waiting time of its observed genetic pattern in the timed oncogenetic tree. GPS was calculated for tumor samples from two cancer types with notably different genetic backgrounds, namely glioblastoma and prostate cancer. Using Cox regression analysis we demonstrate that for both diseases GPS has prognostic value, i.e. it can be used to differentiate patient subgroups with respect to expected clinical outcome. Of special importance is the fact that the presented method can be successfully applied to two genetically different tumor types, which underlines the potential universality of the new approach.

In order to verify the information gain due to GPS it is of particular interest to demonstrate improved performance over established histopathological parameters. By fitting multivariate Cox regression models we show that for both cancer types GPS is prognostic also after adjustment for age. For prostate tumors Gleason score reflecting the histological pattern of tumor growth is a common grading system with a high predictive value. For the largest group of tumors with an average Gleason score of 7, it turns out that GPS can be used to further identify subgroups with different prognosis with respect to time to relapse after surgery.


    Acknowledgments
 
Financial support was provided by BMBF grant no. 01GR0453 (J.R.), by Deutsche Forschungsgemeinschaft under grant no. HO 1582/1-3 (N.B.) and by Deutsche Krebshilfe under grant no. 70-3193-Schu I (W.A.S.). This work has been performed in the context of the BioSapiens Network of Excellence (EU contract no. LHSG-CT-2003-503265).


    Footnotes
 
{dagger}Present address: Department of Mathematics, University of California, Berkeley, CA 94720-3840, USA Back

Received on November 28, 2004; revised on January 21, 2005; accepted on February 7, 2005

    REFERENCES
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 CONCLUSIONS
 REFERENCES
 

    Andersen, P. and Gill, R. (1982) Cox's regression model for counting processes, a large sample study. Annal. Statist., 510, 1100–1120.

    Beerenwinkel, N., et al. (2005a) Learning multiple evolutionary pathways from cross-sectional data. J. Comput. Biol., in press.

    Beerenwinkel, N., et al. (2005b) Mtreemix: a software package for learning and using mixture models of mutagenetic trees. Bioinformatics, 21, , pp. 2106–2107 Advance Access published January 18 2005. doi:10.1093/bioinformatics/bti274[Abstract/Free Full Text].

    Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. B, 57, 289–300.

    Cox, D.R. (1972) Regression models and life tables (with Discussion). J. R. Statist. Soc. B, 34, 187–220.

    von Deimling, A., et al. (2000) Comprehensive allelotype and genetic anaysis of 466 human nervous system tumors. J. Neuropathol. Exp. Neurol., 59, 544–558[Medline].

    Desper, R., et al. (1999) Inferring tree models for oncogenesis from comparative genome hybridization data. J. Comput. Biol., 6, 37–51[ISI][Medline].

    Freije, W.A., et al. (2004) Gene expression profiling of gliomas strongly predicts survival. Cancer Res., 64, 6503–6510[Abstract/Free Full Text].

    von Heydebreck, A., et al. (2004) Maximum likelihood estimation of oncogenetic tree models. Biostatistics, 5, 545–556[Abstract].

    Jiang, F., et al. (2000) Construction of evolutionary tree models for renal cell carcinoma from comparative genomic hybridization data. Cancer Res., 60, 6503–6509[Abstract/Free Full Text].

    Klein, J.P. and Moeschberger, M.L. Survival Analysis: Techniques for Censored and Truncated Data, (1997) , New York Springer.

    Ohgaki, H., et al. (2004) Genetic pathways to glioblastoma: a population-based study. Cancer Res., 64, 6892–6899[Abstract/Free Full Text].

    R Development Core Team. R: A Language and Environment for Statistical Computing, (2004) , Vienna, Austria R Foundation for Statistical Computing.

    Simon, R., et al. (2000) Chromosome abnormalities in ovarian adenocarcinoma: III. Using breakpoint data to infer and test mathematical models for oncogenesis. Genes Chromosom. Cancer, 28, 106–120[CrossRef][Medline].

    Strohmeyer, D.M., et al. (2004) Genetic aberrations in prostate carcinoma detected by comparative genomic hybridization and microsatellite analysis: association with progression and angiogenesis. Prostate, 59, 43–58[CrossRef][ISI][Medline].

    Vogelstein, B., et al. (1988) Genetic alterations during colorectal-tumor development. N. Engl. J. Med., 319, 525–532[Abstract].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
A. Buness, R. Kuner, M. Ruschhaupt, A. Poustka, H. Sultmann, and A. Tresch
Identification of aberrant chromosomal regions from gene expression microarray studies applied to human breast cancer
Bioinformatics, September 1, 2007; 23(17): 2273 - 2280.
[Abstract] [Full Text] [PDF]


Home page
BiostatisticsHome page
N. Beerenwinkel and M. Drton
A mutagenetic tree hidden Markov model for longitudinal clonal HIV sequence data
Biostat., January 1, 2007; 8(1): 53 - 71.
[Abstract] [Full Text] [PDF]


Home page
Mol Cancer ResHome page
V. Jung, R. Kindich, J. Kamradt, M. Jung, M. Muller, W. A. Schulz, R. Engers, G. Unteregger, M. Stockle, R. Zimmermann, et al.
Genomic and Expression Analysis of the 3q25-q26 Amplification Unit Reveals TLOC1/SEC62 as a Probable Target Gene in Prostate Cancer
Mol. Cancer Res., March 1, 2006; 4(3): 169 - 176.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/10/2438    most recent
bti312v2
bti312v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (13)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Rahnenführer, J.
Right arrow Articles by Lengauer, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Rahnenführer, J.
Right arrow Articles by Lengauer, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?