Skip Navigation


Bioinformatics Advance Access originally published online on July 28, 2007
Bioinformatics 2007 23(19):2612-2618; doi:10.1093/bioinformatics/btm382
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary data
Right arrow All Versions of this Article:
23/19/2612    most recent
btm382v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (13)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Hengl, S.
Right arrow Articles by Maiwald, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hengl, S.
Right arrow Articles by Maiwald, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Data-based identifiability analysis of non-linear dynamical models

S. Hengl *, C. Kreutz , J. Timmer and T. Maiwald

Physics Institute, University of Freiburg, Hermann Herder Strasse 3, 79104 Freiburg i.Br., Germany

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 BACKGROUND
 3 APPROACH
 4 METHODS
 5 RESULTS
 6 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Motivation: Mathematical modelling of biological systems is becoming a standard approach to investigate complex dynamic, non-linear interaction mechanisms in cellular processes. However, models may comprise non-identifiable parameters which cannot be unambiguously determined. Non-identifiability manifests itself in functionally related parameters, which are difficult to detect.

Results: We present the method of mean optimal transformations, a non-parametric bootstrap-based algorithm for identifiability testing, capable of identifying linear and non-linear relations of arbitrarily many parameters, regardless of model size or complexity. This is performed with use of optimal transformations, estimated using the alternating conditional expectation algorithm (ACE). An initial guess or prior knowledge concerning the underlying relation of the parameters is not required. Independent, and hence identifiable parameters are determined as well. The quality of data at disposal is included in our approach, i.e. the non-linear model is fitted to data and estimated parameter values are investigated with respect to functional relations. We exemplify our approach on a realistic dynamical model and demonstrate that the variability of estimated parameter values decreases from 81 to 1% after detection and fixation of structural non-identifiabilities.

Availability: Our algorithm is written in Matlab and R. It is available from the authors on request. An implementation of ACE, written in Matlab as well as in C, is available online at www.stefanhengl.de

Contact: hengl{at}fdm.uni-freiburg.de

Supplementary information: Supplementary data are available at Bioinformatics online.


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 BACKGROUND
 3 APPROACH
 4 METHODS
 5 RESULTS
 6 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
In the fast-growing field of Systems Biology, biological processes like signal transduction pathways and metabolic networks are often modelled mathematically based on systems of differential equations. They comprise parameters such as reaction rates, which have to be determined in accordance to measured data, e.g. by fitting to time course or dose response experiments. However, even for the simplest case of a reversible reaction of two species, Formula , only the ratio of forward and backward reaction rate can be determined, if only steady state information is available. For more complex models there may even exist several groups of functionally related parameters, which may consequently, as a matter of principle, not be determined unambiguously. Parameters for which no unique solution exists are called non-identifiable.

Two conceptually different sources for non-identifiability exist: first, the model structure itself may cause parameters to be functionally related. Second, since parameter values are estimated by fitting the model structure to experimental data even a structurally identifiable model may exhibit practical non-identifiabilities, because of an insufficient amount or quality of measurements. The noisier measurements are, and the lower the sampling frequency, the less information is contained in the measurement. Moreover, the dynamical response of the model depends on the input applied in the experiment, e.g. variable and constant stimuli may cause completely different dynamics. Therefore, the type of input may be decisive for parameters estimation (Faller et al., 2003).

Identifiability, however, is a necessary prerequisite for mathematical analysis of a model. Thus, the following question arises: how can non-identifiabilities be detected? Basically two approaches are used to handle non-identifiability: first, the model structure itself is investigated with respect to non-identifiabilities. If non-identifiabilities exist, they must be removed analytically by introduction of new parameters, representing, e.g. an identifiable combination of two non-identifiable parameters. This approach is referred to as a priori identifiability analysis, since the model is examined before simulating or fitting procedures. Second, a non-identifiable model structure manifests itself in functionally related parameters. Thus, non-identifiabilities may be detected by fitting a model repeatedly to data and investigating parameter estimates. Ideally, both methods are applied successively.

Numerous methods have been presented to deal with a priori identifiability analysis of linear and non-linear models. The Laplace transform or transfer function approach which may only be applied to linear models is thoroughly discussed in, Jacquez and Greif, 1985 and Godfrey and DiStefano 1987. However, when modelling biological systems, non-linear differential equations are ubiquitous, e.g. in Michaelis Menten kinetics and cooperative phenomena. The Similarity Transformation Approach (Vajda et al., 1989), the Power Series Expansion, (Pohjanpalo, 1978), the Volterra and Generating Power Series Approaches (Lecourtier et al., 1987) as well as identifiability tests derived from differential algebra (Ljung and Glad, 1994; Saccomani et al., 2003) are also applicable to non-linear models. Unfortunately, these methods become mathematically intractable with increasing model complexity. If we want to investigate which kind of non-identifiabilities of a model occur under realistic experimental conditions, a data based-method is to be applied. However, available data-based approaches for a posteriori identifiability analysis of non-linear models, like multivariate regression, require prior knowledge concerning explicit linear or non-linear functional relations (Quinn and Keough, 2002; Seber and Wild, 1988). In our approach, this is not necessary.

With our method, we provide a solution for following yet unsolved problems:

  1. structural identifiability analysis of large and complex non-linear dynamical models within reasonable time.
  2. automatic detection of non-linear dependencies between arbitrarily many parameters.
  3. practical identifiability analysis under consideration of the quality of data at disposal.

The article is organized as follows. First, we provide a definition of identifiability and introduce briefly the alternating conditional expectation (ACE) algorithm which estimates optimal transformations. Then, the test-function is introduced and its are discussed. Subsequently, we propose the algorithm which detects non-identifiabilities with help of the before defined test-function and apply it to a biological example of a non-linear dynamical model.


    2 BACKGROUND
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 BACKGROUND
 3 APPROACH
 4 METHODS
 5 RESULTS
 6 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
2.1 Identifiability
Identifiability of a dynamical model depends on the model equations themselves, as well as on input and observation functions, initial conditions, constraints (Audoly et al., 2001; Godfrey and DiStefano, 1987) and the often unknown true parameter values (Vajda et al., 1989). A model together with all constraints is called a constrained structure.

We follow a definition given by Godfrey and DiStefano (Godfrey and DiStefano, 1987). Let Formula denote the state variables, Formula the externally given input signals, Formula the system parameters and Formula the observation function. The initial values Formula depend in general on the parameters Formula . Finally, let Formula denote all additional constraints mathematically formulated as explicit or implicit algebraic equations. A constrained structure is then given by


Formula 1

(1)


Formula 2

(2)


Formula 3

(3)


Formula 4

(4)


Formula 5

(5)
A single parameter pi of Equations (1–5) is globally identifiable (a priori or structural), if there exists a unique solution for pi from the constraint structure. A parameter with countable or uncountable number of solutions is locally identifiable or unidentifiable.

Biological data canonically comprises observational noise, generalizing Equation (2) to


Formula

Observational noise may render even structural identifiable parameters practically non-identifiable. Parameters which can be determined with a small enough SD are termed practical identifiable.

2.2 Alternating conditional expectation (ACE)
Initially, the ACE-algorithm has been proposed by Breiman and Friedman (Breiman and Friedman, 1985) for the purpose of regression analysis and has since been applied in various fields (Buja, 1990; Timer et al., 2000; Voss and Hurths, 1997; Wang and Murphy, 2005). In the bivariate case, ACE non-parametrically estimates optimal transformations Formula and Formula which maximize the linear correlation R between Formula and Formula ,


Formula

Breiman and Friedman also provide weak conditions for convergence of the iterative algorithm. Since ACE itself is not the focus of this article, we just provide the basic knowledge needed to understand its further application.

The bivariate case can easily be extended to arbitrarily many parameters. Let Formula denote a (n x m) matrix of m parameters with n estimates for each parameter. Suppose the m parameters have an unknown linear or non-linear functional relation and let {Theta} and {Phi}j denote the true transformations that linearize the relation between the parameters,


Formula

where {varepsilon} is Gaussian noise. Then ACE estimates optimal transformations Formula such, that


Formula 6

(6)

Note that ACE intrinsically distinguishes between left-and-right-hand side terms. It regards the left-hand side parameter to be the response and all others to be predictors.

The principle of the ACE algorithm for the multivariate case is as follows: It starts with an initial estimate for the optimal transformation of the response, Formula and of the predictors, {Phi}i = 0, iisin{1.,n}/k. The transformation of the response, {Theta}, and the {Phi}i are estimated iteratively; new estimates of the {Phi}i serve as input for the estimation of {Theta}, and vice versa. For each {Phi}i it is calculated, how much variance of the response, {Theta}( pk), cannot yet be explained by the m 2 other predictors, Formula . This unexplained variance is the next estimate for the predictor {Phi}i. In other words, the best estimate for a transformation {Phi}i minimizes the squared residuals of Equation (6). It can be shown, that this estimate is given by


Formula

The {Phi}i are updated successively until the fraction of unexplained variance of the predictor fails to decrease. The last updated estimates serve as input to estimate {Theta}( pk) like follows


Formula

The iteration stops if a new estimate of {Theta}(pk) does not cause further reduction of the fraction of unexplained variance. In practice, the conditional expectation is replaced by smoothing techniques.

2.2.1 Remarks
We use a modified version of ACE as implemented by H. Voss (Voss and Kurths, 1997), where data are ranked before optimal transformations are estimated. Nevertheless, our application does not depend on a special implementation of ACE. We also tested our algorithm with ACE as included in the acepack package of R.

ACE works non-parametrically, thus we do not have to make assumptions about underlying functional relations. However, it does not provide explicit functional expressions as output, but values of the estimated optimal transformations Formula and Formula at the parameter values which served as input.


    3 APPROACH
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 BACKGROUND
 3 APPROACH
 4 METHODS
 5 RESULTS
 6 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Our goal is to reveal, in a data-based way, non-identifiabilities of a non-linear dynamical model. Non-identifiability manifests itself in functionally related parameters, and we apply ACE to non-parametrically estimate these relations. However, to do this objectively, a test function quantifying this information is required. In order to motivate this test-function, we study the typical behaviour of the ACE algorithm by means of a simple example. Our findings can directly be applied to identifiability analysis, as will be outlined afterwards.

Let p2,p3 and p4 be three parameters uniformly distributed on the interval Formula . Suppose a fourth parameter p1 to be functionally related with p2,p3 by Formula . Real data is always corrupted with observational noise, i.e. random errors occurring during measurement, and we take this into account by adding Gaussian noise to the response p1


Formula 7

(7)

We independently draw 200 tuples (p2,p3,p4) and calculate p1 for each tuple, thus gaining a 200 x 4 matrix K, mimicking 200 parameter estimates. Only the first three columns of each row are functionally related, the fourth being independent. Then, we take K as input for ACE to estimate optimal transformations (see Fig. 1). Note that even p4 seems to have quite a smooth estimated optimal transformations significantly different from white noise. This may be mistaken for a real functional dependency. If, however, we draw a new 200 x 4 matrix the same way as outlined above, the transformations of the first three parameters will remain quite stable, while the transformation of the fourth parameter will in general look different, but still smooth. We can understand this volatility if we recall the iterative form of the ACE algorithm that estimates the optimal transformations. Since p2 and p3 suffice to explain all variance of p1 but Gaussian noise, the residuals will be Gaussian noise, smoothed by the filter employed by ACE. Noise will be distributed differently in each new sample we draw, therefore, we yield different estimates for the transformation of p4 every time.


Figure 1
View larger version (24K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. ACE-Plot of Equation (7). Linear, quadratic and sinus are well estimated by ACE. Note that even the fourth parameter p4 which is actually not related to parameter p1 has a smooth estimated transformation function which may be mistaken for a real functional dependency.

 
This is exactly, what renders possible to distinguish between parameters with functional relations and those without; we calculate the average estimated transformation by repeatedly drawing new matrices K, estimating transformations each time. We expect optimal transformations of functionally related parameters to be invariant under averaging. In contrast, parameters without functional relations yield different estimates for each new drawn matrix K.

The connection to the identifiability analysis of a constraint structure follows immediately: a non-identifiable constraint structure causes parameters to be functionally related and we may employ the above findings to detect these relations. The process of drawing new matrices K is replaced by estimating parameters through repeated fitting with different initial guesses of the parameters of the constraint structure to experimental or simulated data. This means, to yield a single 200 x m matrix K, we fit the model 200 times to data; we term this a single fitting sequence. Thus, the problem of identifiability analysis is mapped onto the problem of detecting groups of functionally related parameters. In the following, the qualitative ideas of our approach are quantified within a test-function, which is used to decide for functional dependencies.


    4 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 BACKGROUND
 3 APPROACH
 4 METHODS
 5 RESULTS
 6 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
4.1 Construction of the test-function
A quantitative test-function based on the ACE algorithm is required to detect sets of functionally related parameters. In order to improve its robustness, all estimated optimal transformations for a given fitting sequence are ranked. Let Formula denote the value of the optimal transformation of parameter pk at its r-th estimate in the i-th fitting sequence, and let card denote the cardinal number of a given set, i.e. the number of elements contained in the set. Then, we define Formula as the function which maps each parameter estimate of a certain fitting sequence onto its rank divided by the total number N of fits conducted within one fitting sequence.


Formula

The division by N normalizes the estimated optimal transformations and maps them to the interval [0 1]. Thus, if rs and rl are the values of r which belong to the smallest and largest estimate of parameter p1 in the first fitting sequence, then Formula and Formula . Let M denotes the number of fitting sequences used to calculate the average optimal transformation. Thus, for parameters with strong functional relation, the normalized, average ranked transformation


Formula

is independent of M thus having a constant variance. Parameters without functional relation yield different estimates for each fitting sequence, therefore approximating Formula for M->{infty}, i.e. zero variance. This motivates following definition of a test function


Formula 8

(8)
where Formula denotes the empirical variance. Actually, the test-function Hk for parameter pk depends on the parameters which have been taken as response and predictors. Therefore, we specify Hk by citing the response pi1 and all predictors pil, lisin{2,...,m} used in the calculation of Hk,


Formula

Let Formula denote the set which contains the left-hand side parameter as well as all current right-hand side parameters taken as input for ACE. To test a whole group of parameters at once we take the mean


Formula

In order to reduce the computational burden which arises due to repeated fitting sequences, only one fitting sequence may be conducted and repetitions are replaced by drawing with replacement from K. The computational time decreases by a factor Formula . Note that this bootstrap approach is not necessary in terms of functionality. All results presented are valid also if we do not use the bootstrap. In the following we write NoB instead of M to denote the number of bootstrap samples drawn to calculate optimal transformations.

4.2 Properties
Formula and Hk render it possible to distinguish between three different cases, see Figure 2.

  1. Formula : the response parameter pi1 on the left-hand side has no functional relation with any other parameter Formula
  2. Formula : a given set of parameters contains not enough information, i.e. we need to add additional parameters to establish a strong functional relation.
  3. Formula : a given set of parameters contains enough information to establish a strong functional relation.
Especially the second case is of great importance. Suppose m parameters comprising a functional relation, but only m – 1 of them serve as input for ACE. Then, the m – 2 parameters on the right-hand side actually do not contain enough information to establish a linear relation between the estimates Formula . Roughly spoken, ACE will distribute the lacking information among the estimates of the transformations leading to noisy estimates. Noise is differently distributed in each new estimate based on a new bootstrap sample, therefore, yielding reduced values for Hk (see in the Supplementary Material). Note that these reduced values still significantly exceed those for parameters without any functional relation.


Figure 2
View larger version (23K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Histogram showing the distribution of Hk(p1,p2,p3,p4), Figure 2 for Equation (7). p1 dark blue (D), p2 light blue (C) , p3 yellow (B) and p4 (no functional relation) dark red (A). The distribution shows a separation of the values of Hk between dependent and independent parameters. Thresholds (dashed lines) are determined analytically. With each new bootstrap sample the variance of the normalized, average ranked optimal transformation of parameter p4 (no functional relation) shrinks while all others keep stable (inset). Threshold values are T1 = 0.01 and T2 = 0.07. The mean value of the test-functions of parameters comprising a strong functional relation exceeds T2. If there are not enough parameters to establish a strong functional relation, the mean variance lies between T1 and T2. The test-function of parameters, independent of the current response variable has values below T1. Stated thresholds are universal and apply for all possible functional relations. Simulations were conducted 10 000 times, each time 35 bootstrap samples were drawn from a 200 x 4 matrix containing four tuples of the parameters in each row.

 
The three above stated cases correspond to three regions of variance marked off by two threshold values which are determined analytically (see supplementary Material). Let N denote the number of estimates, and NoB the number of bootstrap samples. The expectation value of the test function of a parameter which is not functionally related with a given set of other parameters is


Formula 9

(9)
The term Formula denotes the overall contribution of the correlation, which derives from the smoothing filter in ACE, to the expectation value.

The expectation value of the test-function of a parameter which has a strong monotone functional relation with a given set of parameters is


Formula 10

(10)

4.2.1 Remarks

  • Both results Equations (9) and (10) have to be equal in the limes of large N and NoB = 1 which is fulfilled (c.f. Fig. 2 inset).
  • Equation (9) assumes independence of the drawn samples which is asymptotically fulfilled for infinitely many fits within one fitting sequence. Therefore, the obtained result is too small for finite N. If we replace resampling by new fitting sequences, Equation (9) holds for all N.
  • Equations (9) and (10) assume ACE to estimate optimal transformations perfectly, which again is only asymptotically true for large N and no noise. Especially, noise results in a slight shift of the test-function for functionally related parameters to smaller variances (c.f. Fig. 2 inset).

4.3 The algorithm
The test-function Hk proposed in Equation (8) interprets the stability of estimated optimal transformations as a measure for functional relations of parameters. Our algorithm exploits the properties of Hk to identify those parameters pk of a given set Formula which have a functional relation and determines the relations non-parametrically.

ACE distinguishes between right-and left-hand side terms, which we want to make use of. We propose a step up algorithm, i.e. we start with parameter pi as left-hand side term and take a second parameter Formula , as right-hand side. Then, we calculate Hj(pi,pj) for each pj and choose the parameter pk with Formula which means we take the parameter which is likely to have the strongest functional relation with pi. If Formula , parameter pi is supposed to be independent of all others. Otherwise we calculate Formula . If Formula a strong functional relation is found. Else another parameter Formula , is added and Formula , calculated again. If T2 is never exceeded, even if we added all parameters, it is supposed to have no strong functional relation.

To increase power, we rerun the loop once again after having found a strong functional relation. If the Formula value increases with an additional parameter, we keep it, else we take the relation which has already been found before.

This is conducted successively for all parameters pi taken as left-hand side; Figure 8 in the Supplementary Material shows a flowchart of the proposed algorithm.

As a crosscheck, we determine the multiple r2, i.e. the fractional amount of variance of the response y explained by the predictor variables Formula .

4.4 Example
We work through an example to illustrate what kind of input is needed and which output is provided. Assume a non-identifiable constraint structure with seven parameters functionally related like follows:


Formula 11

(11)
where p2,p4,p5 and {eta} are uniformley distributed on the interval [0 5]. The input used in our algorithm is Formula , where Formula are column vectors of length 200 which correspond to 200 fits. Note that by use of this input we do not make any assumptions concerning underlying relations. The output is given like follows:


Formula

In each row, ones indicate which parameters are functionally related. The response variable stands on the diagonal; thus, the fourth row indicates that the response p4 is strongly related to the predictors p3 and p5. The matrix S can be translated to tuples representing functional relations: (p1,p2);(p3,p4,p5);(p6)and(p7). Parameters p6 and p7 are correctly assigned to have no functional relation with any other parameters. Our algorithm was tested with more then three dozen comparable examples (see Supplementary Material). Every time the truth could be recovered. S has block diagonal form, which on the one hand results because we ordered parameters in advance. On the other hand, all parameters, when taken as right-hand side term in ACE, contribute strong enough to the left-hand side term. This is not always the case as we will see in the following section.

4.5 Sensitivity and specificity
We determined sensitivity and specificity in dependence of the threshold values for Equation (11). In order to test robustness of our algorithm, noise was added to all left-hand side terms. Figure 3 confirms that defined threshold values yield high sensitivity as well as high specificity.


Figure 3
View larger version (29K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. Sensitivity and specificity for the example of Equation (11). T1 (dashed lines) is varied from 0.01 to 0.07 with T2 = 0.07; sensitivity ({triangleup}) and specificity (^) is calculated. Same is done for T2 (solid lines) with T1 = 0.01. Lines at 100% are separated for clarity. We see that analytically determined threshold values, T1 = 0.01 and T2 = 0.07, have optimal specificity and sensitivity. (Inset) power of proposed algorithm. The percentage of recovered functional relations is plotted versus the mean contribution. To yield comparable results, we tested a set of standard functions fi with Figure 3. The mean contribution M{lambda}f(p3) of the second right-hand side term to the left-hand side is defined like follows: M{lambda}f (p3) = mean({lambda}·f ( p3)/p1). Except for Figure 3, all standard functional relationships are detected with similar power. This result is of great importance, because it confirms the universality of the algorithm.

 
The inset of Figure 3 stresses the generality of the algorithm: sensitivity is largely independent of the actual functional relation. It only depends on the contribution strength of a predictor. The less a predictor pj on the right-hand side contributes to the response on the left-hand side, the noisier the estimated transformation {Phi}j( pj) gets, finally being indistinguishable from a estimated transformation of an independent parameter.

As discussed in the supplementary Material (section: Identifiability of Identifiability), especially for two parameters, there is a strong dependence of optimal transformations on the level of noise. However, this is, to a large extent, compensated by the bootstrap approach and the rank transformation of the optimal transformations, see Equation (8).


    5 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 BACKGROUND
 3 APPROACH
 4 METHODS
 5 RESULTS
 6 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
We apply our method to a non-linear dynamical model motivated by a modelling-project dealing with endocytosis. Our goal is to identify groups of functionally related parameters. Once such sets of non-identifiable, interdependent parameters are detected, either new experiments can be suggested to render the parameters identifiable, or parameters have to be fixed in order to improve identifiability. For the latter case, we suggest following guidelines for handling the two most frequent types of non-identifiable parameters: (1) structurally non-identifiable parameters may, per definition, be fixed at an arbitrary value in parameter space. The model's; dynamical properties are not changed or restricted by the fixation. (2) For practically non-identifiable parameters, the model's; dynamical properties depend slightly on the chosen parameter value. The choice which parameter to fix, depends on the experimental possibilities and available reference values from the literature. If there is no additional information, we suggest to fix the parameter within physiological constraints to that value which belongs to the best fit. (3) Iterate identifiability analysis and fixation of parameters until all free parameter are rendered identifiable. This is a necessary prerequisite for subsequent analysis techniques like sensitivity analysis. Further guidelines are provided in the Supplementary Material.

We proceed as follows : first, we fit the model to simulated data. In order to test identifiability under realistic experimental conditions, we add observational noise. Second, we apply our algorithm for identifiability analysis and interpret the result. Consider the following non-linear model derived from the biological system in Figure 4:


Formula

Following linear combinations of dynamical variables are observed: Formula Formula . Further motivation for the model equations and choice of observation functions is provided in the Supplementary Material. Parameter values were assigned to be: k1 = 0.008; k2 = 5c 10-5; k3 = 0.10; k4 = 0.25; k5 = 0.15; k6 = 0.075; k7 = 0.01; Bmax = 1000; kD = 100; EPO (t = 0) = 3000; EPOR (t = 0) = 1000. All simulations were conducted with our developed Systems Biology Multi-Experiment Fitting Matlab Toolbox PottersWheel (www.potterswheel.de). The model was fitted 500 times to simulated data, each fit started at the true parameter values, disturbed like follows: Formula The fitting results can be written in a 500 x 7 matrix which is then taken as input for identifiability analysis. Our approach yields following result:


Formula

which can easily be to tuples: (k1);(k2);(k3); Formula We see, that parameters k4,k5 and k6 are assigned to have a strong functional relation. The corresponding r 2-value is r 2 = 0.99. Figure 5 shows a scatterplot of the three non-linearly related parameters. Every point on the hyperbola represents a three tuple of the parameters, each tuple yielding the same output function descibing the data equally well. Since k4,k5 and k6 lie on a 1D manifold, one of the three parameters uniquely determines the other two. The fixation of one parameter should therefore suffice to eliminate the non-identifiability. In order to check whether the detected non-identifiability is structural or due to limited data, we set observational noise to zero and reran the fitting sequence again: the model exhibited the same non-identifiability. We can therefore fix one of the parameters without loosing flexibility of the model. We fix k4 at k4 = 0.23 and conduct another 500 fits. The results are shown in Figure 6. The errors of k5 and k6 decrease tremendously. A new identifiability analysis confirms that the model has successively been rendered identifiable.


Figure 4
View larger version (24K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4. Non-linear dynamical model for the endocytosis of the erythropoietin receptor (EPO receptor). The EPO receptor is constitutively produced and internalized. EPO reversibly binds to the receptor and thereby induces accelerated endocytosis. Internalized EPO may either be recycled to the cytoplasm or undergo degradation before it is recycled. It is assumed, that the internalized dissociated receptors are degraded and that the degraded receptors do not interfere with the dynamic of the system.

 

Figure 5
View larger version (14K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 5. Scatterplot. Our algorithm for identifiability analysis identifies a three parameter relation between k4,k5 and k6. The parameters lie on a tilted hyperbola in the 3D parameter space. The model's; observation functions are invariant under parameter variations along the hyperbola, i.e. all points yield comparable {chi}2 values. In order to render the constrained structure identifiable, one of the three parameters has to be fixed.

 

Figure 6
View larger version (8K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 6. (A) Boxplot of parameter estimates. Five hundred fits to simulated data were conducted and {chi}2-values calculated for each fit. The boxplot is populated with the best 15% of the fits, which yielded lowest, comparable {chi}2-values. Parameter k7 is determined experimentally and thus does not appear in the boxplot. The three parameters k4,k5 and k6 are fitted with large SDs and are thus non-identifiable, all remaining parameters are fitted with small SDs and can be considered practical identifiable. However, boxplots can only provide a coarse classification, and it remains unclear which parameters are functionally related and how their relation looks like. Our approach for identifiability analysis (c.f. Fig. 5) motivates to measure or fix one of the parameters k4,k5 or k6. We fix k4 at k4 = 0.23 and fit the model another 500 times to data. (B) Boxplot of parameter estimates with fixed k4 = 0.23. Errors reduce drastically. All parameters are now practically identifiable.

 
Note that in general it is possible for three functionally related parameters to lie on a 2D manifold, i.e. a curved surface in space. This would require to fix two parameters in order to completely resolve non-identifiability. For an invertible optimal transformation {Theta}, however, it is also possible to reduce the number of non-identifiable parameters, not the non-identifiablity itself, by expressing one of the parameters through the other ones Formula .


    6 CONCLUSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 BACKGROUND
 3 APPROACH
 4 METHODS
 5 RESULTS
 6 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
We presented a non-parametric data-based algorithm for identifiability testing. Its major characteristics, generality and robustness, render it a valuable tool for identifiability analysis of non-linear dynamical models. The use of the bootstrap method reduces computational time tremendously.

Non-identifiabilities arise due to the structure of a model and the observation function. The proposed method is capable of identifying structural subunits causing non-identifiabilities by detecting groups of functionally related parameters. It reveals which parameters are uniquely determinable and which are not. Thus, it also provides a more realistic picture of what can be inferred from a model. In an example, the variance of estimated parameter values could be reduced from around 81 to 1%, after fixing one parameter as suggested by our approach.

The ability to apply the proposed approach to simulated data renders it directly applicable for Experimental Design, i.e. the quest for optimal experimental conditions, e.g. observables to maximize the information while minimizing experimental effort. For this purpose, we may test the identifiability of a model at all possible combinations of realistic input–output scenarios. Among the identifiable structures we then choose the input–output combination which is easiest to realize experimentally.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 BACKGROUND
 3 APPROACH
 4 METHODS
 5 RESULTS
 6 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
The authors thank Martin Peifer for his support in the analytical calculations, as well as Verena Becker, Marcel Schilling, and Ursula Klingmüller from the German Cancer Research Center in Heidelberg for providing the model discussed. S.H., C.K. and T.M. gratefully acknowledge financial support by the grant BMBF 0313074D & FP6 EU-grant COSBICS LSHG-CT-2004-0512060.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Limsoon Wong

Received on April 4, 2007; revised on July 10, 2007; accepted on July 10, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 BACKGROUND
 3 APPROACH
 4 METHODS
 5 RESULTS
 6 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Audoly S, et al. Global identifiability of nonlinear models of biological systems. IEEE Trans. Biomed. Eng. (2001) 48:55–65.[CrossRef][Web of Science][Medline]

    Breiman L, Friedman J. Estimating optimal transformations for multiple regression and correlation. J. Am. Stat. Assoc. (1985) 80:580–598.[CrossRef][Web of Science]

    Buja A. Remarks on functional canonical variates, alternating least squares methods and ace. Ann. Stat. (1990) 18:1032–1069.

    Faller D, et al. Simulation methods for optimal experimental design in systems biology. Simulation (2003) 79:717–725.[Abstract]

    Godfrey KR, DiStefano JJ. Identifiabiliy of Model Parameters in Identifiability of Parametric Models, Chapter 1. (1987) Oxford: Pergamon Press. 1–20.

    Jacquez J, Greif P. Numerical parameter identifiability and estimability: Integrating identifiability, estimability, and optimal sampling design. Math. Biosci (1985) 77:201–227.[CrossRef][Web of Science]

    Lecourtier Y, et al. Volterra and Generating Power Series Approach to Identifiability Testing in Identifiability of Parametric Models, Chapter 5. (1987) Oxford: Pergamon Press. 50–66.

    Ljung L, Glad T. On global identifiability for arbitrary model parametrizations. Automatica (1994) 30:265–276.[CrossRef][Web of Science]

    Pohjanpalo H. System identifiability bases on the power series expansion of the solution. Math. Biosci (1978) 41:21–33.[CrossRef][Web of Science]

    Quinn G, Keough M. Experimental Design and Data Analysis for Biologists (2002) Cambridge, UK: Cambridge University Press.

    Saccomani M, et al. Parameter identifiability of nonlinear systems: the role of initial conditions. Automatica (2003) 39:619–632.[CrossRef][Web of Science]

    Seber G, Wild C. Nonlinear Regression (1988) New York: Wiley.

    Timmer J, et al. Parametric, nonparametric and parametric modelling of a chaotic time series. Phy. Lett. A (2000) 274:123–134.[CrossRef]

    Vajda S, et al. Similarity transformation approach to identifiability analysis of nonlinear compartmental models. Math. Biosci. (1989) 93:217–248.[CrossRef][Web of Science][Medline]

    Voss H, Kurths J. Reconstruction of non-linear time delay models from data by the use of optimal transformation. Phy. Lett. A (1997) 234:166–344.

    Wang D, Murphy M. Identifying nonlinear relationships in regression using the ACE algorithm. J. Appl. Stat. (2005) 32:243–258.[CrossRef][Web of Science]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
A. Raue, C. Kreutz, T. Maiwald, J. Bachmann, M. Schilling, U. Klingmuller, and J. Timmer
Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood
Bioinformatics, August 1, 2009; 25(15): 1923 - 1929.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
O. Kotte and M. Heinemann
A divide-and-conquer approach to analyze underdetermined biochemical models
Bioinformatics, February 15, 2009; 25(4): 519 - 525.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
B. Finkenstadt, E. A. Heron, M. Komorowski, K. Edwards, S. Tang, C. V. Harper, J. R. E. Davis, M. R. H. White, A. J. Millar, and D. A. Rand
Reconstruction of transcriptional dynamics from gene reporter data using differential equations
Bioinformatics, December 15, 2008; 24(24): 2901 - 2907.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
T. Maiwald and J. Timmer
Dynamical modeling and multi-experiment fitting with PottersWheel
Bioinformatics, September 15, 2008; 24(18): 2037 - 2043.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
H. Schmidt, M. F. Madsen, S. Dano, and G. Cedersund
Complexity reduction of biochemical rate expressions
Bioinformatics, March 15, 2008; 24(6): 848 - 854.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary data
Right arrow All Versions of this Article:
23/19/2612    most recent
btm382v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (13)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Hengl, S.
Right arrow Articles by Maiwald, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hengl, S.
Right arrow Articles by Maiwald, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?