Skip Navigation


Bioinformatics Advance Access originally published online on October 11, 2006
Bioinformatics 2007 23(4):480-486; doi:10.1093/bioinformatics/btl522
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/4/480    most recent
btl522v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Gonzalez, O. R.
Right arrow Articles by Mendoza, E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gonzalez, O. R.
Right arrow Articles by Mendoza, E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Parameter estimation using Simulated Annealing for S-system models of biochemical networks

Orland R. Gonzalez 1,*, Christoph Küper 3,4, Kirsten Jung 3, Prospero C. Naval, Jr 1 and Eduardo Mendoza 2,5

1 Department of Computer Science University of the Philippines-Diliman Munich, Germany
2 Mathematics Department University of the Philippines-Diliman Munich, Germany
3 Department Biologie I, Bereich Mikrobiologie, Ludwig-Maximilians-Universität Munich, Germany
4 Medizinische Fakultät, Physiologisches Institut, Ludwig-Maximilians-Universität Munich, Germany
5 Physics Department & Center for NanoScience Ludwig-Maximilians-University Munich, Germany

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSIONS AND CURRENT...
 REFERENCES
 

Motivation: High-throughput technologies now allow the acquisition of biological data, such as comprehensive biochemical time-courses at unprecedented rates. These temporal profiles carry topological and kinetic information regarding the biochemical network from which they were drawn. Retrieving this information will require systematic application of both experimental and computational methods.

Results: S-systems are non-linear mathematical approximative models based on the power-law formalism. They provide a general framework for the simulation of integrated biological systems exhibiting complex dynamics, such as genetic circuits, signal transduction and metabolic networks. We describe how the heuristic optimization technique simulated annealing (SA) can be effectively used for estimating the parameters of S-systems from time-course biochemical data. We demonstrate our methods using three artificial networks designed to simulate different network topologies and behavior. We then end with an application to a real biochemical network by creating a working model for the cadBA system in Escherichia coli.

Availability: The source code written in C++ is available at http://www.engg.upd.edu.ph/~naval/bioinformcode.html. All the necessary programs including the required compiler are described in a document archived with the source code.

Contact: gonzalez{at}bio.ifi.lmu.de

Supplementary information: Supplementary material is available at Bioinformatics online.


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSIONS AND CURRENT...
 REFERENCES
 
Estimating values for the parameters of mathematical models of biochemical networks have classically been done through reductionistic means. Among such approaches are those that use isolation and those that utilize perturbations. In the former case, kinetic parameters are typically estimated one at a time, typically in vitro, by isolating a relevant network constituent. Substantial kinetic information typically in the form of traditional rate laws have been determined in this way, for example from enzymatic assays. In the case of the latter, kinetics are approximated based on system responses to single perturbations from a steady state (Curto et al., 1997, 1998; Voit and Radivoyevitch, 2000). Although these two approaches have been widely used, both are limited in their application especially when taken in a systemic perspective. One reason is that isolation is not always possible and even not desirable when feasible. Another is that there are constraints as to which singular perturbations are possible. It is known that components isolated and studied in vitro do not always behave in the same manner when studied in vivo.

Biochemical profiles are simultaneous measurements of biochemicals, which are currently taken as single snapshots or as a sequence of snapshots forming a time series (Gerner et al., 2002; Neves et al., 2002). These profiles are extremely valuable because they carry information regarding the network structure and dynamics of the underlying systems. In studies, such as gene function analysis (Goodacre and Harrigan, 2003), identification of disease state patterns, and analysis of drug toxicities, deriving the network parameters from these profiles is vital to the experimentalist. Increasing availability of comprehensive biochemical profiles have made alternative systems-level methods for parameter estimation possible. In such schemes, parameters are estimated in groups or even on a system-wide scope.

Complex integrated biochemical systems are conveniently modeled by non-linear dynamical systems called S-systems, which are essentially systems of coupled ordinary differential equations (ODEs) adhering to the power-law formalism (Savageau, 1969a,b, 1970; Voit and Radivoyevitch, 2000). S-systems assume the following mathematical form:


Formula 1

(1)
where n is the number of dependent variables, m is the number of independent variables, the Xj's correspond to the current states (typically as concentrations) of the different metabolites, the Formula 's are the rates of change in the concentrations, rate parameters {alpha}i, ßi, and kinetic orders gij and hij are parameters measuring the influence of Xj on Xi.

A straightforward way of estimating parameters from time-series profiles is to make model outputs resemble the experimental profiles as closely as possible through residual error minimization. Although this may sound like a trivial exercise, the task is riddled with serious difficulties. Among these are problems of computational complexity leading to large amounts of computational resources required, and the recurring lack of global identifiability that results in a pool of possible solutions of similar quality from which it is difficult to select the best (Sands and Voit, 1996; Voit, 1995). Despite the numerous challenges researchers continued developing and refining suitable approaches, ranging from classical algorithms and their derivatives such those based on gradient descent, to more elaborate evolutionary approaches (Kikuchi et al., 2001; Ueda et al., 2001; Kikuchi et al., 2003; Kimura et al., 2005).


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSIONS AND CURRENT...
 REFERENCES
 
As the appropriate modeling framework, we chose S-systems with the general formulation of the network equations set up according to Equation (1). This general form can be reduced and constrained by using known network properties, such as topology and regulation. For example, for each kinetic parameter gij, it is left as free if Xj is known to affect the production of Xi. Otherwise, it is constrained to zero which effectively drops the corresponding term in the power-law. If the complete topology and regulation of the network being modeled were known, the corresponding symbolic S-system formulation can easily be written based on well defined rules (Voit, 2000: Chapter 3).

With the reduced symbolic formulation and the associated constraints, the next step is the determination of suitable values for the remaining parameters. We do this by minimizing the residual error of the model outputs obtained using numerical integration against the measured experimental data. For this purpose, we adapted the optimization algorithm known as simulated annealing (SA) for use with S-systems.

SA is an optimization strategy that operates in a way roughly analogous to the method by which metal and glass are created. SA differs from most other iterative improvement techniques in that candidate solutions of lower quality than the current one can be accepted during the course of the algorithm's iterations depending on a pseudo-temperature variable. Consequently, transitions out of local optima are always possible at non-zero temperatures (Kirkpatrick et al., 1983). Since most researchers agree that the inverse problem of parameter estimation from time series typically suffers from abundant local optima, such a property gives SA better prospects for convergence. Moreover, in the process of controlling the pseudo-temperature variable, an adaptive divide-and-conquer mechanism implicitly occurs. Gross features of the eventual state of the system appear at higher temperatures while finer details develop at lower temperatures (Kirkpatrick et al., 1983). This property to a degree gives SA the benefits of both global and local optimization approaches. It functions much like a global optimization technique at high-temperatures, and implicitly ‘switches’ to local optimization behavior at lower temperatures.

2.1 Optimization by SA
A rough description of our implementation of SA is outlined in Figure 1. The technique starts with the creation of an initial estimate h, which simply assigns random values to all the parameters satisfying the initial constraints (Line 1 in the pseudo code). Line 3 initializes the control variable T (analogous to temperature). This initial value should be enough to completely ‘melt’ the initial estimate. In our experiments, we empirically determined and used the range [100, 10 000]. The rest of the lines 4–18 form a loop structure wherein the initial guess is iteratively refined for better solutions.


Figure 1
View larger version (17K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 Pseudo code for SA. In out experiments, we typically used the value of 1000 for the initial value of T and decremented it gradually by multiplying the current value by 0.96. Iterations were terminated when the temperature drops below 1.0 x 10–5 or when no more hypotheses are being accepted.

 
  • Lines 6–8: creates a candidate solution h' by copying the parameters of h and then adding random values to each of the parameters as illustrated by the perturbation equation Formula , where k is a constant, E is the current error and p stands for each of the kinetic parameters. Rate constant parameters are handled in a similar way. The complete construction of this perturbation function may be found in Section 2.2.
  • Lines 9–13 : deal with the acceptance of the candidate h' based on its error and the error of the current candidate h. We computed the error as


    Formula 2

    (2)
    where Xit stands for the value of constituent Xi at time point t in the actual biochemical profile, Formula stands for the value of constituent Xi at time point t in the approximated traces obtained by numerically integrating the particular candidate, n is the number of dependent variables, and N is the number of time points. Observe that the candidate solution is automatically accepted when the change in error {Delta}E from h to h' is zero or negative. This intuitively is the case when the h' is better than h as reflected by a lower error. Otherwise, h' is probabilistically accepted based on the probability Formula . Notice that higher temperatures and lower changes in the error lead to higher probabilities for the acceptance of h'.

  • Line 15 : The pseudo-temperature T is lowered after the system has been subjected long enough to a particular temperature level. This process of cooling must be done gradually or else the regression has a higher probability of ‘freezing’ prematurely to sub-optimal values. In our experiments, we lowered T by multiplying the current value by 0.96.

2.2 The perturbation function
Essential to SA is the manner by which candidate solutions are randomly perturbed. In our work, perturbation was initially done by adding to each of the parameters of the candidate solution a random value taken from a Gaussian distribution. i.e.


Formula 3

(3)
where p is the parameter, c is a constant, Formula is a function that returns a random number from a Gaussian distribution with mean Formula and SD {sigma}. For our experiments, Formula and {sigma} were fixed to 0 and 1, respectively.

Initial experiments suggested that a constant c in Equation (3) may not suffice. Small values made the optimization process more susceptible to getting trapped in local minima and seemed to greatly constrain the search space that the algorithm was able to explore. Large values for c led to convergence failure and prevented the algorithm from working out the finer details once the error became small enough. Different constant values of c were tested but none yielded adequate results. Consequently, we considered c as a (variable) function of different optimization properties, such as the current error and temperature. Among all of the different formulations, consistently best results were achieved by a formula heuristically constructed from Equation (2). From the definition of the error in Equation (2), the contribution of a particular point to the total residual error is


Formula 4

(4)
for specific values of i and t where Formula is the approximation of Xit. By assuming that the previous estimate to Formula is perfect, a first order approximation of Formula may be written as


Formula 5

(5)
where Formula , is the value of the differential equation Formula evaluated at time t and {Delta}t is the difference between two successive time points. Substitution of Equation (5) into Equation (4) and taking the square root of both sides yields


Formula 6

(6)
.

Expanding Formula using Equation (1), then Equation (6) becomes


Formula 7

(7)

From Equation (7), suppose there was a way to reduce the error from Eit to p Eit where 0 ≤ p < 1 by modifying a particular kinetic order say gik. If the modification were to be accomplished by adding {Delta}g to the kinetic order, the relationship may be written as


Formula 8

(8)

Substraction Equations (8) from Equation (7) yields


Formula 9

(9)
from which after more algebraic manipulations results to


Formula 10

(10)
By moving to logarithmic space, the relationship between the parameter perturbation {Delta}g and the change in error can finally be described as


Formula 11

(11)
where c1 compensates for the change of logarithmic base from Xkt – 1 to e.

Equation (11) describes the relationship between the perturbation of a particular kinetic order by {Delta}g and the reduction of the error Eit by a factor of (1 – p). Note however that Eit is only a componentof E and that computing {Delta}g directly using Equation (11) may actually increase the overall error. Nevertheless, Equation (11) is still useful for the purpose of defining a relationship between the current total error, (which is an averaging of the Eit's) and appropriate perturbation magnitudes. The equation fulfills this purpose as it suggests a roughly linear relationship between {Delta}g and Formula , which is consistent with the S-system philosophy of linearization in logarithmic space. From the combination of Equations (3) and (11), we formed the perturbation function


Formula 12

(12)
where p stands for any of the kinetic parameters and l is an externally optimized constant. Note that in moving from Equation (11) to Equation (12), we introduced an externally optimized constant l to absorb some effects of replacing the (variable) denominator in Equation (11) by constants. Construction of the perturbation function using one of the degradative kinetic order h follows in a similar manner. The construction of the perturbation function for rate constants is given in Section 1 of the Supplementary material.

2.3 Top-down modeling and structure identification
The method that we propose belongs to the class of top-down approaches wherein the determination of parameter values is left to an optimization algorithm that minimizes the difference between model simulations and experimental data. Closely related to this inverse problem of finding parameter values is the task of identifying network structure, a problem that has recently been emerging in the literature (Velfingstad et al., 2004). In this type of problem, the forms and constraints of the differential equations which are dictated by network structure and topology are also optimized. This approach is intended for scenarios wherein some or most of the topological and regulatory details are unknown or vague.

S-systems are well suited for structure identification problems because of their regular structure and the fact that each parameter has a well defined biological interpretation. In this work, we adopted an iterative estimate-prune-estimate scheme that has been described by other authors (Voit and Almeida, 2004). An outline of the strategy is provided in Section 2 of the Supplementary material. The first step is the formulation of the complete symbolic S-system model. For this step, knowing the number of dependent and independent variables is sufficient. Next, whatever a priori information regarding the structure and regulation of the system being modeled that is available is used to reduce the symbolic formulation and to create parameter constraints. Optimization by SA as outlined in Section 2.1 is then performed to obtain a candidate solution.

In contrast to the problem of estimating parameter values only, obtaining a solution does not end with the optimization step. Since at least some parameters were left free, some of them may have to be removed because they do not contribute substantially to the system behavior. For example, if prior to optimization it was uncertain whether Xj positively regulates Xi, then the parameter gij was left free. If optimization then returns a candidate solution wherein gij is less than 0.01, then removing gij by setting it to zero (thus dropping it from the model) is unlikely to worsen the residual error value substantially. In our experiments, we used the pruning rule that if a particular kinetic order say gij or hij is very small, then that parameter is removed from the system (Voit and Almeida, 2004). These parameters are constrained to zero and the values of the other parameters are then re-estimated using another round of optimization. The application of the said rule may be done simultaneously on multiple applicable parameters or individually.


    3 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSIONS AND CURRENT...
 REFERENCES
 
We tested our methods on three artificial networks designed to simulate different network topologies and on the cadBA system in Escherichia coli. The first artificial network is the theoretical system Voit and Almeida used to demonstrate their decoupling strategy (Voit and Almeida, 2004). The system as illustrated in Figure 2 is a generic branched pathway with four dependent constituents X1 through X4, one constant source X5, and two regulatory signals. A typical numerical S-system representation of this system is


Formula 13

(13)


Figure 2
View larger version (12K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2 A generic branched network with four dependent constituents X1 through X4, one constant source X5, and two regulatory signals (adapted from Voit and Almeida, 2004).

 
Using the S-system formulation Equation (13), we created an artificial dataset with four time courses which resembled a set of biochemical profiles. We then applied our methods to the artificial dataset to estimate the parameters of the theoretical system. In this first experiment, we assumed the scenario wherein topological and regulatory details are known. Thus, we performed regression with a symbolic formulation with 18 free parameters. Initial guesses were created by randomly assigning values within the range [0, 90] to rate constant parameters and values within the range [–2, 2] to kinetic orders. As may be expected from a random assignment of values, initial guesses were far from the true solution both in terms of their residual error values and the dynamics depicted by their traces.

Using the procedure we described in the methods section, we ran our experiments on an Intel Celeron 1.5 GHz machine. Typical runs lasted for ~10 h. The bulk of the total computational resources used was primarily consumed performing numerical integration which is necessary in order to compute errors as defined in Equation (2). Table 1b is a solution that we obtained with a residual error value of 1.4 x 10–5. Its traces superimposed with the datapoints in the dataset created using Table 1a are shown in Figure 3.


Figure 3
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3 Solution of the estimated model with parameter values given in Table 1b superimposed on the dataset used for optimization. The dataset was generated using the system defined in Table 1a. The filled shapes depict one of the datasets used for training. The dashed lines are the corresponding model simulations. Please refer to the text for more details.

 


View this table:
[in this window]
[in a new window]

 
Table 1 Successive estimations of the values of the parameters of the system defined in Equation (13)

 
In this contribution, we explored decoupling of the system of ODEs in order to remove the requirement for numerical integration (Voit and Almeida, 2004). Essentially, rather than work with the solutions X's of the ordinary differential equations, optimization was performed using the values of the equations Formula themselves. To this end, we approximated the experimental slopes using the popular three point method. Using the decoupling strategy just described, we again estimated the parameters of Equation (13) except this time with the added complexity of structure identification. Starting with the general S-system formulation we used no a priori information to constrain the parameters other than some general knowledge in the field (Voit and Radivoyevitch, 2000: Chapter 5). Specifically, we constrained all kinetic orders to the range [–2, 2]. Next, gii's were set to zero due to the observation that in most systems, an element's current concentration does not directly affect its own production. Finally, we restricted all hii's to positive values based on the observation that the rate of degradation of an element in a network typically depends on its current concentration. A total of 44 parameters were left for optimization. Using the same procedure for creating initial estimates as described earlier and the pruning rule in Section 2.3, we obtained the result summarized in Table 1c. As a consequence of decoupling, it was possible to estimate the parameters one equation at a time. Estimating the parameters of Formula took 2 steps, Formula took 1 step, Formula took 2 steps and Formula took 1 step. The running time for each optimization step was under 2 min.

The use of decoupling (slope approximation) leads to dramatic reduction in computational time. However, it must be noted that the technique is not applicable in all cases. If an equation Formula is affected by an element Xj for which there is no data available, then decoupling may not be applied to Formula . Moreover, transforming the inverse problem to the decoupled form introduces additional parameter estimation bias that is dependent on the slope approximation algorithm used.

In order to further gauge the efficacy and general applicability of our methods, we ran experiments which simulated the presence of noise in the data by randomly perturbing the traces by adding values in the range [–10%, +10%]. Moreover, we also applied our methods to two other artificial systems designed to simulate behavior and dynamics different from Equation (13). The first contained a converging pathway with double inhibition and the second was a larger network with eight constituents and included a synthesis reaction. The results we got for these two systems are similar in quality to the results for Equation (13) in terms of the residual error values (data not shown).

3.1 The cadBA system in E.coli
The cadBA system in E.coli is believed to be a part of the organism's mechanism for coping with acidic environments. At the center of the system is the cadBA operon which encodes lysine decarboxylase (CadA) and a lysine-cadaverine antiporter (CadB). This operon is regulated at a transcriptional level by the protein CadC which is a product of an adjacent gene cadC (Neely and Olson, 1996; Auger et al., 1989; Kuper and Jung, 2005). Moreover, it is known that cadBA is induced by acidic pH and the presence of external lysine. The acidity of the medium is decreased via the concomitant consumption of an H+ due to the decarboxylation of lysine to cadaverine. The resulting cadaverine is then used by CadB to transport more lysine into the cell (refer to Fig. 4). Using the information known regarding the system's topology and regulation, we set up the symbolic S-system formulation. Due to the fact that data for some of the components were unavailable, only a subset of the components could be considered: CadA, CadB, the cadBA transcript, external lysine, external cadaverine and pH. Exclusion of other components due to lack of appropriate data was handled as follows:


Figure 4
View larger version (26K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4 A schematic diagram of the cadBA system in E.coli. This system is believed to be part of the organism's coping mechanism for acidic environments. Double-line arrows represent mass flow. The thinner and shorter arcs represent regulation. The S-system model that we created is smaller that the network shown because of some data limitations and simplifications (see text for more details). The model has a total of 18 parameters, 16 of which were estimated. The values of the remaining two parameters are dependent on the values of the other parameters due to network constraints. Note that although the diagram shows a regulatory arc from external lysine to the activation of CadC, the regulatory actions of lysine on cadBA expression actually goes through another gene lysP.

 
  • CadC: this constitutive protein of E.coli was bypassed by redirecting the regulatory functions of both pH and external lysine on it directly to the equation for the expression of cadBA.
  • Internal lysine and cadaverine : the lack of data for the internal concentrations of lysine and cadaverine led us to couple the decarboxylation reaction with the transport mechanism in order to avoid speculating on the dynamics of the internal versions of both metabolites. Thus, the actions of both CadA and CadB were modeled as a single step.

Using the proposed method, the following model was obtained


Formula 14

(14)

As in previous experiments, the initial estimate was created by assigning random values to the parameters. The range [0, 200] was used for rate constants and the range [0, 8] for kinetic orders. Decoupling was possible because traces for all the relevant constituents were available. However, the equations for both lysine and cadaverine were still processed simultaneously due to the interdependence between some of their parameters. Each optimization step took between 1 and 2 min on an Intel Pentium IV 3.0 GHz machine. The fit of Equation (14) to one of the available experimental datasets is shown in Figure 5. Lysine and cadaverine are expressed in mM and proton concentration (H+) in µM. The proteins CadA and CadB are expressed as activities. The cadBA transcript is modeled using relative intensities in the range [0, 10]. A profile for CadB is missing in Figure 5 because CadB was not measured in the experiments. For this reason and the fact that both CadA and CadB belong to the same operon, CadB values were assumed to be equal to the values of CadA.


Figure 5
View larger version (23K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 5 Fit of the cadBA model to experimental data (Kuper, 2005; Kuper and Jung, 2005). Due to lack of available data and the fact that both CadA and CadB belong to the same operon, CadB concentrations were taken to be the same as CadA. The values for [H+] (pH) during the course of the simulation were interpolated from the data.

 

    4 DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSIONS AND CURRENT...
 REFERENCES
 
We adapted SA for S-systems parameter estimation from time series data by crafting perturbation functions that relate perturbation magnitudes for the different S-system parameter types to the current optimization error. The kinetic order perturbation function Equation (12) is consistent with the power-law philosophy of linearization in logarithmic space as it describes a roughly linear relationship between perturbation magnitudes and the logarithms of the residual distances between actual points and model approximations. A graphical illustration of the relationship between the current error level E and the perturbation magnitude function for different values of l is given in Section 3 of the Supplementary material. The downward concave trend of the perturbation magnitudes when the error approaches zero has the further consequence of encouraging SA to explore more promising areas of the parameter space. This is because the lower E gets, the higher the probability that closer neighbors of the current solution gets sampled.

After demonstrating the proposed method by estimating the parameters of Equation (13) using the network's topology to reduce the symbolic formulation a priori (Table 1b), structure identification experiments were also performed by pretending that topological and regulatory information were unavailable. In these experiments, estimating the parameters took several optimization and pruning steps. Deciding when to stop pruning was trivial because the true parameter set was known. In general however, there is no specific set of rules that can be used to decide what and when to prune.

In Section 4 of the Supplementary material we discuss additional issues on pruning as well as the guidelines that we used for judging between candidate solutions. In addition, some theoretical considerations are also available, such as the use of methods for identifying transformation groups under whose actions the solutions of the differential equations remain unchanged (Voit, 1992). Nevertheless, pruning, and the process of selecting the best hypothesis from a pool of candidates, remains subjective and heavily dependent on the nature of the network being modeled and the amount of information that is available. Structure identification, though it can prove useful in limited applications, still needs a lot of refinement if it is to be applied to general systems.

One aspect that is key to the performance of SA is the annealing schedule. Essentially, it is the way by which the pseudo-temperature is controlled. This includes using an appropriately high T0 and ensuring that the temperature T does not go down too rapidly. Depending on how T0 is chosen, the importance of the quality of the initial estimate will vary. Using a low starting temperature generally restricts the search space to relatively smaller area, and using a very high T0 renders the quality of the initial estimate of little import as the optimization process can easily move away. Consequently, devising the annealing program will also have to take into consideration the amount of initial information available.

The current model for the cadBA system still needs further refinement and extension, particularly the inclusion of more constituents. For reasons stated earlier, CadC, internal lysine and internal cadaverine are currently not modeled. Moreover, another gene lysP, and its corresponding protein LysP, also affect the system by inhibiting cadBA expression in the absence of lysine. These details will have to be integrated into the model when the corresponding data becomes available.

The nature of the experimental data available for the cadBA network presented some added difficulties. For example, lysine was measured in an experiment separate from the other constituents. Although the initial conditions used were similar, it is obvious that substantial discrepancies are present. If the traces for both lysine and cadaverine were to be taken as is, at various points the amount of cadaverine that was produced would have exceeded the amount of lysine that was consumed. This is the reason for the discrepancy between the simulation of the model for lysine and the experimental profile for the metabolite as is evident in Figure 5. During optimization, constraints were added to ensure that cadaverine production did not exceed lysine degradation. A strict equality between the two was not enforced because lysine is also potentially degraded by the system substantially for other purposes.

Another challenge posed by the available data is that cadBA transcript measurements are currently expressed as relative intensities. This will introduce complications in reconciling the model with other experimental data as the transcript's absolute copy number or concentration at a given time point is unknown. Moreover, this problem also makes interpreting and verifying the parameter values harder.

Taking into consideration the earlier discussion regarding lysine, the fit of the model is reasonably close to the experimental profiles. By lowering the initial concentration of external lysine, the model also shows the expected reduction in cadBA activity up to a point where the network practically shuts down. The same trend is evident in raising the initial pH value (lowering of [H+]). A simulation of the model with pH initialized to 6.5 is shown in Section 5 of the Supplementary material. These responses, which are consequences of the values of the parameters, will have to be further verified and refined both qualitatively and quantitatively through the use of further experimental data that use different initial conditions.

It is known that the presence of external cadaverine inhibits the initiation of cadBA expression. Unfortunately, the current model does not capture this behavior. We believe that this is due to the fact that all of the experimental data we have start with zero external cadaverine. Again, further data from experiments with varying initial concentrations of cadaverine in the medium will be extremely helpful.


    5 CONCLUSIONS AND CURRENT WORK
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSIONS AND CURRENT...
 REFERENCES
 
In this paper, we showed how SA with an appropriately constructed perturbation function can be used effectively to estimate the parameters of biochemical networks modeled as S-systems. SA, depending on the temperature program used, can behave either as a global or local optimization approach, and implicitly switches from the former to the latter as the pseudo temperature goes down. We demonstrated the efficacy and general applicability of the proposed method by applying it successfully to three artificial systems designed to simulate different network topologies and behavior, and also showed how it can be adapted to the problem of structure identification. Finally, as an application to a real-world problem we created a working model for the cadBA network in E.coli.

Although the simulation of the model for the cadBA system is reasonably close to the experimental profiles, further verification will still have to be done. It would be beneficial to obtain experimental data that expressed cadBA transcript in terms of absolute values or concentrations rather than as relative intensities. Also, experiments that use different initial conditions for external lysine and pH from the ones currently available will help gauge the quality of the current kinetic parameters. Finally, other network constituents, such as lysP and cadC will have to be integrated as data becomes available.

Various aspects of SA, such as solution sampling can further be investigated to improve convergence and to reduce computational costs. In the current implementation, candidate solutions are created by perturbing all parameters simultaneously. Variants that perturb smaller subsets of the parameter set are promising avenues to explore specially for scaling to larger networks. Also, the current implementation is memoryless in that it uses only the current solution and the error for creating the next candidate solution. Implementations that use solution histories for controlling sampling directions are other possible modifications. Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Martin Bishop

Received on July 6, 2006; revised on September 13, 2006; accepted on September 21, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSIONS AND CURRENT...
 REFERENCES
 

    Auger, E.A., et al. (1989) Construction of lac fusions to the inducible arginine- and lysine decarboxylase genes of Escherichia coli K12. Mol. Microbiol, . 3, 609–620[CrossRef][ISI][Medline].

    Curto, R., et al. (1997) Validation and steady-state analysis of a power-law model of purine metabolism. Biochem. J, . 324, 761–775.

    Curto, R., et al. (1998) Mathematical models of purine metabolism in in man. Math. Biosci, . 151, 1–49[CrossRef][ISI][Medline].

    Gerner, C., et al. (2002) Concomitant determination of absolute values of cellular protein amounts, synthesis rates, and turnover rates by quantitative proteome profiling. Mol. Cell. Proteomics, 1, 528–537[Abstract/Free Full Text].

    Goodacre, R. and Harrigan, G.G. Metabolite Profiling: Its Role in Biomarker Discovery and Gene Function Analysis, (2003) , Dordrecht, The Netherlands Kluwer Academic Publishers.

    Kikuchi, S., et al. (2001) Pathway finding from given time-courses using genetic algorithm. Genome Informatics, 12, 304–305.

    Kikuchi, S., et al. (2003) Dynamic modeling of genetic networks using genetic algorithm and S-system. Bioinformatics, 19, 643–650[Abstract/Free Full Text].

    Kimura, S., et al. (2005) Inference of S-system models of genetic networks using a cooperative coevolutionary algorithm. Bioinformatics, 21, 1154–1163[Abstract/Free Full Text].

    Kirkpatrick, S., et al. (1983) Optimization by simulated annealing. Science, 220, 4598.

    Kuper, C. (2005) Charakterisierung der transkriptionellen Aktivierung des cadBA-Operons durch den Transmembranregulator CadC aus Escherichia coli. Dissertation, Biologischen Fakultat der Ludwig-Maximilians-Universitat zu München, .

    Kuper, C. and Jung, K. (2005) CadC-mediated activation of the cadBA promoter in Escherichia coli. J. Mol. Microbiol. Biotechnology, 10, 26–39.

    Neely, M. and Olson, E. (1996) Kinetics of expression of the Escherichia coli cad operon as a function of pH and lysine. J. Bacteriol, . 178, 5522–5528[Abstract/Free Full Text].

    Neves, A.R., et al. (2002) Is the glycolytic flux in Lactococcus lactis primarily controlled by the redox charge? J. Biol. Chem, . 277, 28088–28098[Abstract/Free Full Text].

    Sands, P.J. and Voit, E.O. (1996) Fluxed-based estimation of parameters in S-systems. Ecol. Modeling, 93, 75–88[CrossRef].

    Savageau, M.A. (1969a) Biochemical Systems Analysis, I. Some mathematical properties of the rate law for the component enzymatic reactions. J. Theor. Biol, . 25, 365–369[CrossRef][ISI][Medline].

    Savageau, M.A. (1969b) Biochemical Systems Analysis, II. The steady-state solutions for an n-pool system using a power-law approximation. J. Theor. Biol, . 25, 370–379[CrossRef][ISI][Medline].

    Savageau, M.A. (1970) Biochemical Systems Analysis, III. Dynamic solutions using a power-law approximation. J. Theor. Biol, . 26, 215–226[CrossRef][ISI][Medline].

    Ueda, T., et al. (2001) Efficient numerical optimization technique based on real-coded genetic algorithm. Genome Informatics, 12, 451–453.

    Velfingstad, S.R., et al. (2004) Priming nonlinear searches for pathway identification. Theor. Biol. Med. Model, . 1, 8[CrossRef][Medline].

    Voit, E.O. (1992) Symmetries of S-Systems. Math. Biosci, . 109, 19–37[CrossRef][ISI][Medline].

    Voit, E.O. Computational Analysis of Biochemical Systems, (2000) Cambridge University Press.

    Voit, E.O. and Almeida, J. (2004) Decoupling dynamical systems for pathway identification from metabolic profiles. Bioinformatics, 20, 1670–1681[Abstract/Free Full Text].

    Voit, E.O., et al. (1995) Hierarchical monte carlo modeling with S-distributions: concepts and illustrative analysis of mercury contamination in king mackerel. Environ. Int, . 21, 627–635[CrossRef].

    Voit, E.O. and Radivoyevitch, T. (2000) Biochemical systems analysis of genome-wide expression data. Bioinformatics, 16, 1023–1037[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
P.-K. Liu and F.-S. Wang
Inference of biochemical network models in S-system using multiobjective optimization approach
Bioinformatics, April 15, 2008; 24(8): 1085 - 1092.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/4/480    most recent
btl522v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Gonzalez, O. R.
Right arrow Articles by Mendoza, E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gonzalez, O. R.
Right arrow Articles by Mendoza, E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?