Skip Navigation


Bioinformatics Advance Access originally published online on September 5, 2006
Bioinformatics 2006 22(22):2782-2789; doi:10.1093/bioinformatics/btl465
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
22/22/2782    most recent
btl465v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (8)
Google Scholar
Right arrow Articles by Griffith, M.
Right arrow Articles by Sanders, W. H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Griffith, M.
Right arrow Articles by Sanders, W. H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2006 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commerical License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Dynamic partitioning for hybrid simulation of the bistable HIV-1 transactivation network

Mark Griffith 1, Tod Courtney 1,*, Jean Peccoud 2 and William H. Sanders 1

1 Coordinated Science Laboratory, University of Illinois at Urbana-Champaign 1308 W. Main Street, Urbana, IL 61801, USA
2 Virginia Bioinformatics Institute Washington Street, MC 0477, Virginia Tech, Blacksburg, VA 24061, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THEORY
 3 MeTHODS
 4 RESULTS
 5 DISCUSSION
 REFERENCES
 

Motivation: The stochastic kinetics of a well-mixed chemical system, governed by the chemical Master equation, can be simulated using the exact methods of Gillespie. However, these methods do not scale well as systems become more complex and larger models are built to include reactions with widely varying rates, since the computational burden of simulation increases with the number of reaction events. Continuous models may provide an approximate solution and are computationally less costly, but they fail to capture the stochastic behavior of small populations of macromolecules.

Results: In this article we present a hybrid simulation algorithm that dynamically partitions the system into subsets of continuous and discrete reactions, approximates the continuous reactions deterministically as a system of ordinary differential equations (ODE) and uses a Monte Carlo method for generating discrete reaction events according to a time-dependent propensity. Our approach to partitioning is improved such that we dynamically partition the system of reactions, based on a threshold relative to the distribution of propensities in the discrete subset. We have implemented the hybrid algorithm in an extensible framework, utilizing two rigorous ODE solvers to approximate the continuous reactions, and use an example model to illustrate the accuracy and potential speedup of the algorithm when compared with exact stochastic simulation.

Availability: Software and benchmark models used for this publication can be made available upon request from the authors.

Contact: tod{at}crhc.uiuc.edu

Supplementary information: Complete lists of reactions and parameters of the HIV-1 Tat transactivation model, as well as additional results for other benchmark models, are available at http://www.mobius.uiuc.edu/bioinfo06/


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THEORY
 3 MeTHODS
 4 RESULTS
 5 DISCUSSION
 REFERENCES
 
A number of investigators have demonstrated a relationship between phenotypic variability and the noisy dynamics of populations of macromolecules controlling gene expression (Korobkova et al., 2004; Raser and O'Shea, 2004; Elowitz et al., 2002; Rosenfield et al., 2005; Pedraza and van Oudenaarden, 2005; Weinberger et al., 2005). These observations led to a renewed interest in simulation algorithms suitable to modeling molecular noise. Specifically, many algorithms and tools for the stochastic solution of well-mixed chemical systems have been developed, beginning with the work of Gillespie (1977). That work outlined two variants of the stochastic simulation algorithm, the Direct method and the First Reaction method, which permit the Monte Carlo generation of stochastic trajectories of systems composed of coupled chemical reactions. However, as physical systems become larger and interacting molecular species appear in greater numbers, these algorithms often require unacceptably large computational resources. In particular, a system (e.g. a genetic network) consisting of some species with low copy numbers (e.g. a gene) and some species with high copy numbers (e.g. proteins) could have reactions whose rates differ by several orders of magnitude. Moreover, the small populations lessen the accuracy of continuous approximations, whereas the large populations make discrete stochastic simulation impractical. To be of value to scientific research, methods are needed that accommodate the stiffness of the networks of molecular interactions that regulate many aspects of the developmental and physiological processes of the cell.

Several recent efforts have been directed toward this end. Gibson and Bruck (2000) created the Next Reaction variant of the stochastic simulation algorithm, a modification to the First Reaction method, by using special data structures and more efficient random number generation. Gillespie (2000) approximated the system as a continuous-state Markov process, and then derived the chemical Langevin equation to specify how the trajectories of the state evolve continuously with stochastic fluctuations. Gillespie (2001) presented the {tau}-leap’ method, an approximate technique for accelerating stochastic simulation, in which the occurrence of some fast reactions can be eliminated by taking time steps that are larger than a single reaction. An improved procedure for selecting the value of {tau} has also been presented (Cao et al., 2006). By applying the quasi-steady-state assumption to the Gillespie algorithm, Rao and Arkin (2003) explored an approximation technique designed to reduce the computational load by simulating a simplified stochastic model. With the slow-scale stochastic simulation algorithm, Cao et al. (2005) refined this idea by introducing a dynamic partitioning of the model into slow and fast subprocesses. The asymptotic distribution of the fast process is computed or estimated numerically to compute the propensity of the slow-scale process.

All these methods are based on a monolithic representation of the chemical system; that is, all parts of the system are modeled either discretely or continuously. However, by describing different parts of the system with an appropriate representation, one is able to leverage the advantages of both discrete and continuous solutions. Specifically, a hybrid approach would be less computationally expensive than exact discrete-event simulation and more accurate than continuous approximations, while still preserving the stochastic nature of the model where it is important. A hybrid method also leads to models in which the sources of stochastic behavior are more transparent, as the non-stochastic components are abstracted away (Kiehl et al., 2004). Takahashi et al. (2004) developed an efficient method to simulate higher-order models consisting of components driven by different algorithms and timescales. Hybrid algorithms have also been proposed (Haseltine and Rawlings, 2002; Salis and Kaznessis, 2005) that partition the system into subsets of ‘fast’ and ‘slow’ reactions, approximate the ‘fast’ reactions either deterministically as ordinary differential equations (ODE) or stochastically using the chemical Langevin equation, and simulate the ‘slow’ reactions as stochastic events with time-dependent propensities.

This article expands upon the ideas developed by those researchers, and highlights a unique partitioning approach. Instead of partitioning the system into ‘fast’ and ‘slow’ reactions based on reaction propensity alone, we note that, owing to small population sizes, it may not be valid to approximate a ‘fast’ reaction continuously. We have developed a unique partitioning algorithm that dynamically classifies reactions as discrete by default, and continuous only when it is both theoretically possible and practically beneficial. In our algorithm, we first partition reactions into subsets of continuous and discrete reactions based on which approximation is more valid for the population of reactants and products involved. We then restrict the continuous subset to those reactions whose propensity is some constant factor times larger than the propensity of the fastest reaction in the discrete subset. Thus, the reaction rates are not partitioned according to an absolute threshold, but rather a relative threshold that updates dynamically throughout the course of the simulation. The goal of this approach is to guarantee that the reactions modeled continuously are indeed significantly faster than the reactions modeled discretely, thus ensuring that the computational gain of eliminating the ‘fast’ reactions overcomes the cost associated with switching between the deterministic and stochastic regimes, as well as the overhead involved in partitioning. Results from an example model indicate that our algorithm reproduces an accurate solution and is faster than discrete stochastic simulation, so long as the reaction propensities in the discrete and continuous subsets vary by some orders of magnitude. In addition, there may be a tradeoff between the accuracy desired and the speedup provided by the algorithm, which can be controlled by two parameters controlling the approximation. The dynamic hybrid simulation algorithm introduced in this article is implemented in a generic simulation environment providing several alternative simulation algorithms. Models and simulation parameters are specified in a text file. A command line interface makes it possible to invoke this simulation engine from other applications developed in programming languages, such as MATLAB, that support system calls.


    2 THEORY
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THEORY
 3 MeTHODS
 4 RESULTS
 5 DISCUSSION
 REFERENCES
 
Consider a system of r coupled biochemical reactions involving M species in a well-mixed volume. The i-th reaction Ri can be written as

Formula
where {alpha}i,j and ßi,j are, respectively, the number of molecules of species Yj consumed and produced by reaction Ri, and ki is the kinetic coefficient. The r x M stoichiometric matrix is defined by Formula.

The state of the system at time t is given by X(t), an M-vector of non-negative integers representing the number of molecules of each species. For readability we will use X to mean X(t), and Xi to mean the i-th component of X. The propensity of the j-th reaction as a function of the state vector X, denoted by Formula, is the probability that the reaction will occur in the interval [t,t + dt). Although these propensities may be computed using different rate laws, our implementation expands on the work by Joshi et al. (2004) and focuses specifically on mass action kinetics, under which Gillespie has shown the propensity to be a function of the kinetic coefficient, the populations of the reactants and a combinatorial term capturing the number of reacting configurations (Gillespie, 1977).

2.1 Dynamic partitioning scheme
The key to any hybrid simulation algorithm is its approach to partitioning. In our algorithm, the system of reactions is dynamically partitioned into two subsets, C and D, representing the continuous and discrete reactions, respectively. We partition using a combination of population- and propensity-based approaches. In particular, our scheme chooses for C the largest set such that {forall}Rj isin C, the following conditions are met:

Formula 1(1)
and

Formula 2(2)
where {gamma} and {Lambda} are non-negative constants, and {lambda}max is the rate of the fastest reaction currently classified as discrete. Condition (2) is what makes our partitioning scheme unique. Unlike schemes in previous works, the reaction rate threshold in (2) is not absolute, but depends on the distribution of reaction propensities in the discrete regime. We conducted a detailed analysis of simulation traces for different classes of models that showed there must be a minimum separation between the reaction rates in the two regimes in order to overcome the cost of switching between a deterministic and a stochastic solution. A similar recommendation has been made by Haseltine and Rawlings (2002) as guidance for researchers when specifying static partitions of their model. Ideally, the distribution of reaction rates would be clustered into two distinct groups separated by a large gap, and hybrid simulation would classify the ‘fast’ reactions as continuous and the ‘slow’ reactions as discrete by choosing an appropriate threshold somewhere in the gap. For simple systems in which small populations participate in ‘slow’ reactions and large populations participate in ‘fast’ reactions, this may be accomplished by a naïve partitioning scheme based solely on reaction propensities. However, for more complex systems, this may not be possible, as there may exist reactions that must be classified as discrete (e.g. they involve species with low copy numbers) in order to preserve their stochastic behavior, but may, due to other reactant populations or kinetic constants, be ‘fast’ and in some cases actually have larger propensities than the reactions labeled continuous. This is the situation we attempt to address in our partitioning scheme, and will investigate further in Section 4.

Thus condition (1) first divides the set of reactions into a tentative partitioning based on the populations of reactant and product species. Then condition (2) revises the partitioning in an iterative fashion so that the propensity of any continuous reaction will be at least {Lambda} times larger than the propensity of the fastest discrete reaction. As a result of condition (2), therefore, the partitioning process iterates until a fixed point is reached. The following algorithm, which is a component of our general hybrid simulation algorithm outlined in Section 3.1, illustrates how the partitioning is performed.

(1) Set {lambda}max = 0, C = D = {emptyset}

(2) For each reaction Rj

  1. If Xi ≤ {gamma} * |{nu}j,i| for some reactant or product of Rj, then
    1. D = D {cup} {Rj}
    2. {lambda}max = max{{lambda}j, {lambda}max}

  2. else C = C {cup} {Rj}

(3). If C = {emptyset} or D = {emptyset} then stop. Reactions are either all discrete or all continuous.

(4) Set fxdpt = true

(5) For each Rj isin C

  1. If {lambda}j <{Lambda}*{lambda}max
    1. C = C–{Rj}
    2. D = D {cup}{Rj}
    3. {lambda}max = max{{lambda}j, {lambda}max}
    4. Set fxdpt = false

(6) If fxdpt = false goto step 4.

2.2 Continuous reactions
Given a system of reactions partitioned into subsets C and D, and under the assumption that rate laws such as mass action or Michaelis–Menten kinetics (Rao et al., 2002) apply, we approximate the subset of continuous reactions as a continuous Markov process and use the following system of ODE to model the evolution of the system as affected by only these reactions.

Formula 3(3)
The approximation is accurate when conditions (1) and (2) are satisfied by the continuous reactions, which we dynamically evaluate to determine if a repartitioning of the system is necessary. In fact, in the thermodynamic limit as the system size tends to infinity, the chemical master equation produces the same process as the solution of the ODEs, and the approximation becomes exact.

ODEs are fairly simple computationally. They can be solved using a variety of numerical methods, from the Euler method to higher-order Runge–Kutta methods, many of which are readily available in software packages that can easily be incorporated into existing simulation code.

2.3 Discrete reactions
Having considered the simulation of the continuous reactions, we now describe how the discrete reaction events are simulated. Since the propensities of the discrete reactions depend on the state changes due to the continuous reactions, one can think of the discrete reactions as being represented by a time-dependent Markov process (Gibson and Bruck, 2000). Gillespie (1992) has given a general method for exact stochastic simulation of such a process, and derived the joint probability density function P({tau}, j) for the reaction that occurs next and the time when it occurs. Conditioning P({tau}, j) gives the time-dependent probability density P({tau}) of the Direct variant of the stochastic simulation algorithm as

Formula 4(4)
where

Formula 5(5)
Note that if {lambda}tot is constant in time, Equation (4) becomes the density function of an exponential distribution with rate {lambda}tot. Given that a reaction occurs in time {tau}, the probability that reaction Rj occurs is then

Formula 6(6)

In order to sample from the time-dependent probability densities defined in Equations (4) and (6), we use a Monte Carlo technique that equates the integration of a time-dependent density function to a random number. Given two random numbers, x1 from the unity mean exponential distribution and u uniformly distributed between 0 and Formula 6, {tau} and j are chosen as follows:

Formula 7(7)

Formula 8(8)
where Ik is the indicator function

Formula 9(9)
Thus, in our implementation we determine the next reaction time by integrating the ODE system and Formula 9 forward in time until Equation (7) is satisfied. We then choose which reaction occurred according to Equation (8).

2.4 Comparison with related work
In comparing our approach to related work, we consider three aspects described in the previous subsections: the partitioning techniques, the approximation of continuous reactions and the simulation of discrete reaction events.

2.4.1 Partitioning
A partitioning scheme can be characterized by two components: (1) when partitioning is performed and (2) how it is performed. For (1), the partitioning can be performed statically, in an off-line manner (Kiehl et al., 2004), or dynamically, i.e. the system partitioning may change over the course of the simulation (Salis and Kaznessis, 2005). For (2), the system can be partitioned based on the populations of species involved in the reactions, the reaction propensities themselves (Haseltine and Rawlings, 2002), or some combination of the two (Salis and Kaznessis, 2005; Kiehl et al., 2004). In theory, if one had some a priori knowledge of the system, one could classify as continuous those reactions that occur most frequently. However, that may not be plausible for some systems, and certainly does not consider the problem that different reactions could be bottlenecks at different times during the simulation or that some of the reactions that occur most frequently involve small populations. For those reasons, we employ an on-line and dynamic partitioning scheme that uses both population- and rate-based criteria.

2.4.2 Continuous reactions
As an alternative to ODEs, which are fundamentally deterministic, others have proposed stochastic differential equations (SDEs) to model the continuous reactions. The Langevin equation, which extends the ODE formulation to include a stochastic noise term, has been adapted for application to chemical systems by Gillespie (2000). The chemical Langevin equation, written as follows:

Formula 10(10)
where W is a Formula 10-dimensional Wiener process, and can be used to simulate the stochastic dynamics of the system as a result of the continuous reactions (Haseltine and Rawlings, 2002; Salis and Kaznessis, 2005).

We choose to model the continuous reactions using ODEs instead of SDEs for simplicity. Although ODEs are a theoretically less accurate approximation, we justify this decision with the wide availability of rigorous ODE solvers and the acceptable levels of accuracy observed. Note that in the thermodynamic limit, the system evolves deterministically as the chemical Langevin equation reduces to the ODE system in (3). Obviously the limit is unattainable in physical systems, but if one is primarily interested in the stochasticity of the small populations as opposed to the large populations, the approximation is acceptable.

2.4.3 Discrete reactions
One may also describe the transitions in the time-dependent Markov process, which represents the discrete reactions, with other probability distributions. Salis and Kaznessis (2005) use the time-dependent probability density of the Next Reaction variant of the stochastic simulation algorithm, and construct a system of ‘jump equations’, one for each discrete reaction, whose solution produces the time at which the next reaction occurs. Regardless of the distribution, the approaches are mathematically equivalent. While the Direct approach is more computationally efficient, the Next Reaction approach offers the flexibility of allowing different distributions of reaction times among the individual reactions.


    3 MeTHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THEORY
 3 MeTHODS
 4 RESULTS
 5 DISCUSSION
 REFERENCES
 
We now describe our methods for solving the partitioned reaction system of Section 2, beginning with an overview of the hybrid simulation algorithm followed by a discussion of the implementation of the algorithm.

3.1 Algorithm
At the beginning, X is set to the initial state vector X0, the simulation time t is initialized to zero, and the stoichiometric matrix {nu} is computed. The set of reactions is partitioned according to the criteria established in Section 2.1. If none of the reactions is classified as continuous, the stochastic simulation algorithm based on Gillespie's Direct method is performed, and the ODE integration step is skipped. Otherwise, a random number is selected from the unity mean exponential distribution to prepare for the numerical integration of the ODE system. The integration is then performed until one of the following stopping conditions is met: a discrete event has been generated at time {tau} as described by Equation (7), a population change causes a violation of the large population requirement for valid continuous approximations described by Equation (1), or the end of the simulation is reached. If a discrete event has been generated, the next reaction to occur is randomly chosen, weighted by the relative reaction propensities, and the state vector is updated accordingly. This procedure continues until the entire trajectory is computed, i.e. Formula 10.

  1. Initialize the system:
    a. Formula 10, Formula 10.
    b. Compute stoichiometric matrix {nu}.

  2. Loop while Formula 10 :
    a. Partition set of reactions using the algorithm outlined in Section 2.1.
    b. If continuous reactions exist:
    (1) Select a random number from the unity mean exponential distribution.
    (2) Numerically integrate the ODE using Equation (3). Stop when:
    1. Equation (7) states a discrete reaction should occur,
    2. Equation (1) determines that repartitioning is necessary due to significant changes in populations, or
    3. Formula 10 is reached.


    c. If a discrete reaction is to occur:
    (1) For each Formula 10, compute discrete reaction propensity Formula 10 and set Formula 10.
    (2) If no continuous reactions exist, choose {tau} from the exponential distribution with mean Formula 10.
    (3) Generate u from the uniform distribution Formula 10.
    (4) Choose j such that Formula 10.
    (5) Let Formula 10, where Formula 10 is the jth row of {nu}, and set Formula 10 {tau}.

  3. End simulation and report results.

3.2 Implementation
Our hybrid simulation algorithm was implemented as part of an extensible reaction simulation framework written in ISO C++. For comparison, the framework also consists of a stochastic simulator, which implements the Direct variant of Gillespie's method (Gillespie, 1977), and a deterministic differential equation solver. These simulators have been used to compare accuracy and computation time among the various algorithms for several published models, and to compare our results with published results.

The simulation framework makes extensive use of object-oriented design principles and inheritance. Each solver is implemented as a derived class of the base simulator class, and employs its own simulation algorithm and data structures. In particular, the modular design of the hybrid simulator allows us to quickly and easily plug in different ODE solution methods and experiment with different partitioning techniques.

The framework includes two different higher order approximate solution methods for ODE systems, both of which use adaptive step size control based on the estimate of error at each internal solver step. The first is a standard fifth-order Cash–Karp Runge–Kutta algorithm (Press et al., 1992). The second is CVODE (Cohen and Hindmarsh, 1996), a solver of initial value problems for stiff and non-stiff ODE systems. CVODE is part of the SUNDIALS suite of solvers for differential and non-linear algebraic systems, developed by the Center for Applied Scientific Computing at Lawrence Livermore National Laboratory.

All model and solution parameters, such as {Lambda} from (2), are specified in a model description file. The implementation parses the model description file, instantiates the appropriate simulator object and initializes its data structures.


    4 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THEORY
 3 MeTHODS
 4 RESULTS
 5 DISCUSSION
 REFERENCES
 
We present in detail an example model of biological significance to demonstrate the performance of the hybrid simulation algorithm. In order to show that the algorithm has been tested, and performs well, on a wide variety of models, we will briefly present some results for other models. In the interest of space, however, we have not included a discussion of the results from these models. Such discussions, as well as a listing of reactions and parameters for each model, can be found in the Supplementary information.

The chosen models provide good test cases because they involve species present at varying orders of magnitude and, as a result, some reactions occur much more frequently than others. Thus exact stochastic simulation is quite expensive computationally. In addition, some of the models exhibit bimodal distributions, which are often indicative of bistability. In this situation, the asymptotic behavior of the solutions of an ODE representation of the model depends on the basis of attraction of the initial condition, whereas the asymptotic distribution of the stochastic model is independent of the initial condition. This leads to significant differences between the dynamics of the two representations (Fig. 1). The different regimes of the model corresponding to the peaks of the asymptotic distribution are also likely to have different sets of fast and slow reactions, preventing the use of a static partitioning of the reaction set. We, therefore, investigate hybrid simulation with dynamic partitioning as a reasonable alternative, and consider the accuracy and speedup of the algorithm when applied to these models.


Figure 1
View larger version (21K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 A comparison of the time evolution of the mean (center line) and standard deviation (Formula 10) of GFP for discrete stochastic, hybrid, and deterministic ODE simulation. Results are based on 1000 trials.

 
All experiments in this study were performed on a Linux Fedora Core 3 system with an Athlon XP 2400 processor and 512 MB of RAM. The ODEs in the hybrid algorithm are integrated using the CVODE higher-order methods with adaptive step sizing. Similar results were obtained using Runge–Kutta methods.

4.1 HIV-1 Tat transactivation example
We consider here a previously introduced (Weinberger et al., 2005) model of the stochastic fluctuations in the HIV-1 Tat protein within the Tat transactivation positive feedback loop. These fluctuations have been experimentally and computationally shown to influence the progression of the virus to either latency or productive infection, resulting in two distinct phenotypes. This example was chosen to test our algorithm because it is part of a large and growing body of work investigating the influence of stochastic gene expression on phenotypic variability. In addition, because of the bimodal nature of the system, a deterministic model does not accurately capture the dynamics, and a stochastic description is needed. However, CPU time becomes an obstacle to a discrete simulation, and thus we believe this model fits nicely into our hybrid system framework.

We simulated two variations of the model, corresponding to different initial GFP and Tat concentrations (Dim bulk sort and Mid bulk sort), as well as different kinetic coefficients for GFP translation. In this model, the GFP degradation and translation reactions are by far the ones occurring most frequently, and thus present the greatest burden to a discrete simulation. They are the reactions that the hybrid algorithm targets for elimination by approximating them as continuous events. A list of reactions and parameters of the Tat transactivation model is available in the Supplementary information. Simulations of each model were performed using the hybrid algorithm, exact stochastic simulation and a deterministic ODE solution.

Figure 1 compares the time evolution of the first two moments of the GFP marker, as computed by the three algorithms for Formula 10 s (~1.65 weeks) of simulation time. Qualitatively, this figure shows that the hybrid algorithm manages to capture these moments very well. Furthermore, the deterministic ODE solution is unable to reconstruct even the first moment. The reason is the bimodal nature of the probability density of GFP, which can be seen in Figure 2 for the Mid sort variation of the model at time 106 s.


Figure 2
View larger version (14K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2 The distribution of GFP molecules at timeFormula 10s as computed by exact stochastic simulation (upper) and hybrid simulation (lower) for the Mid sort variation of the Tat transactivation model. Results are based on 10 000 simulations.

 
Note that the hybrid algorithm is quite accurate in reconstructing the entire distribution as computed by the exact solution method. We parameterize the hybrid algorithm with Formula 10 and Formula 10 to construct this figure, but similar results were observed for other values of the solution parameters. In fact, the insensitivity of the result to the value of {gamma} can be explained by the following. Regardless of the fraction of time the reactions involving GFP are classified as continuous or discrete (i.e. regardless of the parameter {gamma}), the continuous approximation will be very good because GFP molecules are present in such high numbers, on the order of thousands to a few million molecules. Thus, for this particular model, the ability of the hybrid algorithm to reconstruct the distribution of GFP molecules is independent of the choice of solution parameters. Furthermore, Figure 3 shows that increasing {gamma}, beyond a certain point, has a dramatic effect on the computational expense of the hybrid algorithm. The reason is that with increasing {gamma}, more of the system is being modeled discretely, and the cost of simulation grows with the number of discrete events. Note that for Formula 10, the hybrid algorithm significantly outperforms the stochastic algorithm, as the CPU time of the former is not affected by the changing solution parameter. For Formula 10 500 000, however, the elimination of reaction events in the hybrid algorithm does not outweigh the overhead of partitioning and switching between continuous and discrete solutions, and thus no speedups are observed with the hybrid algorithm. Nonetheless, as varying {gamma} has no effect on the quality of the hybrid approximation for this example, one does not have to sacrifice accuracy to obtain performance gains. This is not always the case, however, as can be seen in the results from the intracellular viral infection example (Haseltine and Rawlings, 2002; Srivastava et al., 2002) included in the online supplement. In that example, the choice of {gamma} does affect the accuracy of the approximation, because the template species is present in smaller numbers, in the order of tens of molecules.


Figure 3
View larger version (8K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3 A comparison of the CPU time per run of the hybrid algorithm versus the baseline stochastic simulation, as the solution parameter {gamma} is varied, for the Dim sort variation of the Tat transactivation model. Relative speedup is measured as the ratio of the computational run time of stochastic simulation to that of hybrid simulation.

 
4.2 Other models
Table 1 summarizes the performance of the hybrid algorithm, when compared with the stochastic simulation algorithm, on the Tat transactivation example, as well as a few other benchmark models collected from the literature. The table compares the performance of the algorithms on this set of examples, including a model of intracellular viral infection (Haseltine and Rawlings, 2002; Srivastava et al., 2002), the cycle test (Salis and Kaznessis, 2005) and a simple crystallization system (Salis and Kaznessis, 2005; Haseltine and Rawlings, 2002), with respect to the average number of discrete events and CPU time per run.


View this table:
[in this window]
[in a new window]

 
Table 1 A summary of performance results for several benchmark models comparing the stochastic and hybrid simulation algorithms

 
It is likely that a comparable implementation of a static partitioning algorithm would result in even higher performance gains since it would not require repartitioning the reactions. In order to verify that the dynamic partitioning is necessary, our simulation software can output the percentage of simulation time, on average, a reaction was treated as continuous. A static partitioning algorithm would be acceptable when all reactions are always assigned to the same class during the course of the simulation. Supplementary Table 4 in the online supplement provides some partition statistics for the virus replication model. They show that even for small values of {gamma}, for >35% of the simulation time, the conditions are not met to treat reactions (3) and (5) as continuous. Therefore, even in this favorable situation, a static partition would rely on assumptions that may not be verified. Partition statistics of other models provide similar indications that it would be difficult to justify saving the cost of dynamically repartitioning the reactions at run time.


    5 DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THEORY
 3 MeTHODS
 4 RESULTS
 5 DISCUSSION
 REFERENCES
 
Although Gillespie has developed exact methods for simulating the stochastic kinetics of a well-mixed chemical system, these methods become quite computationally intensive for larger systems with higher number of reaction events. Continuous models, such as those described by differential equations, can approximate the exact stochastic solution and are much simpler computationally. However, they are not as useful when modeling systems with low-frequency reaction events and small populations, as they fail to preserve the stochastic nature of such discrete-event systems.

In this study, we presented a hybrid simulation algorithm that partitions the reaction system into continuous and discrete subsets, approximates the continuous reactions deterministically as an ODE system, and generates discrete reaction events by integrating a time-dependent probability density function. We described a unique approach to partitioning in which the extents of reaction are not partitioned based on an absolute threshold, but instead on a threshold relative to the distribution of discrete reaction propensities. The partitioning, which uses a combination of population- and rate-based criteria, is performed at run time and updated dynamically.

We implemented the hybrid algorithm in an extensible simulation framework using principles of object-oriented design and inheritance. Our implementation also utilizes two different rigorous ODE solvers to simulate the continuous reactions. To demonstrate the accuracy and performance of the hybrid algorithm when compared with exact stochastic simulation, an example model was given and both algorithms were used to compute various measures of interest on the model. In this example and others included in the Supplementary information, we were able to show that the hybrid algorithm is reasonably accurate, and that computational savings are possible if the distributions of partitioned reaction extents vary by some orders of magnitude. In the worst case, our hybrid algorithm reduces to Gillespie's Direct method for discrete stochastic simulation with some additional overhead. Furthermore, for one of the examples, the algorithm was parameterized in such a way as to vary the fraction of the system that is modeled continuously. Results from that model suggested there may be a tradeoff between the accuracy and speedup of the algorithm, controlled by the aggressiveness of the partitioning technique employed. The tradeoff is likely dependent on the measure of interest to be computed from the model, particularly if the measure is on small-numbered species. Currently the partitioning strategy is tuned manually, by adjusting the solution parameters {Lambda} and {gamma}, but it should be possible to automatically determine these parameters given the type of measure to be computed and topological information about the reaction network.

Even though the development of approximation methods of Gillespie's Stochastic Simulation Algorithm (SSA) has been very active in the past few years, the scope of problems that actually require these approximations has not yet been determined. Many previously published biological models with documented parameters did not appear as stiff as we anticipated. It was surprisingly difficult to find a biological model demonstrating the performance gain of our approach. The value of dynamic partitioning is best illustrated with unstable models with multimodal distributions that can randomly transition between several regimes. In this situation, the fast reactions change as the system transitions from one regime to the other regime, making it impossible to partition the reactions a priori. Similar to the Tat system, the molecular mechanisms controlling the reactivation of a latent virus are likely to benefit from a dynamic partitioning algorithm. It is also possible that this approach will find applications in epidemiology when estimating the probability of emergence of rare infectious diseases. Performance gain is not the only benefit of dynamic partitioning. The other benefit is the assurance of a predictable computation time that does not depend on the system stability. This feature is necessary when exploring the parameter space of a model such as in optimization applications. Different parameterizations of the model may have different degrees of stiffness and stability. If the basic SSA or a static partitioning approach may give acceptable results for some parameterization, it is likely that other sets of parameters may lead to unacceptably large computation times or erroneous results.


    Acknowledgments
 
The authors would like to thank Kaustubh Joshi for his preliminary work in hybrid systems research, as well as his continued encouragement and help. They also thank Leor Weinberger for his interactions, including the sharing of simulation code and models and personal discussions involving the Tat transactivation example; Jenny Applequist for her editorial assistance; and Michael McQuinn for his contribution in data collection and analysis. This work was supported in large part by a gift from Pioneer Hi-Bred International. Funding to pay the Open Access publication charges for this article was provided by the University of Illinois.

Conflict of Interest: None declared.


    FOOTNOTES
 
Associate Editor: Thomas Lengauer

Received on March 9, 2006; revised on August 4, 2006; accepted on August 26, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THEORY
 3 MeTHODS
 4 RESULTS
 5 DISCUSSION
 REFERENCES
 

    Cao, Y., et al. (2005) Multiscale stochastic simulation algorithm with stochastic partial equilibrium assumption for chemically reacting systems. J. Comput. Phys, . 206, 395–411[CrossRef].

    Cao, Y., et al. (2006) Efficient stepsize selection for the tau-leaping simulation method. J. Chem. Phys, . 124, doi:10.1063/1.2159468.

    Cohen, S.D. and Hindmarsh, A.C. (1996) CVODE, a stiff/nonstiff ODE solver in C. Comput. Phys, . 10, 138–143.

    Elowitz, M.B., et al. (2002) Stochastic gene expression in a single cell. Science, 297, 1183–1186[Abstract/Free Full Text].

    Gibson, M.A. and Bruck, J. (2000) Efficient exact stochastic simulation of chemical systems with many species and many channels. J. Phys. Chem, . 104, 1876–1889.

    Gillespie, D.T. (1977) Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem, . 81, 2340–2361[CrossRef].

    Gillespie, D.T. Markov Processes: An Introduction for Physical Scientists, (1992) , New York Academic Press.

    Gillespie, D.T. (2000) The chemical Langevin equation. J. Chem. Phys, . 113, 297–306[CrossRef].

    Gillespie, D.T. (2001) Approximate accelerated stochastic simulation of chemically reacting systems. J. Chem. Phys, . 115, 1716–1733[CrossRef].

    Haseltine, E.L. and Rawlings, J.B. (2002) Approximate simulation of coupled fast and slow reactions for stochastic chemical kinetics. J. Chem. Phys, . 117, 6959–6969[CrossRef].

    Joshi, K.R., Neogi, N., Sanders, W.H. (2004) In Alur, R. and Pappas, G.J. (Eds.). Dynamic partitioning of large discrete event biological systems for hybrid simulation and analysis. Proceedings of the 7th International Workshop on Hybrid Systems: Computation and Control (HSCC 2004)Philadelphia, PA, pp. 463–476.

    Kiehl, T.R., et al. (2004) Hybrid simulation of cellular behavior. Bioinformatics, 20, 316–322[Abstract/Free Full Text].

    Korobkova, E., et al. (2004) From molecular noise to behavioural variability in a single bacterium. Nature, 428, 574–578[CrossRef][Medline].

    Pedraza, J.M. and van Oudenaarden, A. (2005) Noise propagation in gene networks. Science, 307, 1965–1969[Abstract/Free Full Text].

    Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T. Numerical Recipes in C: The Art of Scientific Computing, (1992) 2nd edn. , New York Cambridge University Press.

    Rao, C.V. and Arkin, A.P. (2003) Stochastic chemical kinetics and the quasi-steady-state assumption: application to the Gillespie algorithm. J. Chem. Phys, . 118, 4999–5010[CrossRef].

    Rao, C.V., et al. (2002) Control, exploitation and tolerance of intracellular noise. Nature, 420, 231–237[CrossRef][Medline].

    Raser, J.M. and O'Shea, E.K. (2004) Control of stochasticity in eukaryotic gene expression. Science, 304, 1811–1814[Abstract/Free Full Text].

    Rosenfeld, N., et al. (2005) Gene regulation at the single-cell level. Science, 307, 1962–1965[Abstract/Free Full Text].

    Salis, H. and Kaznessis, Y. (2005) Accurate hybrid stochastic simulation of a system of coupled chemical or biochemical reactions. J. Chem. Phys, . 122, 054103[CrossRef].

    Srivastava, R., et al. (2002) Stochastic vs. deterministic modeling of intracellular viral kinetics. J. Theor. Biol, . 218, 309–321[CrossRef][Web of Science][Medline].

    Takahashi, K., et al. (2004) A multi-algorithm, multi-timescale method for cell simulation. Bioinformatics, 20, 538–546[Abstract/Free Full Text].

    Weinberger, L.S., et al. (2005) Stochastic gene expression in a lentiviral positive-feedback loop: HIV-1 Tat fluctuations drive phenotypic diversity. Cell, 122, 169–182[CrossRef][Web of Science][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Brief BioinformHome page
J. Pahle
Biochemical simulations: stochastic, approximate stochastic and hybrid approaches
Brief Bioinform, January 16, 2009; (2009) bbn050v1.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Peccoud, T. Courtney, and W. H. Sanders
Mobius: an integrated discrete-event modeling environment
Bioinformatics, December 15, 2007; 23(24): 3412 - 3414.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
22/22/2782    most recent
btl465v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (8)
Google Scholar
Right arrow Articles by Griffith, M.
Right arrow Articles by Sanders, W. H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Griffith, M.
Right arrow Articles by Sanders, W. H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?