Bioinformatics Advance Access originally published online on November 19, 2007
Bioinformatics 2008 24(2):285-286; doi:10.1093/bioinformatics/btm566
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
© Oxford University Press 2007.
The SBML discrete stochastic models test suite
1Department of Mathematical Sciences, University of Liverpool, Liverpool, L69 7ZL, 2School of Mathematics & Statistics, Newcastle University, Newcastle upon Tyne, NE1 7RU and 3Centre for Integrated Systems Biology of Ageing and Nutrition (CISBAN), Newcastle University, UK
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Stochastic simulation is a very important tool for mathematical modelling. However, it is difficult to check the correctness of a stochastic simulator, since any two realizations from a single model will typically be different.
Results: We have developed a test suite of stochastic models that have been solved either analytically or using numerical methods. This allows the accuracy of stochastic simulators to be tested against known results. The test suite is already being used by a number of stochastic simulator developers.
Availability: The latest version of the test suite can be obtained from http://www.calibayes.ncl.ac.uk/Resources/dsmts/ and is licensed under GNU Lesser General Public License.
Contact: D.J.Wilkinson{at}ncl.ac.uk
| 1 INTRODUCTION |
|---|
|
|
|---|
In recent years, it has been increasingly recognized that mathematical modelling can help us to understand complex biological networks. As a result of this, the Systems Biology Markup Language (SBML) was developed as a standard format in which to represent the models (Hucka et al., 2003). SBML is quickly becoming the lingua franca for the development and sharing of biochemical network models.
One popular modelling technique is to use a discrete stochastic kinetic framework. However, testing the correctness of the implementation of the underlying algorithm is difficult, since a stochastic simulator will by definition give you a different realization for each run (for a different seed). This is especially problematic since it is possible for two exact algorithms, such as Gillespie's direct method (Gillespie, 1977) and the Next Reaction Method (Gibson and Bruck, 2000), to have different implementations and to use random number streams in an entirely different way.
A further complication in establishing the correctness of a simulator arises from issues of interpretation of the SBML model representation. The SBML specification contains little guidance relating to the proper procedures to be followed in encoding models intended for discrete stochastic simulation (though the latest specification does contain an example), leading to potential confusion. Indeed, SBML Level 1 was not capable of encoding discrete stochastic kinetic models in a correct, accurate and unambiguous way. Fortunately, SBML Level 2 and beyond are quite capable in this regard. See the discussion in Chapter 2 of Wilkinson (2006) for further details.
This article describes how the SBML discrete stochastic models test suite (DSMTS) can be used to test a stochastic simulator. Versions of the test suite exist for SBML Level 2, versions 1 and 3.
| 2 TESTING A STOCHASTIC SIMULATOR |
|---|
|
|
|---|
The only practical testing method is to run the simulator a large number of times and check that the distribution of outcomes is not significantly different from the true underlying distribution. This can only be tested in a probabilistic way. The test suite is a set of SBML models each with time course data for the means and SDs of the model species. Developers may use the suite to check that their simulators produce results that are consistent with the SBML standard. The test suite assumes that the simulator produces output on a regular time grid. Of course, exact stochastic simulators naturally produce output on a non-regular grid corresponding to individual reaction events [see Wilkinson (2006) for further details on stochastic simulators]. However, this step function output is easy to map onto a regular time-grid either post hoc, or during the simulation run itself.
In order to test the output from an exact stochastic simulator for a given SBML model (Gillespie, 1977), n independent simulation runs of the simulator should be performed. For the statistical tests to have reasonable power to detect subtle problems, n should be set to at least 10 000. The sample means and SDs of the species amounts from the simulation runs at t = 0,1,...,50 can be compared with the corresponding values in the test suite using the statistical tests described below. Figure 1 shows the means and SDs over time for an example model that includes an event. By comparing the output of many simulations to the true value, we can test the stochastic simulator. The simulated values of a particular species, X, can be tested as follows: let
be the value of Xt on the i th run of the simulator, where Xt is the random variable representing X at time t. Put µt = E(Xt) and
. Assuming that simulator runs are independent, by the Central Limit Theorem we have
|
|
|
|
|
Although the test suite is designed primarily for rigorous testing of exact simulators, it should also prove useful to developers of approximate simulators (Gillespie and Petzold, 2003) or hybrid simulators (Kiehl et al., 2004; Puchalka and Kierzek, 2004). For example, one way of using the test suite to assess the performance of an approximate simulator is to plot the means and SDs as percentages of their true values.
The DSMTS currently uses variations of three simple models:
- The birth–death process [see Cox andMiller (1965) for details];
- The dimerization process (Wilkinson, 2006);
- The batch immigration-death process, of which the classic immigration-death process is a special case (Gillespie and Renshaw, 2005).
1 h, for n = 10 000 and a reasonably fast simulator. | 3 CONCLUSION |
|---|
|
|
|---|
The test suite is already being employed by a number of stochastic simulator developers. For example, the developers of the Systems Biology Workbench (Hucka et al., 2002; Vallabhajosyula and Sauro, 2007), the BASIS system (Gillespie et al., 2006; Kirkwood et al., 2003) and COPASI (Hoops et al., 2006) all use the test suite routinely. We have found the test suite to be invaluable when developing our own stochastic simulator, as it provides a simple and systematic means with which to test many aspects of the simulator behaviour.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
Work on the DSMTS was partially funded by the BBSRC through grants BEP 17042, BBS/B/16550, and BBC0082001.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Thomas Lengauer
Received on September 17, 2007; revised on October 18, 2007; accepted on November 7, 2007
| REFERENCE |
|---|
|
|
|---|
Cox DR, Miller HD. The Theory of Stochastic Processes (1965) London: Methuen.
Gibson MA, Bruck J. Efficient exact stochastic simulation of chemical systems with many species and many channels. J. Phys. Chem. (2000) 104:1876–1889.[Web of Science]
Gillespie CS, Renshaw E. The evolution of a batch-immigration death process subject to counts. Proc. R. Soc. A (2005) 461:1563–1581.
Gillespie CS, et al. Tools for the SBML community. Bioinformatics (2006) 22:628–629.
Gillespie DT. Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. (1977) 81:2340–2361.[CrossRef][Web of Science]
Gillespie DT, Petzold LR. Improved leap-size selection for accelerated stochastic simulation. J. Chem. Phys. (2003) 119:8229–8234.[CrossRef]
Hoops S, et al. COPASI – a COmplex PAthway SImulator. Bioinformatics (2006) 22:3067–3074.
Hucka M, et al. The ERATO systems biology workbench: enabling interaction and exchange between software tools for computational biology. Pac. Symp. Biocomput. (2002) 450–461.
Hucka M, et al. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics (2003) 19:524–531.
Kiehl TR, et al. Hybrid simulation of cellular behavior. Bioinformatics (2004) 20:316–322.
Kirkwood T.BL, et al. Towards an e-biology of ageing: integrating theory and data. Nat. Rev. Mol. Cell Biol. (2003) 4:243–249.[Web of Science][Medline]
Puchalka J, Kierzek AM. Bridging the gap between stochastic and deterministic regimes in the kinetic simulations of the biochemical reaction networks. Biophys. J. (2004) 86:1357–1372.[Web of Science][Medline]
Vallabhajosyula RR, Sauro HM. Stochastic simulation GUI for biochemical networks. Bioinformatics (2007) 23:1859–1861.
Wilkinson DJ. Stochastic Modelling for Systems Biology (2006) Chapman & Hall/CRC Press.
This article has been cited by other articles:
![]() |
J. Pahle Biochemical simulations: stochastic, approximate stochastic and hybrid approaches Brief Bioinform, January 16, 2009; (2009) bbn050v1. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

