Skip Navigation


Bioinformatics Advance Access originally published online on July 16, 2008
Bioinformatics 2008 24(17):1933-1934; doi:10.1093/bioinformatics/btn338
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/17/1933    most recent
btn338v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Vyshemirsky, V.
Right arrow Articles by Girolami, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Vyshemirsky, V.
Right arrow Articles by Girolami, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

BioBayes: A software package for Bayesian inference in systems biology

Vladislav Vyshemirsky * and Mark Girolami

Department of Computing Science, University of Glasgow, G12 8QQ, UK

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 APPLICATIONS
 4 SUMMARY
 ACKNOWLEDGEMENTS
 REFERENCES
 

Motivation: There are several levels of uncertainty involved in the mathematical modelling of biochemical systems. There often may be a degree of uncertainty about the values of kinetic parameters, about the general structure of the model and about the behaviour of biochemical species which cannot be observed directly. The methods of Bayesian inference provide a consistent framework for modelling and predicting in these uncertain conditions. We present a software package for applying the Bayesian inferential methodology to problems in systems biology.

Results: Described herein is a software package, BioBayes, which provides a framework for Bayesian parameter estimation and evidential model ranking over models of biochemical systems defined using ordinary differential equations. The package is extensible allowing additional modules to be included by developers. There are no other such packages available which provide this functionality.

Availability: http://www.dcs.gla.ac.uk/BioBayes/

Contact: vvv@dcs.gla.ac.uk


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 APPLICATIONS
 4 SUMMARY
 ACKNOWLEDGEMENTS
 REFERENCES
 
Inferring the structure of biochemical systems from experimental observations is one of the important challenges in systems biology (Burbeck and Jordan, 2006). Such structures are usually defined with mathematical models. One of the advantages of using formal mathematical models is the possibility to make predictions of system behaviour alongside explaining the observed processes. Systems of ordinary differential equations (ODEs) are a widely used formalism for modelling biochemical systems (see, for example, de Jong, 2003; Voit, 2000). Inferring parameters of ODE models of biochemical systems can be achieved using methods of Bayesian inference (Golightly and Wilkinson, 2005; Rogers et al., 2006), and evidence-based ranking of alternative models is possible using Bayes factors (Vyshemirsky and Girolami, 2008).

The main benefit of adopting the Bayesian approach to model inference is the consistent propagation of uncertainty through all the stages of analysis and the formal way in which prior knowledge can be included in the modelling process. This approach allows one to consider noisy observations as a source of data for learning full distributions of beliefs rather than restricting oneself to the most plausible explanation of some phenomenon. So, instead of making future predictions based on one's best guess, the Bayesian approach considers all probable outcomes.

Implementing the methods of Bayesian inference for probabilistic analysis of biochemical models, however, requires addressing many technical problems such as solving initial value problems for stiff systems of differential equations (Press et al., 2002), or estimating effective proposal distributions for satisfactory convergence of Markov Chain Monte Carlo (MCMC) algorithms (Gelman et al., 1995). It is also important to mention, that in recent years the scientific community has formulated a number of standards for a unified description and exchange of data and models, for example, the SBML standard (Hucka et al., 2003) for models of biochemical systems. At the same time working with ad hoc implementations of inference algorithms usually require some fine tuning to each particular problem.

We herein present an extensible software package, BioBayes, which supports standard definitions of mathematical models, and provides a framework for applying methods of Bayesian inference to ODE models of biochemical systems. In addition to implementations of general inference and model comparison methods, BioBayes provides an infrastructure for plugging-in user specific methods using standard interfaces, thus enabling fine tailoring of the tool to user's specific requirements if needed.

It is important to mention that Bayesian inference over ODE models can be performed using some other tools, for example, WinBUGS (Lunn et al., 2004) with the WBDiff extension, however, BioBayes supports importing biological models in the SBML standard. The support for model exchange standards saves significant modelling efforts when using BioBayes.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 APPLICATIONS
 4 SUMMARY
 ACKNOWLEDGEMENTS
 REFERENCES
 
BioBayes is built using the Java virtual machine for its user interface, while using platform-specific libraries for effective computations. Its modular architecture is based on the Eclipse Rich Client Platform that allows straightforward integration with many existing extensions (e.g. version control of documents). This modular structure also allows users to build their own extensions to BioBayes ranging from additional algorithms for more effective inference in specific classes of problems to new types of models and editors.

The software package allows users to organize their files in projects, import SBML models, create datasets in such projects, and define tasks for parameter inference or model comparison.

The release version of BioBayes published at the official website http://www.dcs.gla.ac.uk/BioBayes/ includes two MCMC methods for model parameter inference and one method for model comparison. We estimated the computational speed of these algorithms to be approximately the same as the performance of ad hoc Matlab samplers.

Metropolis–Hastings sampler
The implementation of the Metropolis–Hastings sampler (Hastings, 1970) utilizes MCMC methods and enables model parameter inference from experimental data for simpler models of biochemical systems.

Users can define the desired prior distributions for model parameters, run this sampler to infer parameter posteriors using one or more experimental datasets, and progress can be monitored via the results pane.

We optimize the proposal distribution of the Metropolis–Hastings sampler for more effective convergence of Markov Chains by scaling the proposal variance proportionally to the local acceptance ratio and also by adjusting the proposal covariance matrix to a local approximation of the posterior distribution as described by (Gelman et al., 1995).

This implementation of the Metropolis–Hastings sampler allows users to run several chains at the same time to monitor the convergence of the sampler to the true posterior distribution by comparing within-chain variance of the sample to between-chain variance as proposed by (Gelman et al., 1995). The Formula statistic is computed for that purpose for each of the model parameters. Current values of Formula are displayed in the results pane of the programme. The Formula values approach one as the chains mix. The software allows users to define an acceptable threshold for the Formula values, and the programme assesses that the chains have converged to the true posterior after all the values fall below that threshold. Users, of course, can override that convergence criterion and use, for example, a simple limit on the number of steps during the initialization of the chains.

When it is judged, using the Formula statistic, that the chains have converged to the posterior distribution, the programme produces the final posterior sample performing sample thinning if required by the user. The marginalized projections of this sample are then displayed, and the sample itself can be exported for further analysis using external tools.

Population-based MCMC
There is also a population-based MCMC sampler (Jasra et al., 2007) available that can be applied to more complex problems when straightforward Metropolis–Hastings fails to converge, e.g. when using non-linear oscillator models (see tutorial package and examples on the official website). This sampler runs several Markov chains in parallel using a tempered sequence of distributions as their targets. Moves between different chains in such a sequence of distributions help the sampler to overcome energy barriers and therefore sample more efficiently from multi-modal posterior distributions. The number of steps in such a sequence can be adjusted by the user.

The convergence of this sampler to the true posterior distribution is again judged by using the Formula statistic over several population-based MCMC samplers run simultaneously.

Annealing–melting integration
Annealing–melting integration can be used to compute marginal likelihoods, the quantity used for evidence-based ranking of alternative models (Vyshemirsky and Girolami, 2008). This algorithm is based on the population-based MCMC sampler described earlier. The samples from the tempered sequence of target distributions are used to estimate the marginal likelihoods with thermodynamic integrals.

Several population-based MCMC samplers are run simultaneously to evaluate their convergence to the true posterior distribution, and at the same time the SD of the final estimate is computed using this set of simultaneous samplers.


    3 APPLICATIONS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 APPLICATIONS
 4 SUMMARY
 ACKNOWLEDGEMENTS
 REFERENCES
 
Consider an example of performing Bayesian inference over the parameters of a model of exponential protein decay. The concentration of protein S undergoing the decay process may be defined using the following differential equation:


Formula

where k1 is the decay rate parameter. The overall statistical model has two more parameters: the initial concentration of the decaying protein S|t=0, and observation noise variance {sigma}2. Arbitrarily selecting values for these parameters: S|t=0=1, k1=0.1 and {sigma}2=0.1, we generated an example dataset D. Priors have then been assigned to model parameters. A singular prior S|t=0=1 was used for parameter S|t=0, while parameters k1 and {sigma}2 were assigned a Gamma prior {Gamma}(1,2). The Metropolis–Hastings sampler from the BioBayes package is then applied to infer the parameter posterior. The inferred posterior sample has mean of Formula=0.132, Formula=0.126 and SD of sFormula=0.02, sFormula=0.04, which matches very well the original values used for generating dataset D.

A detailed tutorial which includes this example and examples of parameter inference and model ranking for non-linear circadian controllers is available from the official website.


    4 SUMMARY
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 APPLICATIONS
 4 SUMMARY
 ACKNOWLEDGEMENTS
 REFERENCES
 
BioBayes can import SBML descriptions of biochemical models together with experimental data to perform consistent Bayesian learning of model parameter values. It can also estimate marginal likelihoods of alternative models for evidence based model comparison and ranking.

The software is built using a modular architecture which enables straightforward extension by users.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 APPLICATIONS
 4 SUMMARY
 ACKNOWLEDGEMENTS
 REFERENCES
 
Mark Girolami is an EPSRC Advanced Research Fellow, EP/E052029/1.

Funding: This research is funded by Microsoft Research within the ‘Modelling and Predicting in Biology and Earth Sciences 2006’ programme.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Olga Troyanskaya

Received on March 6, 2008; revised on June 3, 2008; accepted on July 1, 2008

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 APPLICATIONS
 4 SUMMARY
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Burbeck S, Jordan KE. An assessment of the role of computing in systems biology. IBM J. RES & DEV. (2006) 90:529–543.

    de Jong H. Modeling and simulation of genetic regulatory systems: a literature review. Lecture Notes in Computing Science (2003) 2602:149–162.[CrossRef]

    Gelman A, et al. Bayesian Data Analysis. (1995) USA: Chapman & Hall/CRC.

    Golightly A, Wilkinson DJ. Bayesian inference for stochastic kinetic models using a diffusion approximation. Biometrics (2005) 61:781–788.[CrossRef][Web of Science][Medline]

    Hastings WK. Monte Carlo sampling methods using Markov chains and thier applications. Biometrika (1970) 57:97–109.[Abstract/Free Full Text]

    Jasra A, et al. On population-based simulation for static inference. Stat. Comput. (2007) 17:263–279.[CrossRef]

    Lunn DJ, et al. WinBUGS – a Bayesian modelling framework: concepts, structure, and extensibility. Stat. Comput. (2004) 10:325–337.[CrossRef]

    Hucka M, et al. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics (2003) 19:524–531.[Abstract/Free Full Text]

    Press WH, et al. Numerical Recipes in C++: The Art of Scientific Computing. (2002) Cambridge: Cambridge University Press.

    Rogers S, et al. Bayesian model-based inference of transcription factor activity. BMC Bioinformatics (2006) 8(Suppl. 2):S2.[CrossRef]

    Voit EO. Computational Analysis of Biochemical Systems. (2000) Cambridge: Cambridge University Press.

    Vyshemirsky V, Girolami MA. Bayesian ranking of biochemical system models. Bioinformatics (2008) 24:833–839.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
F. Y. Bois
GNU MCSim: Bayesian statistical inference for SBML-coded systems biology models
Bioinformatics, June 1, 2009; 25(11): 1453 - 1454.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/17/1933    most recent
btn338v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Vyshemirsky, V.
Right arrow Articles by Girolami, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Vyshemirsky, V.
Right arrow Articles by Girolami, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?