Skip Navigation


Bioinformatics Advance Access originally published online on December 5, 2007
Bioinformatics 2008 24(6):833-839; doi:10.1093/bioinformatics/btm607
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow An erratum has been published
Right arrow All Versions of this Article:
24/6/833    most recent
btm607v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Vyshemirsky, V.
Right arrow Articles by Girolami, M. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Vyshemirsky, V.
Right arrow Articles by Girolami, M. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Bayesian ranking of biochemical system models

Vladislav Vyshemirsky * and Mark A. Girolami

Department of Computing Science, University of Glasgow, Glasgow, G12 8QQ, UK

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 APPROACH
 3 METHODS AND RESULTS
 4 DISCUSSION
 5 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Motivation: There often are many alternative models of a biochemical system. Distinguishing models and finding the most suitable ones is an important challenge in Systems Biology, as such model ranking, by experimental evidence, will help to judge the support of the working hypotheses forming each model.

Bayes factors are employed as a measure of evidential preference for one model over another. Marginal likelihood is a key component of Bayes factors, however computing the marginal likelihood is a difficult problem, as it involves integration of nonlinear functions in multidimensional space. There are a number of methods available to compute the marginal likelihood approximately. A detailed investigation of such methods is required to find ones that perform appropriately for biochemical modelling.

Results: We assess four methods for estimation of the marginal likelihoods required for computing Bayes factors. The Prior Arithmetic Mean estimator, the Posterior Harmonic Mean estimator, the Annealed Importance Sampling and the Annealing-Melting Integration methods are investigated and compared on a typical case study in Systems Biology. This allows us to understand the stability of the analysis results and make reliable judgements in uncertain context. We investigate the variance of Bayes factor estimates, and highlight the stability of the Annealed Importance Sampling and the Annealing-Melting Integration methods for the purposes of comparing nonlinear models.

Availability: Models used in this study are available in SBML format as the supplementary material to this article.

Contact: vvv{at}dcs.gla.ac.uk

Supplementary information: Supplementary data are available at Bioinformatics online.


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 APPROACH
 3 METHODS AND RESULTS
 4 DISCUSSION
 5 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
One of the important challenges of Systems Biology is inferring the structure of biochemical systems from various experimental observations (Burbeck and Jordan, 2006). Alternative models of a biochemical system can be used to describe different hypotheses about the system structure. For example, there are two alternative hypotheses about the topology of the Epidermal Growth Factor (EGF) activated MAPK pathway in PC12 cells (Roux and Blenis, 2004). The first one, supported by Brown et al. (2004) and Schoeberl et al. (2002), suggests that there is only one path involved in activating a protein named ERK when the system is stimulated with EGF. The second one, supported by Kao et al. (2001), suggests that there are two parallel paths involved in this process. These alternative hypotheses are described using systems of ordinary differential equations (ODE). The problem is to decide which of these two alternative hypotheses is more plausible. This can be achieved by performing experiments and collecting data, and then using a statistical inferential methodology to compare the a posteriori (after observing the evidence) probabilities of the alternative models.

In a diverse range of scientific disciplines, it has been shown that Bayesian Inference provides a consistent framework for knowledge updating and hypotheses testing, for example, Archaeology (Christen and Buck, 1998), Astrophysics (Brewer et al., 2007), Forensic Science (Dawid and Mortera, 1996), Bioinformatics and Computational Biology (Werhli et al., 2006; Wilkinson, 2007). In this article, we demonstrate how Bayesian hypotheses testing can be employed to rank ODE models of a biochemical system based on the evidential support they gain from experimental data and existing knowledge.

The research presented in this article is timely, as recent developments in molecular biology allow scientists to obtain more high-quality experimental data vital for consistent hypotheses testing, while at the same time, the research interest of the scientific community is being shifted to study more and more complex systems (Wang et al., 2007). It is important therefore to perform reliable hypotheses testing through consistent model ranking.

In this article, we assess several methods to estimate the marginal likelihood, a quantity which is required for Bayesian hypotheses testing. We consider methods that perform well on nonlinear ODE models using realistic amounts of experimental data.

Mendes et al. (2003) proposed to use artificial models for evaluation and comparison of optimization algorithms in a controlled context, where the correct result of the analysis is known. Such synthetically constructed models together with simulated data serve as a ‘gold standard’ with which to make comparison. The technique of using artificial models for testing methodology has been widely adopted, see for example (Husmeier, 2003; Hu et al., 2007), and as such, this approach will be used in this article.


    2 APPROACH
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 APPROACH
 3 METHODS AND RESULTS
 4 DISCUSSION
 5 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Traditionally hypotheses about the structure of biological systems are expressed using mathematical models, typically (though not exclusively) nonlinear ordinary differential equations (de Jong, 2003; Voit, 2000). The parameters of such models are usually chosen in such a way that the predicted behaviours approach the mean of the experimental observations (Cho et al., 2003; Hoops et al., 2006; Schoeberl et al., 2002). The Bayesian literature (Bernardo and Smith, 1994; Jaynes, 2003; Neal, 1993) argues that using probability distributions over the parameter values is a more appropriate way of expressing the uncertainty inherent from the variability of observations. Distributions over the model parameters were successfully employed to express uncertain beliefs about parameter values by, for example, Brown et al. (2004), and more explicitly by Rogers et al. (2007) and Heron et al. (2007).

Assume that alternative working hypotheses are expressed in a form of predictive parametric mathematical models. The likelihood of reproducing experimental data D, consisting of N independent identically distributed data points, with model M given a particular set of model parameters {theta}, and assuming Normal errors is defined as:


Formula 1

(1)
where xi is the condition in which Di was measured, and {varphi}(M, {theta}, xi) produces the value predicted with model M using parameters {theta} in condition xi.Formula is the normal probability density function, and {sigma}2 is the observation noise variance. Note, that a likelihood of this form is always strictly positive. This definition of the likelihood was chosen because it models the measurement noise. However, other likelihood definitions, for example Gaussian processes (Rasmussen and Williams, 2006), can be used if a more sophisticated model for the noise process is required.

In the case where a discrete set of competing hypotheses is considered, they can be ranked by the ratio of their posterior probabilities. A pair of hypotheses H1 and H2 can be represented with models M1 and M2 having parameters {theta}1 and {theta}2 correspondingly. Taking a prior distribution of beliefs in preference of each model {pi} into account, this ratio is:


Formula 2

(2)
where


Formula 3

(3)
is called the Bayes factor for Models 1 and 2. Bayes factors are used to test competing hypotheses, and update corresponding beliefs using formula (2).

The Bayes factor is a summary of the evidence provided by the data in favour of one hypothesis, represented by a model, as opposed to another. Jeffreys (1961) suggested interpreting Bayes factors in half-units on the log10 scale. Pooling two of his categories together for simplification we demonstrate his scale in Table 1. Using the logarithms of the Bayes factors is often convenient, and log B12 is sometimes called the ‘weight of evidence in favour of H1’, while log likelihoods are sometimes called ‘surprise values’, ‘log evidence’ or ‘information content’ (MacKay, 2003). Logarithmic likelihoods are measured in units of information content which depend on the base of the logarithms used.1


View this table:
[in this window]
[in a new window]

 
Table 1. Interpretation of the Bayes factor as evidence support categories according to Jeffreys(1961)

 
These categories are not a calibration of the Bayes factor, as it already provides a meaningful interpretation as probability, but rather a rough descriptive statement about standards of evidence in scientific investigation. In cases when data is uninformative due to, for example model overspecification, the Bayes factor will be close to 1 and thus the posterior odds do not deviate from the prior odds, thus highlighting that the experimental protocol has been uninformative.

Computing Bayes factors is challenging, as the marginal likelihoods for nonlinear models have to be evaluated to obtain these. The main problem is that the marginal likelihood


Formula

can be evaluated analytically only in very special cases. The majority of biological models, however, are based on nonlinear ODEs. In such cases analytical integration of the marginal likelihood is impossible. Brute force numerical integration can be applied to low-dimensional problems. This approach, however, becomes computationally intractable as its complexity grows exponentially with the dimensionality of a problem.

The reasons of complexity leave us with the only practical option of considering methods for approximate evaluation of marginal likelihoods. Many of these approximate methods are limited by very strong conditions. For example, Laplace approximations (Bernardo and Smith, 1994) are large sample approximations around the maximum a posteriori parameter estimate. Such asymptotic approximations rely on the almost normal density of the posterior distribution, which is often not satisfied for nonlinear problems.

Two more methods which can be applied in a general case are importance sampling estimators (Newton and Raftery, 1994) and thermodynamic integration (Gelman and Meng, 1998; Ogata, 1989).

Four estimators from the above classes: the Prior Arithmetic Mean estimator, the Posterior Harmonic Mean estimator, Annealed Importance Sampling and the Annealing-Melting Integration are evaluated in this article. The first two were chosen because they are popular, straightforward to implement, and relatively inexpensive computationally. It will, however, be demonstrated in this article that estimates produced with these estimators have large variation, and therefore cannot be used when comparing large nonlinear models. The latter two estimators are significantly more sophisticated, and have much higher computational complexity. The estimates produced with these methods are, however, significantly more stable than the ones produced with the importance sampling procedures.

Importance Sampling Estimators:
Importance sampling estimation consists of generating a sample from an unnormalized density {pi}*({theta}). Under quite general conditions, an estimate of the integral


Formula

is


Formula 4

(4)
where {omega}i = p({theta}(i)|M)/{pi}*({theta}(i)); the function {pi}*({theta}) is known as the importance sampling function; and {theta}(i) is sampled from {pi}*({theta}).

The simplest application of this method is to use the prior as the importance sampling function {pi}*({theta}) = p({theta}|M), in which case (4) produces the Prior Arithmetic Mean estimator (McCulloch and Rossi, 1991):


Formula 5

(5)

A well-known problem with this estimator is that the high-likelihood region can be very small. Therefore, unless m is very large, the sample drawn from the prior will contain virtually no points from the high-likelihood region, resulting in a very poor estimate of the marginal likelihood. The problem becomes aggravated in higher dimensions, as the relative size of the high posterior probability region tends to decrease as the dimension increases. Lewis and Raftery (1997) reference a study in which to reduce the standard error to an acceptable level, it was necessary to use a sample of roughly 50 million draws from the prior distribution.

An alternative application of importance sampling estimation, proposed by Newton and Raftery (1994), is to use the parameter posterior as the importance sampling function {pi}*({theta}) = p({theta}|D, M). A sample from the parameter posterior can be obtained using Markov Chain Monte Carlo sampling. Such a sample should be significantly better in covering the high-likelihood region. Substituting the parameter posterior into (4) results in the Posterior Harmonic Mean estimator, we obtain:


Formula 6

(6)

The main problem with this estimator is that sometimes its variance can become infinite (see the additional example provided in the Supplementary Material), because of the occasional occurrence of a value of {theta}(i) with a small likelihood and hence a large effect on the final result. For vague prior distributions, parameters in regions of high posterior mass will have high-likelihood values, whilst parameters in the low posterior probability regions will tend to have low-likelihood values. As the likelihood appears as a reciprocal in (6) this implies that the salient contributions to the sum come from the tails of the posterior distribution, hence the possibility of huge variances. Raftery and Newton (2007) have also demonstrated that while being asymptotically unbiased, this estimator produces biased estimates in many practical cases when finite posterior samples are used, and tends to overestimate the real value of the marginal likelihood.

Annealed Importance Sampling:
Neal (2001) proposed to combine importance sampling principles with the Simulated Annealing heuristic (Kirkpatrick et al., 1983).

The Annealed Importance Sampling estimator can be designed to draw samplesFormula approximately from a series of unnormalized distributions


Formula

which form a path in the probability density space connecting the prior (when β = 0) and the posterior (when β = 1). The important feature for this estimator is thatFormula only need to be drawn approximately; thus, when using a Markov chain for producing these, the convergence of the chains to the stationary distribution is not generally required. This is a very attractive property for integrating the likelihoods of nonlinear models, as in such cases achieving convergence of the Markov chains to their stationary distributions is a very challenging problem.

The following quantities are estimated from such samples:


Formula

The marginal likelihood is then expanded as


Formula 7

(7)
where 0 = β0 ≤ β1 ≤ ... ≤ βn–1 ≤ βn = 1.Each term in (7) is approximated using importance sampling based on samples from qβ({theta}). The logarithm of the marginal likelihood is then estimated as:


Formula 8

(8)
whereFormula , j = 1, ... , M are sampled from Tβi–1({theta}|{theta}i–1)), where Tβi–1 is a Markov transition kernel that has qβi–1({theta}) as its stationary distribution.

Neal (2001) showed that Annealed Importance Sampling produces unbiased estimates of the marginal likelihood.

Thermodynamic Integration:
The logarithm of the marginal likelihood can be represented in terms of the integral


Formula 9

(9)
This integral can be approximated by numerical integration (as in Friel and Pettitt, 2006; Lartillot and Philippe, 2006).

As with Annealed Importance Sampling, this method relies on samples from unnormalized ‘bridging’ distributions qβ ({theta}) that link the prior and the posterior. In the case of thermodynamic integration, a Markov chain Monte Carlo simulation is usually run for particular values of β, in which qβ is used as an unnormalized density in the Metropolis–Hastings ratio. The average of the logarithmic likelihood is then estimated on this sample. This computation is repeated for a series of values of β partitioned between 0 and 1, which implies running a separate chain for each value of β. There are a number of ways to select a schedule for β to estimate this integral. Lartillot and Philippe (2006) use β values equally distributed between 0 and 1, whilst Friel and Pettitt (2006) use a different schedule, selecting these values as


Formula

with N = 40 and c = 4. In the example described in this article, the latter one will be used as it produces lower variance estimates. Though, this schedule was shown to be theoretically suboptimal in general by Gelman and Meng (1998); its variance is superior to a uniform spacing.


    3 METHODS AND RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 APPROACH
 3 METHODS AND RESULTS
 4 DISCUSSION
 5 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Here we consider four alternative models of a biochemical system, with an additional example provided in the Supplementary Material. The models are artificially constructed to demonstrate the essence of the proposed methodology and demonstrate its main points and advantages on an example with a known result.

The schematic diagrams for the models are depicted in Figure 1a–d. The entities in circles represent proteins, while arrows correspond to biochemical reactions. Enzymatic behaviour is indicated by an arrow with a circle as head. For example, see Figure 1b where S is an enzyme for activation of R. Kinetic parameters of the reactions are depicted as text beside the arrows e.g. k1, V1. These networks represent realistic networks, and they all have a structure which is very common in nature (Han et al., 2007).


Figure 1
View larger version (15K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Schematic diagrams of biochemical models used in this study. (a) Model 1: a model of a signal transduction cascade. Protein S represents the input signal. S can degrade to dS. At the same time S activates protein R from its inactive state, to an active state Rpp by binding and activation. Protein Rpp can then be deactivated. This model was used to generate the experimental data. (b) Model 2: a simplified version of a signal transduction cascade. It represents the same process as described by Model 1, but a mechanistic description of the activation process is omitted and replaced with more general functions. (c) Model 3: a biochemical model which is significantly different to the rest of the models considered in this article. This model does not describe degradation of protein S. (d) Model 4: a more complex version of Model 1. This model mechanistically describes how protein Rpp is deactivated by phosphotase PhA.

 
The proposed approach, however, is not limited to signal transduction applications. In the Supplementary Material, we additionally consider the Repressilator (Elowitz and Leibler, 2000) that includes gene regulation processes, and utilizes data measured in a biochemical laboratory.

Model 1: This model defines a common motif of signalling pathways that is a stage in a signal transduction cascade, for example this motif is repeated several times in Schoeberl et al. (2002). The input signal is represented by the concentration of protein S depicted in the top left of the diagram (Fig. 1a). This protein activates the next stage of the cascade by binding to protein R forming complex RS, and activating R into its phosphorylated form Rpp. Protein Rpp can then be deactivated. Model 1 also defines input signal degradation by converting protein S into its degraded form dS.

All the proteins used in this model will be represented as dependent variables in our ODE model. As we are interested in modelling and analysis of temporal behaviour, the independent variable is time. The dephosphorylation reaction Rpp -> R is defined using the Michaelis–Menten kinetic law, while the rest of the reactions (arrows in Fig. 1a) are defined using the Mass Action kinetic law with parameters depicted as textual remarks beside the arrows in the model diagram (e.g. k1, k4). The system of ODEs which defines this model can be found in the Supplementary Material. This model has six kinetic parameters: k1... k4, V, Km.

Model 2: The model depicted in Figure 1b was constructed as a simplified representation of the signal transduction cascade stage. It essentially represents the same system as defined with Model 1, but uses the Michaelis–Menten kinetic law to define phosphorylation of R.

The system of ODEs used in this model can be found in the Supplementary Material. The model has five kinetic parameters: k1, k2, k3, V1, V2.

Model 3: The model depicted in Figure 1c is a version of Model 2 with degradation of protein S removed. As protein S cannot degrade, the model would not be capable of decreasing the signal. Our goal for this model is to demonstrate through hypotheses testing, that this model gains significantly smaller evidential support from data than the rest of the models considered.

The system of ODEs used in this model can be found in the Supplementary Material. The model has four kinetic parameters: k1, k2, V1, V2.

Model 4: The model depicted in Figure 1d is a more complex version of Model 1. Phosphatase PhA depicted in the bottom of the diagram deactivates protein R. All the reactions are defined using the mass action kinetic law. This model was constructed to demonstrate how it would be penalized for complexity according to Occam's razor concept (Jaynes, 2003) in Bayesian hypotheses testing.

The system of ODEs used in this model can be found in the Supplementary Material. This model has seven kinetic parameters: k1 ... k7.

The initial values for all of the models can be found in SBML files provided as the Supplementary Material to this article.

Data generation: For this study, we use data generated from Model 1. To generate the data we simulated the behaviour of Model 1 with the following values for kinetic parameters:


Formula

by solving an initial value problem, and generated the time series of variable values (protein concentrations).

We decided to generate a data set for the experiments as a set of three replicates of a time series of Rpp values, measured at the following time points: t isin{2s, 5s, 10s, 20s, 40s, 60s, 100s}. We added observation noise with variance 0.01 to the simulated values at each of the time points. The data set D contains twenty one samples. The obtained values are depicted in Figure 2.


Figure 2
View larger version (6K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Data set generated from Model 1.

 
Overall statistical models: The definition of the likelihood (1) has been used as described in Section 2. Only one additional noise parameter {sigma} has been added to each of the models, as the same noise value was used when the data was generated. The same noise parameter will be substituted to the normal distributions placed at each of the time points when evaluating the likelihood (1).

More sophisticated models of experimental noise can be considered if required. For example, using immunoblotting data may require a separate noise parameter for different experiments due to different affinities of the labelling antibodies; and using microarray data may require considering a log-normal distribution for experimental noise due to the saturation properties of the microarray probes.

Any information about the parameter values used to generate data for our experiments with Model 1 has been discarded; and the prior for model parameters was defined. As none of the parameters can take a negative value, we defined the priors for all of the parameters of all the models to be distributed according to Gamma distribution {Gamma}(1, 3). The mean of this prior (µ = 3) is significantly larger than any of the parameters used for data generation. Care must be taken however to ensure that the priors are not so vague that Lindley's Paradox may become an issue. This has the effect of the Bayes factors preferring the simplest model description irrespective of the evidence to the contrary (see e.g. Denison et al., 2002, for a nice accessible discussion).

Hypotheses testing: We now compare the alternative hypotheses defined using Models 1–4 by estimating corresponding marginal likelihoods and computing the Bayes factors. The Metropolis–Hasting algorithm (Hastings, 1970) was used to produce samples from distributions required for estimation of marginal likelihoods. The estimates are based on 400 000 samples drawn from each of the required distributions. We monitored the convergence of 20 parallel Markov chains to the stationary distributions for the Posterior Harmonic Mean estimator and for the Annealing-Melting Integration method for each of the β i=1... 40, using the method proposed by Gelman et al. (1995). Samples from the prior were drawn directly when required for the Prior Arithmetic Mean estimator. For the Annealed Importance Sampling, we produce 4000 proposals according to the Metropolis–Hastings algorithm before drawingFormula . The estimates variance begins to grow if a smaller number is chosen for such ‘burn in’, and larger numbers do not show any benefit in the quality of the estimate.

The obtained estimates (averaged from 10 repetitions for each of the models and each of the proposed estimators) for the log marginal likelihood for each of the models using each of the methods proposed above are given in Table 2. The first row of Table 2 highlights the variance of the estimates obtained using the Prior Arithmetic Mean estimator. The Posterior Harmonic Mean estimator produces larger estimates than the results obtained using the Annealed Importance Sampling or the Annealing-Melting Integration which is consistent with Raftery and Newton (2007). This method in these examples, however, produces the correct relative ranking of the competing models. The variance of the estimates produced using this method may, however, become infinite, so careful monitoring of this variance is required in practice and the Supplementary Material provides an example demonstrating the instability of the estimator. The variances of the estimates produced by the Annealed Importance Sampling and the Annealing-Melting Integration are comparable to each other. The Annealed Importance Sampling is about 15 times faster than the Annealing-Melting Integration method on this particular example, as it requires samples to be drawn from qβ({theta}) only approximately, and therefore the chains do not have to converge to their target distributions in each of these intermediate steps.


View this table:
[in this window]
[in a new window]

 
Table 2. Estimates of ln p(D|Mi)

 
Assuming equal prior distribution of beliefs between the alternative hypotheses {pi}(M1) = {pi}(M2) = {pi}(M3) = {pi}(M4), we take the mean estimates obtained with Annealing-Melting Integration (as they are the most stable) to compute the Bayes factors and use them for hypotheses testing (Table 3).


View this table:
[in this window]
[in a new window]

 
Table 3. The obtained Bayes factors

 
These correspond to the following relative ranking of the four competing models:


Formula

The incorrect model (Model 3) gained the smallest evidential support and its marginal likelihood is dwarfed by the marginal likelihoods of other models. Model 1, which was used for data generation, has the maximal marginal likelihood, and therefore should be preferred over the rest of the models. Model 4, which was constructed to be an overly complex version of Model 1, has a smaller marginal likelihood value, and therefore is rated second. This demonstrates that Bayesian hypotheses testing accounts for the complexity of models, and implements Occam's razor principle.

According to the evidence support categories by Jeffreys (1961) defined in Table 1, the evidence suggests a ‘decisive’ preference of the original Model 1 over the rest of the models. Interestingly, if the true model, Model 1, is not in the set being considered, we see that Model 4 {succeeds} Model 2, though the log B42 = 2.446 is only just decisive. This suggests both models may be considered as plausible explanation of the observed data.


    4 DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 APPROACH
 3 METHODS AND RESULTS
 4 DISCUSSION
 5 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
We perform Bayesian hypothesis testing for a selection of biochemical models by computing Bayes factors. An alternative approach to this is based on the maximum a posteriori point estimates of the model parameters. Such estimates can be produced using, for example, the Simulated Annealing algorithm (Hoops et al., 2006; Kirkpatrick et al., 1983). Once such point estimates are found, the alternative models are compared by the maximized value of the corresponding posterior densities. The main problem with using maximum a posteriori estimates for model comparison is their common trend to prefer more complex models over simpler ones due to the ability of more complex models to provide a better ‘fit’ to data.

The main problem with a more general non-Bayesian hypotheses testing framework using frequentist significance tests is the obscurity of meaning for the P-values (Goodman, 1999). The smaller the P-value is, the more significant the result is said to be. This value, however, is not the probability of an error. Moreover, non-Bayesian tests tend to reject null hypotheses in very large samples, whereas Bayes factors do not. An example with a number of samples n = 113 566 was discussed by Raftery (1986), where a meaningful model that explained 99.7% of the deviance was rejected by a standard {chi}2 test with a P-value of about 10–120 but was nevertheless favoured by the Bayes factor.

Akaike (1983) proposed yet another criterion for model comparison, which takes the complexity of the models into account. This criterion suggests choosing the model which minimizes AIC (Akaike information criterion): AIC= –1(log maximum likelihood) + 2(number of parameters). Akaike (1983) claimed that model comparisons based on the AIC are asymptotically equivalent to those made with Bayes factors. But this is true only in the situations when predictions of the prior are compatible to those of the likelihood, and not in the more usual situation when prior information is small in comparison to the information provided by the data. Additionally, using Akaike information criterion, or similar methods, e.g. the deviance information criterion (Spiegelhalter et al., 2002), may not always provide reliable results, as these methods are justified only for the cases when parameter posteriors are unimodal and almost multivariate normal. This is a very rare case in modelling biological systems, for example, the models considered in this study are nonlinear.

The nonlinearity of models employed to describe alternative hypotheses is due to the fact that, in biological research, mechanistic models based on the laws of chemical kinetics are widely used. In the case when experimental data is available for only a few of the model variables (e.g. chemical species), these systems of ODEs collapse into high-order differential equations, which may cause complex nonlinear likelihoods.

Nonlinearity of the models poses the biggest challenge for computing the marginal likelihoods required to calculate the Bayes factors, as simple methods of computing these can fail dramatically (see PAM in Table 2 and PHM in the Supplementary Material). In this article, we have demonstrated that such methods produce very unstable or biased results when applied to nonlinear models of biochemical systems. The reliability of such estimates will be even smaller if the parameter posterior is distributed in multiple equiprobable modes, as sampling from such distributions is very difficult.

On the other hand, the methods based on the principles of path sampling, for example the Annealing-Melting Integration discussed in this article, overcome the problem of integrating difficult distributions by linking the prior, which is usually very simple to sample from, and the posterior with a path in the probability densities space. Moving along such a path helps to overcome so-called energy barriers which separate local modes, and prevent Markov chains used for sampling from convergence to the true posterior distribution. These methods, however, are computationally more expensive, as Markov chains at each of the intermediate steps must converge to their stationary distributions, which may require millions of ‘burn-in’ samples to be discarded first. We provide a comparison of the computational time required to produce one marginal likelihood estimate for Model 4 using a computer with Intel Core 2 Duo processor, PAM = 92 s, PHM = 145 s, AIS = 2187 s, AMI = 28 703 s. The problem of convergence becomes more and more challenging as the complexity and the size of models grow. This is a potential limitation of the applicability of the Annealing-Melting Integration estimator to only small- to medium-sized problems.

The Annealed Importance Sampling method provides a very attractive feature that the chains do not have to converge to the stationary distributions to be useful in producing the estimates. We have demonstrated that the variance of the estimates produced with the Annealed Importance Sampling method is only marginally worse than the stability of Annealing-Melting Integration estimates, while promising faster estimation and better scalability to larger problems.


    5 CONCLUSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 APPROACH
 3 METHODS AND RESULTS
 4 DISCUSSION
 5 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
In this article we considered four alternative hypotheses about the structure of a biological system defined with four realistic ODE models. All the models were artificially constructed to allow testing of the proposed methodology on an example with a known result.

We simulated the experimental data from one of the suggested models; selecting the original model as the most probable one at the model comparison stage, was a crucial result to demonstrate the correctness of the approach. We demonstrated how the principle of Occam's razor works in this framework, as the overly complex model was not preferred to the original one despite it being capable of reproducing the experimental data precisely enough. At the same time, the framework does not blindly select the simplest model. The control experiment using a structurally different model which was not capable of reproducing the general trends of system behaviour was also successful, as we demonstrated that this model was rated significantly lower as compared to the rest of the alternatives.

We estimated the marginal likelihood for each of the models using four different estimators: the Prior Arithmetic Mean estimator, the Posterior Harmonic Mean estimator, the Annealed Importance Sampling and the Annealing-Melting Integration method. The estimates produced with the Prior Arithmetic Mean estimator are very unstable, while the estimates of the marginal likelihoods produced with the Posterior Harmonic Mean estimator are biased to larger values, though still capable of producing a correct ranking in this study. This demonstrates that in the cases when complex nonlinear models are employed in a study, the traditional methods of marginal likelihood estimation with importance sampling procedures may fail to provide reliable results. At the same time Annealed Importance Sampling and Annealing-Melting Integration methods provide low variance estimates. We propose to use the Annealed Importance Sampling or Annealing-Melting Integration methods in the cases when hypotheses are expressed with nonlinear ODE models. The Annealed Importance Sampling method is usually faster, as it does not require the samplers to converge when sampling from distributions bridging the prior and the posterior. At the same time, the Annealing-Melting Integration produces smaller errors of the estimates. We highlight the stability of the estimates obtained using annealing-related methods, and foresee the potential of these methods to scale to more complex applications.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 APPROACH
 3 METHODS AND RESULTS
 4 DISCUSSION
 5 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
This research is funded by Microsoft Research within the ‘Modelling and Predicting in Biology and Earth Sciences 2006’ programme.

M.A.G. is an EPSRC Advanced Research Fellow, EP/E052029/1.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Limsoon Wong

1 ‘Bit’ corresponds to log2 p, ‘nat'to ln p, ‘ban'to log10 p and ‘deciban’ to 10 · log10 p. A historical overview for these names can be found in (MacKay, 2003). Back

Received on August 28, 2007; revised on October 26, 2007; accepted on December 3, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 APPROACH
 3 METHODS AND RESULTS
 4 DISCUSSION
 5 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Akaike H. Information measures and model selection. Bull. Int. Stat. Inst (1983) 50:277–290.

    Bernardo JM, Smith AFM. Bayesian Theory. (1994) Chichester: Wiley.

    Brewer BJ, et al. Bayesian Inference from Observations of Solar-like Oscillations. Astrophys. J (2007) 654:551–557.[CrossRef][Web of Science]

    Brown KS, et al. The statistical mechanics of complex signaling networks: nerve growth factor signaling. Phys. Biol (2004) 1:184–195.[CrossRef][Web of Science][Medline]

    Burbeck S, Jordan KE. An assessment of the role of computing in systems biology. IBM J. RES DEV (2006) 90:529–543.

    Cho K-H, et al. Mathematical modeling of the influence of RKIP on the ERK signaling pathway. Lect. Notes Comput. Sci (2003) 2602:127–141.[CrossRef]

    Christen JA, Buck CE. Sample selection in radiocarbon dating. Appl. Stat (1998) 47:543–557.

    Dawid AP, Mortera J. Coherent analysis of forensic identification evidence. J. R. Stat. Soc. [Ser B] (1996) 58:425–443.

    de Jong H. Modeling and simulation of genetic regulatory systems: a literature review. Lect. Notes Comput. Sci (2003) 2602:149–162.

    Denison D, et al. Bayesian Methods for Nonlinear Classification and Regression. (2002) Wiley.

    Elowitz MB, Leibler S. A synthetic oscillatory network of transcriptional regulators. Nature (2000) 403:335–338.[CrossRef][Web of Science][Medline]

    Friel N, Pettitt AN. Marginal likelihood estimation via power posteriors. Technical report (2006) Department of Statistics, University of Glasgow.

    Gelman A, Meng X-L. Simulating normalising constants: from importance sampling to bridge sampling to path sampling. Stat. Sci (1998) 13:163–185.[CrossRef][Web of Science]

    Gelman A, et al. Bayesian Data Analysis. (1995) London: Chapman & Hall.

    Goodman S. Toward evidence-based medical statistics. 1 : the P value fallacy. Ann. Intern. Med (1999) 130:995–1004.[Abstract/Free Full Text]

    Han Z, et al. Signal transduction network motifs and biological memory. J. Theor. Biol (2007) 246:755–761.[CrossRef][Web of Science][Medline]

    Hastings WK. Monte Carlo sampling methods using Markov chains and thier applications. Biometrika (1970) 57:97–109.[Abstract/Free Full Text]

    Heron EA, et al. Bayesian inference for dynamic transcriptional regulation; the hes1 system as a case study. Bioinformatics (2007) 23:2596–2603.[Abstract/Free Full Text]

    Hoops S, et al. COPASI—a COmplex PAthway SImulator. Bioinformatics (2006) 22:3067–3074.[Abstract/Free Full Text]

    Husmeier D. Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks. Bioinformatics (2003) 19:2271–2282.[Abstract/Free Full Text]

    Hu J, et al. Non-parametric quantification of protein lysate arrays. Bioinformatics (2007) 23:1986–1994.[Abstract/Free Full Text]

    Jaynes ET. Probability Theory: The Logic Of Science. (2003) Cambridge: Cambridge University Press.

    Jeffreys H. Theory of Probability. (1961) 3rd. Oxford: Clarendon Press.

    Kao S, et al. Identification of the mechanisms regulating the differential activation of the MAPK cascade by epidermal growth factor and nerve growth factor in PC12 cells. J. Biol. Chem (2001) 276:18169–18177.[Abstract/Free Full Text]

    Kirkpatrick S, et al. Optimization by simulated annealing. Science (1983) 220:671–680.[Abstract/Free Full Text]

    Lartillot N, Philippe H. Computing Bayes factors using thermodynamic integration. Syst. Biol (2006) 55:195–207.[CrossRef][Medline]

    Lewis SM, Raftery AE. Estimating Bayes factors via posterior simulation with the Laplace-Metropolis estimator. J. Am. Stat. Assoc (1997) 92:648–655.[CrossRef][Web of Science]

    MacKay DJC. Information Theory, Inference, and Learning Algorithms. (2003) Cambridge: Cambridge University Press.

    McCulloch RE, Rossi PE. Bayes factors for nonlinear hypotheses and likelihood distributions. Technical Report 101 (1991) Statistics Research Center, University of Chicago, Graduate School of Business.

    Mendes P, et al. Artificial gene networks for objective comparison of analysis algorithms. Bioinformatics (2003) 19(Suppl. 2):ii122–ii129.[Abstract]

    Neal RM. Probabilistic inference using Markov Chain Monte Carlo methods. Technical Report CRG-TR-93-1 (1993) Department of Computer Science, University of Toronto.

    Neal RM. Annealed importance sampling. Statistics and Computing (2001) 11:125–139.[CrossRef][Web of Science]

    Newton MA, Raftery AE. Approximate Bayesian inference by the weighted likelihood bootstrap. JRSS Ser. B (1994) 3:3–48.

    Ogata Y. A Monte Carlo method for high dimensional integration. Num. Math (1989) 55:137–157.[CrossRef]

    Raftery AE. Choosing models for cross-classifications. Am. Sociol. Rev (1986) 51:145–146.[CrossRef][Web of Science]

    Raftery AE, Newton MA. Estimating the integrated likelihood via posterior simulation using the harmonic mean identity. Bayesian Stat (2007) 8:1–45.

    Rasmussen CE, Williams CKI. Gaussian Processes for Machine Learning. (2006) Cambridge, MA, USA: MIT Press.

    Rogers S, et al. Bayesian model-based inference of transcription factor activity. BMC Bioinformatics (2007) 8(Suppl. 2):S2. doi:10.1186/1471-2105-8-S2-S2.

    Roux PP, Blenis J. ERK and p38 MAPK-activated protein kinases: a family of protein kinases with diverse biological functions. Microbiol. Mol. Biol. Rev (2004) 68:320–344.[Abstract/Free Full Text]

    Schoeberl B, et al. Computational modelling of the dynamics of the MAP kinase cascade activated by surface and internalised EGF receptors. Nat. Biotechnol (2002) 20:370–375.[CrossRef][Web of Science][Medline]

    Spiegelhalter DJ, et al. Bayesian measures of model complexity and fit (with discussion). JRSS Ser. B (2002) 64:583–639.

    Voit EO. Computational Analysis of Biochemical Systems. (2000) Cambridge: Cambridge University Press.

    Wang Y, et al. A continuum mathematical model of endothelial layer maintenance and senescence. Theor. Biol. Med. Model (2007) 4. doi:10.1186/1742-4682-4-30.

    Werhli AV, et al. Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical Gaussian models and Bayesian networks. Bioinformatics (2006) 22:2523–2631.[Abstract/Free Full Text]

    Wilkinson DJ. Bayesian methods in bioinformatics and computational systems biology. Briefings in bioinformatics (2007) 8:109–116.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
T. Toni and M. P. H. Stumpf
Simulation-based model selection for dynamical systems in systems and population biology
Bioinformatics, January 1, 2010; 26(1): 104 - 110.
[Abstract] [Full Text] [PDF]


Home page
J R Soc InterfaceHome page
T. Toni, D. Welch, N. Strelkowa, A. Ipsen, and M. P.H Stumpf
Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems
J R Soc Interface, February 6, 2009; 6(31): 187 - 202.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
V. Vyshemirsky and M. Girolami
BioBayes: A software package for Bayesian inference in systems biology
Bioinformatics, September 1, 2008; 24(17): 1933 - 1934.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
P. Gao, A. Honkela, M. Rattray, and N. D. Lawrence
Gaussian process modelling of latent chemical species: applications to inferring transcription factor activities
Bioinformatics, August 15, 2008; 24(16): i70 - i75.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow An erratum has been published
Right arrow All Versions of this Article:
24/6/833    most recent
btm607v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Vyshemirsky, V.
Right arrow Articles by Girolami, M. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Vyshemirsky, V.
Right arrow Articles by Girolami, M. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?