Skip Navigation


Bioinformatics Advance Access originally published online on September 25, 2008
Bioinformatics 2008 24(22):2592-2601; doi:10.1093/bioinformatics/btn483
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/22/2592    most recent
btn483v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Yoshida, R.
Right arrow Articles by Higuchi, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Yoshida, R.
Right arrow Articles by Higuchi, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Bayesian learning of biological pathways on genomic data assimilation

Ryo Yoshida 1,*, Masao Nagasaki 2, Rui Yamaguchi 2, Seiya Imoto 2, Satoru Miyano 2 and Tomoyuki Higuchi 1

1Institute of Statistical Mathematics, Research Organization of Information and Systems, 4-6-7 Minami-Azabu, Minato-ku, Tokyo 106-8569 and 2Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 PATHWAY MODELING
 3 PARAMETER ESTIMATION
 4 MODEL SELECTION
 5 NUMERICAL RESULTS
 6 CONCLUDING REMARKS
 ACKNOWLEDGEMENTS
 REFERENCES
 

Motivation: Mathematical modeling and simulation, based on biochemical rate equations, provide us a rigorous tool for unraveling complex mechanisms of biological pathways. To proceed to simulation experiments, it is an essential first step to find effective values of model parameters, which are difficult to measure from in vivo and in vitro experiments. Furthermore, once a set of hypothetical models has been created, any statistical criterion is needed to test the ability of the constructed models and to proceed to model revision.

Results: The aim of our research is to present a new statistical technology towards data-driven construction of in silico biological pathways. The method starts with a knowledge-based modeling with hybrid functional Petri net. It then proceeds to the Bayesian learning of model parameters for which experimental data are available. This process exploits quantitative measurements of evolving biochemical reactions, e.g. gene expression data. Another important issue that we consider is statistical evaluation and comparison of the constructed hypothetical pathways. For this purpose, we have developed a new Bayesian information–theoretic measure that assesses the predictability and the biological robustness of in silico pathways.

Availability: The FORTRAN source codes are available at the URL http://daweb.ism.ac.jp/~yoshidar/GDA/

Supplementary information: Supplementary data are available at Bioinformatics online.

Contact: yoshidar{at}ism.ac.jp


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 PATHWAY MODELING
 3 PARAMETER ESTIMATION
 4 MODEL SELECTION
 5 NUMERICAL RESULTS
 6 CONCLUDING REMARKS
 ACKNOWLEDGEMENTS
 REFERENCES
 
States of living cells are controlled by a series of biochemical reactions, e.g. binding and phosphorylation of protein molecules, gene expressions intermediated by transcription factors. In systems biology, mathematical modeling and simulation, based on biochemical rate equations, have proven to be an efficient approach for unraveling complex cellular mechanisms of the biological pathways (Chen et al., 2004; Doi et al., , 2004, 2006; Matsuno et al., 2000). Usually, a pathway model is comprised of a network of differential or difference equations that define rates of change in evolving concentrations of mRNA molecules and proteins. To proceed to simulations, it is vital to determine a set of model parameters for which experimental data are available. Furthermore, once a set of hypothetical models has been created, any criterion is needed for model revision to test the ability of the constructed models, for instance, from the standpoints of predictability, stableness and robustness of biological pathways.

The aim of our research is to present a new statistical technology towards the data-driven construction of in silico biological pathways, which are referred to as genomic data assimilation (GDA). The method starts with conversion of hypothetical cellular mechanism into a set of rate equations. The model is described based on non-linear difference equations which are represented by hybrid functional Petri net (HFPN) (Doi et al., , 2004; Koh et al., 2006; Matsuno et al., 2000; Nagasaki et al., 2006; Tasaki et al., 2006). Once hypothetical models have been created, we then proceed to the Bayesian learning of model parameters by solving a non-linear constraint optimization problem. Particularly, this article focuses on the analyses of transcription networks. In this context, the model parameters are explored based on time course measurements of gene expression profiles. Since transcriptions of mRNA molecules are necessarily mediated by the mechanism of regulatory proteins, any models should include the regulatory functions of protein-level activities, which are unobserved from gene expression measurements directly. Such a partially observed system usually yields ill-posedness in the problem of parameter estimation. To avoid it, we introduce a Bayesian regularizer which incorporates external biological knowledge and prior values of parameters into the learning algorithm. Moreover, once effective values of parameters are found for a set of models, the method proceeds to the model evaluation and comparison. In this article, a new Bayesian information theoretic measure is presented to test the ability of in silico models in terms of predictability and biological robustness, simultaneously. The lack of them indicates further aspects of the mechanism that require model revisions. We illustrate the proposed approach on the statistical learning of transcription circuits of mammalian circadian clock.

In our previous studies (Nagasaki et al., 2006; Tasaki et al., 2006), we applied a sequential Monte Carlo method known as the particle filter (PF; Kitagawa et al., 1996) to the estimation of kinetic parameters of HFPN. More recently, the use of PF algorithm has been suggested also by Quach et al. (2007) whereas they actually applied the unscented Kalman filter, rather than PF, to estimate the rate constants of ordinary differential equations. However, it has been reported by Nagasaki et al. (2006) that the ability of the PF was limited by the estimation of only few parameters. On the other hand, such limitation has been overcome in this research so that the proposed method succeeded in the estimations of much larger number of unknown parameters.

For the inferences of transcription circuits, the use of conventional graphical models has been popular, e.g. Bayesian networks (Imoto et al., 2006; Perrin et al., 2003), state space models (Beal et al., 2005; Hirose et al., 2008; Rangel et al., 2004; Yamaguchi et al., 2007; Yoshida et al., 2005). At the early stage of this area, most methods focused on the construction of causal graphs which are spanned by mRNA molecules (genes) only, do not include regulatory proteins. Furthermore, it was typical that interactions between different transcripts are described by (linear or non-linear) regression models. However, such an approach would only have a limited utility in the researches of gene regulations. Particularly, the following shortcomings are argued: the first one is induced from the biological evidence that any reactions between mRNA molecules are necessarily mediated by the mechanism of regulatory proteins. The second drawback results from the gap between actual mechanisms of biochemical reactions and statistical regression models, which may cause the lack of interpretability in subsequent secondary analysis. For examples, as long as we adopt regression model-based approaches, it would not be straightforward to convert the estimated model into a better understanding of underlying mechanism of biological pathways.

We tackle this problem in a direct way that incorporates more realistic models, i.e. in silico biological pathways, into the subsequent statistical inferences (parameter estimation and structural learning). In the case of transcription networks, the models explicitly describe regulatory mechanisms of protein-level activities. Moreover, the component models are built by imitation of biochemical rate equations. By placing such a model at the heart of causal learning, the constructed hypothetical model can provide an explicit picture of biological pathways.


    2 PATHWAY MODELING
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 PATHWAY MODELING
 3 PARAMETER ESTIMATION
 4 MODEL SELECTION
 5 NUMERICAL RESULTS
 6 CONCLUDING REMARKS
 ACKNOWLEDGEMENTS
 REFERENCES
 
2.1 Hybrid functional Petri net
HFPN is an extensively used simulation architecture to construct an artificial network of dynamic system evolved at successive discrete times (Doi et al., 2004; Koh et al., 2006; Matsuno et al., 2000; Nagasaki et al., 2006; Tasaki et al., 2006). Basic components of HFPN consist of entity (node), arc (edge) and process (see Fig. 1). In the case of modeling transcription networks, mRNA molecules and regulatory proteins are assigned to the entities which are drawn by circles. Each entity contains a concentration level of mRNA or regulatory protein. Presence of interaction between a pair of nodes is indicated by depicting a set of edges (arc). A biochemical reaction (regulatory function) with kinetic parameters is specified in the process which is drawn as a box and attached to the edges. During the run of simulation, concentrations of biochemical reactants are successively propagated from node to node according to the regulatory functions which are assigned to the processes. For more comprehensive descriptions of HFPN, see Matsuno et al. (2000) and Doi et al. (2004).


Figure 1
View larger version (7K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Basic components of HFPN. Petri nets represent an artificial network of biochemical reactions by using three types of symbols for entity, arc and process.

 
To bring more clarity on the usage of HFPN, let us consider a series of biochemical reactions that are comprised of binding of two cooperative transcription factors p1 and p2 to the promoter of gene r, which leads to transcription activation of the gene (Fig. 2). Denote concentration levels of the mRNA and the two transcription factors at time point n by xr, n, xp1,n and xp2,n, respectively. Here, we consider the transcription regulations modeled by


Formula 1

(1)
{nabla}t stands for the operator which represents the rate of change between a small time interval {delta}t, i.e. {nabla}txn colone(xn+1xn)/{delta}t. kd* and kc* denote the degradation rate and the constant term, respectively, for each reactant. The regulatory functions f1 and f2 describe a set of reactions involved in the synthesis of mRNA r that is cooperatively mediated by the transcription factors p1 and p2. The components of these equations are split and assigned to the processes. HFPN of this toy model is illustrated in Figure 2.


Figure 2
View larger version (12K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Illustrative example for pathway modeling based on HFPN. The transcription of mRNA r is cooperatively mediated by the two regulatory proteins p1 and p2 (left panel). The regulatory mechanism in (1) is converted into the graphical representation of HFPN as shown in right panel. Each regulatory function in (1) is assigned to the process as follows: t1:–kdp1xp1, t2:–kp2xp2, t3:f1(xp1)f2(xp2), t4:–kdr xr, t5:kcp1, t6:kcp2.

 
As a regulatory function for transcription regulation, some functional forms have been suggested so far (Matsuno et al., 2000; Nachman et al., 2004; Quach et al., 2007; Roger et al., 2007). Typically, they have been proposed based on the biological observation that synthesis of mRNA molecules responds to increasing concentrations of the corresponding transcription factors with a hill curve. A class of such regulatory functions includes Michaelis–Menten equation (Nachman et al., 2004; Quach et al., 2007; Roger et al., 2007),


Formula

where kmax is the maximum rate of transcription and km is the Michaelis–Menten constant. In general, this regulatory function is used to describe increasing in activity, however, more recently, it has been extended to express the inhibitory regulations by (Nachman et al., 2004).

Alternatively, one may wish to describe an abrupt increasing of mRNA concentration in response to excess concentrations of regulatory proteins. Such a regulatory mechanism is represented by using an algebraic function varying sigmoidally between 0 to a positive upper bound. For example, Chen et al. (2004) used the Goldbeter–Koshland function to mimic the transcription factor activities during the cell-cycle progression of budding yeast. By controlling the tuning parameters, the sigmoid can become steeper (more close to the indicator function) or slow ascent. Alternatively, as a lower level of detail, we may use the indicator function, rather than sigmoid-type function, as


Formula

where k is the rate of transcription and I(x>s) takes value one if the concentration level xn is larger than the threshold value s, otherwise zero. Furthermore, the inhibitory effect of xn on the target mRNA can be represented by f(x)=kI(xn<s).

Combining the regulatory functions for transcription, we can further model a variety of competing and cooperative regulatory relationships. For instance, in order to describe the set of reactions that a protein p1 inhibits transcription level of r coupled with activator protein p2, one may use the product of indicator functions as f1(xp1)f2(xp2) with f1(xp1)=k1 I(xp1,n<s1) and f2(xp2)=k2 I(xp2,n>s2) where ki>0 for i=1,2.

In general, HFPN never limits any classes of regulatory functions that are allowed to use. However, such a high flexibility would cause some difficulties when we conduct statistical inferences and developments of computer program. For example, in order to develop a versatile computer program which supports modeling and parameter estimation, it is necessary to specify any class of regulatory functions. Besides, the use of a model having over complexity may lead to an ill-posed problem for parameter estimation due to limited amount of data available. For these reasons, we propose here a class of regulatory functions. The class of regulatory functions is comprised of bilinear regulators, switching regulator based on the indicator functions and Michaelis–Menten equations that are summarized in Figure 3 with the graphical representations by HFPN. Such restrictions mainly come from computational aspects of the proposed estimation technique which explores the Bayes estimates of parameters by solving a highly non-linear optimization problem. By the aid of the restrictions, the problem can be easily solved as will be shown in Section 3 and Appendix 1 (see Supplementary Material).


Figure 3
View larger version (15K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. Class of semi-convex regulatory functions. (a) The bilinear regulators are used for describing binding of reactants, linear degradation and translation. (b) The indicator function and its combinations are used for describing the regulatory mechanism that abrupt increasing (or decreasing) in mRNA concentrations responds to excess accumulation of regulatory proteins. (c) The Michaelis–Menten kinetics are derived by assuming quasi-steady state approximation on the mediator complex of E1 and E3, which synthesizes the product E2 with the consumption of E1 as the substrate. The proto-model of the Michaelis–Menten kinetics is originally governed by the bilinear regulatory functions defined over the four reactants.

 
2.2 Transcription circuit of Circadian clock regulation
Mammalian circadian clocks are controlled by a negative feedback loop of the transcriptional networks, consisting of self-sustained oscillations of mRNA syntheses and their translations. The recent genetic analyses, e.g. Ueda et al. (2002), have identified numerous clock-control genes, e.g. the period genes (per1, per2 and per3) and two cryptochrome genes (cry1 and cry2). The period and cryptochrome genes are known to comprise the negative limb, while clock and bmal1 genes constitute positive limb of the feedback loop in the control of circadian clock (Fujii et al., 2004).

Figure 4 shows the constructed HFPN model of circadian clock-control system that is converted from the mechanisms collected by Fujii et al. (2004). The 12 nodes (entities) are spanned by the five mRNAs (per, cry, rev/erb, clock, bmal), the translated proteins (PER, CRY, REV/ERB, CLOCK, BMAL) and the two protein complexes (PER/CRY, CLOCK/BMAL), respectively. The mathematical model of the transcription circuit includes 12 first-order difference equations as


Formula 2

(2)
in order corresponding to per, PER, cry, CRY, PER/CRY, rev/erb, REV/ERB, clock, CLOCK, bmal, BMAL, and CLOCK/BMAL, respectively.


Figure 4
View larger version (14K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4. Transcription circuit of the five mRNAs and the regulatory proteins for controlling mammalian circadian clock. The regulatory mechanisms reported by Fujii et al. (2004) are converted into the graphical representation of HFPN.

 
Once the rate constants (k*), the threshold values (s*) and the initial concentrations (x0,i, i=1, ···, 12) are given, we can conduct simulation experiments to observe successive change in the concentrations of the 12 reactants. Hereafter, we consider the problem of finding effective values of these unknown parameters by using time course gene expression data. Note that gene expression data enable us to measure transcript levels of the five mRNAs only whereas any observations for the remaining seven reactants are not available in conventional gene expression analyses. Such a partially observed system may increase difficulty in the problem of parameter estimation. This aspect will be further discussed in later.


    3 PARAMETER ESTIMATION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 PATHWAY MODELING
 3 PARAMETER ESTIMATION
 4 MODEL SELECTION
 5 NUMERICAL RESULTS
 6 CONCLUDING REMARKS
 ACKNOWLEDGEMENTS
 REFERENCES
 
3.1 State space representation
By the conduction of time course experiments of gene expression, one can obtain a series of gene expression vectors yn=(yi,n)1≤i≤pisinRFormula over a set of observed time points nisinO. Suppose that we have a pathway model xn=f(xn–1, {theta}) which defines rates of change in the concentration levels of p entities (reactants), xnisinFormula, with the vector-valued positive function fisinRFormula. The vector of unknown model parameters {theta} consists of rate constants and threshold values. The entire entities, H={1, ···, p}, are spanned by a set of mRNA molecules M and its complementary Mc (proteins). To express a partially observed system, we treat the i-th element of yn, denoted by yi,n, as a missing value if iisinMc. Of interest is then to estimate the m unknown parameters {theta}isin{Theta}subRFormula by using the time course measurements y {equiv} {yn}nisinO.

To derive the learning procedure, we introduce the state space model as


Formula 3

(3)


Formula 4

(4)
where wi,n and vn denote the measurement and system noises independently and identically distributed, respectively. The process of generating yn and xn is assumed to follow (3) and (4) with the initial concentration x0 which is distributed according to a certain distribution x0 ~ p(x0). Note that the xns evolve at the overall time points N by following the stochastic simulation model (4) whereas the observation model (3) is defined over the subset OsubN. In usual, the observed time points in O are of order O(10). Thereby, they are sparsely distributed over the entire time points N which lie in 103 to 104, typically. Besides, the measurement equation (3) is only defined over the observed variables in M which is a subset of entire nodes H.

3.2 Learning algorithm
The task that we address here is to learn the model parameter {theta}, the initial concentration x0 and series of unobserved concentrations x {equiv} {xn}nisinN, based on time course measurements y. In the case of transcription circuit of circadian clock (2), the model parameter consists of the kinetic rate constants and the threshold values. We tackle this problem within the framework of smoothness prior approach (Kitagawa et al., 1996). The problem amounts to finding the minimizer of the optimization problem as follows:


Formula 5

(5)
The objective function J which is minimized over {theta} and x consists of the four terms; squared loss function between observations y and state variables x, regularization term to impose a smoothness on the estimated model f, and prior distributions for the initial concentration x0 and the kinetic parameters {theta}, respectively. The smoothing parameters {zeta}={{varepsilon}, {lambda}i, {eta}i, {gamma}i} determine the trade-off between these four terms. Note that the sign conditions are imposed on {theta} and x because any rate constants, threshold values and concentration levels of reactants should lie in positive regions.

In most case, the regulatory functions f are non-linear. Then, solving (5) becomes computationally hard mainly due to the inequality constraints. Besides, when we are allowed to use totally unconstrained mode of expression for modeling, it would be hard to develop a computer program possessing the general versatility. To avoid such intractability, we employ the class of semi-convex regulatory functions, which is summarized in Figure 3, so that the optimization problem becomes easy to solve. The family includes the bilinear regulatory function used for the modeling of translation, degradation and binding of different reactants, and the indicator function which describes abrupt activation or inhibition of transcription levels of mRNAs in response to excess concentration of regulatory proteins. The Michaelis–Menten kinetic function is also supported by this class. We can take full advantage of the semi-convex regulatory functions to find the solution of (5).

Here, we consider the optimization procedure as follows:

0. Set initial values for x, x0 and {theta}. Assign a set of values to the prior parameters, µ, {omega} and {zeta}.
Repeat the following procedure;
loop {
1. For each node i isin H and time point n isin N {cup} {0}, perform the constraint optimization of J with respect to only xi,n with the other estimands fixed at the current values.
2. For each i=1, ..., m, perform the partial minimization of J with respect to {theta}i with the other estimands fixed at the current values.
    } until the loop attains at convergence.
The method alternates the partial optimization of J with respect to a single quantity under full conditioning of the other quantities. If a class of regulatory functions is restricted to be the semi-convex family that we showed, the objective function J in Step 1 and 2 becomes convex or approximately convex under the full conditioning. Then, the constrained optimization problem in each step can be solved (at least on the approximation) by applying techniques of convex optimization that have been widely studied in the field of statistical science and machine learning. For comprehensive review of convex optimization, see Boyd et al. (2004) and Scholkopf et al. (2001). Algorithmic details are omitted here, but in Appendix 1 (Supplementary Material), we detail procedures, derivations and some implementational issues of the proposed technique. The FORTRAN 77 source codes which support the model building and the parameter estimation are accessible at our website (see the URL given in Supplementary Information).


    4 MODEL SELECTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 PATHWAY MODELING
 3 PARAMETER ESTIMATION
 4 MODEL SELECTION
 5 NUMERICAL RESULTS
 6 CONCLUDING REMARKS
 ACKNOWLEDGEMENTS
 REFERENCES
 
4.1 Bayesian view of smoothness prior
Once a set of unknown parameters has been found for which the solutions fit data, we proceed to the stage of model evaluation. Suppose that, aside from the proto-model in Figure 4, we have the alternative models as shown in Figure 5 where some additional regulations are included in the proto-model. Of interest is then to compare the ability of the hypothetical models.


Figure 5
View larger version (20K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 5. Three hypothetical pathways for transcription networks of circadian clock (Model 0, 1 and 2): (Model 0) proto-model shown in Figure 4, (Model 1) additional transcription regulation is included from PER/CRY complex to clock mRNA in Model 0 (indicated by green), (Model 2) unknown mRNA (hypo) and its translated protein (HYPO) are included in Model 1 as a set of mediators for transcription pathway from CLOCK/BMAL complex to cry. The mathematical equations for these models are shown in Supplementary Table 1.

 
In the field of statistical science, one of the most commonly used techniques for model evaluation is a class of cross-validations which are basically available for the purpose that we consider. However, we should avoid the use of cross-validations because the limited amount of data often leads to unstableness of the parameter learning with the validation dataset which is generated by removal of part of data. Therefore, we have decided to seek out the way based on the information criterion. Note that no assumption has been made so far about the probabilistic distributions of xn and yn. To proceed to tests based on information criterion, it is necessary to elucidate the probabilistic distributions underlying the parameter learning method.

Recall that the optimization algorithm of parameter learning explores kinetic parameters and unobserved concentrations of biochemical reactants under the loss function given by


Formula 6

(6)
Note that the loss function shown here is multiplied by an arbitrary positive value c. Obviously, any solutions of (6) are invariant under the choice of multiplicative constant c because the proposed estimator needs to determine only a ratio between the smoothing parameters in {zeta}. The difficulty caused by such scale-invariance nature will be further discussed.

Taking the exponential of (6) after dividing it by –2 yields the truncated Gaussian distribution (positive part) as a joint model of y, {theta} and x as,


Formula

where S1=diag{{gamma}1/c,···, {gamma}m/c}, S2=diag{{eta}1/c, ···, {eta}p/c} and S3=diag{{lambda}1/c, ···, {lambda}p/c}. Here, the density function {varphi}+(z; r, V) indicates the positive-part Gaussian distribution with mean r and covariance matrix V. The truncations of the density functions correspond to the sign conditions in the problem (5).

The solutions of the original problem (5) can be found by maximizing the posterior distribution p{zeta}/c(x, x0, {theta}|y){propto}p{zeta}/c(y, x, x0, {theta}) with respect to the kinetic parameters and the hidden state variables conditional on the observations. Consequently, the problem (5) or (6) is converted to the maximum a posteriori (MAP) estimation with the assumption of truncated Gaussian model. This probabilistic view is essential to derive the Bayesian measures for model evaluation, e.g. Bayesian information criterion (BIC). However, before we proceed, it is needed to consider the lack of uniqueness of noise scale {zeta}/c since most existing methods of model evaluation depend on scale of noises.

4.2 Bayesian measure as a criterion for evaluation of pathway models
We address the problem of evaluating models within the Bayesian framework. In the field of Bayesian statistical learning, it is common place to evaluate the performance of models based on the marginal likelihood,


Formula 7

(7)
where the integrand is defined by the product of p{zeta}(y|x), p{zeta}(x|{theta}, x0), p{zeta}(x0) and p{zeta}({theta}) in order corresponding to the observation model, the system model and the prior distributions for initial concentrations and kinetic parameters, respectively. The BIC can be derived by expanding (7) around a proper estimator, for example, the MAP estimator {theta}MAP, according to the Laplace approximation, as


Formula 8

(8)
The specified dispersion parameters and the estimator are indicated in the argument of (8).

In the integral of (8), the observation model p{zeta}(y|x) measures the discrepancy between sequences of yn and xn, which are generated by the estimated model f(·; {theta}MAP) with the inclusion of quasi-system noises vn. While the pathway models that we consider are comprised of a set of deterministic difference equations, the quasi-system noises are included during the process of simulations according to p{zeta}(x|{theta}MAP, x0). This criterion evaluates the average discrepancy between observations and state variables based on the Gaussian distribution {varphi}+(yi,n; xi,n,{varepsilon}) where the support of xns is integrated with regard to the measures {varphi}+(xn; f(xn–1; {theta}), S3). The idea is to assess the sensitivity of pathway model against the probabilistic perturbations by the inclusion of vn. {If the robustness of in silico pathway is weak, simulated concentrations by the estimated stochastic difference equations are likely to show large variations. Then, for repeated experiments of stochastic simulations, the average discrepancy between observations and state variables becomes large. This thereby yields a small value of BIC. The BIC conversely assigns a higher score to a model if it is robust to the noise contaminations. From this view point, we obtain the additional understanding of underlying principle behind the use of BIC: in silico pathways robust to cellular perturbations are considered as having superiority.

4.3 Sensitivity analysis of in silico pathways
Here, we provide the way of selecting values of the dispersion parameters {zeta} in the model evaluator (8). In order to specify a scale of noise components, we introduce the empirical Bayes (EB) estimator {theta}EB[{zeta}'] (Carlin et al., 1996), which is defined as the maximizer of the joint distribution of y and {theta}, i.e.


Formula

where the hidden state variables xns are integrated out from the joint model p{zeta}'(y, x, x0, {theta}). Hereafter, the dispersion parameters of the EB estimation are denoted by {zeta}' whereas {zeta} indicates those corresponding to the MAP estimation.

For the EB estimator, the BIC can be well defined and computed in the standard way as


Formula 9

(9)
in which the estimator computed under the {zeta}' is substituted.

In what follows, we exploit the EB estimator to find an appropriate scale of the dispersion parameters corresponding to the MAP estimation. The basic idea is that if there exists a set of dispersion parameters {zeta}' so that the corresponding EB estimator is equal to the MAP estimator, i.e. {theta}EB[{zeta}']={theta}MAP[{zeta}], one can define the formula of BIC corresponding to the MAP estimation as


Formula 10

(10)
In this formula, the estimator in (9) is replaced by {theta}MAP[{zeta}] rather than {theta}EB[{zeta}'] whereas the dispersion parameters remain to be {zeta}' but satisfy the equivalency {theta}EB[{zeta}']={theta}MAP[{zeta}].

To find the dispersion parameters {zeta}' yielding the equivalency between EB and MAP, we state the following theorem:

THEOREM 1.
If there exists a set of dispersion parameters {zeta}' satisfying the following three conditions,


Formula

then, the MAP estimator derived under the {zeta} can be achieved by the EB estimator using the {zeta}'.

PROOF
The proof is omitted here, shown in Appendix 2 (Supplementary Material). {blacksquare}

The conditions state that the limits of ratios between the dispersion parameters {zeta}' approach to the ratios between {zeta} as the observational noise dispersion of the EB estimator approaches zero, i.e. {varepsilon}'->0. Note that the dispersion parameters, {lambda}i', {eta}i' and {gamma}i', are assumed to be the functions of {varepsilon}', and to approach to zero as {varepsilon}'->0.

This statement gives an additional insight to understanding statistical machinery of the MAP estimators. It implies that the MAP estimation can be achieved by performing the EB estimation with the dispersion parameters satisfying the C1–C3. This result provides us the idea that the MAP can be approximated by the EB estimation with the sufficiently small {zeta}' if the {zeta}' satisfies the conditions C1–C3 approximately.

We use this statement in order to derive the procedure of Bayesian model evaluation. It is organized by the following steps:

  1. Set a sufficiently small value to the dispersion parameter of observational model, {varepsilon}', in (10).
  2. Compute the other dispersion parameters {lambda}i', {eta}i' and {gamma}i' according to


    Formula

    with the {zeta} which is used for computing the MAP estimator {theta}MAP.

  3. Compute Ba[{zeta}', {theta}MAP] by following (10).

Note that if the conditions C1–C3 are satisfied, their first-order expansions are given as,


Formula

These expressions lead to the approximation formulae in Step 2. In Step 3, the MAP estimator is substituted in the plug-in distributions of (10) while the dispersion parameters are given by the approximated {zeta}'.

It should be noted that all dispersion parameters, computed in Steps 1 and 2, are given by small values because we choose a fairly small value for {varepsilon}' in Step 1 that yields the small dispersions in Step 2. Consequently, during the process of model evaluation, sensitivity of in silico models is assessed based on the Gaussian likelihood with an exceedingly small variance {varepsilon}' in response to the perturbations of systems with small noise components {lambda}i', {eta}i' and {gamma}i'.

The remaining task to be addressed is the computation of the integral in the formula of (10). To this end, we exploit the conventional Monte Carlo integration. Then, Step 3 is replaced as follows:

  1. (3-1) Run the Monte Carlo simulations M times with the constructed model f{theta}MAP under the successive perturbations with the small noises,


    Formula


  2. (3-2) Approximate the BIC by replacing the integral in (10) by the average taken over the M random draws,


    Formula


From the procedure of the Monte Carlo experiments, we can see also that the proposed criterion is founded on the intuition that the noise sensitivity of pathway model is measured based on an exceedingly spiked Gaussian likelihood under the perturbations of system with small noise components.


    5 NUMERICAL RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 PATHWAY MODELING
 3 PARAMETER ESTIMATION
 4 MODEL SELECTION
 5 NUMERICAL RESULTS
 6 CONCLUDING REMARKS
 ACKNOWLEDGEMENTS
 REFERENCES
 
As an illustration, we demonstrate the proposed technology on the pathway analysis with the time course gene expression data of Ueda et al. (2002). Ueda et al. (2002) studied transcription circuits underlying mammalian circadian clocks by the conduction of time course microarray experiments. By using Affymetrix mouse high-density oligonucleotide probe array (GeneChip), time courses of mRNA concentrations were measured during 44 h (about 2 days) at the equally spaced 12 time points T={0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44} (hour) under conditioned light/dark cycles. This transcriptome analysis has shown that in the suprachiasmatic nucleus (SCN), transcript levels of 16 clock and clock-related genes exhibit high-amplitude oscillations in accordance with circadian rhythm. Of the 12 reactants in the constructed models, temporal expressions of the four mRNAs (per2, cry1, rev/erb-{alpha}, bmal1) were observable from the dataset (expression profiles of clock were not shown in Ueda et al. (2002). For each mRNA, we extracted the 12 data points from Figure 1a of Ueda et al. (2002).

After converting the sampling time set T into O={1, 64, 128, 191, 255, 318, 382, 445, 509, 572, 636, 700} where the entire time points are defined by N={1, 2, ..., 699, 700}, we conducted the estimations of model parameters for the three hypothetical models which are respectively shown in Figures 4 and 5. Note that in the original model (Model 0), there are no regulatory mechanisms which yield oscillating concentrations of clock mRNA. However, as was reported by Ueda et al. (2002), observed transcription levels of clock mRNA exhibit oscillating patterns in liver clock tissues of mouse (see http://sirius.cdb.riken.jp/MouseLiver/MouseLiverCCG(020707).html). For Model 1, the additional transcription regulation is included from PER/CRY complex to clock in Model 0 in order to yield quasi-oscillation of clock gene. We also constructed Model 2 where the unknown mRNA (hypo) and its translated protein (HYPO) are included in Model 1 to compensate the transcription pathway from CLOCK/BMAL to cry. The mathematical equations for these models are shown in Supplementary Table 1. This experimental setup is intended for the modelers who only have models based on incomplete knowledge about a biological pathway.

For each model, the prior parameters for the rate constants, the threshold values and the initial concentrations are summarized in Supplementary Table 1. Most of these values were chosen based on those reported by our collaborators (Fujii et al., 2004; Matsuno et al., 2006). We used the same prior values for the kinetic parameters and the initial states, which are shared by the three models whereas we placed the additional prior values on the parameters associated with the newly added regulations in Models 1 and 2. Here, in order to test the sensitivity of the proposed estimation technique against the choice of prior distributions, the biased values were imputed in these prior means. For example, the binding rate constant for PER/CRY, kb2,4, and degradation rates of some mRNA molecules were shifted towards larger values. As shown in Figure 6a, the simulation paths generated with the biased prior means were largely different from the periodicity of observations. The smoothing parameters {zeta}=({varepsilon}, {lambda}i, {eta}i, {gamma}i) that we used were chosen as {varepsilon}=1, {lambda}i=0.1, {eta}i=10, {gamma}i=5 for all i. This specification offers relatively diffuse priors on {theta}.


Figure 6
View larger version (34K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 6. Summary of data analysis: (a) Results of parameter learning. For Model 1 and Model 2, the simulated paths generated with the estimated kinetics and initial concentrations are shown, respectively, in the left and right panels (blue line) with the gene expression measurements (dots). The gray lines indicate the simulated paths which were generated with the specified prior means. (b) Probabilistic perturbations of per (top) and rev/erb (bottom) for Model 1. (c) Probabilistic perturbations of per (top) and rev/erb (bottom) for Model 2.

 
We repeated the procedure of parameter learning under 50 different starting values for the rate constants, the thresholds and the initial concentrations, which were drawn from the prior distributions (positive-part Gaussian). The starting values for the state variables, xn, n=1, ..., 700, were generated by the run of simulations with the initial parameters. For each model, the estimated parameters were computed by the lowest local minima which were selected during the 50 times revisions of initial parameters. For Model 1 and Model 2, results of parameter learning are summarized in Figure 6a with Supplementary Table 1. Here, detailed discussion about the estimation results for Model 0 is omitted, but it is briefly summarized in Supplementary Table 1. In Figure 6a, each panel shows the simulated time courses of 12 reactants with the 12 data points in the time course measurements. Both estimated models seem to capture the periodicity of circadian rhythm. However, it is observed that the length of cycle identified by Model 1 is slightly shorter than that of Model 2. Indeed, for Model 1, the time intervals between the first two peaks were approximately 21 h, e.g. 20 h 59 min for per mRNA, which is much shorter than 24 h (note that the data were calibrated during the controlled light/dark cycle over 2 days). In spite of a wide choice of initial parameters, Model 1 could not identify the circadian rhythm. To reduce the mismatch of Model 1, we employed the alternative pathway (Model 2) which assumes existence of the additional two hypothetical reactants and the corresponding regulatory mechanisms. By the model revision, the inconsistency in the cycle of Model 1 has been corrected greatly (see the right panel of Fig. 6a) where the time intervals between the first two peaks were ~24 h, e.g. 24 h 15 min for per mRNA.

We demonstrate here the method of PF with the parameter estimation of Model 1 for a benchmark comparison with the proposed optimization method. The PF explores the joint posterior distributions of kinetic parameters and initial concentrations by repeating the resampling of initially-generated Monte Carlo samples from prior distributions. The algorithm is briefly summarized in Supplementary Figure 1 (see also Kitagawa et al., 1996). According to Higuchi (1997), the algorithmic nature of PF is essentially the same with the genetic algorithm. Hence, the ability of the PF shown here can be used as a rule of thumb for the performance of genetic algorithm, e.g. the method proposed by Koh et al. (2006). Supplementary Figure 1 shows the simulation paths which were drawn with the estimated parameters by the PF where the numbers of Monte Carlo samples were fixed at 104 and 5 x 104, respectively. Obviously, the estimated parameters fail to capture the periodicity of circadian rhythm. As the number of unknown parameters becomes large, the PF and genetic algorithm require us to generate an exceedingly large number of Monte Carlo samples (initial values) in order to cover the overall parameter space. Usually, the number of required samples becomes large quickly with exponential order of the number of unknown parameters. For example, genetics algorithm explores values of parameters by alternating cross-over and mutation of initially generated seeds for the estimands. Thereby, it is a key for success whether or not initial set of candidate parameters lies in the region close to the optimal values. Since the ability of such approaches is limited obviously by the case where model contains much larger number of unknown parameters, we adopted the strategy based on gradient descent learning.

For the proposed model evaluators which test the noise sensitivity of the hypothetical pathways (Models 1 and 2), we set {varepsilon}'=10–5. Figure 6b and c display the generated simulation paths of per and rev/erb under 100 times run of stochastic simulations (M=100) with Model 1 and Model 2, respectively. For Model 1, most simulated paths largely deviate from the solution of MAP estimation (blue line). In particular, some of them failed to exhibit any periodicity. Such weakness of Model 1 against the probabilistic perturbations has been overcome through the inclusion of the compensation pathway on CLOCK/BMAL to cry as the perturbed simulations, as shown in Figure 6b, were very close with each other. Indeed, the computed BIC scores, which were –119.275 and –101.881 for Model 1 and Model 2, respectively, indicate the dominance of Model 2.

We also verified cyclic periodicity of the identified transcription circuits based on the persistent run of simulation experiments. For each model, the simulated paths were generated during the successive 2800 time points which consist of approximately eight cycles of circadian rhythm. It is observed from Figure 7 that the peak positions shown by Model 1 gradually precede those of Model 2 as time advances. The observed time intervals between the eight peaks (upper-side) were 20.99, 20.55, 20.55, 20.55, 20.49, 20.55, 20.55, 20.49 h for Model 1, and 24.26, 22.75, 22.51, 22.57 22.57 22.57, 22.57 h for Model 2, respectively. Note that for Model 2, the lengths of cycle decrease by about 23 h after the first 44 h (about 2 days under light/dark controlled-experiments). This periodicity is consistent to the inherent circadian rhythm of mouse (Jin et al., 1999; Vitaterna et al., 1994). In conjunction with the Bayesian sensitivity analysis that we showed, this periodicity test also ensures the advantage of Model 2.


Figure 7
View larger version (76K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 7. Results of periodicity test. In each plot, the simulated paths of Models 1 and 2 are depicted by green and red lines, respectively. The time points corresponding to diurnal cycles are indicated by vertical lines (blue).

 

    6 CONCLUDING REMARKS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 PATHWAY MODELING
 3 PARAMETER ESTIMATION
 4 MODEL SELECTION
 5 NUMERICAL RESULTS
 6 CONCLUDING REMARKS
 ACKNOWLEDGEMENTS
 REFERENCES
 
We have developed the method of genomic data assimilation with intent to establish a link between conventional statistical learning of biological pathways and in silico pathway simulation. The proposed method is organized by the four steps: (1) knowledge-based modeling with HFPN, (2) estimation of model parameters by solving the optimization problem of smoothness prior, (3) Bayesian evaluation of hypothetical models, (4) remodeling where needed. We have addressed these tasks within the framework of non-linear state space models.

The concept of data assimilation was originally emerged in the field of earth sciences, e.g. meteorology and oceanography. The objective is to integrate information from observed data and a numerical model for geophysical simulation. Another goal of this article was to introduce the concept of data assimilation in the field of in silico systems biology.

In the context of HFPN, several techniques have been proposed for parameter learning. For example, Koh et al. (2006) proposed the genetic algorithm which estimates the rate constants of in silico signal transduction pathways under the principle of least square. However, the proposed loss function ignores the regularization terms which incorporate prior values of rate constants. Inclusion of regularization terms is essential for the parameter estimations of pathway models because limited amounts of training data often cause unstableness or ill-posed problem for the parameter estimation. Besides, it is argued that the genetic algorithm requires a great deal of computational efforts. According to the report of Koh et al. (2006), the computational time which is required to run the genetic algorithm is about 3 h on an average where the number of reactants in the model is about 10. Compared with the method of Koh et al. (2006), our approach could successfully reduce the computation time; as in the numerical experiments shown in Section 5, the algorithm converged within few minutes CPU time on average (Intel Core(TM)2 Quad processor, 2.66 Ghz, FORTRAN77).

In this article, we demonstrated the ability of the proposed method with the transcription networks of circadian clock that contain about 50 or more unknown parameters. The proposed optimization algorithm succeeded in the identification of periodicity of the observed circadian rhythm. However, some shortcomings should be argued here. One limitation is related to the observed fact that the method sometimes failed to capture the periodicity of circadian rhythm when the choice of initial parameters is fairly inappropriate. This aspect is illustrated in Supplementary Figure 2. In the numerical experiments, we conducted the optimization procedure with the biased initial parameters which generate the simulation paths largely different from the observed periodicity. In this case, the method failed to identify the observed cycle of circadian clock as indicated by the left panel of Supplementary Figure 2. However, when the initial parameters were replaced by alternative ones, the performance of the proposed method has been greatly improved as shown in the right panel of Supplementary Figure 2. The both sets of initial parameters were generated from the same prior distributions used in Section 5. In order to overcome the sensitivity to the choice of initial parameters, it is needed to adopt the computer intensive approach which explores potential local modes under widely chosen initial parameters. Indeed, by repeating the optimization 50 times with different starting values, we succeeded in finding of the parameters which appropriately capture the periodicity of circadian rhythm as shown in Section 5.

Another shortcoming of the current technology is the lack of strategy for model reconstruction. Inconsistency between model and observations indicates further aspects of the mechanism that require model revision. This task is especially important for pathway learning. In this area, it has been relatively overlooked so far. For the creation of hypothetical models, we can exploit a variety of external knowledge on interactome, e.g. protein–protein interactions reported by yeast two hybrid experiments, DNA–protein interaction database such as TRANSFAC. Systematic way of integrating external information on interactome to the data assimilation technology should be discussed as a future issue.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 PATHWAY MODELING
 3 PARAMETER ESTIMATION
 4 MODEL SELECTION
 5 NUMERICAL RESULTS
 6 CONCLUDING REMARKS
 ACKNOWLEDGEMENTS
 REFERENCES
 
We really appreciate all of the comments and the suggestions from the three anonymous referees that improved the quality of our aricle considerably.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Limsoon Wong

Received on March 29, 2008; revised on September 8, 2008; accepted on September 9, 2008

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 PATHWAY MODELING
 3 PARAMETER ESTIMATION
 4 MODEL SELECTION
 5 NUMERICAL RESULTS
 6 CONCLUDING REMARKS
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Beal M, et al. A Bayesian approach to reconstructing genetic regulatory networks with hidden factors. Bioinformatics (2005) 21:349–356.[Abstract/Free Full Text]

    Boyd SP, et al. Convex Optimization (2004) Cambridge, UK: Cambridge University Press.

    Carlin BP, et al. Bayes and Empirical Bayes Methods for Data Analysis (1996) New York: Chapman & Hall/CRC.

    Chen KC, et al. Integrative analysis of cell cycle control in budding yeast. Mol. Biol. Cell (2004) 15:3841–3862.[Abstract/Free Full Text]

    Doi A, et al. Constructing biological pathway models with hybrid functional Petri nets. In Silico Biol. (2004) 4:271–291.[Medline]

    Doi A, et al. Simulation-based validation of the p53 transcriptional activity with hybrid functional Petri net. In Silico Biol. (2006) 6:1–13.[Medline]

    Fujii, Y. et al. (2004) A new regulatory interactions suggested by simulations for circadian genetic control mechanism in mammals, Genome Inform. available at http://www.jsbi.org/journal/GIW04/GIW04P003.pdf. (Last accessed date, September 26, 2008).

    Higuchi T. Monte Carlo filter using the genetic algorithm operators. J. Stat. Comput. Simulation (1997) 59:1–23.

    Hirose O, et al. Statistical inference of transcriptional module-based gene networks from time course gene expression profiles by using state space models. Bioinformatics (2008) 23:932–942.

    Imoto S, et al. Computational strategy for discovering druggable gene networks from genome-wide RNA expression profiles. Pac. Symp. Biocomput. (2006) 11:559–571.

    Jin X, et al. A molecular mechanism regulating rhythmic output from the suprachiasmatic circadian clock. Cell (1999) 96:57–68.[CrossRef][Web of Science][Medline]

    Kitagawa G, et al. Smoothness Priors Analysis of Time Series (1996) New York: Springer-Verlag.

    Koh G, et al. A decompositional approach to parameter estimation in pathway modeling: a case study of the Akt and MAPK pathways and their crosstalk. Bioinformatics (2006) 22:e271–e280.[Abstract/Free Full Text]

    Matsuno H, et al. Hybrid Petri net representation of gene regulatory network. Pac. Symp. Biocomput. (2000) 5:341–352.

    Matsuno H, et al. A new regulatory interaction suggested by simulations for circadian genetic control mechanism in mammals. J. Bioinform. Computat. Biol. (2006) 4:139–153.[CrossRef]

    Nachman I, et al. Inferring quantitative models of regulatory networks from expression data. Bioinformatics (2004) 20:i248–i256.[Abstract]

    Nagasaki M, et al. Genomic data assimilation for estimating hybrid functional Petri net from time-course gene expression data. Genome Inform (2006) 17:46–61.

    Perrin BE, et al. Inference of gene regulatory network using dynamic Bayesian networks. Bioinformatics (2003) 19:ii386–ii349.

    Quach M, et al. Estimating parameters and hidden variables in non-linear state-space models based on ODEs for biological networks inference. Bioinformatics (2007) 23:3209–3216.[Abstract/Free Full Text]

    Rangel C, et al. Modeling T-cell activation using gene expression profiling and state-space models. Bioinformatics (2004) 20:1361–1372.[Abstract/Free Full Text]

    Rogers S, et al. Bayesian model-based inference of transcription factor activity. BMC Bioinformatics (2007) 8(Suppl_2):S2.

    Scholkopf B, et al. Learning with kernels (2001) Cambridge, MA: MIT Press.

    Tasaki S, et al. Modeling and estimation of dynamic EGFR pathway by data assimilation approach using time series proteomic data. Genome Inform. (2006) 17:226–238.[Medline]

    Ueda, et al. A transcription factor response element for gene expression during circadian night. Nature (2002) 418:534–539.[CrossRef][Web of Science][Medline]

    Vitaterna MH, et al. Mutagenesis and mapping of a mouse gene, clock, essential for circadian behavior. Science (1994) 29:719–725.[CrossRef]

    Yamaguchi R, et al. Finding module-based gene networks in time-course gene expression data with state space models. IEEE Signal Proces. Mag. (2007) 24:37–46.[CrossRef]

    Yoshida R, et al. Estimating time-dependent gene networks from time series DNA microarray data by dynamic linear model with Markov switching. Proc. IEEE Comput. Syst. Bioinform. Conf. (2005) 289–298.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/22/2592    most recent
btn483v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Yoshida, R.
Right arrow Articles by Higuchi, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Yoshida, R.
Right arrow Articles by Higuchi, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?