Bioinformatics Advance Access originally published online on August 29, 2006
Bioinformatics 2006 22(21):2681-2687; doi:10.1093/bioinformatics/btl445
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Markov Chain Monte Carlo Algorithm based metabolic flux distribution analysis on Corynebacterium glutamicum
1 Signal Processing and Complex Systems Research Group, Department of Automatic Control and Systems Engineering, University of Sheffield Sheffield, S1 3JD, UK
2 Biological and Environmental Systems Group, Department of Chemical and Process Engineering, University of Sheffield Sheffield, S1 3JD, UK
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Metabolic flux analysis via a 13C tracer experiment has been achieved using a Monte Carlo method with the assumption of system noise as Gaussian noise. However, an unbiased flux analysis requires the estimation of fluxes and metabolites jointly without the restriction on the assumption of Gaussian noise. The flux distributions under such a framework can be freely obtained with various system noise and uncertainty models.
Results: In this paper, a stochastic generative model of the metabolic system is developed. Following this, the Markov Chain Monte Carlo (MCMC) approach is applied to flux distribution analysis. The disturbances and uncertainties in the system are simplified as truncated Gaussian multiplicative models. The performance in a real metabolic system is illustrated by the application to the central metabolism of Corynebacterium glutamicum. The flux distributions are illustrated and analyzed in order to understand the underlying flux activities in the system.
Availability: Algorithms are available upon request.
Contact: visakan{at}sheffield.ac.uk
| 1. INTRODUCTION |
|---|
|
|
|---|
To manipulate the product yield in a metabolic network, it is necessary to study the behaviour of the fluxes, i.e. reaction rates of the metabolic system. The quantification of all intracellular metabolic fluxes in a given model of the cellular metabolism, i.e. metabolic flux analysis (MFA), has long been recognized as an important approach in metabolic network analysis (Bailey, 1991; Stephanopoulos et al., 1998).
The common approach for MFA is based on system stoichiometric information, where the stoichiometric matrix represents the flux flow balances among various reaction pools (Varma and Palsson, 1994; Bonarius et al. 1997). The measured extracellular substrate uptake rates from the medium, secretion rates of the products from the cells and the assumptions on enzyme activities are posed as constraints to the system. Once the number of constraints equals the number of system unknowns, the intracellular fluxes can be uniquely determined. However, potential pitfalls of the stoichiometry-based method become obvious when there is an occurrence of parallel reactions or cyclic pathways, such as the TCA cycle in the metabolic system under consideration, preventing reliable estimation of the metabolic fluxes.
An alternative approach to overcome this drawback is to use 13C tracer experiments (Anderson, 1983), which feed a substrate with a known labelling state into the system during the stationary state. After the labelling isotopic balances are reached, the labelling states of the intracellular metabolites can be measured using NMR and/or gas chromatography mass spectrometry (GC-MS) (Max et al., 1996; Wiechert and de Graaf, 1997; Wittman and Heinzle, 2002), where the accuracy of GC-MS experiment has been improved dramatically in recent years (Wittman and Heinzle, 2002; Klapa et al., 2003). In the case where the substrates are labelled at specific atom positions, such as [1-13C]glucose, the fractional enrichment labelling state of a specific atom position (Sonntag et al., 1993) can be obtained. When the system is fed with mixtures of both uniformly labelled and non-labelled substrates, it is possible to follow the cleavage of covalent bonds in the carbon backbone of the biomolecules (Szyperski, 1995) and to derive the isotopomer labelling state of the system. The carbon labelling balances are often posed as an overdetermined system, providing additional constraints to the intracellular bidirectional system. Numerous optimization approaches have been studied to deal with the isotopomer labelling model (Schmidt et al., 1997, 1999) and fractional labelling model (Marx et al., 1996; Wiechert and de Graaf, 1997; Wiechert et al., 1997; Yang et al., 2005a) in order to derive intracellular flux quantities.
However, stochastic mechanisms are ubiquitous in biological systems. Both intrinsic noise rooted in the biochemical process of gene expression and extrinsic noise from the fluctuations in other cellular components contribute substantially to the overall variation (Elowitz et al., 2002). Though the stoichiometric structure of a metabolic network remains intact during the process, protein abundance originating from the intrinsic and the extrinsic noise (Raser and O'Shea, 2005; Elowitz et al., 2002; Blake et al., 2003) will result in the fluctuation in 13C atoms perturbed in the 13C tracer experiment. Though a stochastic method like genetic algorithm (Schmidt et al., 1997) is capable of obtaining flux estimations with Gaussian noise assumption, the elitism nature prevents these methods from providing any biologically-meaningful flux distribution information under extrinsic and intrinsic noise. Moreover, the actual distributions of the system noise can vary for different systems. Hence, an adequate and unbiased flux quantification requires a stochastic approach, which takes the noise in both system measurements and uncertainties into account explicitly. It is expected that both the flux estimates and the flux distribution information can be obtained by such an approach. In this paper, we apply the MCMC algorithm for flux distribution analysis. The MCMC algorithm jointly estimates the distributions of the fluxes and the metabolites, in addition to the single estimated values. We extend the simplified MCMC approach of Yang et al. (2005b) and develop a stochastic generative model with explicit representation of noise and uncertainties that can handle multiplicative noise. Flux samples are generated by a deliberately-constructed Markov chain and the flux distributions are illustrated by histogram analysis of these samples.
The paper is organized as follows. In section 2, the metabolic system model through 13C tracer experiments is abstracted using concise matrix equations. In section 3, the detailed rejection sampling algorithm for metabolic flux distribution analysis is presented. In order to examine the applicability of the proposed approach, the central metabolic system of the organism, Corynebacterium glutamicum, under lysine-producing conditions, is utilized in section 4 and the biological significance of the results is analyzed. The conclusions are given in section 5.
| 2. SYSTEMS AND METHODS |
|---|
|
|
|---|
Consider a simple metabolic system with m metabolites and n fluxes (including both the forward fluxes and the backward fluxes), which is viewed as a network composed of a set of metabolites M and fluxes vo, where {
} and {
} normally with m < n. When the system reaches a (quasi-)steady-state, it is often assumed that the fluxes entering a metabolic pool balance the fluxes flowing out of it. Hence, for a particular metabolic pool
,
![]() |
1 and
2 are the two sets inclusive of all fluxes entering the pool K and all fluxes out of it, respectively. ki represents the number of the flux vi going into or flowing out of the pool K. Since the extracellular fluxes are often known to a metabolic system, and there is also common knowledge about the unidirectionality of some intracellular pathways according to thermo-dynamic considerations, the stoichiometric constraints can then be formulated as,
![]() | (1) |
has all the fluxes which are already measured or known to the system with
. Here
(normally with m < n < n) represents the reactional indices for each metabolite and its relevant fluxes. Assuming that the rank of the matrix S is r, the number of free fluxes in Equation (1) is then n1r. If we use vf and vr to represent the free fluxes and the restricted fluxes in v, Sf and Sr represent the corresponding sub-matrices in S, Equation (1) can be revised as,
![]() | (2) |
![]() | (3) |
Here the set Xo, X and
are utilized to denote the whole fractional enrichment dataset, the intracellular fractional enrichment data and the extracellular fractional enrichment data of metabolites set M, respectively, with
. After the labelled substrates are fed into the system and the system reaches its carbon balance equilibrium (it is often assumed that this equilibrium is the same as the steady-state reached by the substrate without any labelling), a balance equation can be derived from the carbon mass balance for the I-th carbon atom:
![]() |
1 contains every carbon atom s connected to the carbon atom i by an associated flux vs and
2 includes all the fluxes flowing out of the carbon atom i. Ts>i is the carbon atom mapping vector (Zupke and Stephanopoulos, 1994) from carbon atom s to carbon atom i.
The above equations can be formulated in a concise format with either positional enrichment data x or fluxes v appearing linearly,
![]() | (4) |
![]() | (5) |
![]() |
![]() | (6) |
![]() | (7) |
The solution has to satisfy the non-negative constraints of the fluxes.
are the least squares estimates of the fluxes v.
However, in practice, the measurements available are always contaminated with noise, and moreover, only partial measurements of the fractional enrichment data are available. Due to the non-negative properties of the measurements, if we assume the noise model on the measurements y is a multiplicative truncated Gaussian, then the relationship between y and x will be:
|
| (8) |
and
is the multivariate truncated zero mean Gaussian noise with the covariance
.
denotes the Schur product and 1 is a vector with all ones. The matrix H is the correlation matrix between the measurements and the system fractional enrichments, denoting the available measurements from the experiments. It was also found that the matrix
in Equation (5) is normally a non-singular square matrix, except in the circumstance where the carbon atom network becomes disconnected by virtue of vanishing fluxes (Anderson, 1983), which can be easily avoided in a real metabolic system. Therefore, a deterministic solution to fractional enrichment data can be obtained once the fluxes are available:
![]() | (9) |
![]() |
![]() | (10) |
| 3. ALGORITHM |
|---|
|
|
|---|
MCMC was formally introduced into scientific computation in 1953 (Metropolois et al., 1953). A MCMC algorithm generates an ergodic homogeneous Markov chain z(n) with a stationary distribution
(z), which is only known up to a multiplicative constant (Gilks et al., 1996). Starting from any initial states, a MCMC using a delicately constructed transition kernel is expected to produce the samples asymptomatically converging in distribution to
(z), provided that the transition kernel allows the states to move freely. The most commonly used MCMC algorithms are the MetropolisHastings algorithm (Hastings, 1970; Peskun, 1973) and the Gibbs sampler (Gelman and Gelman, 1984; Gelfand and Smith, 1990), where the Gibbs sampler is just a special case of the MetropolisHastings approach. In the MetropolisHastings algorithm, the i-th sample zi is generated by the following strategy: |
|
where q(·) is the proposed transition kernel, also known as proposal distribution and U(0,1) is a random sample drawn from the uniform distribution between 0 and 1. The initial sample z0 is often randomly picked from all possible states. The requirement for a good proposal distribution is that the distribution can be easily sampled from and there exists a M, so that,
(Gilks et al., 1996). In the Gibbs sampling algorithm, the proposal distribution q(·) is the full conditional distribution of z,
. Here zm is the m-th component in the vector z and zm is a vector composed of all parameters in z except zm. Since the acceptance rate of the Gibbs sampling algorithm is 1, it is clear that the Gibbs sampling algorithm is just a special case of the single-component MetropolisHastings sampling with 100% acceptance rate.
Considering the unavoidable noise in gene expression levels, Equations (4) and (5) then need to be reformulated as,
![]() | (11) |
![]() | (12) |
·
is the truncated zero mean Gaussian noise with covariance
. Due to the non-singularity property of the matrix
, the conditional dependence of x given v can be derived from Equation (12),
![]() | (13) |
is the inverse matrix of matrix
(vo). From Equation (8), the conditional dependence of the system measurement y on x is then,
![]() | (14) |
and X is a diagonal matrix whose element
i,j in the i-th row and j-th column satisfies
Hence, by incorporating Equations (13) and (14), the full conditional distribution of the fluxes v and the latent variables x are given by,
![]() | (15) |
![]() | (16) |
Though the full conditional distributions of the two latent variables x and v are both viewed as the truncated Gaussian distributions, direct sampling from them is unattainable due to the function format
in Equation (15) and the non-invertible matrices H and
in Equation (16). Therefore, we apply the rejection sampling algorithm to obtain samples for the two full conditional distributions. Assuming that the probability distribution of an unknown quantity z is p(z) and there is an envelope function q(z) such that
, the basic Metropolis sampling algorithm to get the i-th sample zi from p(z) is given by,
DoSample zi from the density proportional to q(z);
if
accept zi; end
until zi is accepted
It is often required that the envelop is a heavy-tailed function so as to fully cover the target distribution, hence, the uniform functions
and
between allowed ranges for the fluxes
and the positional enrichment data
are adopted in the proposed MCMC algorithm with the formats
and
. The MCMC algorithm convergence diagnostic has been discussed thoroughly in Brooks and Roberts (1998). Here we monitor the convergence using the simulation output by the estimated potential scale reduction (Gilks et al., 1996), which is essentially a function of the between-sequence variances and the within-sequence variances. The complete procedure for sampling from the full conditional distributions of both x and v are outlined in Table 1.
|
| 4. APPLICATION |
|---|
|
|
|---|
In this section, the proposed MCMC approach is applied to flux distribution analysis in the central metabolism of C.glutamicum, which was investigated in Marx et al. (1996) and Wiechert et al. (1997), where partial fractional enrichment data were obtained by NMR spectroscopy. The central metabolic network is shown in Figure 1, where the reactions Gly1, Gly3, PPP2, PPP3, PPP4, CAC4 and AC are assumed to be bidirectional as in Marx et al. (1996) and Wiechert et al. (1997). All other reactions are assumed to be unidirectional. The complete list of chemical reactions, carbon atom transfer fates and the measured extracellular fluxes are available in Wiechert et al. (1997). In total there are 16 metabolites, which include 64 carbon atoms, and 17 fluxes where we assume that all anaplerotic carboxylation reactions are represented by one single bidirectional reaction step from PEP/PYR to MAL/OAA (Wiechert et al., 1997).
|
The measurements of the partial positional enrichment data (listed in Table 3) and the available extracellular fluxes used in Wiechert et al. (1997) are utilized as known quantities. The covariance
used in Equation (8) is a diagonal matrix
and the backward flux
and the exchange flux
by |
|
|
|
|
From the comparisons in Table 2, it is clear that the MCMC results match smoothly with the previous results obtained by parameter estimation methods. The error produced by MCMC is in the same range as the errors from Marx et al. (1996) and Wiechert et al. (1997). If the ratio of the mean from the MCMC algorithm to the result from previous method is viewed as a measure of the difference of the net flux estimation result, it is obvious that the net fluxes
and
experience the most substantial change when using the MCMC method for flux analysis in comparison with other optimization approaches. Glyoxylate cycle remains negligible when glucose is the sole carbon source in C.glutamicum. However, it becomes active when another carbon source like acetate is available (Wendisch et al., 2000). The big difference in both
and
before and after system uncertainties are included suggests the potential influence of external noise (including external carbon source) to glyoxylate cycle. The lysine outputs, which are expressed by the net fluxes
and
, are also changed considerably, implying the potential influence of noise at gene expression level to lysine yield. Here, the histograms of all the sixteen net fluxes are illustrated in Figures 3 and 4. It is shown that all the three fluxes,
,
and
, in the glycolysis reactions share the same distribution. In the oxidative pentose phosphate pathway, the four net fluxes,
,
,
and
, also have the same distribution though their absolute values in Table 2 are substantially different from each other. An interesting finding here is that the transformation rate from fumarate to oxaloacetate
and then to isocitrate
actually share the same distribution format as that in the glycolysis pathway, which are totally different from two other fluxes,
and
in the citric acid cycle, implying the big impact of glycolysis reactions to the citric acid cycle. There seems to be a balanced relationship between the glyoxylate cycle and the anaplerotic section, where the increase of one side leads to a decrease in the other, in order to keep the smooth operation of the TCA cycle. The two lysine-producing rates do not show any particular peaks in the histograms. Hence, the noise at the gene expression level might not be a threat to the lysine-producing yield, although it can influence the internal reaction rates to some extent.
|
|
Correspondingly, the mean results of the sampled positional enrichment data are compared to their measurements in Table 3, which shows that the means of the samples obtained by the MCMC algorithm all lie in the area around their measurements. However, further systematic studies are required to determine what the possible tendency of the underlying positional enrichment data are.
| 5. CONCLUSIONS |
|---|
|
|
|---|
Flux analysis using parameter estimation and 13C tracer experiments is able to provide information about the quantification in the internal metabolic system. However, in order to demystify the intracellular metabolic network, the crucial issue is to discover the distributions of various types of internal entities. This is especially true when the noise at gene expression level are taken into account. Although a Monte Carlo estimate calculated from multiple datasets are capable of producing a flux distribution map, such method cannot provide unbiased results unless the estimator itself is consistent. A comprehensive flux analysis requires an approach that can incorporate the system noise explicitly and obtain the flux estimates, flux distributions and accompanied metabolite distribution concurrently. In this paper, a stochastic generative model, the MCMC approach, which presents the truncated Gaussian multiplicative noise explicitly in both system and measurement models, is applied to flux distribution analysis. The dependency between the fluxes, their corresponding metabolites and metabolite measurements are represented by different conditional distributions. The flux and metabolite samples obtained from MCMC approach provide explicit distribution information of these entities through histogram analysis, which are important for metabolic flux and pathway analysis. Though only a truncated Gaussian multiplicative noise model has been considered in the current work, the MCMC approach can freely be extended to flux distribution analysis under various distributions of external and internal noise and uncertainties.
| Acknowledgments |
|---|
The authors would like to thank the EPSRC and the University of Sheffield for the financial support during this research work. Prof. Wright acknowledges the EPSRC for an Advanced Research Fellowship.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Jonathan Wren
Received on May 25, 2006; revised on August 11, 2006; accepted on August 15, 2006
| REFERENCES |
|---|
|
|
|---|
Anderson, D.H. (1983) Compartmental modeling and tracer kinetics. Lecture Notes in Biomath, , NY Springer-Verlag Vol. 50, .
Bailey, J.E. (1991) Toward a science of metabolic engineering. Science, 252, 16681675
Blake, W.J., et al. (2003) Noise in eukaryotic gene expression. Nature, 422, 633637[CrossRef][Medline].
Bonarius, H., et al. (1997) Flux analysis of underdetermined metabolic networks: the quest of the missing constraints. Trends Biotech, . 15, 308314[CrossRef][Web of Science].
Brooks, S.P. and Roberts, G.O. (1998) Assessing convergence of Markov Chain Monte Carlo algorithms. Stat. Comput, . 8, 319335[CrossRef].
Elowitz, M.B., et al. (2002) Stochastic gene expression in a single cell. Science, 297, 11831186
Forbes, N.S., et al. (2001) Using isotopomer path tracing to quantify metabolic fluxes in pathway models contraining reversible reactions. Biotechnol. Bioeng, . 74, 196211[CrossRef][Web of Science][Medline].
Gelfand, A.E. and Smith, A.F.M. (1990) Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc, . 85, 398409[CrossRef].
Gelman, S. and Gelman, D. (1984) Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Tran. Pattn. Anal. Mach. Intel, . 6, 721741.
Gilks, W.R., Richardon, S., Spiegelhalter, D.J. Markov Chain Monte Carlo in Practice, (1996) , Boca Raton, Florida Chapman & Hall/CRC.
Hastings, W.K. (1970) Monte Carlo sampling methods using Markov Chains and their applications. Biometrika, 57, 97109
Klapa, M.I., et al. (2003) Ion-trap mass spectrometry used in combination with gas chromatography for high-resolution metabolic flux determination. Biotechniques, 34, 832840[Web of Science][Medline].
Marx, A., et al. (1996) Determination of the fluxes in the central metabolism of Corynebacterium glutamicum by nuclear magnetic resonance spectroscopy combined with metabolite balancing. Biotechnol. Bioeng, . 49, 111129[CrossRef].
Metropolis, N., et al. (1953) Equations of state calculations by fast computing machines. J. Chem. Phys, . 21, 10871091[CrossRef].
Peskun, P.H. (1973) Optimum Monte-Carlo sampling using Markov Chains. Biometrika, 60, 607612
Raser, J.M. and O'Shea, E.K. (2005) Noise in gene expression: origins, consequences, and control. Science, 309, 20102013
Schmidt, K., et al. (1997) Modeling isotopomer distributions in metabolic networks using isotopomer mapping matrices. Biotechnol. Bioeng, . 55, 831840[CrossRef][Medline].
Schmidt, K., et al. (1999) Quantitative analysis of metabolic fluxes in Escherichia coli using two-dimensional NMR spectroscopy and complete isotopomer models. J. Biotechnol, . 71, 175189[CrossRef][Web of Science][Medline].
Sonntag, K., et al. (1993) Flux partitioning in the split pathway of lysine synthesis in Corynebacterium glutamicum: quantification by 13C and 1H NMR spectroscopy. Eur. J. Biochem, . 213, 13251331[Web of Science][Medline].
Stephanopoulos, G.N., Aristidou, A.A., Nielsen, J. Metabolic Engineering Principles and Methodologies, (1998) , San Diego, USA Academic Press.
Szyperski, T. (1995) Biosynthetically directed 13C-fractional labeling of proteinogenic amino acids. Eur. J. Biochem, . 232, 433448[Web of Science][Medline].
Varma, A. and Palsson, B.O. (1994) Metabolic flux balancing: basic concepts, scientific and practical use. Biotechnology, 12, 994998[CrossRef].
Wendisch, V.F., et al. (2000) Quantitative determination of metabolic fluxes during coutilization of two carbon sources: comparative analyses with Corynebacterium glutamicum during growth on acetate and/or glucose. J. Bacteriol, . 182, 30883096
Wiechert, W. and de Graaf, A.A. (1997) Bidirectional reactional steps in metabolic networks: I. modeling and simulation of carbon isotope labeling experiments. Biotechnol. Bioeng, . 55, 101117[CrossRef].
Wiechert, W., et al. (1997) Bidirectional reactional steps in metabolic networks: II. flux estimation and statistical analysis. Biotechnol. Bioeng, . 55, 118135[CrossRef][Medline].
Wittmann, C. and Heinzle, E. (2002) Genealogy profiling through strain improvement by using metabolic network analysis: metabolic flux genealogy of several generations of lysine-producing corynebacteria. Appl. Env. Microbiol, . 68, 58435859
Yang, J., Wongsa, S., Kadirkamanathan, V., Billings, S. A., Wright, P. C. (2005a) Differential evolution and its application in metabolic flux estimation. Lecture notes in comput. sci, Vol. 3449, 115124.
Yang, J., et al. (2005b) Metabolic flux distribution analysis by 13C-tracer experiments using the Markov chain-Monte Carlo method. Biochem. Soc. Trans, . 33, 14211422[CrossRef][Web of Science][Medline].
Zupke, C. and Stephanopoulos, G. (1994) Modeling of isotope distributions and intracellular fluxes in metabolic networks using atom mapping matrices. Biotechnol. Prog, . 10, 489498[CrossRef].
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




















accept zi; end
represents the k-th sample of variable a in the l-th chain
-ketoglutarate; FUM: Fumarate; OAA: Oxaloacetate; Lys: Lysine; AcCoA: Acetyl-CoA; Glyox: Glyoxylate]

