Bioinformatics Advance Access originally published online on February 27, 2006
Bioinformatics 2006 22(10):1198-1206; doi:10.1093/bioinformatics/btl069
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Planning optimal measurements of isotopomer distributions for estimation of metabolic fluxes
1 Department of Computer Science P.O. Box 68 (Gustaf Hällströmin katu 2b) 00014 University of Helsinki Finland
2 NMR Laboratory, VTT Technical Research Centre of Finland P.O. Box 65, 00014 Helsinki, Finland
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Flux estimation using isotopomer information of metabolites is currently the most reliable method to obtain quantitative estimates of the activity of metabolic pathways. However, the development of isotopomer measurement techniques for intermediate metabolites is a demanding task. Careful planning of isotopomer measurements is thus needed to maximize the available flux information while minimizing the experimental effort.
Results: In this paper we study the question of finding the smallest subset of metabolites to measure that ensure the same level of isotopomer information as the measurement of every metabolite in the metabolic network. We study the computational complexity of this optimization problem in the case of the so-called positional enrichment data, give methods for obtaining exact and fast approximate solutions, and evaluate empirically the efficacy of the proposed methods by analyzing a metabolic network that models the central carbon metabolism of Saccharomyces cerevisiae.
Contact: ajrantan{at}cs.helsinki.fi
| 1 INTRODUCTION |
|---|
|
|
|---|
The goal of metabolic flux analysis is to discover the steady state conversion velocities of metabolites to each other through chemical reactions catalyzed by the enzymes of an organism. Information about reaction rates, or fluxes, represents the final outcome of cellular regulation and thus constitutes an important aspect of the phenotype of a cell in different conditions (Nielsen, 2003). Flux information can be harnessed in many different applications ranging from pathway optimization in metabolic engineering (Stephanopoulos et al., 1998) and from characterization of the physiology of an organism (Kelleher, 2001) to more efficient drug design for human diseases such as cancer (Boros et al., 2004).
The most accurate information about the fluxes can be obtained by conducting isotopomer tracer experiments where the cell is fed with a mixture of natural and 13C-labeled external substrates. The fate of the 13C atoms can be observed by measuring the resulting NMR (Szyperski et al., 1999) or mass spectrum (Christensen and Nielsen, 1999; Wittmann and Heinzle, 1999) of metabolic products and intermediates. From the measurements one obtains information about the fluxes of the alternative pathways producing a metabolite. This methodology has been successfully applied in numerous cases to explicitly solve key fluxes of specific metabolic networks in various experimental conditions of interest (Marx et al., 1996; Stephanopoulos et al., 1998; Szyperski, 1995).
A popular general framework for estimating the flux distribution of an arbitrary metabolic network is based on non-linear optimization procedure where candidate values for the fluxes are generated successively and the isotopomer distributions of metabolites corresponding to candidate fluxes are computed until the the isotopomer distributions fit the measured data well enough (Ghosh et al., 2001; Schmidt et al., 1997; Selivanov et al., 2004; Wiechert et al., 2001). In each step of such an iteration a non-linear equation system stating isotopomer distributions as the function of the fluxes has to be solved.
Recently Rousu et al. (2003) proposed a direct flux estimation method that first propagates the measured information on isotopomer distributions over the metabolic network and then augments the basic linear stoichiometric equation system with generalized isotopomer balance equations. Rantanen et al. (2005) improved the propagation of isotopomer information by introducing a partition of metabolite fragments to sets having equal isotopomer distributions in all steady states. The direct flux estimation method can be seen as a member of another flux estimation framework originating from METAFoR (metabolic flux ratio) analysis (Fisher et al., 2004; Sola et al., 2004; Szyperski, 1995; Szyperski et al., 1999).
Both flux estimation frameworks have their strengths and weaknesses. The iterative methods require often fewer measured metabolites than METAFoR methods (see Section 2) to produce an estimate of a flux distribution. On the other hand, owing to the non-linearity of the optimization task, it is hard to guarantee that the iterative method converges to the global, unique optimum. If measured data do not fully determine the flux distribution, iterative methods may only output points from a solution space, not the set of all feasible solutions. By performing an a priori identifiability analysis (Isermann and Wiechert, 2003; van Winden et al., 2001) and analysing the sensitivity of a point solution (Möllney et al., 1999) it is possible to examine the uniqueness of the solution. In the method of Rousu et al. (2003) the equation system bounding the flux distribution is linear. Thus the equation system is easily solvable in a fully determined case. Also, if the equation system is underdetermined the linear subspace containing all solutions can be explicitly stated and well-known techniques can be applied to solve as many fluxes as possible (Klamt and Schuster, 2002) as well as to obtain upper and lower bounds to the rest of the fluxes. The general sensitivity of a solution to measurement errors can be easily estimated by inspecting the condition number a linear equation system. Furthermore, as solving and manipulating a linear equation system can be done efficiently, computationally intensive but conceptually simple Monte Carlo techniques can be applied to estimate the sensitivities of individual fluxes.
Linear flux estimation techniques thus have attractive properties compared with non-linear, iterative techniques. However, the need for more measurement data should not be overlooked. Developing measurement techniques for intermediate metabolites and conducting the measurements using the technique are non-trivial, laborious and costly processes. This experimental burden could be eased by concentrating the measurements to non-redundant subsets of metabolites that alone give as much information about the fluxes as measuring every metabolite in the network would give. In this paper we formalize this problem and derive algorithms for finding such optimal subsets of metabolites to measure for a flux estimation method by Rousu et al. (2003). The methods are facilitated by the recent flow analysis method of Rantanen et al. (2005), that enables us to discover redundancies within sets of measurements of metabolites. Practically interesting variations of the problem analyzed are discussed in Section 4.5.
The structure of the paper is the following. Section 2 gives a short introduction to a linear method for flux estimation using isotopomer information. Section 3 introduces the concept of fragment equivalence. In Section 4 we motivate the problem of selecting an optimal set of metabolites to measure, define the measurement optimization problem at hand, study its computational complexity and give exact and fast approximate algorithms for solving it. Section 5 presents the results of experiments conducted with a metabolic model of central carbon metabolism of S.cerevisiae. A summary of the related work is given in Section 6, together with discussion on possible future research directions.
| 2 A LINEAR METHOD FOR METABOLIC FLUX ESTIMATION USING 13C ISOTOPIC TRACERS |
|---|
|
|
|---|
A metabolic network G = (
,
) is composed of a set
= {M1, ... , Mm} of metabolites and a set
= {
1, ... ,
n} of reactions that perform their interconversions.
For the purposes of 13C isotopic tracing, only carbon atoms are of interest. Thus, we represent a k-carbon metabolite as a set of carbon locations M = {c1, ... , ck}. Fragments of metabolites are simply subsets F = {f1, ... , fh}
M of the metabolite. A fragment F of M is denoted as M|F. By a slight abuse of notation, we denote by F and M also the corresponding physical pools of molecules that have the required molecular structure, respectively.
Isotopomersdifferent isotopic versions of the molecule M = {c1, ... , ck} are represented by binary sequences b = (b1, ... , bk)
{0, 1}k where bi = 0 denotes a 12C and bi = 1 denotes a 13C in location ci. Molecules that belong to the bisotopomer of M are denoted by M(b). Isotopomers of metabolite fragments M|F are defined in an analogous manner: a molecule belongs to the F(b)-isotopomer of M, denoted M|F (b1, ... , bh), if it has a 13C atom in all locations fj that have bj = 1, and 12C in other locations of F.
The isotopomer distribution D(M) of metabolite M gives the relative abundances 0
PM(b)
1 of each isotopomer M(b) in the pool of M such that
![]() |
Reactions are pairs
j = (
j,
j) where
is a vector of stoichiometric coefficientsdenoting how many molecules of each kind are consumed and produced in a single reaction eventand
j is a carbon mapping describing the transition of carbon atoms in
j. Bidirectional reactions are modeled as a pair of reactions. (For simplicity of presentation, we assume in this paper that the reactions have simple stoichiometries
ij
{1, 0, 1} and that the carbon mappings
j are bijections. Both of these restrictions, however, can be lifted without great difficulty.) Metabolites Mi with
ij < 0 are called reactants and with
ij > 0 are called products of
j. A pathway from metabolite fragments {F1, ... , Fp} to fragment F' is a sequence of reactions that define a bijective (composite) mapping from all carbons of {F1, ... , Fp} to all carbons of F'.
For notational convenience it will be useful to distinguish between the subpools of a metabolite produced by different reactions and pathways. Therefore, we denote by Mij, j > 0, the subpool of Mi produced (
ij > 0) or consumed (
ij < 0) by reaction
j. As metabolite molecules produced by different reactions are mixed in a metabolite pool it is not possible to directly measure isotopomer distributions of inflow subpools {Mij|
ij > 0}. However, the isotopomer distributions of inflow subpools can be (in some cases) derived from the isotopomer distributions of the reactants of reactions
j (see Section 3). All flux estimation methods that are based on tracing isotopomers, in some form or another, rely on this fact. Note further that the subpools related to the outflows of the junction are the same, i.e. Mi = Mij for all j :
j < 0.
By Mi0 we denote the subpool of Mi that is related to the external inflow (ßi < 0) or external outflow (ßi > 0) of Mi. We call the sources of external inflows external substrates. We denote by Ii = {Mij|
ij < 0} the inflow and by Oi = {Mij|
ij > 0} the outflow subpools of Mi. Subpools of fragments are defined analogously.
In flux estimation, the quantities of interest are the velocities vj
0 of the reactions
j, giving the number of reaction events of
j per time unit. If the velocities vj of reactions
j
and the sizes of metabolite pools stay constant over time, we say that the metabolic network is in a steady state. In such a state the metabolite balance
![]() | (1) |
![]() | (2) |
The isotopomer distributions of the outflow subpools Mij are always identical to the distribution of the whole metabolite pool Mi as we assume that reactions uniformly sample their reactant pools. If, however, the pathways leading to a junction metabolitea metabolite with more than one producermanipulate the carbons of the metabolite differently, then the isotopomer distributions of the inflows (
ij > 0) often have differences. Furthermore, (2) together with (1) gives linearly independent equations that constrain the fluxes, ideally so that a single (correct) flux distribution can be pinpointed. The stoichiometric linear system (1) alone can be underdetermined, hence the additional equations (2) are used.
Applying (2) suffers from two difficulties (Rousu et al., 2003):
- The (mass spectrometric and NMR) measurements do not in general come in the form of fully determined isotopomer distributions, but as a set of linear isotopomer constraints
for the metabolites Mi, where the coefficients
(3)
depend on the measurement technique and the metabolite. For example, the constraints for a metabolite given by a mass spectrometric measurement depend on the fragmentation pattern of a metabolite in the tandem mass spectrometer and how many of the produced fragments have sufficiently high frequency to exceed the detection limit. NMR technology gives more direct access to relative frequencies of some isotopomers. In general, however, some isotopomers cannot be uncovered and the sensitivity is lower than that of a mass spectrometer. Because of these practical hindrances, instead of (2), we will have to resort to a weaker form of balances
(for all metabolites Mi) that at worst case only contain the metabolite balance Equations (1) and in the best case, meet (2).
(4)
- With current technologies, not all metabolites can be measured, so not all isotopomer frequencies are available. This calls for methods that can be used to derive isotopomer frequencies or their combinations from measurements made for other metabolites in the network. In Rousu et al. (2003), a general methodology was proposed, where measurements of the form of (3) can be propagated in between two junction metabolites to obtain the constraints of the form of (4) for the junction metabolite with as many linearly independent equations as possible. The method relies on the fact that in individual reactionsand in general pathways with no junction metabolitesone can compute isotopomer constraints to products from isotopomer constraints (3) of reactants, and vice versa, by using vector space operations and the carbon maps.
| 3 FRAGMENT EQUIVALENCE |
|---|
|
|
|---|
In Rantanen et al. (2005), a flow analysis method was developed that extends the scope of propagation to make it possible to propagate information through the junction metabolites. The idea is to find fragments M|F and Mi|F' that are equivalent (denoted by M|F
Mi|F') in the sense that in all isotopomeric steady states their isotopomer distributions are the same, no matter what kind of labelings are used for the external substrates. Intuitively, the source fragment M|F and the target fragment Mi|F' are equivalent if all pathways producing F' from the fragments of external substrates go through F, in all pathways from F to F' carbons of F stay intact and all pathways have the same carbon maps between F and F'. We assume that if a metabolic reaction transfers a fragment from a reactant to a product, then the carbons of the fragment stay intact in the reaction. Now, because F is always a precursor of F', carbons of F always travel intact to F' and the fragments F are similarly oriented when reaching F' via every pathway (carbon maps are the same), the isotopomer distribution of F' is equal to the one of F in every steady state and with every labeling of external substrates. The basic example of such equivalence is that of reactant and product fragments of a single reaction, which easily follows from the assumed intactness of fragments in a single reaction (Fig. 1):
|
LEMMA 1
Letj = (
j,
j) be a reaction with a reactant M and a product Mi. If fragment M|F satisfies
j (F)
Mij, then M|F
Mij|
j (F).
If Mi is not a junction metabolite, that is, it has only one inflow, then Mij = Mi in Lemma 1, and we have M|F
Mi|
j (F). By the transitivity of the equivalence relation
, the result can be applied to an unbranched pathway i.e. to a pathway that does not contain two reactions producing the same metabolite. Here, intactness of the fragment is ensured by requiring that the image of the fragment belongs to a single metabolite in each intermediate step of the pathway (Fig. 2):
|
LEMMA 2
Let, be the metabolites and
the reactions of a pathway. Let M be a reactant of
and let
be the sole producer of
and denote by
the composite carbon mapping of the pathway
. If for some fragment F of M,
for each 1
h
r, then
.
For a source fragment M|F and a target fragment Mi|F' having several pathways in between them, it is further required that F must always be a precursor of F' and that the composite carbon mappings of the pathways are the same (Fig. 3).
|
LEMMA 3
Letbe the set of unbranched pathways connecting M and Mi with associated composite carbon mappings
and let Mik denote the subpool of Mi produced by Rk. For fragment F = (f1, ... , fh) of M, if M|F
Mik|
k(F) for 1
k
p,
for each 1
t
h and every pathway producing F' from some fragments of external substrates contains F then M|F
Mi|F'.
The equivalence relation defined with above lemmas is symmetric and transitive. As every fragment is equivalent to itself, the equivalence partitions a metabolic network to equivalence classes of fragments with equivalent isotopomer distributions. The equivalence classes of fragments can be efficiently (in polynomial time) found by constructing the fragment flow graph of a metabolic network and applying dominator tree analysis (Lengauer and Tarjan, 1979) to it (Rantanen et al., 2005). Briefly, in a dominator tree constructed from the fragment flow graph a fragment F' is a descendant of F if and only if all pathways from the fragments of external substrates to F' contain F and carbons of F always travel intact and similarly oriented to F'.
For flux estimation, these equivalences serve in several roles: first, the fragment isotopomer distributions of the subpools of a junction can differ only if the fragment is not equivalent to any fragment in its reactants. It is only those fragment isotopomers that can potentially induce linearly independent constraints to the fluxes. This gives us possibilities to remove redundant equations from the flux system to be solved. Second, we can use interchangeably any isotopomer measurement of two equivalent fragments, thus enabling us to form balance equations in junctions where adjacent metabolites are poorly or not at all measured. As by Lemma 3 the equivalence classes may extend beyond junctions, the technique improves the propagation methods described in Rousu et al. (2003).
In this paper, these equivalences are utilized in yet another way. Namely, they turn out to be powerful tools for selecting a small subset of metabolites to be measured so that the fluxes of a metabolic network can still be discovered from the measurements.
| 4 MEASUREMENT OPTIMIZATION IN THE POSITIONAL ENRICHMENT CASE |
|---|
|
|
|---|
Using the linear system consisting of Equations (1) and (4) for solving the fluxes
j is not straight forward. There are several problems. First of all, the system should be of full rank to give a point solution instead of just some linear constraints to the fluxes. Full rank means that there should be sufficiently many linearly independent equations. In the extreme case it is possible that full rank is not achievable at all (e.g. if a one-carbon metabolite has a large number of producers). The independence of equations depends on the actual values of the isotopomer abundances which in turn depend intricately on the isotopomer abundances of the external substrates and on the actual fluxes in the steady state under consideration. Hence the quality of the equation system obtained depends on the measurements performed as well as on the 13C labeling patterns used for external substrates.
On the other hand, the NMR and MS methods producing isotopomer data are not trivial either. In (tandem) mass spectrometer it is necessary to separate metabolites in the cell extract and develop for each metabolite or metabolite group a specific experimental protocol. In NMR, spectral overlap, large differences in concentration and available spectral edition techniques make fractional positional labeling of some metabolites more accessible than others. Therefore there is a need to design an optimized measurement strategy that minimizes the experimental effort. Here the equivalence concept of Section 3 can be utilized: measuring more than one representative of an equivalence class does not add new information; the distribution observed for one member of the class can also be used in association with the others to write equations (4).
In the rest of the paper we will consider the measurement optimization problem as a special case (positional enrichment) satisfying rather strong but experimentally justified conditions. Even in this case the optimization problem turns out to be computationally hard but still tractable in practice. Solutions to the problem given below can be used to guide an iterative experiment planning process towards a small set of metabolites to measure. Our computational techniques can also be generalized for less constrained situations, for example to fractional enrichments of larger fragments (Blank and Sauer, 2004; Gombert et al., 2005) or to the mass isotopomers of whole metabolites (van Winden et al., 2005). Hence the present exercise is of wider interest.
4.1 Optimization problems
In the so-called positional enrichment measurements of 13C, one observes the 13C isotopomer distribution of an individual carbon ch of a metabolite M which simply is the relative abundance of 13C in ch. Positional enrichment data can be obtained by NMR (Follstad and Stephanopoulos, 1998; Marx et al., 1996; Schmidt et al., 1999; Shen et al., 1999; Sola et al., 2004; Wiechert et al., 2001) or tandem mass spectrometry. Also GC-MS with different derivatization techniques can provide such data for many metabolites (Szyperski, 1998). Thus the availability of positional enrichment data for the carbons of some metabolites in our network is a reasonableand computationally convenientfirst approximation in the early stages of an experimental process when more detailed information about the measurable constraints to the isotopomer distributions is not yet available.
Formally, we assume that one 13C positional enrichment measurement gives for a metabolite M = {c1, ... , ck} the 13C labeling frequencies
and
for each carbon ch of M. Note that
![]() |
is the full isotopomer distribution of M marginalized to bh = 1.
With the positional enrichment data available for all metabolites M, we can infer by Lemma 1 the positional enrichment frequencies
for all subpools Mij of all junction metabolites Mi. Then the generalized isotopomer balances (4) get the form
![]() | (5) |
Note here that the corresponding equation for bh = 0 is linearly dependent on Equations (1) and (5) as
.
Based on the positional enrichment data, we can thus write |Mi| new equations, in addition to the mass balance (1), for each junction metabolite Mi. This is the strongest system of equations we can hope to obtain from positional enrichment measurements. (Note, however, that there is no guarantee that this system would allow solving all the fluxes: the system may be under-determined in some junctions and over-determined in some others.)
Now it should be clear that because of the equivalence of carbons of different metabolites, measuring all metabolites may sometimes be redundant. Some subset would already allow us to write the full set of |Mi| equations (5) for each junction Mi. This leads to the following optimization problem:
Problem 1 [Positional enrichment measurement (PEM) minimization problem] Given a metabolic network G = (
,
), find the smallest set of metabolites to measure for positional enrichment data such that, noting the equivalences of the carbons, it is possible to write a full set of |Mi| equations (5) for each junction metabolite Mi of the network G.
Figure 4 gives an example of an optimal solution to the PEM minimization problem in a small metabolic network consisting of 13 metabolites and 10 reactions. There exists two junction metabolites consisting of three and two carbons in the example (enclosed by boxes). In order to write the full set of five balance Equations (5) for these carbons, the positional enrichment data is required for the junction carbons and for all the carbons of the reactants of reactions that produce junction metabolites. In the example this data can be derived, with the help of equivalences, from five metabolites (enclosed by ovals). So instead of measuring all 13 metabolites it is sufficient to measure only five of them.
|
4.2 Solution by set covering techniques
By its combinatorial nature, the PEM minimization is a variant of the well-known minimum set cover problem:
Problem 2 [Minimum set cover (Ausiello et al., 1999)] Given a collection
of subsets of a finite set S, find the smallest subcollection
of
such that 
.
To see the connection between PEM and set cover, observe that an Equation (5) can be written as soon as we know
for all subpools Mij of Mi. This is the case when we have measured for each Mij some carbon ct of some metabolite Mr such that Mr|ct
Mij|ch. We now say that measuring Mr covers a subpool carbon Mij|ch of a junction Mi if Mij|ch
Mr|ct for some carbon ct of Mr. The metabolites Mr define in this way the collection of subsets of the set cover instance. The basic set of the instance consists of all one-carbon subpools of the junction metabolites. Then we get the following technical formulation of PEM minimization:
LEMMA 4
A set'
![]()
of metabolites is a solution of the PEM minimization if
' is a smallest set such that it covers all subpool carbons Mij|ch of all junction metabolites Mi. Furthermore, each set
![]()
![]()
covers certain subpool carbons.
The NP-hardness and the inapproximability of our problem can be shown by a polynomial-time reduction from the set cover:
THEOREM 1
PEM minimization is NP-hard.
PROOF
Let () be an instance of set cover.
We will describe a polynomial-time construction of a metabolic network G(
) that satisfies the following claim: the PEM minimization of G(
) has solution of size k + 1 if and only if (
) has a cover of size k.
Network G(
) will have only one junction metabolite, denoted MF. Metabolite MF has only one carbon cF which is produced along |S| separate paths induced by the carbon maps. Each path corresponds to an element of S. The paths go through metabolites that correspond to the sets in the collection
. The PEM minimization is then equivalent to finding the smallest number of metabolites that intersect all paths. This is equivalent to finding the smallest subcollection of
that covers entire S. (For an illustration, see Fig. 5.)
|
More technically, let S = {s1, ... , sn} and
= {C1, ... , Cm}. The metabolites of the network G(
) form m + 2 layers. Layers 1, ... , m represent sets C1, ... , Cm while layer m + 1 is an extra technical layer, and layer m + 2 is the final layer containing only the junction metabolite MF. The metabolites on layer i, i < m + 2, consist of carbons cij, 1
j
n, that correspond to the elements of sj of S such that cij corresponds to sj. The carbons are grouped into metabolites such that {cij : sj
Ci} is a metabolite representing Ci, and the remaining carbons cij, sj
Ci, each form a one-carbon metabolite. The carbons cm + 1, j of layer m + 1 form n separate one-carbon metabolites. The reaction
i, 1
i
m, of G(
) transforms the metabolites of layer i into metabolites of layer i + 1, the associated carbon map taking carbon cij to ci + 1, j for 1
j
n. Finally, reaction
, transforms the metabolite of carbon cm + 1, j to metabolite MF (which has carbon cF only).
Obviously, the inflow subpool of MF from reaction
'j and all one-carbon fragments cij, 1
i
m + 1, are equivalent (and correspond to sj). Hence to solve PEM for G(
) we need to find metabolites that cover all these equivalence classes as well as the outflow from MF (Lemma 4). Noting that the smallest such covering set of metabolites can always be selected from metabolites representing
(plus metabolite MF which must always be included), the smallest PEM solutions of G(
) are in one-to-one correspondence, the former being larger by one. Hence, G(
) satisfies the claim.
In the reduction given in the proof of Theorem 1 there is a bijective mapping between the elements of the set S and the equivalence classes. Let |
G| be the number of equivalence classes in the metabolic network G. As the minimum set cover problem is known to be hard to approximate within a factor c'log |S| for some c' > 0 [see Ausiello et al. (1999)], we get the following inapproximability result:
COROLLARY 1
The PEM minimization problem is hard to approximate within a factor c log|G| for some c > 0, unless P = NP.
On the other hand, the PEM minimization problem can be reduced to the minimum set cover problem in polynomial time, as pointed out by Lemma 4. It is known that the well-known Greedy Set Cover algorithm is a factor 1 + ln |S| approximation algorithm for Problem 2, see (Ausiello et al., 1999). Hence, we get the following result:
COROLLARY 2
The greedy set cover algorithm finds for the PEM minimization problem a solution which is within a factor of 1 + ln |G| from the optimum.
The number of equivalence classes in the metabolic network can be bounded above by, for example, the number of carbons.
4.3 Solution by integer linear programming
Most metabolic network models used in flux estimation contain only a few dozens of metabolites and reactions. Thus, methods for obtaining the optimal metabolite sets might be of interest, even if they have exponential worst-case time complexity. One versatile approach is to model the problem as a mixed integer linear program (MILP), i.e. as a minimization of some linear objective function in a polytope, with the additional constraint that certain variables in the optimal solutions are integral [see (Martin, 2001) for further details].
The objective function to minimize is the sum of costs of the metabolites that provide the maximum positional enrichment information (Problem 1). Let m1, ... , m|
| be indicator variables whether or not the metabolite Mi
is measured. Let wi be the cost of measuring the metabolite Mi. Thus, the objective function to be minimized is
![]() |
{0, 1} for each metabolite M
. The other constraints are as follows.
The requirement of Problem 1 that we have to write a balance equation for every carbon c
Mi can sometimes lead to sets of measured metabolites that are too laborious to measure in practice. In that case we can try to use MILP to find less expensive solutions by requiring for each junction Mi only ki
|Mi| balance equations. The cost of this relaxation is that we might lose some independent balances constraining the fluxes in (5).
Let xi,c be the indicator variable that the balance equation can be written for a carbon c
Mi. Then the constraints can be stated as
![]() |
The balance equation can be written for a carbon c
Mi if for all corresponding carbons cij in the inflow subpools Mij and for carbon c there is a measured metabolite M' with equivalent carbon c'. Let Ri,c be the set of reactions with the carbon c as a product. Let Ei,c,j be the set of indices p of metabolites that have a carbon equivalent to the inflow subpool carbon cij produced by the reaction
j, and let Ei,c be the set of indices p of metabolites that have a carbon equivalent to c. Now
![]() |
![]() |
Combining these constraints we obtain the following mixed integer linear program (we assume ki = 0 for each non-junction metabolite Mi):
|
|
The number of variables in the program is
![]() |
![]() |
If the number of integer variables is too high, the integer linear program can be relaxed to a linear program, i.e. the requirement of the variables being binary can be relaxed to the requirement that the variables have their values in the unit interval [0, 1]. In fact, the constraints xi,c
{0, 1} can be relaxed to xi,c
[0, 1] without affecting the solution since the cost of the solution is minimized when all variables xi,c are either 0 or 1; the values of xi,c can be replaced by
xi,c
without violating the constraints or increasing the cost of the solution. If the constraints mi
{0, 1} are relaxed, then the obtained solutions are not necessarily integral, but the solution can be transformed to (possibly suboptimal) integral solution by randomized rounding techniques. For example, the following procedure can be applied:
- Find the optimal solution for the linear program. If the values of all variables mi are integral, output the solution and halt.
- Choose one variable mi with non-integral value randomly with probability proportional to its value and replace the occurrences of the variables mi in the linear program by the constant 1 and go to step 1.
4.4 Full isotopomer data
It is possible to modify the above techniques of PEM minimization for other types of measurement data. At the other end of the spectrum is the full isotopomer data, i.e. the distributions PM(b) for full metabolites M. Similar set cover and MILP algorithms can be used in this case as well.
Note that, in the PEM case, the equivalence classes are the largest possible (because they correspond to the smallest possible, one-carbon, fragments) and hence their number is the smallest possible which leads to a small number of measurements. For the full metabolite isotopomer data the sets are smaller and hence a larger number of measurements may be necessary. However, full metabolite measurements give more data per measurement and therefore allow writing more equations (4).
4.5 Practical extensions
The relevant cases in which positional enrichment data are available only for some but not all carbons of the metabolites or more than one measurement per equivalence class is required can be handled with straightforward modifications to the techniques given above. In the first case, the unmeasurable carbons of metabolite M are disregarded when the cover of M is computed. For example, this way a situation where tandem MS produces positional enrichments only for some carbons of a metabolite can be modelled. In the latter case, we require more than one carbon to be measured from an equivalence class before it is considered as covered. By setting high costs to metabolites overlapping in the NMR spectrum one can test whether the separation of these metabolites is necessary or it is possible to derive the same isotopomer information from some other metabolites that are easier to measure. Our techniques can also be used to find out how many equations (4) for each junction can be written, given a set of metabolites whose positional enrichments are measurable.
Above we studied the problem of selecting small sets of metabolite to measure that would make us possible to write as many generalized balance equations (4) as the measurement of every metabolite would allow. However, owing to the dependencies in the basic equation system (1), some equations (4) might be redundant. By applying standard techniques of linear algebra to the equation system (1) it is possible to find subsets of fluxes whose values alone would suffice to fix the whole flux distribution (Isermann and Wiechert, 2003.) Such sets are called free fluxes. Now, if the number of metabolites to measure proposed by the above methods is still too high, we can enumerate different minimal sets of free fluxes and for each such set require that the Equations (4) are written only to junction carbons that are produced by some free flux. In the best case, i.e. with good labeling of external reactants and small measurement errors, equations bounding only free fluxes are enough to solve all the fluxes in the network.
| 5 COMPUTATIONAL EXPERIMENTS |
|---|
|
|
|---|
We tested our method for finding fragment equivalence sets with the model of central carbon metabolism of S.cerevisiae containing glycolysis, pentose phosphate pathway and citric acid cycle. Carbon mappings were provided by the ARM project (Automatic reconstruction of metabolism, http://www.metabolism.jp/index.html). The network consisted of 31 metabolites and 35 reactions of which 15 were bidirectional. Bidirectional reactions were modeled as two unidirectional reactions. The rank of the linear equation system (1) for the 50 fluxes of the model network was 30. Cofactor metabolites not accepting or donating carbon atoms were excluded from the analysis. The only carbon source of the network was glucose. There were two external products in the model. A total of 20 metabolites were produced by more than one reaction and thus formed junctions. Visualizations of the metabolic network used in the experiments are available at http://www.cs.helsinki.fi/group/sysfys/images/bioinformatics_metabolic_net.pdf.
In this demonstration of the computational methods given above we assumed that either positional enrichment or full isotopomer information was available for every metabolite in the network. We also assumed that the effort needed to measure the isotopomer information is equal for every metabolite. In the real experimental design situations these parameters can be easily set to correspond the task at hand (see Section 4.5). Two alternative forms of balance equation systems were analyzed: (1) a balance equation should be written for every carbon of a junction metabolite (there were 45 such carbons in an example model) or (2) a balance equation should be written only for (the number of producing reactions 1) of equations for each junction (this would ask for writing 21 equations in the example). The equations were not generated for carbons known to be uninformative by fragment equivalence anlaysis Rantanen et al., 2005). The minimum sets of metabolites to measure were found by Greedy Set Cover algorithm and MILP programs with guaranteed optimal solution. [In the case (2) only MILP technique is applicable.] MILP programs were solved using the publicly available MILP solver lp_solve 5.5. package (http://groups.yahoo.com/group/lp_solve/).
The results of the tests are summarized in Table 1. Solutions to the MILP programs were computed instantaneously with a PC with 2.4 GHz Pentium 4 processor, except for the fifth problem of Table 1 that took 25 s to finish. According to our tests it is enough to measure 16 metabolites in the model to be able to construct as many balance equations as the measurement of every metabolite would allow. If (the number of producing reactions 1) equations for each junction were constructed, 14 metabolites have to be measured. The actual number of metabolites to measure is sensitive to changes in the metabolic network. In a simpler model of central carbon metabolism of S.cerevisiae that had more one-directional reactions it was enough to measure only one-fourth of the metabolites.
|
The fragment equivalence analysis (see Section 3) revealed five junction metabolites whose isotopomer distributions are uninformative for flux estimation. This implies that not all the fluxes of the model can be solved utilizing only isotopomer information, regardless of the estimation method, the quality measurement data and input substrate labeling. The sets of metabolites to measure contained pentose phosphates ribose 5P, xylulose 5P and ribulose 5P that are hard to measure with current techniques. Unfortunately their information is needed to write as many generalized balance equations as the measurement of every metabolite would allow.
In our experiments Greedy Set Cover algorithm found equally good solutions as the MILP solver. Optimal sets for the positional enrichment information were found equal to the optimal sets for full isotopomer data indicating certain robustness against assumptions about the measurement data. There existed many optimal solutions consisting mostly of the same key metabolites in all the test cases.
| 6 DISCUSSION AND CONCLUSION |
|---|
|
|
|---|
To the authors best knowledge the problem of selecting an optimal set of metabolites to measure for flux estimation has not been systematically studied in the context of direct flux estimation. However, the selection of metabolites to measure is related to the studies of structural flux identifiability analysis that examines if a set of measurements suffices to determine all the fluxes of metabolic network in the context of iterative flux estimation. In this area van Winden et al. (2005) give analytical methods to study the information content of positional enrichment and the so-called cumomer data (Wiechert et al., 1997) before the actual measurements are carried out. They also present methods for checking the structural identifiability in the case of mass isotopomer or NMR multiplet data. Isermann and Wiechert (2003) give analytical and numerical methods for structural identifiability problem in the case of full isotopomer information. For the case of arbitrary measurement information they give an algorithm that checks whether an iterative flux fitting algorithm will converge to a (possibly suboptimal) point solution. They also study information content of single metabolites.
The selection of the optimal set of metabolites to measure covers only one half of the experimental design of isotopomer tracing experiments. The other half is the selection of the labeling mixture of external substrates. Together with the actual flux distribution the labeling of external substrates induce the isotopomer distributions of every metabolite in the network thus having a strong effect on the linear independence of isotopomer balance equations. The selection of optimal labeling in the context of iterative flux estimation methods is studied by Möllney et al. (1999) and by Araúzo-Bravo and Shimizu (2003).
An interesting future work is to utilize the free flux approach in our context as well as to look for methods that combine the contribution of this article with techniques that propose optimal labelings of substrates in the direct flux estimation context of Rousu et al. (2003). Moreover, by applying branch and bound algorithm and isotopomer constraint manipulation techniques introduced in Rousu et al. (2003) it is possible to extend the metabolite selection to the case where we know what kind of constraints can be measured to the isotopomer distribution of each metabolite. From an experimental point of view it would also be useful to develop an experimental design method that tries to falsify the given model of a metabolic network by proposing a cost-effective set of measurements of metabolite fragments whose isotopomer distributions should be equal if the model were correct.
| Acknowledgments |
|---|
The authors thank the reviewers of this paper for many useful comments on the manuscript, Esa Pitkänen and Paula Jouhten for fruitful discussions, and Marina Kurtén for correcting the language. This work was supported by grant 203668 from the Academy of Finland (SYSBIO program). J.R. was also supported by the EU/Marie Curie Individual Fellowship grant (HPMF-CT-2002-02110) and T.M. by APrIL II (FP6-508861).
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Preliminary version of this article appeared in the proceedings of German Conference on Bioinformatics 2005. Lecture Notes in Informatics Vol. P-71 (2005), pp. 177191.
Received on November 30, 2005; revised on February 21, 2006; accepted on February 21, 2006
| REFERENCES |
|---|
|
|
|---|
Araúzo-Bravo, M.J. and Shimizu, K. (2003) An improved method for statistical analysis of metabolic flux analysis using isotopomer mapping matrices with analytical expressions. J. Biotechnol, . 105, 117133[CrossRef][Medline].
Ausiello, G., Crescenzi, P., Kann, V., Marchetti-Spaccamela, A., Protasi, M. Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties, (1999) Springer.
Blank, L.M. and Sauer, U. (2004) UCA cycle activity in Saccharomyces cerevisiae is a function of the environmentally determined specific growth and glucose uptake rates. Microbiology, 150, 10851093
Boros, L., et al. (2004) Use of metabolic pathway flux information in targeted cancer drug design. Drug Discov. Today: Therap. Strat, . 1, 435443[CrossRef].
Christensen, B. and Nielsen, J. (1999) Isotopomer Analysis Using GC-MS. Metab. Eng, . 1, E8E16.
Fisher, E., et al. (2004) High-throughput metabolic flux analysis based on gas chromatography-mass spectrometry derived 13C constraints. Anal. Biochem, . 325, 308316[CrossRef][Medline].
Follstad, B.D. and Stephanopoulos, G. (1998) Effect of reversible reactions on isotope label redistributionanalysis of the pentose phosphate pathway. Eur. J. Biochem, . 252, 360371[Medline].
Ghosh, S., et al. (2005) Closing the loop between feasible flux scenario identification for construct evaluation and resolution of realized fluxes via nmr. Comput. chem. Eng, . 29, 459466[CrossRef].
Gombert, A.K., et al. (2001) Network identification and flux quantification in the central metabolism of Saccharomyces cerevisiae under different conditions of glucose repression. J. Bacteriol, . 183, 14411451
Isermann, N. and Wiechert, W. (2003) Metabolic isotopomer labeling systems. Part II: structural identifibiality analysis. Math. Biosci, . 183, 175214[Medline].
Kelleher, J.K. (2001) Flux estimation using isotopic tracers: common ground for metabolic physiology and metabolic engineering. Metab. Eng, . 3, 100110[CrossRef][ISI][Medline].
Klamt, S. and Schuster, S. (2002) Calculating as many fluxes as possible in underdetermined metabolic networks. Mol. Biol. Rep, . 29, 243248[Medline].
Lengauer, T. and Tarjan, R. (1979) A fast algorithm for finding dominators in a flowgraph. ACM Trans. Program. Lang. Syst, . 1, 121141.
Martin, A. (2001) General mixed integer programming: computational issues for branch-and-cut algorithms. In Jünger, M. and Naddef, D. (Eds.). Computational Combinatorial Optimization: Optimal and Provably Near-Optimal Solutions, Lecture Notes in Computer Science, , Berlin/Heidelberg, Germany Springer Vol. 2241, , pp. 125.
Marx, A., et al. (1996) Determination of the fluxes in the central metabolism of Corynebacterium glutamicum by nuclear magnetic resonance spectroscopy combined with metabolite balancing. Biotechnol. Bioeng, . 49, 111129[CrossRef].
Möllney, M., et al. (1999) Bidirectional reaction steps in metabolic networks IV: optimal design of isotopomer labeling systems. Biotechnol. Bioeng, . 66, 86103[CrossRef][ISI][Medline].
Nielsen, J. (2003) It is all about metabolic fluxes. J. Bacteriol, . 185, 70317035
Rantanen, A., Rousu, J., Pitkänen, E., Maaheimo, H., Ukkonen, E. (2005) Flow analysis of metabolite fragments for flux estimation. Proceedings of the 3rd International Workshop on Computational Methods in Systems Biology, CMSB 2005, Edinburgh , pp. 242255.
Rousu, J., Rantanen, A., Maaheimo, H., Pitkänen, E., Saarela, K., Ukkonen, E. (2003) A method for estimating metabolic fluxes from incomplete isotopomer information. Computational Methods in Systems Biology, Proceedings of the First International Workshop, CMSB 2003, Lecture Notes in Computer Science Vol. 2602, , pp. 88103.
Schmidt, K., et al. (1997) Modeling isotopomer distributions in biochemical networks using isotopomer mapping matrices. Biotechnol. Bioeng, . 55, 831840[CrossRef].
Schmidt, K., et al. (1999) Quantification of intracellular metabolic fluxes from fractional enrichment and 13C13C coupling constraints on the isotopomer distribution in labeled biomass components. Metab. Eng, . 1, 166179[CrossRef][Medline].
Selivanov, V., et al. (2004) An optimized algorithm for flux estimation from isotopomer distribution in glucose metabolites. Bioinformatics, 20, 33873397
Shen, J., et al. (1999) Determination of the rate of the glutamate/glutamine cycle in the human brain by in vivo 13C NMR. Proc. Natl Acad. Sci. USA, 96, 82358240
Sola, A., et al. (2004) Amino acid biosynthesis and metabolic flux profiling of Pichia pastoris. Eur. J. Biochem, . 271, 24622470[Medline].
Stephanopoulos, G., Aristidou, A., Nielsen, J. Metabolic Engineering: Principles and Methodologies, (1998) , San Diego, USA Academic Press.
Szyperski, T., et al. (1999) Bioreaction network topology and metabolic flux ratio analysis by biosynthetic fractional 13C labeling and two-dimensional NMR spectrometry. Metab. Eng, . 1, 189197[CrossRef][Medline].
Szyperski, T. (1995) Biosynthetically directed fractional 13C-labelling of proteinogenic amino acids. An efficient analytical tool to investigate intermediary metabolism. Eur. J. Biochem, . 232, 433448[ISI][Medline].
Szyperski, T. (1998) 13C-NMR, MS and metabolic flux balancing in biotechnology research. Q. Rev. Biophys, . 31, 41106[CrossRef][ISI][Medline].
van Winden, W.A., et al. (2001) A priori analysis of metabolic flux identifiability from 13C-labeling data. Biotechnol. Bioeng, . 74, 505516[Medline].
van Winden, W.A., et al. (2005) Metabolic-flux analysis of Saccharomyces cerevisiae CEN.PK113-7D based on mass isotopomer measurements of 13C-labeled primary metabolites. FEMS Yeast Res, . 5, 559568[CrossRef][Medline].
Wiechert, W., et al. (2001) A Universal Framework for 13C metabolic flux analysis. Metab. Eng, . 3, 265283




Mij, then M|F 
, be the metabolites and
the reactions of a pathway. Let M be a reactant of
and let
be the sole producer of
and denote by
the composite carbon mapping of the pathway
. If for some fragment F of M,
for each 1
.
be the set of unbranched pathways connecting M and Mi with associated composite carbon mappings
and let Mik denote the subpool of Mi produced by Rk. For fragment F = (f1, ... , fh) of M, if M|F
k(F) for 1
for each 1 









