Skip Navigation


Bioinformatics Advance Access originally published online on December 16, 2004
Bioinformatics 2005 21(8):1617-1625; doi:10.1093/bioinformatics/bti225
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/8/1617    most recent
bti225v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (12)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Haunschild, M. D.
Right arrow Articles by Wiechert, W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Haunschild, M. D.
Right arrow Articles by Wiechert, W.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2004. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

Investigating the dynamic behavior of biochemical networks using model families

Marc Daniel Haunschild 1, Bernd Freisleben 2, Ralf Takors 3 and Wolfgang Wiechert 1,*

1Department of Simulation, University of Siegen Paul-Bonatz-Strasse 9-11, D-57068 Siegen, Germany
2Department of Mathematics and Computer Science, University of Marburg Hans-Meerwein-Strasse, D-35032 Marburg, Germany
3Institute of Biotechnology, Research Center Juelich D-52425 Juelich, Germany

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 1 INTRODUCTION
 2 DYNAMIC SAMPLING DATA
 3 DATA DRIVEN MODELING
 4 THE MODELING PROCESS
 5 KINETIC VARIANTS
 6 NETWORK VARIANTS
 7 COMBINATORIAL MODEL GENERATION
 8 EXCLUDING MEANINGLESS MODELS
 9 AUTOMATIC NAVIGATION IN...
 10 SENSITIVITY ANALYSIS
 11 MMT2: A COMPUTATIONAL...
 12 FEATURES OF MMT2
 13 BENCHMARK RESULTS FOR...
 14 BENCHMARK RESULTS FOR...
 15 CONCLUSIONS
 16 OUTLOOK ON MODEL...
 REFERENCES
 

Motivation: Supporting the evolutionary modeling process of dynamic biochemical networks based on sampled in vivo data requires more than just simulation. In the course of the modeling process, the modeler is typically concerned not only with a single model but also with sequences, alternatives and structural variants of models. Powerful automatic methods are then required to assist the modeler in the organization and the evaluation of alternative models. Moreover, the structure and peculiarities of the data require dedicated tool support.

Summary: To support all stages of an evolutionary modeling process, a new general formalism for the combinatorial specification of large model families is introduced. It allows for automatic navigation in the space of models and excludes biologically meaningless models on the basis of elementary flux mode analysis. An incremental usage of the measured data is supported by using splined data instead of state variables. With MMT2, a versatile tool has been developed as a computational engine intended to be built into a tool chain. Using automatic code generation, automatic differentiation for sensitivity analysis and grid computing technology, a high performance computing environment is achieved. MMT2 supplies XML model specification and several software interfaces. The performance of MMT2 is illustrated by several examples from ongoing research projects.

Availability: http://www.simtec.mb.uni-siegen.de/

Contact: wiechert{at}simtec.mb.uni-siegen.de


    1 INTRODUCTION
 TOP
 Abstract
 1 INTRODUCTION
 2 DYNAMIC SAMPLING DATA
 3 DATA DRIVEN MODELING
 4 THE MODELING PROCESS
 5 KINETIC VARIANTS
 6 NETWORK VARIANTS
 7 COMBINATORIAL MODEL GENERATION
 8 EXCLUDING MEANINGLESS MODELS
 9 AUTOMATIC NAVIGATION IN...
 10 SENSITIVITY ANALYSIS
 11 MMT2: A COMPUTATIONAL...
 12 FEATURES OF MMT2
 13 BENCHMARK RESULTS FOR...
 14 BENCHMARK RESULTS FOR...
 15 CONCLUSIONS
 16 OUTLOOK ON MODEL...
 REFERENCES
 
Modeling and simulation of biochemical networks is an important issue in computational biology. With the advent of the new disciplines, systems biology (Kitano, 2000) and metabolic engineering (Stephanopoulos et al., 1998), which are aiming at a holistic understanding of cellular systems from a theoretical or applied viewpoint, the quantitative understanding of regulatory networks has become a main focus of interest. The investigation of the dynamic behavior of biochemical networks gives complementary information to the structural knowledge about cellular systems obtained from genome, transcriptome and proteome data. In particular, modeling and simulation help to unravel the hierarchically organized cellular regulation networks (Lengeler, 2000) and the complex interactions between cellular components. However, the design of simulation tools strongly depends on the research aims and on the available data. Consequently, a still growing number of different modeling and simulation tools for biochemical networks are currently under development.

Many existing tools (Wiechert, 2002) are starting at a stage where the regulatory structure, reaction kinetic terms and kinetic parameters are already given. One group of these tools aims at a holistic understanding of cellular regulation and thus emphasizes on conceptional modeling and hierarchical decomposition of networks (Takahashi et al., 2003; Tränkle et al., 2000).

Another group is more concerned with the analysis of biochemical networks, e.g. applying the concepts developed in metabolic control theory (Mendes, 1997; Schilling et al., 1999; Sauro, 1993). Some of these tools are not only capable of computing and analyzing the stationary state of a system but also of simulating the dynamic behavior.

However, in practice the regulatory structure of biochemical networks and in particular the reaction kinetic details are still only partially known. In fact, even after several decades of enzyme investigation, the regulatory properties of the major biocatalysts are still incomplete (Wiechert, 2002). The main reason is that all these investigations took place under in vitro conditions and may not be representative for the in vivo situation. As an example, the regulatory importance of the phosphotransferase glucose uptake system in Escherichia coli, which cannot be investigated in vitro, has been recognized only recently.

Consequently, in vivo experiments with high throughput measurement techniques are required to obtain more information about cellular regulation in vivo. Surprisingly, only a few tools are available to support a bottom-up model development process based on experimental in vivo data (Rizzi et al., 1997; Westerhoff, 2001). Such data-driven tools are not only important for the evaluation of rapid sampling experiments in metabolic engineering, but also for transient observations in gene regulatory processes.

In this paper, the first discussion will be on how data-driven modeling is typically carried out (Sections 2–4). Without a supporting software, this is a tedious procedure of setting up models, trying out alternative modeling approaches and evaluating parameter fitted models. To support the modeler in this work, the formalism of model families will be presented (Sections 4–9). Two other important methods for data-driven modeling are to decrementally use splines instead of state variables and to perform sensitivity analysis for statistical purposes (Section 10). In Sections 11–12 it will be explained how these methods are implemented in the proposed MMT2 tool. In Sections 13 and 14, benchmarking results for sensitivity computations and model variants will be presented. Section 15 concludes the paper, and Section 16 gives an outlook on model selection.


    2 DYNAMIC SAMPLING DATA
 TOP
 Abstract
 1 INTRODUCTION
 2 DYNAMIC SAMPLING DATA
 3 DATA DRIVEN MODELING
 4 THE MODELING PROCESS
 5 KINETIC VARIANTS
 6 NETWORK VARIANTS
 7 COMBINATORIAL MODEL GENERATION
 8 EXCLUDING MEANINGLESS MODELS
 9 AUTOMATIC NAVIGATION IN...
 10 SENSITIVITY ANALYSIS
 11 MMT2: A COMPUTATIONAL...
 12 FEATURES OF MMT2
 13 BENCHMARK RESULTS FOR...
 14 BENCHMARK RESULTS FOR...
 15 CONCLUSIONS
 16 OUTLOOK ON MODEL...
 REFERENCES
 
Dynamic sampling is the most important method to obtain information about intracellular biochemical networks. As an example from metabolic engineering, rapid sampling devices for pulse experiments have been developed and applied to strains of Saccharomyces cerevisae, E.coli and Corynebacterium glutamicum cells (Koning and Dam, 1992; Rizzi et al., 1997; Schaefer et al., 1999). They serve to measure the dynamic behavior of the cell's metabolism at a very high sampling rate.

To this end, cells are grown in a bioreactor and stimulated by a substrate pulse (Fig. 1). A sampling device takes out cells at a frequency of 5 per second which are immediately cooled down with –50°C methanol containing quenching fluid that stops the metabolic activity. Each of these samples stores a snapshot of the cell's metabolism activity during the experiment. At the moment, over 30 chemical substances in the cell's main metabolism can be quantified, leading to over 5000 data points for a single experiment.



View larger version (28K):
[in this window]
[in a new window]
 
Fig. 1 By using a rapid sampling device, a very high sampling rate can be achieved when measuring the metabolite pool concentrations in a pulse experiment (Schaefer et al., 1999).

 
Only recently such a large amount of highly informative in vivo data on biochemical systems has become available. By using rapid sampling devices, new challenging biological questions can be tackled in the field of metabolome research. The resulting data can be very useful when trying to unravel the structure of the metabolic regulation network in vivo.


    3 DATA DRIVEN MODELING
 TOP
 Abstract
 1 INTRODUCTION
 2 DYNAMIC SAMPLING DATA
 3 DATA DRIVEN MODELING
 4 THE MODELING PROCESS
 5 KINETIC VARIANTS
 6 NETWORK VARIANTS
 7 COMBINATORIAL MODEL GENERATION
 8 EXCLUDING MEANINGLESS MODELS
 9 AUTOMATIC NAVIGATION IN...
 10 SENSITIVITY ANALYSIS
 11 MMT2: A COMPUTATIONAL...
 12 FEATURES OF MMT2
 13 BENCHMARK RESULTS FOR...
 14 BENCHMARK RESULTS FOR...
 15 CONCLUSIONS
 16 OUTLOOK ON MODEL...
 REFERENCES
 
An important aspect of modeling—particularly of biological systems—is that model development is an iterative process that takes place in a sequence of experiments, simulation runs, model based data evaluation and experimental design (Wiechert and Takors, 2004). In this experimental cycle, an initial plan of a system model is successively improved until finally a satisfactory reproduction of the measured data is achieved. Moreover, even if the data can be reproduced well by a model at the end of this process, it still does not mean that the model has some predictive power. For this purpose, a simulator must supply an accompanying set of tools for model analysis and statistical methods to test the validity of the model.

It is a common experience that the final outcome of this process is not the concise knowledge gathered in the experimental cycle. On the contrary, most insights into the biological system have been obtained from the decisions made in the modeling process. For example, the decision to leave out a certain effect from the model because it has very little influence on the system behavior is very important, but nevertheless it is not documented in the final model structure. On the other hand, for those effects that are recognized in the final model it does not become clear why they have been incorporated. To summarize, there is much more knowledge gathered about the biological system in the modeling process than is actually documented in its final result. In particular, an adequate software tool must address the following requirements of an evolutionary modeling process:

  1. In the course of the modeling process, the modeler is typically not only concerned with one single model but with sequences, alternatives and structural variants of models. In many situations, several model alternatives have to be tried out simultaneously which in practice can become a tedious procedure. Consequently, model variants should be explicitly supported and managed by the modeling tool. However, this is typically not a linear evolution but rather a process with many dead ends, new model setups, trials of model variants and model fusions.
  2. The experimental context given by the available concentration data of several system pools must be taken into account. In particular, it is absolutely necessary for any statistical evaluation to incorporate the knowledge about measurement uncertainties and special signal features like steps in the feed concentration.
  3. Not only must the simulation of a given model, be offered but also the tools for parameter fitting, extensive statistical evaluation, experimental design and model discrimination. These tools are absolutely necessary to support the modeling decisions made in the process.
  4. An evolutionary modeling process typically starts with a small model. In this situation, it makes sense to model only a few regulatory loops while other influences are not mechanistically described by the model. If known effectors from unmodeled parts of the system are measured, they can be incorporated into a simulation as an external data source. To this end, it is necessary to replace noisy measurement data by smoothing functions. In the case of pulse experiments, discontinuous signals must also be allowed.

From these requirements it becomes clear that simulation is only a very basic task in the modeling process while other algorithms are much more important and time-consuming. Since the computational efforts dramatically grow with the size of the models, the usage of high performance computing is mandatory (see Section 11).


    4 THE MODELING PROCESS
 TOP
 Abstract
 1 INTRODUCTION
 2 DYNAMIC SAMPLING DATA
 3 DATA DRIVEN MODELING
 4 THE MODELING PROCESS
 5 KINETIC VARIANTS
 6 NETWORK VARIANTS
 7 COMBINATORIAL MODEL GENERATION
 8 EXCLUDING MEANINGLESS MODELS
 9 AUTOMATIC NAVIGATION IN...
 10 SENSITIVITY ANALYSIS
 11 MMT2: A COMPUTATIONAL...
 12 FEATURES OF MMT2
 13 BENCHMARK RESULTS FOR...
 14 BENCHMARK RESULTS FOR...
 15 CONCLUSIONS
 16 OUTLOOK ON MODEL...
 REFERENCES
 
Biochemical networks are usually represented by bipartite graphs with reaction and pool nodes (Wiechert, 2002). The graph can be augmented by additional arcs for effector relations (activation, inhibition). From this general network structure, a commonly used mathematical model structure in Equation 1 is immediately derived.

(1)

Here, the vector contains the metabolites, N is the stoichiometric matrix describing the interconnecting fluxes between the metabolites. The fluxes' quantitative description is enclosed within the vector of functions depending on a vector of parameters and, of course, the current metabolites concentrations .

To support the evolutionary process of model building for biochemical networks, a formalization of the relations between different models is necessary. From an abstract viewpoint, the set of all models defined in a modeling process is given by a family of biochemical network models which are given, in general, as

(2)

Here, I is an index set which enumerates all models in the sequential modeling process. However, a linear sequence of models does not reflect the true interdependencies between various models, because there might be branches, variants, dead ends or model fusions. Thus, the index set I must be extended by an additional relation {precedes} that indicates the interdependencies between the models. Basically, the relation i {precedes} j is used if model j is obtained by an elementary network modification from model i. Here, ‘elementary’ modification means a single change of some reaction kinetics or the network topology, as defined more precisely below. A directed graph is induced by the elementary relation {precedes} representing the relations between all models (i.e. a ‘network of networks’ as shown in Fig. 4).



View larger version (15K):
[in this window]
[in a new window]
 
Fig. 4 (a) Some variants of the example network and their neighborhood relationships. In Figure 2, the step from Network 4 to 5 was not elementary. As shown here, there are intermediate networks. (b) Two other network variants.

 

    5 KINETIC VARIANTS
 TOP
 Abstract
 1 INTRODUCTION
 2 DYNAMIC SAMPLING DATA
 3 DATA DRIVEN MODELING
 4 THE MODELING PROCESS
 5 KINETIC VARIANTS
 6 NETWORK VARIANTS
 7 COMBINATORIAL MODEL GENERATION
 8 EXCLUDING MEANINGLESS MODELS
 9 AUTOMATIC NAVIGATION IN...
 10 SENSITIVITY ANALYSIS
 11 MMT2: A COMPUTATIONAL...
 12 FEATURES OF MMT2
 13 BENCHMARK RESULTS FOR...
 14 BENCHMARK RESULTS FOR...
 15 CONCLUSIONS
 16 OUTLOOK ON MODEL...
 REFERENCES
 
The simple example shown in Figure 2 is used to illustrate these concepts in more detail. Assume that the first model (Fig. 2a) in the modeling process is a simple branched network with three reaction steps each with a Michaelis–Menten type kinetics:

(3)



View larger version (9K):
[in this window]
[in a new window]
 
Fig. 2 Sequence of models during a modeling process. Networks 1–4 have the same topology, only the kinetic formulas vary. Network 5 shows a change in the topology introducing a new reaction step. The way from Networks 1–4 to Network 5 is not an elementary step, as shown in Figure 4. (a) Network 1, initial guess; (b) Network 2, inhibitor effect assumed from pool D; (c) Network 3; (d) Network 4 and (e) Network 5.

 
A comparison of the model with measured data might now indicate that an inhibitory effect of metabolite on the first reaction step V1 is present. To include this effect in the model, the reaction kinetic term of the first reaction step has to be changed, thus producing the second model (Fig. 2b) in the sequence. The second model is called kinetic variant of the first model because some kinetics is changed whereas the network topology is unaltered. For simplicity, the inhibition is described by a multiplicative term, although arbitrary complex kinetics are possible in MMT2.

(4)

Obviously, there is just a single parameter added to the system (an inhibition constant kI), and the Michaelis–Menten parameters v1,max, k1,m basically have the same biological meaning in the second model as in the first model. It also becomes clear that if the limkI->0 is taken, the Michaelis–Menten model turns out to be a special case of Model 1. This elementary extension relationship between the first two models will be formalized by the relation 1 {precedes} 2.

After trying out the inhibition model, the result might still not be satisfying and the question arises whether the remaining metabolite might be the inhibitor. But there is also another alternative that both and have an inhibiting effect on the first reaction step. Consequently, two more reaction kinetic terms have to be tried out giving rise to Models 3 and 4 (Fig. 2c and d) in the sequence.

Looking at the interdependencies of the four kinetic variants (Fig. 3) of reaction step V1, it becomes clear that the numbering of the models does not say anything about their logical interdependency. More precisely, Model 3 is an elementary generalization of Model 1 in the same way as Model 2 extends Model 1, whereas Model 4 is an elementary generalization of both Models 2 and 3. Model 4 generalizes Model 1 only indirectly via Model 2 or 3.



View larger version (14K):
[in this window]
[in a new window]
 
Fig. 3 Relationship of kinetic variants represented by a graph. Here, variant 1 is a special case of variants 2 and 3 having the inhibition parameter kI set to zero.

 

    6 NETWORK VARIANTS
 TOP
 Abstract
 1 INTRODUCTION
 2 DYNAMIC SAMPLING DATA
 3 DATA DRIVEN MODELING
 4 THE MODELING PROCESS
 5 KINETIC VARIANTS
 6 NETWORK VARIANTS
 7 COMBINATORIAL MODEL GENERATION
 8 EXCLUDING MEANINGLESS MODELS
 9 AUTOMATIC NAVIGATION IN...
 10 SENSITIVITY ANALYSIS
 11 MMT2: A COMPUTATIONAL...
 12 FEATURES OF MMT2
 13 BENCHMARK RESULTS FOR...
 14 BENCHMARK RESULTS FOR...
 15 CONCLUSIONS
 16 OUTLOOK ON MODEL...
 REFERENCES
 
Up to now, the running example only illustrates kinetic variants of a network which do not change the network topology. Network variants are the second type of modifications in a modeling process. In an elementary network modification, a new reaction step can be added to the network or an existing step can be removed.

Assume that additional measurement data become available so that the lumped pool can be subdivided into two parts, and . Consequently, reaction step V1 (Fig. 2d) is subdivided into two steps V6 and V7 (Fig. 2e). However, the change from Network 4 to Network 5 is not an elementary extension or specialization step because three reactions (V1, V6 and V7) are involved. Therefore, auxiliary models have to be introduced to explain the relation between Models 4 and 5 by a sequence of elementary modifications (Fig. 4). It will be shown below that steps I–IV do not have to be explicitly formulated by the modeler. The road from 4–5 leads through a sequence of Models 4–I–II–5 or alternatively 4–II–IV–5.

It turns out that Models I, III and IV are meaningless in practice because they have isolated or dead end nodes. Thus, no biologically meaningful stationary state except for zero fluxes can occur which does not make sense for a metabolic network. Model II in turn represents the maximum network which generalizes both networks 4 and 5. In general, any metabolic network should satisfy the condition that a stationary solution different from zero is possible (see Section 8).


    7 COMBINATORIAL MODEL GENERATION
 TOP
 Abstract
 1 INTRODUCTION
 2 DYNAMIC SAMPLING DATA
 3 DATA DRIVEN MODELING
 4 THE MODELING PROCESS
 5 KINETIC VARIANTS
 6 NETWORK VARIANTS
 7 COMBINATORIAL MODEL GENERATION
 8 EXCLUDING MEANINGLESS MODELS
 9 AUTOMATIC NAVIGATION IN...
 10 SENSITIVITY ANALYSIS
 11 MMT2: A COMPUTATIONAL...
 12 FEATURES OF MMT2
 13 BENCHMARK RESULTS FOR...
 14 BENCHMARK RESULTS FOR...
 15 CONCLUSIONS
 16 OUTLOOK ON MODEL...
 REFERENCES
 
Up to this point, the modeling process has been described from a retrospective viewpoint in order to show the general structure of the modeling process. When small biochemical networks with less than five reaction steps are considered, it is usually sufficient to carry out the modeling process step by step with elementary alteration operations. The observed discrepancy between model prediction and the measured data in this situation usually gives the right hints to make a certain change in the network structure or the reaction kinetics.

However, when large biochemical systems are modeled in an evolutionary process, it becomes more and more difficult to get the right ideas on how to change a given network. Usually, there will be several ideas on how to improve the current model and a combination of these ideas might make sense.

In the case of the running example, assume that the current stage of the modeling process is represented by Network 5 in Figure 2. All the single steps that led from Networks 1 to 5 are now combined to produce a model search space in the following combinatorial way:

  1. All reaction steps considered in the different models are united to the maximum Network II shown in Figure 4. In this network, the reaction steps V1, V6 and V7 can be switched on and off. If a metabolite pool becomes isolated this way, it is automatically removed from the network. For example, when the reaction steps V6 and V7 are both switched off, pool is removed resulting in Network 4.
  2. For some reaction steps, kinetic alternatives can be specified by the modeler. The alternatives must be explicitly given by kinetic formulas and the elementary generalization relations (Fig. 4) must also be specified. Parameters with the same biological meaning in all kinetic variants must be given the same name in all models. In the given example, there is only one reaction step V1 + V6, for which different reaction alternatives are considered. The four possible variants for reaction step V1 with their relations are shown in Figure 3.

The total number of models which can be obtained this way is 5 · 5 · 2 = 50. In this situation, it becomes an increasingly tedious procedure to fit one model variant after the other to a given dataset. Even a small number of possible network operations, together with all their combinations, can quickly produce a combinatorial explosion of model variants until the number of possible models becomes prohibitive for a manual modeling process.

Having the perspective of grid computing (Foster and Kesselman, 2003) in mind, this is a typical task that can be automated to run in parallel on a computer network. Using grid technology, it is possible to explore even a search space of several hundred metabolic models and to fit them to a given dataset (see Section 9).

Nevertheless, the number of tested models must be reduced. It should be clear that a large number of combinations is not really wanted or biologically meaningful. As described in Section 8, through elementary flux modes and user defined constraints, the space of meaningless models is minimized.


    8 EXCLUDING MEANINGLESS MODELS
 TOP
 Abstract
 1 INTRODUCTION
 2 DYNAMIC SAMPLING DATA
 3 DATA DRIVEN MODELING
 4 THE MODELING PROCESS
 5 KINETIC VARIANTS
 6 NETWORK VARIANTS
 7 COMBINATORIAL MODEL GENERATION
 8 EXCLUDING MEANINGLESS MODELS
 9 AUTOMATIC NAVIGATION IN...
 10 SENSITIVITY ANALYSIS
 11 MMT2: A COMPUTATIONAL...
 12 FEATURES OF MMT2
 13 BENCHMARK RESULTS FOR...
 14 BENCHMARK RESULTS FOR...
 15 CONCLUSIONS
 16 OUTLOOK ON MODEL...
 REFERENCES
 
To increase the efficiency of combinatorial model testing, the number of biologically meaningless models is minimized. It should be clear that fitting the parameters of such a model is useless since the modeler will reject it anyways. In this section, the two presented exclusion mechanisms enable the modeler to define a priori the models that are to be rejected. However, an automatic navigation algorithm might still use the meaningless models as auxiliary intermediate steps to produce a model variant from a given model (Fig. 4).

  1. Each model in the search space must be biologically meaningful, and, especially, those models having inoperable parts are to be rejected. This means that a stationary solution apart from zero fluxes must be possible. Whether such a condition is possible can be automatically determined by computing the elementary flux modes, as described below. Using this mechanism, Networks I, III and IV are excluded.
  2. Further constraints might be imposed by the user by specifying Boolean expressions that describe which combinations should be excluded. For example, the following constraint enforces that only one of the reaction Steps V1 and V6 must be the null kinetics. It excludes all the introduced intermediate Networks I, II, III, V and VI, which reduces the search space to 16 models.

(5)

If both (1) and (2) are considered, the exclusion mechanisms reduce the model space down to eight models. Taking into account the combinatorial explosion and the proposed reduction, it should now become clear that the exploration of rather complex model search spaces is well within the range of modern grid computing facilities.

Exclusion mechanism 1, namely the method of elementary flux modes (Schuster et al., 2000, http://bms-mudshark.brookes.ac.uk/algorithm.pdf), breaks down the complete network to an elementary sets of fluxes. These so-called elementary flux modes can be interpreted as subnetworks that can autonomously operate at steady state. Table 1 and Figure 5a show the four modes of the maximum example Network II.


View this table:
[in this window]
[in a new window]
 
Table 1 Tabular notation of the elementary flux modes

 


View larger version (7K):
[in this window]
[in a new window]
 
Fig. 5 The elementary flux modes of the running example. Figure 4, Network II shows the maximum network; (a)–(d) the four elementary sets of reactions that operate on their own. Note that metabolite A is external in this example. (a) Mode 1; (b) Mode 2; (c) Mode 3 and (d) Mode 4.

 
An easy way to find the ‘failure modes’ is described by Klamt and Gilles (2004). By computing the elementary flux modes of the maximum network, it is possible to forecast the reactions that will be inoperable when switching off a particular one. For example, if reaction #6 is switched off, all elementary modes that contain this reaction (Modes 3 and 4) are not operable anymore. In the remaining modes, reaction #7 is always switched off. This indicates that this reaction is cut off.


    9 AUTOMATIC NAVIGATION IN MODEL SPACE
 TOP
 Abstract
 1 INTRODUCTION
 2 DYNAMIC SAMPLING DATA
 3 DATA DRIVEN MODELING
 4 THE MODELING PROCESS
 5 KINETIC VARIANTS
 6 NETWORK VARIANTS
 7 COMBINATORIAL MODEL GENERATION
 8 EXCLUDING MEANINGLESS MODELS
 9 AUTOMATIC NAVIGATION IN...
 10 SENSITIVITY ANALYSIS
 11 MMT2: A COMPUTATIONAL...
 12 FEATURES OF MMT2
 13 BENCHMARK RESULTS FOR...
 14 BENCHMARK RESULTS FOR...
 15 CONCLUSIONS
 16 OUTLOOK ON MODEL...
 REFERENCES
 
In the previous sections, we have described how model families are built up and how auxiliary models are used to fill the gaps between similar models in the model space to achieve elementary changes between the models. As a consequence, a neighborhood relationship that allows a navigation in the model space comes into existence.

The knowledge of the elementary modification relations is very important when the model space is automatically explored by a systems analysis, parameter fitting or optimization algorithm. The relation {precedes} enables intelligent evolutionary search algorithms that can navigate in the corresponding graph, to be implemented.

When a promising candidate for explaining the experimental data has been found, it makes sense to automatically test its neighbors too. Moreover, the initial parameters taken for the neighbors of a certain model can be partly from the parameter fits already obtained for the present model. This example also shows that an algorithm for generating network variants by applying elementary alteration operations to a given network might have to generate biologically meaningless networks on the way. These networks are automatically excluded from further considerations. The user might specify additional constraints to reduce the complexity of the search space.


    10 SENSITIVITY ANALYSIS
 TOP
 Abstract
 1 INTRODUCTION
 2 DYNAMIC SAMPLING DATA
 3 DATA DRIVEN MODELING
 4 THE MODELING PROCESS
 5 KINETIC VARIANTS
 6 NETWORK VARIANTS
 7 COMBINATORIAL MODEL GENERATION
 8 EXCLUDING MEANINGLESS MODELS
 9 AUTOMATIC NAVIGATION IN...
 10 SENSITIVITY ANALYSIS
 11 MMT2: A COMPUTATIONAL...
 12 FEATURES OF MMT2
 13 BENCHMARK RESULTS FOR...
 14 BENCHMARK RESULTS FOR...
 15 CONCLUSIONS
 16 OUTLOOK ON MODEL...
 REFERENCES
 
During the process of modeling, several methods assisting the analysis of the model are needed. Such methods are, e.g. parameter fitting, statistical analysis, system analysis and experimental design. A basic operation for all these analysis methods is sensitivity analysis. The typically used forms of sensitivities are:

(6)

Using the approach of sensitivity equations, one has to differentiate the implicit model equations by the rules of calculus. More precisely, by exploiting the underlying model structure, the sensitivity ODEs in Equation 7 are derived in (Hurlebaus et al., 2002).

(7)
where D is the total differential.

These equations show that the differentiation only needs to be carried out for the formulas , whereas all other calculations can be done by matrix computations. It will be shown later that the required computational effort is in the same order as that for finite differences, which are known to have several numerical deficits. Note that the left hand side is now matrix valued and for every model parameter there are ODEs describing how they affect the respective metabolic pools. This means that ODEs have to be generated and solved, which can be a very large number in practice.


    11 MMT2: A COMPUTATIONAL ENGINE
 TOP
 Abstract
 1 INTRODUCTION
 2 DYNAMIC SAMPLING DATA
 3 DATA DRIVEN MODELING
 4 THE MODELING PROCESS
 5 KINETIC VARIANTS
 6 NETWORK VARIANTS
 7 COMBINATORIAL MODEL GENERATION
 8 EXCLUDING MEANINGLESS MODELS
 9 AUTOMATIC NAVIGATION IN...
 10 SENSITIVITY ANALYSIS
 11 MMT2: A COMPUTATIONAL...
 12 FEATURES OF MMT2
 13 BENCHMARK RESULTS FOR...
 14 BENCHMARK RESULTS FOR...
 15 CONCLUSIONS
 16 OUTLOOK ON MODEL...
 REFERENCES
 
The metabolic modeling tool MMT2 has been specifically designed to support the process of modeling, to conserve the information gathered on the way and to validate successful solutions. MMT2 is based on the experiences gathered from its predecessor MMT1 (Hurlebaus et al., 2002) and was originally developed for the evaluation of rapid sampling experiments in metabolic engineering but can also be applied to any biochemical network described by chemical reactions and reaction kinetics. Its architecture is a complete redesign based on the state-of-the art simulation technology and extending MMT1's functionality.

MMT2 has been designed as a computational engine that supports all computationally expensive steps described above. Thus, a major focus in the development of MMT2 is high performance computing. On the other hand, MMT2 is intended to be used in a tool chain for biochemical network modeling that might also include preprocessing tools like database access and graphical network representations, powerful optimization frameworks or post-processing tools like data visualization (Qeli et al., 2003) or statistical (Qeli et al., 2004a, b) analysis. Although some of these tools have already been implemented for MMT2, they are not the subject of this contribution. Likewise, methods for model discrimination and experimental design that are already supported by MMT2 are described by Wiechert (2002).

As already outlined, MMT2 is mainly targeted at model based evaluation of experimental data. It is designed for large models with more than 10 ODEs and 100 parameters and large experimental datasets (having several thousands of measured values). In this situation, high performance computing is absolutely crucial. MMT2 and its predecessor MMT1 have been extensively tested with data from rapid sampling experiments.


    12 FEATURES OF MMT2
 TOP
 Abstract
 1 INTRODUCTION
 2 DYNAMIC SAMPLING DATA
 3 DATA DRIVEN MODELING
 4 THE MODELING PROCESS
 5 KINETIC VARIANTS
 6 NETWORK VARIANTS
 7 COMBINATORIAL MODEL GENERATION
 8 EXCLUDING MEANINGLESS MODELS
 9 AUTOMATIC NAVIGATION IN...
 10 SENSITIVITY ANALYSIS
 11 MMT2: A COMPUTATIONAL...
 12 FEATURES OF MMT2
 13 BENCHMARK RESULTS FOR...
 14 BENCHMARK RESULTS FOR...
 15 CONCLUSIONS
 16 OUTLOOK ON MODEL...
 REFERENCES
 
The following list briefly outlines the special features of MMT2:

XML Support. To store metabolic models, MMT2 uses its own XML-format M3L (Metabolic Model Markup Language). It is a dialect of the widely used SBML-format (Hucka et al., 2003) and differs only in those parts where extended features of MMT2 (e.g. splines and model variants) come into action.
Splines. As stated in Section 3, in the beginning of a modeling process, the modeler is not certain at all how the model has to be set up. On the other hand, the modeler already has a large amount of experimental data. Instead of using this experimental data entirely for checking and validating the modeling approach it can be used to reduce the model complexity. For instance, complex mechanisms such as the ATP–ADP flow can be substituted by data and the modeler can first focus on simpler mechanisms before tackling parts of the surrounding network. The measured metabolite time course is represented by a smoothing spline, which is computed by an independent tool and stored as a piecewise polynomial in the M3L format.
Code generation. All the tools mentioned in the Introduction section use an interpreter to perform the simulation, having the advantage of easy usability and simple program installation. MMT2 takes a different way to automatic program generation, having the advantage of very fast execution of a single simulation.
Automatic differentiation. The method of sensitivity equations described in Section has been implemented in MMT2. Using ADOL-C (Griewank et al., 1996), an algorithm was written to implement Equation (7). ADOL-C takes care of computing the derivatives such as and in Equation (7). The big advantage is that the undifferentiated code need not be touched and the complicated sensitivity equations do not explicitly occur in the generated code.
Variant specification. MMT2 fully supports the formalism of model families presented in the previous sections. MMT2 does not distinguish kinetic variants from network variants. Instead, they are brought together into one description by introducing a null kinetics. In the given example, there is a combinatorially generated family of models. Figure 4 displays the structure of this model variant space. Here, the models related by one elementary generalization or specialization step are linked by an arrow. In the rows of Table 2 these model variants are listed. Fortunately, the model space of all combinatorially possible models contains only few models biologically meaningful. By using two exclusion mechanisms, the number of possible models is reduced significantly.
Grid computing. The MMT2 grid service was realized using the Java based Globus Toolkit 3.0 (GT3) (Foster and Kesselman, 2003) as a service-oriented grid middleware. The grid specific parts of the service are implemented as a Java wrapper that uses the computational methods of the MMT2 simulator, accessed through the JNI since they are implemented in C. All descriptions and communication artifacts that are needed to access the simulator as a grid service are generated from the Java wrapper using a number of tools included in the GT3 distribution. The service is currently used by a basic scheduler that offers the distributed execution of the MMT2 simulator service through a web portal that is accessible through a standard web browser.


View this table:
[in this window]
[in a new window]
 
Table 2 Tabular representation of the model space by explicitly listing each network variant

 

    13 BENCHMARK RESULTS FOR SENSITIVITY COMPUTATION
 TOP
 Abstract
 1 INTRODUCTION
 2 DYNAMIC SAMPLING DATA
 3 DATA DRIVEN MODELING
 4 THE MODELING PROCESS
 5 KINETIC VARIANTS
 6 NETWORK VARIANTS
 7 COMBINATORIAL MODEL GENERATION
 8 EXCLUDING MEANINGLESS MODELS
 9 AUTOMATIC NAVIGATION IN...
 10 SENSITIVITY ANALYSIS
 11 MMT2: A COMPUTATIONAL...
 12 FEATURES OF MMT2
 13 BENCHMARK RESULTS FOR...
 14 BENCHMARK RESULTS FOR...
 15 CONCLUSIONS
 16 OUTLOOK ON MODEL...
 REFERENCES
 
Several benchmark examples were used to investigate the performance of MMT2. The Models #1, #2 and #3 in Table 3 are published as SBML files on the homepage of www.sbml.org and were converted to M3L. Model #1 implements the famous Lorenz equations as a set of reactions. Model #2 is concerned with the mitogen-activated protein kinase pathways. Model #3 is a modified Goldbeter minimal mitotic oscillator. Models #4 and #5 were built in a running research project investigating rapid sampling experiments (Degenring et al., 2004) with the aid of MMT2.


View this table:
[in this window]
[in a new window]
 
Table 3 Benchmark runs of the tool MMT2 using 5 different models. Column r.s. shows the time for one single simulation run, the following columns show the consumed time for the computation of the parameter sensitivities

 
All of the benchmark examples presented in Table 3 were computed on an Athlon 1900+ processor with 0.5 GByte main memory, running Linux, Kernel 2.6. The first three columns in this Table show the model's size. Column gives the number of parameters, gives the number of ODEs and gives the number of fluxes. Column r.s. displays the time of a single simulation run of this model which serves as an indicator of the model's complexity and stiffness.

The remaining columns give the results of the benchmark runs. Column f.d. gives the time the algorithm of finite differences needed to calculate the parameter's sensitivities. In theory, this value should be · r.s., but the reference simulation includes the initialization which is only done once when computing the finite differences. Column s.e. is composed of four subcolumns each showing the required time with a specific error tolerance. In general, a smaller tolerance gives more precise results, but also more simulation steps.

The interesting effect with Models #4 and #5 is worth mentioning. Here, the computation with an error tolerance of 10–2 takes longer than the one with 10–4. This is primarily caused by the behavior of the integrator's step size controller which is hampered because of the low accuracy. This is a common experience with numerical ODE solvers. However, above a certain threshold a higher accuracy always means a longer computation time.


    14 BENCHMARK RESULTS FOR MODEL VARIANTS
 TOP
 Abstract
 1 INTRODUCTION
 2 DYNAMIC SAMPLING DATA
 3 DATA DRIVEN MODELING
 4 THE MODELING PROCESS
 5 KINETIC VARIANTS
 6 NETWORK VARIANTS
 7 COMBINATORIAL MODEL GENERATION
 8 EXCLUDING MEANINGLESS MODELS
 9 AUTOMATIC NAVIGATION IN...
 10 SENSITIVITY ANALYSIS
 11 MMT2: A COMPUTATIONAL...
 12 FEATURES OF MMT2
 13 BENCHMARK RESULTS FOR...
 14 BENCHMARK RESULTS FOR...
 15 CONCLUSIONS
 16 OUTLOOK ON MODEL...
 REFERENCES
 
To test the model selection process and to demonstrate the computing performance, a simple exhaustive search strategy was implemented which will be substituted by a more intelligent search algorithm in the future project work. The strategy simply visits any feasible variant of the model and does a repeated parameter fit with multiple starting points by using the subplex optimization algorithm.

The biological model originates from another research project which is concerned with the model-based evaluation of a rapid sampling experiment investigating the metabolism of E.coli. In the current stage it is focused on the shikimic acid pathway. The model consists of eight metabolites and another seven taken from data as smoothed splines. Its 10 reactions sum up 58 parameters which are fitted to 499 measured values from 6 metabolites. For a model discrimination study, the approach of model families was used.

The model family had 196 variants from which 158 were excluded. For each of the remaining 38 models, 15 jobs were created, and every job was run for a maximum of L = 8 h. In total, 570 jobs were created which were distributed on a cluster of 100 Opteron 2 GHz processors having 2 GB main memory. In total, the single simulation run was executed over 5.8 x 108 times and was finished after 51 h and 41 min which translates into 3117 simulation runs per second on average.

After all jobs were finished, the results were ready to be examined by the modeler. Each of the variants was fitted 15 times, which usually results in 15 different local minima. These minima give an indication as to whether or not a variant can be fitted to the measurement data. If the sum of squares is below a certain threshold, the modeler has to evaluate whether, in his or her opinion, the time course fits.


    15 CONCLUSIONS
 TOP
 Abstract
 1 INTRODUCTION
 2 DYNAMIC SAMPLING DATA
 3 DATA DRIVEN MODELING
 4 THE MODELING PROCESS
 5 KINETIC VARIANTS
 6 NETWORK VARIANTS
 7 COMBINATORIAL MODEL GENERATION
 8 EXCLUDING MEANINGLESS MODELS
 9 AUTOMATIC NAVIGATION IN...
 10 SENSITIVITY ANALYSIS
 11 MMT2: A COMPUTATIONAL...
 12 FEATURES OF MMT2
 13 BENCHMARK RESULTS FOR...
 14 BENCHMARK RESULTS FOR...
 15 CONCLUSIONS
 16 OUTLOOK ON MODEL...
 REFERENCES
 
A major part of biochemical network modeling consists of modifying a model. To ease the burden on the modeler, the new concept of model families was proposed in this paper. With this method, a multitude of similar models can be formulated in a single description by using network and kinetic variants to specify sets of similar networks. Since such a description leads to a combinatorial explosion which can include a vast number of biologically meaningless models, the two exclusion mechanisms of constraints and elementary flux modes were provided.

Based on this description, the software tool MMT2 presented in this paper can automatically test all given models and thus relieve the modeler of this tedium. At the end of this automatic procedure, a large set of data has been produced which then must be analyzed by the modeler. Future work will include developing adequate data analysis tools for pre-processing (drawing the metabolic network and exporting it to M3L), post-processing and visualization of experimental data and the produced simulation results. From the computational viewpoint, MMT2 will be enhanced by alternative optimization algorithms for parameter fitting.


    16 OUTLOOK ON MODEL SELECTION
 TOP
 Abstract
 1 INTRODUCTION
 2 DYNAMIC SAMPLING DATA
 3 DATA DRIVEN MODELING
 4 THE MODELING PROCESS
 5 KINETIC VARIANTS
 6 NETWORK VARIANTS
 7 COMBINATORIAL MODEL GENERATION
 8 EXCLUDING MEANINGLESS MODELS
 9 AUTOMATIC NAVIGATION IN...
 10 SENSITIVITY ANALYSIS
 11 MMT2: A COMPUTATIONAL...
 12 FEATURES OF MMT2
 13 BENCHMARK RESULTS FOR...
 14 BENCHMARK RESULTS FOR...
 15 CONCLUSIONS
 16 OUTLOOK ON MODEL...
 REFERENCES
 
For good reasons, concrete algorithms for model search have not been discussed in detail in this contribution. Various concepts for model selection can be found in the literature (Linhart and Zucchini, 1986, Burnham and Anderson, 1998; Wiechert and Takors, 2004). However, it should be noticed that classical model selection procedures were developed for comparably small model families, such as polynomials of different degrees. When they are applied to large families with many parameters, they generally do not produce clear results. An example is the classical Akaike model discrimination criterion (Wiechert and Takors, 2004; Burnham and Anderson, 1998):

(8)
with SSQ the (weighted) sum of squares, n the number of parameters and m the number of data. To have some realistic numbers, assume that n = 100 and m = 1000 for a biological network with rapid sampling data. Then, for a v