Skip Navigation


Bioinformatics Advance Access originally published online on May 14, 2008
Bioinformatics 2008 24(14):1611-1618; doi:10.1093/bioinformatics/btn228
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/14/1611    most recent
btn228v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Centler, F.
Right arrow Articles by Dittrich, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Centler, F.
Right arrow Articles by Dittrich, P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Computing chemical organizations in biological networks

Florian Centler {dagger}, Christoph Kaleta , Pietro Speroni di Fenizio {ddagger} and Peter Dittrich *

Bio Systems Analysis Group, Jena Centre for Bioinformatics (JCB) and Department of Mathematics and Computer Science, Friedrich-Schiller-University Jena, D-07743 Jena, Germany

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THEORY OF CHEMICAL...
 3 ALGORITHM I: CONSTRUCTIVE...
 4 ALGORTIHM II: FLUX-BASED...
 5 ALGORITHM III: HEURISTIC...
 6 IMPLEMENTATION AND PERFORMANCE
 7 APPLICATIONS
 8 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Motivation: Novel techniques are required to analyze computational models of intracellular processes as they increase steadily in size and complexity. The theory of chemical organizations has recently been introduced as such a technique that links the topology of biochemical reaction network models to their dynamical repertoire. The network is decomposed into algebraically closed and self-maintaining subnetworks called organizations. They form a hierarchy representing all feasible system states including all steady states.

Results: We present three algorithms to compute the hierarchy of organizations for network models provided in SBML format. Two of them compute the complete organization hierarchy, while the third one uses heuristics to obtain a subset of all organizations for large models. While the constructive approach computes the hierarchy starting from the smallest organization in a bottom-up fashion, the flux-based approach employs self-maintaining flux distributions to determine organizations. A runtime comparison on 16 different network models of natural systems showed that none of the two exhaustive algorithms is superior in all cases. Studying a ‘genome-scale’ network model with 762 species and 1193 reactions, we demonstrate how the organization hierarchy helps to uncover the model structure and allows to evaluate the model's quality, for example by detecting components and subsystems of the model whose maintenance is not explained by the model.

Availability: All data and a Java implementation that plugs into the Systems Biology Workbench is available from http://www.minet.uni-jena.de/csb/prj/ot/tools.

Contact: dittrich{at}minet.uni-jena.de

Supplementary Information: Supplementary data are available at Bioinformatics online.


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THEORY OF CHEMICAL...
 3 ALGORITHM I: CONSTRUCTIVE...
 4 ALGORTIHM II: FLUX-BASED...
 5 ALGORITHM III: HEURISTIC...
 6 IMPLEMENTATION AND PERFORMANCE
 7 APPLICATIONS
 8 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Biological knowledge of the myriad of processes taking place in living cells is typically represented in network models. Such models detail, how genes regulate each other's activity (de Jong, 2002), how signals detected at the cell surface are propagated within the cell leading to a change in gene expression (Papin et al., 2005), and how external molecules are internalized and interconverted into different metabolites for energy production (Heinrich and Schuster, 1996). As biologists acquire more and more knowledge, these network models not only increase in size, but also first steps are made towards comprehensive whole cell models (Tomita et al., 1999). With classic analysis methods reaching their limits for such networks, the theory of chemical organizations (Dittrich and Speroni di Fenizio, 2007) has been proposed as a novel analysis technique. The main aim is to identify subnetworks that do not create any species outside the subnetwork and that are able to maintain themselves over time. The method cannot only be applied to network models in which the set of species is fixed, but also to constructive dynamical systems in which novel species can appear as the system evolves in time. Signaling systems are an example for such systems as the combinatorial complexity of molecular interactions leads to a vast number of potential species (Blinov et al., 2004). The method has been applied to study a photochemical model of the Martian atmosphere (Centler and Dittrich, 2007), models of HIV infection (Matsumaru et al., 2006a), and a model of Escherichia coli (Centler et al., 2007). Furthermore, the evolution of an artificial chemistry was investigated (Matsumaru et al., 2006b) and chemical computing systems have been constructed with the help of organization theory (Matsumaru et al., 2007). In this article, we lay the algorithmic foundation for computing chemical organizations.

Another topology-based method to study metabolic networks is the analysis using elementary modes (Schuster et al., 1999). In contrast to organization theory, only steady states are considered in this approach. However, both methods are closely related and complement each other (Kaleta et al., 2006). Elementary modes are minimal pathways that can operate at steady state. A biologically feasible system state usually comprises a combination of several elementary modes. As organizations embody such feasible system states, elementary modes can be grouped according to the organizations within they are embedded. This allows for the identification of elementary modes that act in concert, possibly providing a common functionality. Furthermore, this classification eases the analysis of the usually vast number of elementary modes (Kaleta et al., 2006).


    2 THEORY OF CHEMICAL ORGANIZATIONS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THEORY OF CHEMICAL...
 3 ALGORITHM I: CONSTRUCTIVE...
 4 ALGORTIHM II: FLUX-BASED...
 5 ALGORITHM III: HEURISTIC...
 6 IMPLEMENTATION AND PERFORMANCE
 7 APPLICATIONS
 8 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Extending ideas by Fontana and Buss (1994), the theory of chemical organizations (Dittrich and Speroni di Fenizio, 2007) provides a new method to analyze complex reaction networks. A reaction network <M, R> is defined by a set of reactions R{subseteq}PM x PM(M)occurring among molecular species M, with PM(M) denoting the set of all multisets with elements from M. <M, R> implies an m x n stoichiometric matrix S=(si,j) where si,j is the number of molecules of species iisinM that is produced in reaction jisinR(i.e. right hand side minus left hand side); m=|M|, n=|R|.

At its core, chemical organization theory defines an organization O{subseteq}M as a set of molecular species that is (algebraically) closed and self-maintaining. Closure ensures that given the species of an organization, there is no reaction within the reaction network that can create a novel species not yet present in the organization. Formally, O{subseteq}M is closed, if for all reactions (A->B)isinR with AisinPM(O), then also BisinPM(O).

Self-maintenance guarantees that all molecular species that are consumed within the organization O can be recreated simultaneously from organization species at a sufficient rate for their maintenance. A set of species O{subseteq}M is called self-maintaining if a flux vector v=(v1, v2,...,vn)isinRFormula exists such that the following three conditions are fulfilled: (1) For every reaction (A->B)isinR with AisinPM(O), its corresponding flux is vA->B>0. (2) For every reaction (A->B)isinR with A{notin}PM(O), its corresponding flux is vA->B=0. (3) For every species iisinO, its concentration change is non-negative: (Sv)i≥0. In other words: if we consider only the subnetwork made up by the species of O [conditions (1) and (2)], we can find a positive flux vector, such that no species of O decays [condition (3)].

The set of organizations, together with the set inclusion {subseteq}, forms a partially ordered set that can be visualized as a hierarchical structure in a Hasse diagram (see Fig. 3 for an example). Linking to the dynamics, (Dittrich and Speroni di Fenizio, 2007) have shown that all steady states of the system are instances of organizations, given that the dynamics of the system is described by a set of ordinary differential equations.

Besides these fundamental concepts, the following definitions proved helpful. A set of species S{subseteq}M is called semi-self-maintaining, if all species sisinS that are consumed within the reaction network implied by S are also produced within the same network. This reaction network implied by S consists of all reactions (A->B)isinR with AisinPM(S). A species sisinS is consumed (produced) if there exists a reaction (A->B)isinR with AisinPM(S) for which s is contained more times in A than in B (more times in B than in A). Note that each self-maintaining set is semi-self-maintaining (Dittrich and Speroni di Fenizio, 2007). A set of species S{subseteq}M that is closed and semi-self-maintaining is called a semi-organization. A (semi-) organization O{subseteq}M is a connected (semi-) organization}, if it is empty, or there exists a species sisinO so that all species of O are connected to s. Two species si and sjisinO are connected to each other, if there exists a sequence of n species s1, ..., sn with skisinO for k=1,...,n, such that si and s1, sk and sk+1 for k=1,...,n–1, and sn and sj are directly connected. Two species so and spisinO are directly connected, if there exists a reaction (A->B)isinR with AisinPM(O), soisinA{cup}B, and spisinA{cup}B. All input species of the network (i.e.) all species that appear as products in reactions with empty reactant side) are defined as being connected to each other. Given a set of organizations O, an organization OisinO with its corresponding set of reactions R(O):= (A->B)isinR|AisinPM(O) and BisinPM(O) is an elementary organization, if there exist no subset of organizations P{subseteq}O\O such that R(O)={cup}PisinPR(P). In other words, when viewing organizations as sets of reactions, an elementary organization cannot be represented as the union of other organizations. An organization O is called a reactive organization if all species it contains are educt or product in at least one reaction of the set of reactions R(O) that occur within O. Note that all connected organizations with more than one species are also reactive organizations. The opposite is not true, as a reactive organization may contain several connected subnetworks that are disparate.


    3 ALGORITHM I: CONSTRUCTIVE APPROACH
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THEORY OF CHEMICAL...
 3 ALGORITHM I: CONSTRUCTIVE...
 4 ALGORTIHM II: FLUX-BASED...
 5 ALGORITHM III: HEURISTIC...
 6 IMPLEMENTATION AND PERFORMANCE
 7 APPLICATIONS
 8 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
To compute the organizations for a given reaction network, one could simply test all species combinations for the properties of closure and self-maintenance in a brute force fashion. However, this approach is only feasible for networks with few species (i.e. less than 30 species) as the number of sets to test equals 2n, with n being the number of network species. The first more elaborate algorithm to compute organizations determines in a first step all semi-organizations of the reaction network. This is done in a recursive manner: given an already determined semi-organization so, the semi-organizations containing so are computed in the next step. To find a semi-organization above so, the network structure is taken into account to select species that, when added to so, are likely to give rise to a larger semi-organization. In this constructive fashion, the hierarchy of semi-organizations is computed from bottom-up. As all organizations are also semi-organizations, the first step delivers all candidate species sets for organizations. Then, in the second step, all semi-organizations have to be identified that are also organizations. The property of self-maintenance is the property distinguishing both organization types. All semi-organizations, for which it can be shown that a flux vector in accordance with the self-maintenance condition exists, are also organizations. This requires to solve a linear programming problem for each semi-organization.

3.1 Step 1: computing semi-organizations
The algorithm starts with the smallest semi-organization and creates the whole hierarchy of semi-organizations from the bottom-up.

3.1.1 The main loop.
The main function computeSemiOrga-nizations() is depicted as pseudo code in Figure 1. The set SOsToCheck contains all semi-organizations that have already been found and that still have to be processed. It is initialized with the smallest semi-organization computed by closure({}). The function closure(set) computes the smallest closed set that contains the species set set. This is done by iteratively adding all species to set that, according to the reaction rules, can be produced from the species in set. Each addition might enable new reactions. The iteration stops as soon as no new novel species can be created. Taking the closure of the empty set delivers the smallest semi-organization of the network. If the network contains no input species, it is the empty set. If input species are present, the smallest semi-organization contains all input species and their closure.


Figure 1
View larger version (38K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. The constructive approach to compute semi-organizations.

 
The central function of the main loop is SOsDirectlyAbove (so) (see Fig. 1 for pseudo code). It delivers the set of all semi-organizations that are directly above semi-organization so, that means all semi-organizations that contain so and that do not contain any other semi-organization that contains so1. In the Hasse diagram, these are exactly those larger semi-organizations to which so has a connection. In each iteration of the main loop, the smallest semi-organization2 current is taken from SOsToCheck. All semi-organizations returned by SOsDirectlyAbove(current) are then added to SOsToCheck, and current is removed from the set. The iteration stops when no semi-organization is left in SOsToCheck. In order to avoid processing the same set of species twice, a hash structure can be used to keep track of already processed sets (not shown in pseudo code).

3.1.2 Functions for computing semi-organizations.
In order to find a larger semi-organization that contains semi-organization so, it is checked in function SOsDirectlyAbove(so) for each species not in so, whether this species is part of such a larger semi-organization.

The function SOsDirectlyAboveContaining(so speciesSet) computes all semi-organizations that are directly above so and contain the species in speciesSet (see Fig. 1 for pseudo code). First, the closure of the union of so and speciesSet is computed. If the computed closure is semi-self-maintaining, a semi-organization with the desired properties is found and the function returns. If not, those species in the closure are identified that are consumed but not produced. In order to become a semi-organization, these species must be produced somehow. The function producerSets(speciesSet) (see Supplementary Material for pseudo code) returns all possible species combinations that can produce all species in speciesSet. For each such combination, SOsDirectlyAboveContaining() is recursively called again. This time, the producer combination is additionally required to be present in the new semi-organizations.

3.2 Step 2: test for self-maintenance
Every semi-organization determined in the first step of the algorithm is a candidate for an organization. For a semi-organization to also be an organization, it must be shown that a flux vector exists in accordance with the self-maintenance condition. Let O be a semi-organization with m'=|O| species implying n' reactions (i.e. all reactants of n' reactions are contained in O). With SO being the stoichiometric matrix of this subnetwork, we must determine the existence of a flux vector visinRFormula with v>0 and SO · v ≥ 0. With the solution space of these inequalities forming a convex polyhedral cone in the positive orthant, originating in the point of origin, the problem is equivalent to finding a flux vector v with v>1 and SO·v ≥ 0. This is a linear programming problem that can be solved using the simplex method (Dantzig, 1963). Since only the existence of such a flux vector v is of concern, only the first phase of the simplex method needs to be performed.

3.3 Connected semi-organizations
A simple combinatorial explosion can lead to a vast number of chemical organizations. Consider the reaction network <{s1,...,sn},{}>. The network does not contain any reaction. Consequently, all 2n species combinations are closed and self-maintaining, and hence organizations. However, the organizations contain only isolated network species that do not react with each other. In order to avoid this combinatorial complexity, it is adequate to only consider connected organizations (cf. Section 2). In such organizations, all contained species form a single subnetwork in which all species are connected to each other via reactions. In order to compute connected semi-organizations, only the function SOsDirectlyAbove() needs to be modified. Now, only those species are added to an already discovered semi-organization that are directly connected to a member species of that semi-organization. The Supplementary Material contains the pseudo code. The hierarchy of connected organizations can also be used to compute the full organization hierarchy as detailed in the Supplementary Material.


    4 ALGORTIHM II: FLUX-BASED APPROACH
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THEORY OF CHEMICAL...
 3 ALGORITHM I: CONSTRUCTIVE...
 4 ALGORTIHM II: FLUX-BASED...
 5 ALGORITHM III: HEURISTIC...
 6 IMPLEMENTATION AND PERFORMANCE
 7 APPLICATIONS
 8 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
The second algorithm takes a somehow opposite strategy as Algorithm I by combining self-maintaining flux distributions until closed sets are discovered. While the first approach operates on species, the flux-based approach operates on reactions. First, the flux based approach computes elementary organizations (Step 1 and 2), which are combined in Step 3 to obtain reactive organizations, that is, organizations where each species participates at least in one reaction within the organization. Organizations containing species that do not react are determined in the final Step 4.

Starting with the condition of self-maintenance, methods from convex analysis can be employed. Given a reaction network <M, R> and its m x n stoichiometric matrix S, a flux vector visinRFormula fulfilling the self-maintenance condition must be found to show that a species set is self-maintaining. All such flux vectors lie in a convex polyhedral cone P in the n-dimensional flux space RFormula, originating in the point of origin. The cone is defined by the n+m inequalities v≥0 and S·v≥0. These constraints can be transformed into a matrix A representing the set of spanning vectors VB or extreme rays of P (Gagneur and Klamt, 2004). Each point within P can be written as a linear combination of these extreme rays. Thus, we can compute organizations by searching for combinations of extreme rays whose corresponding set of species fulfills the closure condition.

Given a flux vector v, Algorithm II relies only on the set of reactions that have positive fluxes in v and not on their specific flux values. Hence, we define vset as the set of reaction indices containing all reactions that have positive fluxes in v. Considering the set of extreme rays VB defining P, VFormula describes the set of reaction sets vFormula corresponding to the extreme rays vBisinVB. The species set that corresponds to a reaction set vset is denoted by M(vset). It contains all reactants and products of the reactions contained in vset.

Step 1: computing extreme rays—To compute the set of extreme rays VB required for Step 2, we implemented the well-known Schuster algorithm (Schuster et al., 1999) to compute elementary modes. This algorithm computes the extreme rays of a convex cone P' defined by v≥0 and S · v=0. By adding an outflux for each metabolite to the stoichiometric matrix used for the elementary mode computation, the algorithm can also compute the extreme rays of the cone P.

Step 2: computing elementary organizations—The central function in the computation of elementary organizations is organizationsAbove() (see Fig. 2 for its pseudo code). Given a self-maintenance flux vector visinP and its corresponding reaction set vset, it computes all reactive organizations O that contain M(vset) and for which there exists no other organization being a subset of O and a superset of M(vset).3 Hence, the smallest organizations containing M(vset) are computed.


Figure 2
View larger version (40K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. The central recursive function of the flux-based approach to compute organizations.

 
First, the closure of the reaction set vset, respectively M(vset), is computed. This is done by taking the species setM(vset) and iteratively adding all species to the set that can be created by reactions of the network from the species set. The reaction set vFormula contains all reactions that can take place in the generated closed set of species. If this reaction set is identical to vset, the species set M(vset) is an organization. The reaction set vFormula contains more reactions than vset if either species were added, or M(vset) is closed but vset does not contain all reactions that are possible in this set. One such reaction is taken, and all reaction sets vFormulaisinVFormula that contain this reaction are consecutively combined with the original reaction set vset and the function is called again recursively. As the initial reaction set vset and the extreme ray reaction sets vFormula correspond to flux vectors fulfilling the self-maintenance condition, also a flux vector vu fulfilling the self-maintenance property exists for the union vFormula=vset{cup}vFormula. Hence, all reaction sets that are considered in the recursive function calls are associated with self-maintaining flux vectors. To obtain all elementary organizations, the function organizationsAbove() is called for each reaction set vFormulaisinVFormula corresponding to one of the extreme rays defining P.

Step 3: computing reactive organizations—Elementary organ-izations are combined to determine all reactive organizations. This is done by taking all possible combinations of two elementary organizations and calling organizationsAbove() for the union of their reaction sets. For every newly discovered organization, this organization must again be combined with each of the elementary organizations, and organizationsAbove() must be called again for the reaction set unions.

Step 4: computing all organizations—The organizations we have obtained so far all possess a different set of reactions. Consequently, the final step consists of searching for organizations having the same set of reactions as already discovered ones, but containing different species sets. Hence, we need to determine for all discovered organizations all species sets, that can be added to the organization without changing its set of reactions. This can be done by simply inspecting the reaction list.


    5 ALGORITHM III: HEURISTIC APPROACH
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THEORY OF CHEMICAL...
 3 ALGORITHM I: CONSTRUCTIVE...
 4 ALGORTIHM II: FLUX-BASED...
 5 ALGORITHM III: HEURISTIC...
 6 IMPLEMENTATION AND PERFORMANCE
 7 APPLICATIONS
 8 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
The flux-based approach requires the computation of the extreme rays of the cone P as input. However, instead of starting with the set of all extreme rays VB, it is feasible to skip the first step of the algorithm by directly starting with a set of elementary organizations. To obtain such a set, a simple heuristic approach is used. A random walk through the reaction network delivers a set of species. After computing the closure of this species set, it is tested whether the closure also fulfills the self-maintenance condition (see Section 3.2). After having determined a sufficient large set of organizations, the associated set of elementary organizations Oel is determined. The reaction sets of the organizations in Oel are then used as input to the second step of the extreme ray algorithm to compute the complete set of organizations that can be found by combining organizations from Oel. The heuristic approach was able to correctly determine the whole set of organizations for all tested networks to which we could also apply the regular flux-based method.


    6 IMPLEMENTATION AND PERFORMANCE
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THEORY OF CHEMICAL...
 3 ALGORITHM I: CONSTRUCTIVE...
 4 ALGORTIHM II: FLUX-BASED...
 5 ALGORITHM III: HEURISTIC...
 6 IMPLEMENTATION AND PERFORMANCE
 7 APPLICATIONS
 8 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
The algorithms have been implemented in Java as a command line tool and as an application that integrates into the Systems Biology Workbench (Sauro et al., 2003). Networks can be provided in SBML format (Finney and Hucka, 2003), or in a human friendlier proprietary format. SBML files are read using a Java-based SBML parser from the JigCell project (Vass et al., 2006). The computed hierarchy of organizations is stored in XML format (OrgML) and can be visualized and interactively explored using the accompanying Java application OrgView. To solve the linear programming problem for checking for self-maintenance in the constructive approach, we use lpsolve (Berkelaar et al., 2005). In order to study the runtime behavior of the deterministic Algorithms I and II, we compute the organizations for 16 different network models of natural systems including a series of 10 networks that increase in size in an iterative fashion created using BioNetGen (Blinov et al., 2004).

The results are shown in Table 1. The complete hierarchy of connected organizations using Algorithm I could be determined for all networks. After 24 h of computing time, Algorithm I did not terminate for the Mars nightside network and networks EGFR 5–9. Algorithm II did not terminate for the Mars night- and dayside networks and networks EGFR 5–10. Generally, computing the hierarchy of connected organizations is much faster than determining the complete hierarchy of organizations. Comparing the runtime of Algorithms I and II shows that no method is superior in all cases. Algorithm II shows weak performance when the number of extreme rays is large, as in the full martian photochemistry models, which contain a large number of cycles. When the extreme rays can be computed quickly, Algorithm II tends to be faster (e.g. EGFR 4). Although there is a tendency that larger networks require longer computation time, the size of the network alone is only a limited indicator for the (expected) runtime. Adding a single reaction can change the number of organizations and the runtime dramatically (e.g. Mars, nightside versus Mars, dayside, Table 1). Additionally, larger networks not necessarily contain more organizations. For example, the largest network EGFR 10 has only 32 organizations, while EGFR 4, a subnetwork of the former, contains 12 288 organizations. However, computing these many organizations using Algorithm I is almost twice as fast as computing the mere 32 organizations of the larger network.


View this table:
[in this window]
[in a new window]

 
Table 1. Number of regular, connected and elementary organizations and extreme rays in network models, and required runtime of the constructive (Alg. I) and flux-based (Alg. II) approach

 


Figure 3
View larger version (36K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. Hasse diagram of organizations of the genome-scale model of E.coli metabolism for the four scenarios as listed in Table 2. Organizations are vertically arranged according to size, with the smallest organization at the bottom (31 species in Scenarios 1, 2 and 4, and 34 species in Scenario 3) and the largest at the top (567species in Scenario 1 and 547 species in Scenarios 3 and 4). A link between two organizations indicates that the upper organization contains the lower and there is no other organization between them. For the first scenario, only the elementary organizations are shown.

 


View this table:
[in this window]
[in a new window]

 
Table 2. Organizations of the genome-scale model of E.coli metabolism in four scenarios differing in input and decay reactions

 
6.1 Performance analysis
The parameters network size, number of organizations, and required runtime are only weakly correlated. The main factor determining both the number of organizations and runtime is not the network size, but the specific network topology. For any given number of components it is possible to create a reaction network that either does not contain any semi-organization or in which any species combination forms a semi-organization. This is true even for connected semi-organizations, as detailed in the Supplementary Material. One key element of the constructive algorithm is to find larger semi-organizations by recursivly adding species to semi-organizations that already have been computed. Instead of trying to add species in all possible combinations in brute-force manner, this process is guided by the network topology. For a given species set containing species that are not yet produced from the set, such species are added that enable reactions that lead to the production of the former species, in order to create a semi-self-maintaining species set. If, for example in an extreme case, the network only consists of decay reactions for all species, the algorithm discovers in the first recursion step that no species can be added to the empty species set that would enable a reaction that creates other species. In this case, the algorithm quickly terminates, independent of network size, returning the empty set as the only semi-organization.

While the number of extreme rays usually explodes with increasing network size (Gagneur and Klamt, 2004), this is not necessarily the case for reactive connected organizations (cf. Table 1). As pathways consist of combinations of adjacent reactions, the addition of a reaction to the network increases the number of potential pathways through the system. For organizations however, the addition of a reaction can even reduce their number. This can be exemplified for the number of closed sets of a system. If the network consists of species without any reactions, every species set is a closed set. Adding a reaction reduces the number of closed sets as the set of reactants is no longer closed. Accordingly, the number of organization candidates is reduced.

6.2 Runtime complexity
Organization computation is NP-hard. This can be shown by analyzing a specific subproblem, the question whether a reaction network contains a reactive organization apart from the empty set or not. This problem is NP-complete and thus we conclude that organization computation is NP-hard. The proof is detailed in the Supplementary Material. It is based on a reduction to the 3-SAT problem. A formula F in 3-SAT can be translated into a reaction network for which the existence of a non-empty reactive organization implies the satisfiability of F. Vice versa, if there is no solution such that F evaluates to true, the network does not contain a non-empty reactive organization.


    7 APPLICATIONS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THEORY OF CHEMICAL...
 3 ALGORITHM I: CONSTRUCTIVE...
 4 ALGORTIHM II: FLUX-BASED...
 5 ALGORITHM III: HEURISTIC...
 6 IMPLEMENTATION AND PERFORMANCE
 7 APPLICATIONS
 8 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
To demonstrate the use of organization theory to analyze biochemical reaction network models, we apply the method to a genome-scale metabolic network of E.coli by Reed et al. (2003). After splitting reversible reactions into explicit forward and backward reactions, the reaction network contains 762 species and 1193 reactions. As the complete hierarchy of organizations could neither be determined by Algorithm I nor Algorithm II, the heuristic approach was used and only reactive organizations are considered. Four different scenarios which differ in input species and decay reactions as shown in Table 2 are studied. The resulting organization hierarchies are depicted in Figure 3.

In the first scenario, exactly those species are defined as input that are also input in the original network by Reed et al., (2003). A complex growth medium is modeled (see Table 2 for a list of input species). The heuristic approach computes 70 elementary organizations after 48 h of computing time, creating an organization hierarchy of 88 296 organizations. No organization contains the complete network. However, the three largest organizations contain the pseudo species Biomass, indicating states in which the cell is able to produce biomass required for its survival.

In the second scenario, a decay reaction is added for all network species. By this, all species are assumed to decay spontaneously, simulating flow conditions. To be part of an organization, a species must now be produced at a positive rate from the input species set. While in the first scenario organizations exist which only consist of closed reaction loops, for example a reversible reaction in which one species reacts to another and back, such cycles cannot form organizations in the second scenario. For this scenario, the heuristic approach only identified one organization. It contains the input species and all species that can be created from the input. It is the closure of the input species, that means it is the smallest closed species set that contains the input species.

To explain this surprising finding, we analyzed in the first scenario, how organizations are created by the set of organizations to which it has downlinks. Several mechanisms can form the larger organizations. In the simplest case, the union of the downlink organizations already represents the larger organization. In other cases, interactions between species of downlink organizations create new species that are part of the larger organization. And finally, in some cases it is necessary to add novel species to create the larger organization. The closure of the union of downlink organization species then form the larger organization. We call the smallest such sets of additional species creator sets. Among the creator sets of the first scenario are 16 non-metabolic network species representing the acyl carrier protein in several states, tRNA, and thioredoxin, as listed in Table 2. Being a metabolic network model, the model does not contain the de novo biosynthesis pathways of these species. They cannot be created from the network.

In the third scenario, we add input reactions for the three non-metabolic species acyl carrier protein, thioredoxin and tRNA–Glu. This ensures that these species are unconditionally available to the system. In this setting, the network contains two organizations. The smallest organization again is the closure of the input species. The second organization contains 547 species including pseudo species Biomass, indicating bacterial growth. Adding the currency metabolites ATP, NAD, coA, mq8 and proteins thioredoxin and ACP, and tRNA–Glu to the small organization is sufficient to create the larger organization.

In the fourth scenario, the unconditional availability of the non-metabolic species is relaxed. Instead of providing them as input, we remove their decay reactions. In this case, we obtain four organizations. The smallest organization is again the closure of the input species. The organization next in size, containing 487 species, is created by adding the currency metabolites ATP, NAD, coA, mq8 and protein thioredoxin. Adding ACP to this organization gives rise to the second largest organization containing 532 species, including pseudo species Biomass. Adding tRNA–Glu finally creates the largest organization encompassing 547 species. This is identical to the largest organization in the third scenario.

Analyzing the four different scenarios reveals the importance of the non-metabolic species. Although being essential for biomass production, their synthesis is not included in the reaction network model. The largest metabolite sets forming self-maintaining networks did not contain all species of the network model. This hints to parts of the network model that need refinement in order to become a truly genome-scale network model of E.coli metabolism.

Taking an evolutionary perspective, one could ask what is the smallest set of species that can create the whole metabolism as represented by the largest organization from the input species and their direct derivatives as represented by the smallest organization. We find that such a creator set contains the non-metabolic species and four (non-unique) more species, for example ATP, NAD, coA and mq8. For growth on a complex medium that provides 16 food species (see Table 2), these creator species are enough to create all 547 species present in the largest organization. When assuming that the metabolism has evolved from a small set of metabolite transforming enzymes to a more complex network, the creator sets can give hints on species that were crucial in the expansion process of the metabolism.

If the currency metabolites and the non-metabolic species are assumed to be unconditionally available, the network only contains one organization encompassing the 547 species of the largest organizations in Scenarios 3 and 4. The lack of a richer organizational structure indicates that the metabolism acts as an indivisible unit. The organization hierarchy of four organizations in Scenario 4 is a mere artifact of the incompleteness of the network model, as it does not contain the de novo biosynthesis pathways for all of its components.

A total of 215 species do not appear in any organization in Scenarios 3 and 4 (see Supplementary Material for a species list). Among them are the 28 substrate deadend species, which only appear as reactants but never as products in reactions. Such deadends indicate incomplete knowledge regarding the metabolism. Besides the obvious incompleteness of the model indicated by the presence of substrate and product deadend metabolites, organization theory provides a second measure of incompleteness. For species that do not appear in any organization, there exists no feasible state in which these species can be maintained, or produced at positive rates under the given growth conditions. Hence, the reaction network model does not explain the synthesis of such species.


    8 CONCLUSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THEORY OF CHEMICAL...
 3 ALGORITHM I: CONSTRUCTIVE...
 4 ALGORTIHM II: FLUX-BASED...
 5 ALGORITHM III: HEURISTIC...
 6 IMPLEMENTATION AND PERFORMANCE
 7 APPLICATIONS
 8 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
We have presented three algorithms to compute the hierarchy of organizations for a given reaction network. Two of them, the constructive and the flux-based approach, compute the hierarchy of organizations in an exhaustive manner. However, due to the exponential nature of the problem as a network can be constructed in which any species subset is an organization, these algorithms not always finish in reasonable time. In such cases, the heuristic approach can be used to compute at least a significant subset of all organizations.

While the constructive approach computes chemical organizations in a bottom-up manner starting from the smallest organization, the flux-based approach combines extreme rays of the self-maintenance cone to find organizations.

Testing both algorithms on several network models shows that neither of them is superior in all cases. Not depending on extreme rays, the constructive approach has advantages in networks where the number of extreme rays is very large (e.g. the mars model). When the network contains many species but few extreme rays, the flux-based approach is favorable. Since the number of extreme rays usually grows exponentially with network size, this case might proof rare. However, if a network model is to be analyzed, it cannot be guaranteed beforehand, which algorithm will be more appropriate. If the constructive algorithm is chosen, it is sufficient for many applications to only compute the hierarchy of connected organizations which is usually much faster then computing all organizations.

If both the constructive and the flux-based algorithm fail to compute the organization hierarchy in reasonable time, the proposed heuristic algorithm, which is based on the flux-based approach, can be applied. On current desktop computers, the algorithms can be applied to networks having a species number in the order of 100 species for the deterministic approaches and in the order of 1000 species for the heuristic approach.

Testing the algorithms on a set of example network models demonstrates that neither the runtime nor the number of organizations is strictly correlated with the network size. Both are more dependend on the network topology which is difficult to parameterize.

A s an example application of chemical organization theory, we analyze a genome-scale metabolic model of E.coli. The analysis unveiled four important aspects of the model. First, the largest organizations in the four analyzed scenarios represent the largest subnetworks that can operate in biologically feasible states, that means steady states and states with increasing metabolite concentrations. However, these subnetworks do not encompass the whole network model. Several species did not appear in any organization. They can serve as starting points in the refinement of the metabolic model to become truly ‘genomic-scale’.

Second, three non-metabolic species were identified that proofed essential for the metabolic network: acyl carrier protein, thioredoxin and tRNA–Glu. However, their de novo biosynthesis is not accounted for in the model.

Third, only a small set of species is required in addition to the input species to create the metabolic network. In Scenario 3, only the currency metabolites ATP, NAD, Coenzyme A and Ubiquinone 8 need to be added to the smallest organization to create the largest organization in which biomass can be produced. This metabolites are central to the model and might shed light onto the evolution of the metabolism. And fourth, if the non-metabolic and currency metabolites are assumed to be available at all times, the metabolic network model only contains one large organization. The metabolism appears as a indivisible unit that operates as a whole.

Thus, chemical organization theory can on one hand be used to identify parts in the network model that need refinement, and on the other hand it helps to explain model structure and model behavior. In this way organization theory serves as a valuable tool for assessing the quality of whole scale models. By focusing on biologically feasible system states including steady states and states featuring accumulating species, it complements classical flux-based approaches, which focus on steady state flux distributions in metabolic networks.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THEORY OF CHEMICAL...
 3 ALGORITHM I: CONSTRUCTIVE...
 4 ALGORTIHM II: FLUX-BASED...
 5 ALGORITHM III: HEURISTIC...
 6 IMPLEMENTATION AND PERFORMANCE
 7 APPLICATIONS
 8 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
We are grateful to Axel von Kamp and Stefan Schuster for helpful discussions on extreme rays and elementary modes. We thank two anonymous reviewers for helpful comments.

Funding: We acknowledge financial support by the BMBF (0312704A), DFG (Di852/4-1/2), and EU (12789).

Conflict of Interest: none declared.


    FOOTNOTES
 
{dagger}Present address: Department of Environmental Microbiology, UFZ - Centre for Environmental Research Leipzig-Halle, Germany. Back

{ddagger}Present address: Research Institute in Networks & Communications Engineering (RINCE), Dublin City University, Dublin 9, Ireland. Back

Associate Editor: Thomas Lengauer

1More precisely, at least those semi-organizations are delivered. Under certain circumstances, additional semi-organizations are returned that are not directly above semi-organization so but rather contain another semi-organization directly above so. Consider for example the reaction network <{a, b}, {a->a+b}>. The system contains three organizations: {a, b} above {b} above {}. Applied to the empty set, the function SOsDirectlyAbove({}) returns both {b} and {a, b} here, since function SOsDirectlyAboveContaining() first creates the closure of its argument (see Section 3.1.2). For argument {a}, the closure is already {a, b} and hence this semi-organization is returned although not being directly above the semi-organization {}. Back

2If the smallest semi-organization is not unique, the choice is random. Back

3More precisely, at least those organizations are computed. Under certain circumstances, organizations are also in the result set for which a subset Os is also an organization and contains M(vset). However, in such cases Os is also contained in the resulting set of organizations. Back

Received on January 9, 2008; revised on March 6, 2008; accepted on May 7, 2008

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THEORY OF CHEMICAL...
 3 ALGORITHM I: CONSTRUCTIVE...
 4 ALGORTIHM II: FLUX-BASED...
 5 ALGORITHM III: HEURISTIC...
 6 IMPLEMENTATION AND PERFORMANCE
 7 APPLICATIONS
 8 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Berkelaar M, et al. lp solve: Open source (mixed-integer) linear programming system, version 5.5. (2005).

    Blinov ML, et al. BioNetGen: software for rule-based modeling of signal transduction based on the interactions of molecular domains. Bioinformatics (2004) 20:3289–3291.[Abstract/Free Full Text]

    Blinov ML, et al. A network model of early events in epidermal growth factor receptor signaling that accounts for combinatorial complexity. Biosystems (2006) 83:136–151.[CrossRef][Web of Science][Medline]

    Centler F, Dittrich P. Chemical organizations in atmospheric photochemistries - a new method to analyze chemical reaction networks. Planet Space Sci (2007) 55:413–428.[CrossRef]

    Centler F, et al. Chemical organizations in the central sugar metabolism of Escherichia coli. Mathematical Modeling of Biological Systems—Deutsch A, et al, eds. (2007) Vol. I. Boston: Birkhäuser. 109–123.

    Dantzig GB. Linear Programming and Extensions (1963) Princeton: Princeton University Press.

    de Jong H. Modeling and simulation of genetic regulatory systems: a literature review. J. Comput. Biol (2002) 9:67–103.[CrossRef][Web of Science][Medline]

    Dittrich P, Speroni di Fenizio P. Chemical organization theory. Bull. Math. Biol (2007) 69:1199–1231.[CrossRef][Web of Science][Medline]

    Doi A. A hfpn model of bacteriophage lambda. (2005) Available at http://www.genomicobject.net/member3/GONET/lambda.html.(last accessed date February 8, 2005).

    Finney A, Hucka M. Systems biology markup language: level 2 and beyond. Biochem. Soc. Trans (2003) 31:1472–1473.[Web of Science][Medline]

    Fontana W, Buss LW. ‘The arrival of the fittest’: towards a theory of biological organization. Bull. Math. Biol (1994) 56:1–64.[Web of Science]

    Gagneur J, Klamt S. Computation of elementary modes: a unifying framework an the new binary approach. BMC Bioinformatics (2004) 5:175.[CrossRef][Medline]

    Heinrich R, Schuster S. The Regulation of Cellular Systems (1996) New York: Chapman and Hall.

    Kaleta C, et al. Analyzing molecular reaction networks: From pathways to chemical organizations. Mol. Biotechnol (2006) 34:117–124.[CrossRef][Web of Science][Medline]

    Matsumaru N, et al. Chemical organization theory applied to virus dynamics. IT - Inf. Technol (2006a) 48:154–160.[CrossRef]

    Matsumaru N, et al. On the evolution of chemical organizations. In: Proceedings of the 7th German Workshop on Artificial Life—Artmann S, Dittrich P, eds. (2006b) Berlin: Aka. 135–146.

    Matsumaru N, et al. Chemical organization theory as a theoretical base for chemical computing. Int. J. Unconv. Comput (2007) 3:285–309.

    Papin JA, et al. Reconstruction of cellular signalling networks and analysis of their properties. Nat. Rev. Mol. Cell Biol (2005) 6:99–111.[CrossRef][Web of Science][Medline]

    Reed JL, et al. An expanded genome-scale model of Escherichia coli k-12 (ijr904 gsm/gpr). Genome Biol (2003) 4:R54.[CrossRef][Medline]

    Sauro HM, et al. Next generation simulation tools: the systems biology workbench and biospice integration. OMICS (2003) 7:355–372.[CrossRef][Medline]

    Schuster S, et al. Detection of elementary flux modes in biochemical networks: a promising tool for pathway analysis and metabolic engineering. Trends Biotechnol (1999) 17:53–60.[CrossRef][Web of Science][Medline]

    Tomita M, et al. E-cell: software environment for whole-cell simulation. Bioinformatics (1999) 15:72–84.[Abstract/Free Full Text]

    Vass MT, et al. The jigcell model builder: a spreadsheet interface for creating biochemical reaction network models. IEEE/ACM Trans. Comput. Biol. Bioinform (2006) 03:155–164.[CrossRef]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
C. Kaleta, S. Richter, and P. Dittrich
Using chemical organization theory for model checking
Bioinformatics, August 1, 2009; 25(15): 1915 - 1922.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/14/1611    most recent
btn228v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Centler, F.
Right arrow Articles by Dittrich, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Centler, F.
Right arrow Articles by Dittrich, P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?