Bioinformatics Advance Access originally published online on September 5, 2008
Bioinformatics 2008 24(21):2554-2556; doi:10.1093/bioinformatics/btn471
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
DESHARKY: automatic design of metabolic pathways for optimal cell growth
1Instituto de Biologia Molecular y Celular de Plantas, CSIC, 2Instituto de Aplicaciones en Tecnologias de la Informacion y las Comunicaciones Avanzadas (ITACA), Universidad Politecnica de Valencia, Camino de Vera s/n, 46022 Valencia, Spain, 3Department of Chemical Engineering, Massachusetts Institute of Technology, Massachusetts Avenue 77, Cambridge MA 02139, USA, 4Laboratoire de Biochimie, Ecole Polytechnique - CNRS, Route de Saclay, 91128 Palaiseau Cedex and 5Epigenomics Project, Genopole, 523 Terrasses de l'Agora, 91034 Evry Cedex, France
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: The biological solution for synthesis or remediation of organic compounds using living organisms, particularly bacteria and yeast, has been promoted because of the cost reduction with respect to the non-living chemical approach. In that way, computational frameworks can profit from the previous knowledge stored in large databases of compounds, enzymes and reactions. In addition, the cell behavior can be studied by modeling the cellular context.
Results: We have implemented a Monte Carlo algorithm (DESHARKY) that finds a metabolic pathway from a target compound by exploring a database of enzymatic reactions. DESHARKY outputs a biochemical route to the host metabolism together with its impact in the cellular context by using mathematical models of the cell resources and metabolism. Furthermore, we provide the sequence of amino acids for the enzymes involved in the route closest phylogenetically to the considered organism. We provide examples of designed metabolic pathways with their genetic load characterizations. Here, we have used Escherichia coli as host organism. In addition, our bioinformatic tool can be applied for biodegradation or biosynthesis and its performance scales with the database size.
Availability: Software, a tutorial and examples are freely available and open source at http://soft.synth-bio.org/desharky.html
Contact: alfonso.jaramillo{at}polytechnique.fr
Supplementary information: Supplementary data are available at Bioinformatics online.
| 1 INTRODUCTION |
|---|
|
|
|---|
Biotechnology process development is frequently equated with the production of biologics, such as proteins and viral vaccines (Nielsen, 2001). Yet the use of biological systems for the production of small molecules goes back thousands of years and has been increasing since the discipline of metabolic engineering was defined 15 years ago (Bailey, 1991). Initially, metabolic engineering efforts were primarily focused on improving the productivity of naturally-occurring metabolites within an organism, such as for overexpressing glycolytic enzymes in yeast (Schaaff et al., 1989). More recently, the field has expanded to encompass a number of examples of introducing new enzyme activities into a host cell in order to produce non-natural products (Martin et al., 2003; Ro et al., 2006) or to engineer degradation of toxic compounds (Haro and de Lorenzo, 2001).
The use of automated techniques to design biological systems constitutes a breakthrough in biotechnology, and it has previously been applied to predict biodegradation pathways (Hou et al., 2003; Pazos et al., 2005). Interestingly, functional approaches (Hatzimanikatis et al., 2005; Hou et al., 2003; Li et al., 2004) could reveal novel pathways, but these are ultimately limited by the availability of naturally-occurring enzymes. In that sense, recent work shows how to construct biochemical pathways using atomic information (Arita, 2003, 2004), and this approach could be used to enlarge our enzyme database by adding abstract reactions corresponding to functional enzymes. This would allow the design of metabolic pathways that incorporate enzymes not found in nature but which could be engineered by directed evolution or using computational design (Rothlisberger et al., 2008). In this work we propose to go beyond by extending the design to biosynthesis and predicting the cell behavior when implementing a pathway in a given host using plasmids (Jones et al., 2000).
On the other hand, one of the major challenges in synthetic biology is engineering as far as possible orthogonal systems (Sprinzak and Elowitz, 2005). In that way, quantitative models provide fruitful insights. We propose the use of two different models to quantify the readjustment of fluxes (Varma and Palsson, 1994) and the consumption of cellular resources (Bremer and Dennis, 1996) that results from the expression of heterologous pathways. We select the growth rate as the control parameter for the cellular behavior evaluation. From the transcriptional approach, we consider a dynamical model involving RNAs, RNA polymerases, proteins and ribosomes (Carrera J. et al., manuscript in preparation). Accordingly, we compute the reduction in the growth rate due to the sequestration of RNA polymerases and ribosomes. On the other hand, since the cell is metabolically altered, we use Flux Balance Analysis (FBA) to predict the new growth rate. These two strategies give different predictions about the cell behavior, but they constitute two scores to be considered when implementing a designed pathway. Further approaches will use more complex models by integrating the metabolic and transcriptomic systems, and also taking advantage of databases of Gibbs free energies for all enzymatic reactions (Mavrovouniotis, 1991). Importantly, as the desired route could be not unique, we provide a methodology to rank different pathways according to their genetic loads.
| 2 METHODS |
|---|
|
|
|---|
2.1 Algorithm
We have developed a Monte Carlo algorithm (DESHARKY) with the aim of designing metabolic pathways. The purpose is to find a possible route connecting a given compound of interest with a metabolite from the considered hosting organism. These routes can be for biodegradation (reactant as source) or biosynthesis (product as source). For the source compound, we find the possible enzymatic reactions and select one among them with equitable probabilities. We repeat this process for the new source compound. Moreover, we consider with a given probability a move to go back, removing the previous reaction, to improve the convergence and to avoid long pathways. This probability is a function of the number of the already introduced steps, as the longer the pathway, the higher is the probability to go back, and here we have used a sigmoid function. We do not consider metabolic steps involving many compounds which are not specific to the hosting organism (here, one non-specific reactant and one product at most).
2.2 Transcription–translation model
The microbial production or degradation of chemical compounds usually requires the expression of foreign enzymes. This expression consumes cellular resources such as RNA polymerases and ribonucleotides for transcription, and ribosomes and amino acids for translation. Using previous knowledge on heterologous expression, we assume that RNA polymerases and ribosomes are the two critical pools. Using the experimental measurements of these resources in Escherichia coli (Bremer and Dennis, 1996), we have constructed a chassis model (Carrera et al., manuscript in preparation), fitting those data with exponential equations (see caption of Fig. 1). Furthermore, we have modeled the total heterologous expression of RNA (RNAh) by
|
| (1) |
|
| (2) |
is the average transcription rate, C the number of copies of external DNA,
the average translation rate, and
r and
e the degradation rates of the RNA and enzymes, respectively. Hence, a first order approach is to compute the consumption of cellular resources by the heterologous system (RNAPh=
Ctr and RIBh=
RNAh tp, where tr is the transcription time and tp the translation time) and then to recompute the growth rate using the phenomenological chassis model (Fig. 1). We take the minimum value of µ throughout these resources.
|
|
2.3 Metabolic model
We have addressed the metabolic burden with FBA (Varma and Palsson, 1994). This linear program, in which we maximize the cell growth rate (µ), can be written as
|
| (3) |
b, subject to
S=c, where
, usually called shadow prices, are the contributions to the growth rate when perturbing the uptake fluxes (
µ=
b). Therefore, we can precompute
since it is a property of the host organism. In that way, the fact of introducing a new metabolic route in the host can be treated in a perturbative way. Then,
b=S*j where S* is the stoichiometry matrix for this pathway and j its flux.
2.4 Implementation
DESHARKY is implemented in C/C++, it is easily compiled, and it runs in UNIX environments (e.g. in Linux or in Windows using Cygwin). Here we have taken E. coli as the cell model. We have used an extended description of E. coli metabolism involving 1039 compounds, including extracellular compounds, and 2381 biochemical reactions (Schuetz et al., 2007). We provide the KEGG (Kanehisa and Goto, 2000) databases for chemical compounds and enzymatic reactions in a depured format. There are 14 965 chemical compounds, of which 826 are present in the host, 4942 enzymes, of which 2350 have available their sequence, and 7400 enzymatic reactions from 650 organisms. Also we consider a set of compounds eventually in the medium that can be used as substrates by the cell. To enlarge the capabilities of the algorithm, we can assume reversible reactions. In addition, we can introduce reactions which are not found in KEGG. The input of our algorithm is the target compound. The output is the designed metabolic pathway together with the quatification of the transcription, translation and metabolic load. In addition, we provide the sequence of amino acids of the enzymes involved in the pathway. These sequences are the closest phylogenetically to E. coli according to the KEGG classification of organisms.
Here we have assumed an initial growth rate of µ0=2 doublings/h, a transcription kinetics of
=0.1 RNA polymerases/s, a translation kinetics of
=0.4 ribosomes/s, a number of DNA copies for the enzymes of C=100, a transcription velocity of 1/tr=45 nt/s, a translation velocity of 1/tp=16 aa/s, and a metabolic pathway flux of j=1 mmol/gDW/h.
| 3 RESULTS AND DISCUSSION |
|---|
|
|
|---|
We have applied DESHARKY to design several metabolic pathways including biodegradation of toluene or phenol and bioproduction of sorbitol and glucaric acid (Table 1). For instance, the microbial production of glucaric acid is important for therapeutic purposes including cholesterol reduction and cancer chemotherapy, and for the synthesis of new nylons and hyperbranched polyesters. In Figure 1 we show the transcription, translation and metabolic load for this pathway, and in the Supplementary Figure S1 we depict the biochemical transformations and the list of genes encoding the corresponding enzymes. In addition, in the Supplementary Material we have compared the biodegradation pathways we found with those obtained from UM-BBD (Hou et al., 2003) showing alternative routes.
Our tool uses a heuristic algorithm based on Monte Carlo to find a possible route connecting a specified target metabolite with the host metabolism, instead of using a pathway selection by enumeration of all possible metabolic routes (Arita, 2003; Eppstein, 1998). DESHARKY finds a proper pathway and computes its associated genetic load in a few seconds. In addition, our software can be used in distributed computing to sample most of the solution space. For illustration purposes, we show in the Supplementary Material all possible biodegradation routes for phenol. Here, we have assumed non-weighted reactions for the heuristic procedure and we compute the genetic load a posteriori using the transcription and metabolic models. Alternatively, a global optimization could be addressed by considering the load of each reaction during the heuristic procedure (Croes et al., 2006).
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
We thank T. S. Moon for his help, and B. Canton and D. Endy for their fruitful comments on the chassis model.
Funding: Generalitat Valenciana G.R. (BFPI 2007/160 to G.R.); Spanish Ministry of Education (TIN 2006-12860); MIT-France program, Structural Funds ERDF; EU grant BioModularH2 (FP6-NEST-043340).
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Trey Ideker
Received on May 28, 2008; revised on August 14, 2008; accepted on September 2, 2008
| REFERENCES |
|---|
|
|
|---|
Arita M. In silico atomic tracing by substrate-product relationships in Escherichia coli intermediary metabolism. Genome Res. (2003) 13:2455–2466.
Arita M. The metabolic world of Escherichia coli is not small. Proc. Natl Acad. Sci. USA (2004) 101:1543–1547.
Bailey JE. Toward a science of metabolic engineering. Science (1991) 252:1668–1675.
Bremer H, Dennis PP. Modulation of chemical composition and other parameters of the cell by growth rate. In: Escherichia coli and Salmonella.—Neidhardt FC, et al, eds. (1996) 2, 2nd edn. Washington, D.C: ASM Press. 1553–1569.
Croes D, et al. Inferring meaningful pathways in weighted metabolic networks. J. Mol. Biol. (2006) 356:222–236.[CrossRef][Web of Science][Medline]
Eppstein D. Finding the k shortest paths. SIAM J. Comput. (1998) 28:652–673.[CrossRef]
Haro M-A, de Lorenzo V. Metabolic engineering of bacteria for environmental applications: construction of Pseudomonas strains for biodegradation of 2-chlorotoluene. J. Biotechnol. (2001) 85:103–113.[CrossRef][Web of Science][Medline]
Hatzimanikatis V, et al. Broadbelt. Exploring the diversity of complex metabolic networks. Bioinformatics (2005) 21:1603–1609.
Hou BK, et al. Microbial pathway prediction: a functional group approach. J. Chem. Inf. Comput. Sci. (2003) 43:1051–1057.[CrossRef][Web of Science][Medline]
Jones KL, et al. Low-copy plasmids can perform as well as or better than high-copy plasmids for metabolic engineering of bacteria. Metabolic Engineering (2000) 2:328–338.[CrossRef][Medline]
Kanehisa M, Goto S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. (2000) 28:27–30.
Li C, et al. Computational discovery of biochemical routes to specialty chemicals. Chem. Eng. Sci. (2004) 59:5051–5060.[CrossRef]
Martin JJ, et al. Engineering a mevalonate pathway in Escherichia coli for production of terpenoids. Nat. Biotech. (2003) 21:796–802.[CrossRef][Web of Science][Medline]
Mavrovouniotis ML. Estimation of standard Gibbs energy changes of biotransformations. J. Biol. Chem. (1991) 266:14440–14445.
Nielsen J. Metabolic engineering. Appl. Microbiol. Biotechnol. (2001) 55:263–283.[CrossRef][Web of Science][Medline]
Pazos F, et al. MetaRouter: bioinformatics for bioremediation. Nucleic Acids Res. (2005) 33:D588–D592.
Ro DK, et al. Production of the antimalarial drug precursor artemisinic acid in engineered yeast. Nature (2006) 440:940–943.[CrossRef][Web of Science][Medline]
Rothlisberger D, et al. Kemp elimination catalysts by computational enzyme design. Nature (2008) 453:190–195.[CrossRef][Web of Science][Medline]
Schaaff I, et al. Overproduction of glycolytic enzymes in yeast. Yeast (1989) 5:285–290.[CrossRef][Web of Science][Medline]
Schrijver A. Theory of Linear and Integer Programming. (1998) New York: John Wiley & Sons.
Schuetz R, et al. Systematic evaluation of objective functions for predicting intracellular fluxes in Escherichia coli. Molec. Syst. Biol. (2007) 3:119.
Sprinzak D, Elowitz MB. Reconstruction of genetic circuits. Nature (2005) 438:443–448.[CrossRef][Web of Science][Medline]
Varma A, Palsson BO. Metabolic flux balancing: Basic concepts, scientific and practical use. Bio/Technology (1994) 12:994–998.[CrossRef]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
