Skip Navigation


Bioinformatics Advance Access originally published online on October 28, 2004
Bioinformatics 2005 21(7):1180-1188; doi:10.1093/bioinformatics/bti099
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/7/1180    most recent
bti099v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (32)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Tsai, K.-Y.
Right arrow Articles by Wang, F.-S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Tsai, K.-Y.
Right arrow Articles by Wang, F.-S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2004. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

Evolutionary optimization with data collocation for reverse engineering of biological networks

Kuan-Yao Tsai and Feng-Sheng Wang *

Department of Chemical Engineering, National Chung Cheng University Chia-yi 621-02, Taiwan

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 INTRODUCTION
 METHOD
 RESULTS
 CONCLUSION
 REFERENCES
 

Motivation: Modern experimental biology is moving away from analyses of single elements to whole-organism measurements. Such measured time-course data contain a wealth of information about the structure and dynamic of the pathway or network. The dynamic modeling of the whole systems is formulated as a reverse problem that requires a well-suited mathematical model and a very efficient computational method to identify the model structure and parameters. Numerical integration for differential equations and finding global parameter values are still two major challenges in this field of the parameter estimation of nonlinear dynamic biological systems.

Results: We compare three techniques of parameter estimation for nonlinear dynamic biological systems. In the proposed scheme, the modified collocation method is applied to convert the differential equations to the system of algebraic equations. The observed time-course data are then substituted into the algebraic system equations to decouple system interactions in order to obtain the approximate model profiles. Hybrid differential evolution (HDE) with population size of five is able to find a global solution. The method is not only suited for parameter estimation but also can be applied for structure identification. The solution obtained by HDE is then used as the starting point for a local search method to yield the refined estimates.

Availability: The algorithm, implemented by Compaq Visual Fortran Professional Edition 6.6, and the supplements are available at http://www.che.ccu.edu.tw/~bioproc/index-english.html/. IMSL Math/Library is a commercial library included in Compaq Visual Fortran Professional Edition.

Contact: chmfsw{at}ccu.edu.tw


    INTRODUCTION
 TOP
 Abstract
 INTRODUCTION
 METHOD
 RESULTS
 CONCLUSION
 REFERENCES
 
Mathematical modeling and dynamic simulation of genetic networks, metabolic networks and signal transduction cascades is a central theme in systems biology and is increasingly attracting attention in post-genomic era (Voit, 2002). The current interest is not due to particular biological breakthroughs facilitated by mathematical models but is largely based on the promise that signal- and systems-oriented approaches have. The ultimate goal of mathematical modeling is to obtain expressions that quantitatively understand every detail and principle of biological systems. Building a mathematical model comprises mainly two aspects: (1) deciding on the model structure and (2) estimating the involved parameter values. Given a model structure, parameter estimation remains the limiting step in the modeling and simulation of biological systems. There have been in-depth studies regarding parameter estimation for nonlinear dynamic systems (Kuzmic, 1996; Mendes and Kell, 1998) and for steady-state systems (Jacquez and Perry, 1990; Hernandez and Ruiz, 1998). However, there still remain difficulties due to the measurement noise and according to the increasing number of parameters to be estimated (Kimura et al., 2004).

Various in silico models have been proposed to quantitatively describe the dynamic behaviors of biological systems. A serious bottleneck of in silico modeling in the past has often been the lack of sufficient data for parameter estimation. The advent of comprehensive gene expression data and metabolic profiles provides new means of parameter estimation (Moles et al., 2003; Voit, 2002). The estimation of parameter values from time-course data is a part of regression problems. There are two major challenges in this field of the parameter estimation of nonlinear dynamic biological systems. The first challenge is to develop an efficient optimization method to find the global estimates. Several conventional methods, such as the graphical method and the gradient-based nonlinear optimization methods (Mendes and Kell, 1998) have been employed as estimation techniques to obtain the parameter values. The graphical method can be applied only to those problems that can be converted to linear regression problems. Basically, a gradient-based nonlinear optimization method does not have such a restriction, but it requires gradient information about the error function with respect to the parameter estimates and often converges to local minima. Evolutionary algorithms which include genetic algorithms, evolutionary programming, evolution strategies, genetic programming and their variants (Back et al., 1997; McKay et al., 1997; Edwards et al., 1998) can be applied to overcome such drawbacks.

The second challenge is to develop a powerful numerical integration method to solve differential equations. In conventional parameter estimation, it requires numerical integration to yield dynamic profiles for the model. Numerical integration failure is the major problem in the optimization search. In addition, numerical integration is time consuming. Almeida and Voit (2003) have applied an artificial neural network to smooth the measured profile data for obtaining slopes of the response curves. Such slopes are then employed in the reverse problem as the measured data to compare with the slopes of the model so that such an approach avoids the numerical integration of differential equations.

In this study, hybrid differential evolution (HDE) is applied as a global search to determine a satisfied solution (Chiou and Wang, 1999). Such satisfied estimates are then provided as the starting point for a gradient-based optimization method to obtain the refined solution. To avoid numerical integration, we will introduce a modified collocation method (Wang, 2000) to directly use the measured dynamic data in order to yield the approximate model profiles for evaluating the error criterion of parameter estimation problems.


    METHOD
 TOP
 Abstract
 INTRODUCTION
 METHOD
 RESULTS
 CONCLUSION
 REFERENCES
 
Estimation technique
We are interested in dynamics of biological systems at the level of pools of molecular species. Such dynamics can be described mathematically by the S-system as follows (Voit, 2000):

(1)
where X is n-dimensional components or pools, the parameter vector p consists of rate constants, {alpha}j and ßj, and kinetic orders, gij and hij. f is the set of net rate equations, which consists of the influxes and the effluxes. The m-dimensional independent variables in the S-system equations are expressed as Xn+j, j = 1,..., m. The parameter estimation is to determine model parameters so that the dynamic profiles satisfactorily fit the measured observation. Figure 1 shows the concepts of conventional technique of parameter estimation. As shown in the dash box of Figure 1, we provide some adjustable inputs to perform the experiment in order to obtain measurable output data. Such inputs are also provided for the proposed model to compute the system output data. Meanwhile, an optimization method numerically generates model parameters in order to provide for the model. Differential equations including these model parameters are numerically solved to yield dynamic profiles. Difference between the measured data and the computed dynamic profiles becomes a least squared error and is expressed in the form

(2)
where Xei(tj) is the measured data for the i-th component at t = tj; Xi(tj) is the computed concentration for the i-th component at t = tj and Xei max are the maximum measured concentration of the i-th component. Here Ns is number of the sampling data, and Nexp is number of experiments. A least squared regression sums up every observed error in (2) to yield a criterion in order to inspect how the experimental data are close to the computed profiles. If the error criterion is not satisfied, then new model parameters are generated by the optimization algorithm, and repeat the procedures until a satisfied result is obtained.



View larger version (23K):
[in this window]
[in a new window]
 
Fig. 1 A conventional scheme of parameter estimation.

 
There is no general solution guaranteed to solve the problem in nonlinear parameter estimation. Voit and Almeida (2004) have proposed a strategy for improving efficiency to avoid the need for solving the differential equations. As shown in Figure 2, an artificial neural network is applied to smooth the measured data. Such smooth data provide for the slope evaluation and approximation blocks to compute the time-course slopes for each state variable. The evaluated slopes are compared with the slopes of the model to yield the error criterion in the form

(3)
Using the slopes as the error criterion can alleviate a computational burden. As discussed in the textbook of Voit (2000), we may obtain a very small error value. This implies that the slopes of the model are nearly identical to those of experiments. However, dynamic profiles may not be really adequate to the experimental data. Satisfied estimates can be obtained if the estimation problem in (3) is resolved with another initial guesses.



View larger version (29K):
[in this window]
[in a new window]
 
Fig. 2 Parameter estimation using slope approximation technique.

 
Collocation approximation
In this study, we introduce the modified collocation method to approximate dynamic profiles so that such data can be directly applied to the parameter estimation problem (2). Figure 3 shows the computational concepts of the parameter estimation with modified collocation method. The approach is very similar to Figure 1 except the approximation procedure. The modified collocation method is applied to convert differential equations into algebraic equations. Such algebraic equations directly adopt the measured data to approximately yield dynamic profiles to evaluate the error criterion.



View larger version (24K):
[in this window]
[in a new window]
 
Fig. 3 Parameter estimation using modified collocation approximation technique.

 
In collocation methods, dynamic variables X in equation (1) are spanned by a set of shape functions as

(4)
where x(j) is an expansion coefficient of X(t) and {varphi}j(t) is a set of polynomial shape function. Conventional collocation methods on the whole-time domain directly utilize differential equations (1) to define the weighted residual criterion to yield expansion coefficients. For the whole-time domain approach, the accuracy and efficiency largely depend on the degree of shape polynomials. In addition, more computational efforts are required to solve the residual equation using the higher degree polynomials. Collocation on finite elements can be applied to overcome such drawbacks (Zienkiewicz and Morgan, 1984). By using conventional collocation on finite elements, additional equality constraints are required to include in the residual equations for enforcing continuity of the solution profiles at each element boundary. Wang (2000) has proposed the modified collocation method to leave out such a requirement.

The modified collocation method redefined that the weighted residual criterion is based on the integral expression of the dynamic system. As a result, the equality constraints are automatically satisfied for the modified collocation method so that we can derive the unified formulas for computing the expansion coefficients element by element. Here, we use the simplest shape function, linear Lagrange polynomial, to illustrate how to apply the modified collocation equation for parameter estimation problems. The modified collocation equations are therefore expressed in the form

(5)
where x(j) is a vector of expansion coefficients at the j-th collocation point and is equal to the solution X(t) at time t = tj, f[x(j), p] is a vector function of expansion coefficients at the j-th collocation point, {eta}j is the time interval between the j-th collocation point and the (j – 1)-th point and N is number of collocation points. The solution for the dynamic system (1) yields the expansion coefficients by solving the implicit algebraic equations. However, such recursively solving procedures may spend most of computational burden in parameter estimation problems.

Approximated profiles can be applied to circumvent these recursively solving procedures in order to reduce computational time. If we substitute the measured data at each collocation point into the right-hand side of equation (5), the approximates from the data are expressed as

(6)
The reverse problem is thus reformulated from one involving n differential equations to a larger system of n x N decoupled algebraic equations. The reformulation of the system of differential equations as a system of algebraic equations has the obvious advantage of making numerical integration unnecessary, but it has other very beneficial consequences as similar to Voit and Almeida (2004).

Optimization technique
We apply HDE to the estimation problem to globally determine a satisfied result, and then the estimates are used as the starting point for a gradient-based optimization method to obtain a finer solution. The HDE algorithm is a simple population-based, stochastic function method and has extended from the original algorithm of differential evolution (DE) as introduced by Storn and Price (1996, 1997). The basic operations of DE are similar to the conventional evolutional algorithms as shown in Table 1. The basic operations of HDE are also shown in Table 1 for comparison.


View this table:
[in this window]
[in a new window]
 
Table 1 Basic DE and HDE operations

 
The DE and HDE structure is a parallel direct search algorithm which utilizes Np vectors of the decision parameters p in the estimation problem as a population for each generation. The initial population is randomly selected and should attempt to cover the entire search space uniformly. The mutation of DE and HDE is the essential ingredient, compared with the other evolutionary algorithms. The mutation in DE and HDE uses the difference between two randomly chosen individuals as a search direction. Such a mutation is different from the conventional evolutionary algorithms. A mutant individual is yielded from a parent individual and the perturbed mutation. In DE, a fixed mutation factor multiplies the difference vector to yield the perturbed mutation. In HDE, a random mutation factor is applied to obtain more diversified individuals. The crossover operation in DE and HDE is employed to increase the local population diversity, which is similar to the conventional evolutionary algorithms.

In DE and HDE, the fitness of an offspring competes one to one with that of its parent. This competition, which is also different from the conventional evolutionary algorithms, gives rise to a faster convergence rate. However, this faster convergence also leads to a higher probability of obtaining a local optimum because the population diversity descends faster during the solution progress. This drawback can be overcome by using a larger population size, although much computation time is expended to evaluate the fitness function. This fact is particularly serious when using DE to solve optimal control problems (Wang and Chiou, 1997). The HDE can avoid using a larger population size and has been applied to solve bioreaction process optimization problems (Wang et al., 1998; Wang and Sheu, 2000).

When using an evolutionary algorithm to optimize a function, an acceptable trade-off between convergence and population diversity must be generally determined. Convergence implies a fast convergence even though it may be to a local optimum. On the other hand, the population diversity guarantees a high probability of obtaining the global optimum. When the population diversity is small, the candidate individuals are closely clustered. Therefore, the mutation and crossover operations can no longer generate the next better individual because a premature solution is obtained. In the HDE, an accelerated operation and a migration are embedded in the original DE algorithm, and these two operations act as a trade-off operator. The accelerated operation is used to speed up convergence. According to our experience, by using DE to solve optimization problems, the best fitness does not descend continuously from generation to generation. It usually descends toward a better fitness after several generations. Under this situation, the acceleration can be used to speed up convergence. When the best fitness in the present generation is not improved any longer by the mutation and crossover operations, a descent method is then applied to push the best individual for obtaining a better solution.

The rate of convergence can be improved by the acceleration. However, faster descent usually results in finding a local minimum. In addition, performing this operation frequently can make the candidate individuals gradually cluster around the best individual so that the population diversity decreases. Furthermore, the closely clustered individuals cannot reproduce better individuals by mutation and crossover operations. As a result, a migration operation is switched on to escape this local cluster. The new candidate individuals are regenerated on the basis of the best individual in the current generation. Correspondingly, the diversity of the candidates can be retained by using such a regeneration procedure. The migration in HDE is performed only if a measure of the population diversity fails to satisfy the desired tolerance. Chiou and Wang (1999) proposed a measure called the degree of population diversity to check whether the migration should be switched on. The degree of population diversity is between 0 and 1. A zero value implies that all of the genes are clustered around the best individual. On the other hand, the value of 1 indicates that the current candidate individuals are a completely diversified population. The desired tolerance for population diversity is accordingly assigned within this region. A zero tolerance implies that the migration in HDE is switched off. A tolerance of 1 implies that the migration is performed in every generation.


    RESULTS
 TOP
 Abstract
 INTRODUCTION
 METHOD
 RESULTS
 CONCLUSION
 REFERENCES
 
In order to examine the effectiveness of the proposed techniques, we have applied the method to determine model parameters for a cascaded pathway and a small-scale genetic network.

Cascaded pathway
Cascaded mechanisms are found in diverse areas of biochemistry and physiology, including hormonal control, gene regulation, immunology, blood clotting and visual excitation. Purely biochemical examples are the phosphorylase activation cascade in muscle and the bicyclic glutamine synthase cascade. In this example, we consider the ‘true’ cascade system shown in Figure 4. The system is described in the S-system equations as follows:

(7)
where X4 is the precursor, which is considered as the independent variable in the experiment. We first generated 10 sets of random initial conditions for dependent variables within the range [0.1, 0.6] and the precursor within [0.6, 0.9] to generate 40 ‘true’ time-course data. The ‘true’ kinetic orders and rate constants are shown in Case I of Table 2.



View larger version (6K):
[in this window]
[in a new window]
 
Fig. 4 Cascade with three steps and feedback. The dependent variables X1 through X3 are produced from precursors X4, which is considered to be independent.

 

View this table:
[in this window]
[in a new window]
 
Table 2 Parameter value of the ‘true’ system (Case I) and the estimates (Cases II to IV) yielded by various techniques

 
All computations were performed on a Pentium IV computer using Microsoft Windows 2000. The HDE algorithm is implemented in Compaq Visual Fortran. The user has to provide four setting factors for HDE. The setting factors used for all runs in the case studies are listed as follows. The crossover factor is set to 0.5. Two tolerances used in the migration are set to 0.05. The population size of 5 is used in the computation. In HDE, the mutation factor is taken as a random number in [0, 1]. The range used for the parameter estimation is {alpha}i and ßi [0.0, 20.0] and gij and hij [–4.0, 4.0].

The HDE with 50 000 generations is used to solve the estimation problem. This computation uses the total function calls of 314 022 including the migration operations of 189 and acceleration operations of 2663. The computational time required for the global search about 2.3 min on a single-CPU Pentium IV 2.4 GHz. The best objective function value, denoted as sum of squared errors for all state variables and experiments, is 1.373E–3. In order to yield more accurate estimates, the solution obtained by HDE is employed as the starting point for a gradient-based method, a subroutine BCONF in IMSL Math/Library, to solve the parameter estimation problem. The local search procedure, as shown in Figure 1, employs Runge–Kutta pairs of various orders, a subroutine IVMRK in IMSL Math/Library, to solve differential equations for obtaining time-course profiles of the system. The refined objective function value of 2.978E–5 is obtained. The estimates are listed in Case II of Table 2. The computational time required for the local search about 4.67 min on a single-CPU Pentium IV 2.4 GHz. The local search is able to obtain more accurate solution, but it spends much computational time on numerical integration. The HDE with 100 000 generations was also applied to solve the estimation problem. The computational time required for the global search was ~6.6 min to obtain the best objective function value of 8.707E–5. After that, the local search spent ~2.02 min to yield the refined objective function value of 4.103E–6.

We applied DE with population size of 5 to solve the estimation problem for comparison. In the original DE algorithm, the mutation factor is fixed and provided by a user. We tried various pairs of mutation and crossover factors for the original DE to solve the problem. However, we always obtained a premature solution using the fixed mutation factor. We then used the random mutation factor at every generation. We still obtained the premature objective function value of 1.811. This solution was unsatisfied by using the five-population size because of mutation and crossover operations in DE being vanished. Moreover, we were unable to find a converged result when the solution was applied as the starting point for the local search because of a numerical integration failure. A satisfied solution can be obtained by using DE with the population size of 30. The computational time required for the global and local search is ~15.83 min to achieve the best objective function value of 5.522E–4.

A simple genetic algorithm was also applied to solve the estimation problem. We used the binary coding for the variables with the population size of 100 to solve the estimation problem. However, we were unable to find a satisfied solution. In this work, we then applied the improved genetic algorithm (IGA), which embedded the acceleration and migration of the HDE algorithm into the GA algorithm (Chiang et al., 2002), to inspect the computation effect. The best objective function value of 5.495E–4 was obtained by IGA and local search by spending a total CPU processing time of 10.62 min. This effect indicates that the acceleration and migration operations could be indeed applied to improve the solution quality.

We applied the HDE global search using slope approximation, as shown in Figure 2, to evaluate error criterion. The computational time required for the global search was ~2 min. The best slope error criterion for state variables and experiments was 8.806E–2. The solution was then assigned as the starting point for the local search method including Runge–Kutta integration to refine the estimates. The refined objective function value of 7.041E–4 was obtained by the local search with a CPU processing time of 8.03 min. The estimates are listed in Case III of Table 2. The estimates are clearly different from the true values. For instance, gi3 is positive, which implies that X3 is activating the synthesis of X1 rather than inhibiting. This situation is similar to the result in Voit (2000). Nevertheless, the fitness value of 7.041E–4 is larger than the proposed method but acceptable.

Realistic experimental data are, in general, subjective natural variations and uncertainties. Such data require to be smoothed by a filter, such as artificial neural network, in order to yield noise-free data providing for parameter estimation. Accordingly, such an estimation quality should depend on the employed filter. In order to evaluate robustness of the proposed method, we suppose that very rough filtered data, which the previous experimental data are included with random noise, are directly applied for the modified collocation block in Figure 3 to approximate the dynamic profiles. The global HDE search with 50 000 generations is then used to solve the estimation problem for obtaining the best objective function value of 8.50E–2. The refined objective function value of 1.637E–3 is obtained by the local search technique. The estimates are listed in Case IV of Table 2. The parameters, gi2 and gi3, are identical inhibition effect to the true system.

Two test experiments were used to validate fitness for these estimates. The initial conditions and independent variable for the first test experiment are [0.5, 0.2, 0.5, 0.9] within the training ranges. However, the second test experiment is [0.72, 0.08, 0.72, 1.05], which is beyond the training ranges. Figure 5 shows the ‘true’ and computed dynamic profiles. The solid, dash and dotted curves are the dynamic profiles by using the estimates obtained from Cases II, III and IV of Table 2, respectively. Such dynamic profiles are nearly identical to the experimental data, as observed from Figure 5.



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 5 Model validation using various estimates. The triangle and circle data points are the ‘true’ dynamic profiles using the initial conditions for [0.5, 0.2, 0.5, 0.9] and [0.72, 0.08, 0.72, 1.05], respectively. The solid, dash and dotted curves are the computed profiles using the estimates for Cases II, III and IV in Table 2, respectively. The dash-dot and dash-dot-dot curves are the computed profiles using the estimates for Cases III and IV in Table 3, respectively.

 
The proposed global/local search method is not only suited for parameter estimation but also applies for structure identification. A priori knowledge about the biological systems is introduced into the reverse problem in order to determine a correct solution. The pruning term introduced by Kikuchi et al. (2003) is used as the penalty into the objective function to detect a suitable skeletal structure. In addition, if it is a priori known that Xj does not directly affect Xi, then the corresponding kinetic orders are zero and this reduces the parameter space that needs to be searched by a full dimension (Almeida and Voit, 2003; Voit and Almeida, 2004). In this example, we set the kinetic orders gii = 0, which precluded a direct effect of a variable on its own production and required the kinetic orders hii to be >0, indicating that the degradation of compounds almost always depends on the concentration. Under the assumption, the search range used for the regression is {alpha}i and ßi [0.0, 20.0], gij and hij [–4.0, 4.0], i != j and hii [0.0, 4.0]. The HDE algorithm with 100 000 generations was used to solve the estimation problem to yield the best objective function value of 1.382E–2. The optimal parameter values are shown in Case I of Table 3. We delete the smaller kinetic orders, (magnitude <0.01), as shown in Case II (underlined to zero). The pruned model is then refitted by the global search to obtain the optimal estimates as given in Case II of Table 3. Following similar procedures, we obtained the optimal solution in Case III of Table 3. Such a solution is essentially identical to the ‘true’ structure. The parameter values were then applied as the starting point for the local search method to yield the optimal objective function value of 4.358E–4. The optimal estimates are given in Case IV of Table 3. Total computational time for the reverse problem required for the global and local search was ~1.03 h on a single-CPU Pentium IV 2.4 GHz. In order to validate the estimates for Case III and Case IV, both parameter values were applied to evaluate two test experiments. The computational results are shown in Figure 5 by dash-dot and dash-dot-dot curves for Cases III and IV, respectively. Both sets of model parameters can satisfactorily fit the test experiments.


View this table:
[in this window]
[in a new window]
 
Table 3 Sequential identification for parameter values and regulation structure of the cascaded pathway

 
Small-scale genetic network
The second case study is a small-scale genetic network shown in Figure 6 (Hlavacek and Savageau, 1996). This is a typical gene interaction system consisting of two genes (genes 1 and 4). The ‘true’ system is described in the S-system equations as follows:

(8)
where X1 is an mRNA produced from gene 1, X2 is an enzyme protein it produces and X3 is an inducer protein catalyzed by X2. X4 is an mRNA produced from gene 4 and X5 is a regulator protein it produces. Positive feedback from the inducer protein X3 and negative feedback from the regulator protein X5 are assumed in the mRNA production processes of genes 1 and 4. X6, X7 and X8 denote a pool of nucleic acid, amino acid and substrate, respectively, and are considered as independent variables in the system.



View larger version (11K):
[in this window]
[in a new window]
 
Fig. 6 Genetic network used in the second computational experiments.

 
Following similar procedures as discussed in the previous case study, we applied several methods to determine the kinetic parameters of the genetic network. Those results are summarized in the supplements. The optimal solution obtained by HDE using the modified collocation approximation is more accurate and less time consuming than the other methods. Moreover, the proposed method also suits for the small noisy time-course data. Here, we show the estimated results for the reverse problem only. Under the same search range as discussed in the previous case study, the HDE algorithm with 100 000 generations was used to solve the estimation problem to yield the best objective function value of 8.287E–3. The optimal parameter values are shown in Case II of Table 4. We subtracted the smaller kinetic orders, which the magnitude is <0.01, as shown in Case II with underline to zero. The pruned model was then refitted by the global search to obtain the optimal estimates given in Case III of Table 4. Using the same type of simplification again, we obtained the results in Case IV of Table 4. This structure is essentially identical to the ‘true’ system. Such parameter values were then applied as the starting point for the local search method to yield the optimal objective function value of 5.7576E–6. The optimal estimates are given in Case V of Table 4. In Table 4, we used the skeletal threshold of 0.1 to prune small parameters. When a smaller threshold is used, for example 1.0E–3, more sequential identification stages are required, six stages for this case study, to yield the identical result.


View this table:
[in this window]
[in a new window]
 
Table 4 Parameter value of the ‘true’ system (Case I) and sequential identification for parameter values and regulation structure (Cases II to V)

 
The estimated parameter values were applied to evaluate two extra test experiments in order to validate the model. Input conditions for the first test experiment are greater than the upper bounds for the training experiments. However, conditions for the second test experiment are less than the lower bounds for the training experiments. Both ‘true’ dynamic profiles are shown as the circle and triangle data points in Figure 7a and b, respectively. The parameter values for Case V in Table 4 are much close to the ‘true’ process so that the computed profiles (solid curves) are nearly identical to both test experiments shown in Figure 7. Cases II, III and IV were applied to evaluate the fitness. Because the model for Case II uses too many parameters to describe the dynamic behavior, some profiles are less fit to the test experiments, as observed from Figure 7.



View larger version (33K):
[in this window]
[in a new window]
 
Fig. 7 Model validation using various estimates. The circle and triangle data points are the ‘true’ dynamic profiles using the initial conditions for [0.84, 0.84, 0.84, 0.84, 0.84, 1.5, 1.5, 1.5] and [0.08, 0.08, 0.08, 0.08, 0.08, 0.6, 0.6, 0.6], respectively. The dash-dot-dot, dash, dotted and solid curves are the computed profiles using the estimates for Cases II, III, IV and V in Table 4 respectively.

 
Table 5 summarizes the comparison between the proposed method and two reported methods for the reverse problem. The proposed method applied the modified collocation to convert differential equations into a set of algebraic equations. This approach is not only suitable for parallel computation, but also saves a lot of computation time. The total CPU time for this parameter and structure identification from Case II to Case V was ~2.84 h on a single-CPU computer. The PEACE 1 used numerical integration to solve differential equations. Such a method using a modified genetic algorithm and running on a PC cluster with 1040 CPUs took ~10 h for one loop (Kikuchi et al., 2003). In addition, the structure is not completely identical to the true system. Kimura et al. (2004) applied a decomposed method to integrate differential equations. Such an algorithm applied a real-coded genetic algorithm with parallel computation to solve the problem (referred to as GLSDC). It required ~58.8 min on a single-CPU Pentium III 1 GHz to optimize each subproblem. However, such a method was unable to estimate the parameter values with precision.


View this table:
[in this window]
[in a new window]
 
Table 5 Comparison between the proposed method and two reported methods for the reverse problem of the second case study

 

    CONCLUSION
 TOP
 Abstract
 INTRODUCTION
 METHOD
 RESULTS
 CONCLUSION
 REFERENCES
 
Parameter estimation is a well-established technique and is routinely applied to estimate biochemical kinetics and enzyme kinetic parameters in many laboratories. However, there are still many challenges in this field for the parameter estimation of nonlinear dynamic biological systems. Numerical integration for differential equations and finding global parameter values are two major problems. In this study, we applied the modified collocation method to convert differential equations into a set of algebraic equations. Such an approximation could yield accurate model profiles and save a lot of computational time. HDE was applied as a global search to determine a satisfied solution, and then such estimates were used as the starting point for a gradient-based optimization method to obtain refined estimates. Two case studies were used to illustrate the effectiveness of the global/local search method and to compare with several methods.


    Acknowledgments
 
The financial support from the National Science Council, ROC (Grant No. NSC92–2214-E194–004) is highly appreciated.

Received on July 19, 2004; revised on September 30, 2004; accepted on October 15, 2004

    REFERENCES
 TOP
 Abstract
 INTRODUCTION
 METHOD
 RESULTS
 CONCLUSION
 REFERENCES
 

    Almeida, J.S. and Voit, E.O. (2003) Neural-network-based parameter estimation in S-system models of biological networks. Genome Informatics, 14, 114–123[Medline].

    Back, T., Fogel, D., Michalewicz, Z. Handbook of Evolutionary Computation, (1997) , New York Oxford University Press.

    Chiang, C.L., Su, C.T., Wang, F.S. (2002) Augmented Lagrangian method for evolutionary optimization of mixed-integer nonlinear constrained problems. Int. Math. J., 2, , pp. 119–154.

    Chiou, J.P. and Wang, F.S. (1999) Hybrid method of evolution algorithms for static and dynamic optimization problems with application to a fedbatch fermentation process. Comput. Chem. Eng., 23, 1277–1291[CrossRef].

    Edwards, K., Edgar, T.F., Maousiouthakis, V.I. (1998) Kinetic model reduction using genetic algorithms. Comput. Chem. Eng., 22, 239–246.

    Hernandez, A. and Ruiz, M.T. (1998) An EXCEL template for calculation of enzyme kinetic parameters by non-linear regression. Bioinformatics, 14, 227–228[Abstract/Free Full Text].

    Hlavacek, W.S. and Savageau, M.A. (1996) Rules for coupled expression of regulator and effector genes in inducible circuits. J Mol. Biol., 255, 121–139[CrossRef][Web of Science][Medline].

    Jacquez, J.A. and Perry, T. (1990) Parameter estimation: local identifiablility of parameters. Am. J. Physiol., 258, E727–E736.

    Kikuchi, S., Tominaga, D., Arita, M., Takahashi, K., Tomita, M. (2003) Dynamic modeling of genetic algorithm and S-system. Bioinformatics, 19, 643–650[Abstract/Free Full Text].

    Kimura, S., Hatakeyama, M., Konagaya, A. (2004) Inference of S-system models of genetic networks from noisy time-series data. Chem-Bio Informatics J., 4, 1–14.

    Kuzmic, P. (1996) Program DYNAFIT for analysis of enzyme kinetic data: application to HIV proteinase. Anal. Biochem., 237, 260–273[CrossRef][Web of Science][Medline].

    McKay, B., Willis, M., Barton, G. (1997) Steady-state modelling of chemical process systems using genetic programming. Comput. Chem. Eng., 21, 981–996[CrossRef].

    Mendes, P. and Kell, D.B. (1998) Non-linear optimization of biochemical pathways: applications to metabolic engineering and parameter estimation. Bioinformatics, 14, 869–883[Abstract/Free Full Text].

    Moles, C.G., Mendes, P., Banga, J.R. (2003) Parameter estimation in biochemical pathways: a comparison of global optimization methods. Genome Res., 13, 2467–2474[Abstract/Free Full Text].

    Storn, R. and Price, K.V. (1996) Minimizing the real function of the ICEC'96 contest by differential evolution. IEEE Conference on Evolutionary Computation , Nagoya, Japan , pp. 842–844.

    Storn, R. and Price, K.V. (1997) Differential evolution: a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optimiz., 11, 341–359.

    Voit, E.O. Computational Analysis of Biochemical Systems, (2000) , Cambridge, UK Cambridge University Press.

    Voit, E.O. (2002) Models-of-data and models-of processes in the post-genomic era. Math. Biosci., 180, , pp. 263–274[Medline].

    Voit, E.O. and Almeida, J. (2004) Decoupling dynamical systems for pathway identification from metabolic profiles. Bioinformatics, 20, 1670–1681[Abstract/Free Full Text].

    Wang, F.S. (2000) A modified collocation method for solving differential-algebraic equations. Appl. Math. Comput., 116, 257–278[CrossRef].

    Wang, F.S. and Chiou, J.P. (1997) Optimal control and optimal time location problems of differential–algebraic systems by differential evolution. Ind. Eng. Chem. Res., 36, 5348–5357[CrossRef].

    Wang, F.S., Jing, C.H., Tsao, G.T. (1998) Fuzzy-decision-making problems of fuel ethanol production using a genetically engineering yeast. Ind. Eng. Chem. Res., 37, 3434–3443[CrossRef].

    Wang, F.S. and Sheu, J.W. (2000) Multiobjective parameter estimation problems of fermentation processes using a high ethanol tolerance yeast. Chem. Eng. Sci., 55, 3685–3695[CrossRef].

    Zienkiewicz, O.C. and Morgan, K. Finite Elements and Approximation, (1984) , New York Wiley.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
S. Kimura, S. Nakayama, and M. Hatakeyama
Genetic network inference as a series of discrimination tasks
Bioinformatics, April 1, 2009; 25(7): 918 - 925.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
P. Gennemark and D. Wedelin
Benchmarks for identification of ordinary differential equations from time series data
Bioinformatics, March 15, 2009; 25(6): 780 - 786.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
G. Goel, I-C. Chou, and E. O. Voit
System estimation from metabolic time-series data
Bioinformatics, November 1, 2008; 24(21): 2505 - 2511.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
P.-K. Liu and F.-S. Wang
Inference of biochemical network models in S-system using multiobjective optimization approach
Bioinformatics, April 15, 2008; 24(8): 1085 - 1092.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Z. Zi and E. Klipp
SBML-PET: a Systems Biology Markup Language-based parameter estimation tool
Bioinformatics, November 1, 2006; 22(21): 2704 - 2705.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D.-Y. Cho, K.-H. Cho, and B.-T. Zhang
Identification of biochemical networks by S-tree based genetic programming
Bioinformatics, July 1, 2006; 22(13): 1631 - 1640.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/7/1180    most recent
bti099v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (32)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Tsai, K.-Y.
Right arrow Articles by Wang, F.-S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Tsai, K.-Y.
Right arrow Articles by Wang, F.-S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?