Bioinformatics Advance Access originally published online on January 5, 2008
Bioinformatics 2008 24(4):553-560; doi:10.1093/bioinformatics/btm623
Structural systems identification of genetic regulatory networks
Department of Computer Science, Texas A&M University, College Station, TX 77843-3112
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Reverse engineering of genetic regulatory networks from experimental data is the first step toward the modeling of genetic networks. Linear state-space models, also known as linear dynamical models, have been applied to model genetic networks from gene expression time series data, but existing works have not taken into account available structural information. Without structural constraints, estimated models may contradict biological knowledge and estimation methods may over-fit.
Results: In this report, we extended expectation-maximization (EM) algorithms to incorporate prior network structure and to estimate genetic regulatory networks that can track and predict gene expression profiles. We applied our method to synthetic data and to SOS data and showed that our method significantly outperforms the regular EM without structural constraints.
Availability: The Matlab code is available upon request and the SOS data can be downloaded from http://www.weizmann.ac.il/mcb/UriAlon/Papers/SOSData/, courtesy of Uri Alon. Zak's data is available from his website, http://www.che.udel.edu/systems/people/zak
Contact: hxiong{at}cs.tamu.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
| 1 INTRODUCTION |
|---|
|
|
|---|
Genetic regulatory networks, often abbreviated as genetic networks, help us untangle the intricate interactions of multiple genes under different environmental conditions. Recent developments in microarray technologies allow scientists to simultaneously measure expressions of thousands, even tens of thousands of genes, over time. The time course of gene expression data can be used to reconstruct genetic networks. In the past decade, a wide variety of models have been developed to study genetic networks. They include Boolean networks (Akutsu et al., 1999; Kauffman et al., 2003), differential equations (Chen et al., 1999; DHaeseleer et al., 1999) and dynamic Bayesian networks (Dojer et al., 2006; Friedman et al., 2000; Husmeier, 2003; Murphy and Mian, 1999).
A special subclass of dynamic Bayesian networks (DBN) is linear dynamical systems (LDS) (Beal et al., 2005; Dojer et al., 2006; Li et al., 2006; Peer et al., 2001), also known as linear state-space models. LDS assumes that observations depend on unobservable states that evolve under Markovian dynamics, i.e. future states are probabilistically determined only by the current state. The state-space approach can provide a general framework for the design of genetic networks in synthetic biology; however, to date, most efforts in estimating an LDS have tried to estimate parameters without considering structural constraints.
Wu et al. (2004) proposed to use LDS to explore large time-course data. While they used hidden, unobserved variables as the states and modeled expression profiles as the output, they did not consider noise, and they estimated states from the output using maximum likelihood factor analysis. The number of states in estimated models was a variable estimated using the Bayesian information criterion (BIC), and the state transition matrix was estimated using least square methods. Wu et al. (2004) considered the state transition matrix as the key parameter that embodies genetic interaction, and the stability or instability of this matrix was cited as supporting evidence for their method. One limitation of Wu's approach is that factors have no obvious biological counterpart.
Later, Yamaguchi and colleagues (Yamaguchi and Higuchi, 2006; Yamaguchi et al., 2007) also tried to use similar linear state-space system to model genetic networks, where a state was defined as a module of interacting genes. The parameters in the model measured quantitative relationships between modules. The dimension of the states was determined by BIC and the parameters were estimated by the expectation-maximization (EM) algorithm.
Rangel and her colleagues, on the other hand, in a series of papers (Rangel et al., 2001; Rangel et al., 2004a, b), modeled individual gene interactions using linear state-space models. Their estimation method consisted of two parts. The inner part was the EM algorithm with full connectivity assumed and therefore no structural constraints. To avoid over-modeling, they first estimated the dimension of hidden states using cross-validation. To avoid over-fitting, they augmented their objective function with terms that favor sparsity. In the outer loop, a bootstrapping method was used to estimate the confidence intervals for all the parameters. Presence or absence of connection between the genes in the network depends on whether the confidence interval includes zero or not, which added up to be the structure of the network. The final result of their method was a connectivity matrix, but no dynamical model resulted since none of the estimated models agreed with the inferred connectivity matrix. Our work seeks to estimate parameters that agree completely with a given connectivity matrix.
Another approach is to separate the task of parameter estimation and model selection and perform them separately, as Gennemark and Wedelin (2007) did for S-system models of genetic networks, where parameter estimation was done under the constraint of currently estimated structure. This is useful also when connectivity is available from the literature (Iyer et al., 2001; Lee et al., 2002; Ren et al., 2000) in which case only parameters need to be estimated. Rangel et al. (2004a) recognized imposing constraints on parameters as a possible extension of their work and suggested it in the discussion, and we will be following that lead precisely in this report.
The purpose of this report is to develop state-space representations of genetic networks with known structures. The method we propose is to combine linear dynamical modeling with structural constraints to produce biologically realistic models that can predict the dynamic behaviors of genetic networks. The motivation for imposing structural constraints is as follows: Any genetic network has a structure specifying interactions between the genes represented as connections. Not all genes are connected, and in fact, every evidence points to genetic networks being sparse. The network structures can often be determined from experiments by biologists, or they can be roughly inferred by model selection. In linear dynamical models, the structure of the genetic network is mainly reflected in the elements of the system matrices. If there are no connections between the genes or connections between the gene and the external stimuli, their corresponding elements in the system matrices should be equal to zero. Without such constraints the models cannot take the structure of the network into account. Therefore, to ensure that estimated models agree with a known structure, constraints must be imposed on parameters. A popular method for the estimation of parameters in state-space models is the EM algorithm. However, conventional unconstrained EM algorithms cannot be applied to models with constraints, so modifications must be made. Incorporating network structure into the state-space model of genetic networks will lead to imposing constraints on the parameter space. Although constrained EM algorithms have been proposed in the engineering literature (Ahn and Oh, 2003; Welling and Weber, 2001), the proposed constrained EM algorithms require iterative numerical solutions to equations derived in the M-step of the EM algorithm. This iterative solution inside an already iterative method would make computation time intolerably long. While generalized EM algorithms like the one Wu et al. (1996) used, avoid iterative numerical solutions, they have slower convergence speed. Therefore, in this report, we present a new type of constrained EM algorithm that admits analytical and decoupled solution and thus preserves EM's speed while not resorting to numerical solutions or generalized EM. In the DBN community, this problem of structural constraint is called the known structure and partial observability for the learning of Bayesian networks (Murphy and Mian, 2002). Since structures of some genetic networks can be gleaned from the literature or discovered through ChIP-on-chip experiments (Lemmens et al., 2006), they should be taken into account whenever available (Zak et al., 2003). Application to synthetic data and real world SOS data show that our method significantly outperforms conventional EM and that structural constraints are important in the reverse engineering of genetic networks.
| 2 METHODS |
|---|
|
|
|---|
2.1 Linear dynamical systems
We have adopted the linear state-space model as the underlying model for genetic networks, in particular the linear time-invariant (LTI) model. LTI is a linear state-space model where parameters do not change over time (Chen, 1999). A linear state-space model of a dynamical system can be written as
|
| (1) |
The linear state-space model represented in Equation (1) is quite general and can represent more than simple exponential growth and decay, for it can represent higher order dynamics, which we will look at next.
2.2 Higher order dynamics
If we stick with one gene for one state, then the system represented in Equation (1) only will have first order dynamics associated with all the genes, which is exponential decay or growth, but since oscillation is widely observed in biology at least second order should be considered in models of genetic networks. We will give a simple derivation of how to add second order dynamics for the individual nodes of the networks using the principle of continuous to discrete conversion. This is similar to dAlché-Buc's method (dAlché-Buc et al., 2005). Of course third or higher order dynamics can be similarly modeled, but care must be taken to avoid over-fitting.
Suppose we have a second order linear differential equation describing the dynamics of a node:
|
| (2) |
1 and
2 parameters. Let
|
| (3) |
|
| (4) |
t = 1, then we can represent the derivatives as
|
| (5) |
|
| (6) |
The ones and zeros in Equation (6) are fixed except in 1 –
1 where the whole term is variable. An interesting observation is that all interactions and inputs are in the second order term x2.
We will apply this conversion to just one gene in the SOS network, lexA. But first we need to derive the constrained EM algorithm.
2.3 Expectation maximization (EM)
EM is a well known method in systems identification (Dempster et al., 1977; Ghahramani and Hinton, 1996; Gibson and Ninness, 2005; Shumway and Stoffer, 1982). EM is a Maximum-Likelihood (ML) estimator of unobserved states and unknown parameters and it operates in an iterative fashion. Each iteration consists of two steps: the E-step and the M-step. In the E-step, states are estimated using Kalman smoother with previous estimates of parameters as model parameters. In the M-step, parameters are estimated using the estimated states obtained in the E-step, and parameters are calculated as to maximize the likelihood. We will focus on our modification to the M-step where network structure constraints are taken into account. We will follow the notations in Gibson and Ninness (2005), while Kailath et al. (2000) is a good source on Kalman filter. Rewriting Equation (1) as
|
| (7) |
|
| (8) |
|
| (9) |
|
|
n is the number of time-points in the nth time series, |
|
The conventional EM would have the M-step as
|
| (10) |
2.4 Structural systems identification
Given a network whose structure is known, for example if a pair of genes is represented by two states and they have no interaction, then the corresponding entry in matrix A should be zero. Similarly, if an input has no influence on a gene that is represented as a state, then the corresponding entry in B should be zero. The same goes for entries in C that describe how measurements depend on the states. Matrix D is usually all zeros because genes do not impact other genes expression level instantaneously.
Take for example the SOS network in Figure 4. The sole input is the gene recA, so input vector ut is a scalar, and matrix B is a vector whose entries are all zeros except for the first element. Among the rest of the genes, only lexA interacts with other genes so matrix A only has the first column and the diagonal entries as non-zero for a first order system. The matrix A's diagonal entries are non-zero because, in general, genes impact their own expression levels. Since we measure all the genes expression levels directly, we set C to be an identity matrix. D is a zero matrix as explained above. So putting all this together, we have parameters A, B, C and D initially determined as follows:
|
|
|
|
|
|
|
|
to be diagonal since we have no reason to believe that noise in each state or measurement is correlated. This also is a standard assumption unless there is specific evidence that contravenes the assumption. Making
diagonal also makes the analytic form of M-step possible.
Incorporating structural constraints results in an M-step that is more complicated than the M-step in the conventional EM algorithms:
|
| (11) |
|
| (12) |
so that if an entry of
is constrained, the corresponding entry of M is 0, and 1 otherwise. The notation ^ represents element-wise product, also known as the Hadamard product. Equations (11) and (12) are quite general, so the formulas admit non-zero constrained values.
Equation (12) looks more complicated, but the calculation is actually explicit as long as we have
new from Equation (11). Equation (11) can be solved row-wise by the following procedure:
Explicit solution for
can be obtained by:
- for each row of
,
i, suppose ri = {indices of constrained elements of
i };
- delete all elements of
i and
, the ith row of
, whose indices are in ri;
- delete all rows and columns of
whose indices are in ri;
- solve
where the
notation denotes respective vectors and matrices after deletion in step 2 and 3.
Therefore the procedure above and Equation (12) plus the first two equations in Equation (10) constitute the modified M-step.
2.5 Data source
First, the synthetic data was generated by a system that had four states and four outputs and the parameters were as follows:
|
|
Second, to validate our method with real-world data, we chose the SOS DNA repair network of the Escherichia coli with eight essential genes. The SOS network is a highly conserved system and is a well studied network (Camas et al., 2006; Friedman et al., 2005). It consists of 30 genes, the master regulator being the lexA gene. Gene lexA inhibits all the rest of the SOS network's gene under normal condition, and when DNA damage is sensed, the normally suppressed genes become active. A diagram of SOS network with eight essential genes is shown in Figure 4.
The experimental data for the SOS system can be downloaded from Uri Alon's homepage. Ronen et al. (2002) used green fluorescent proteins (GFP) to track eight genes of the SOS network as they react to different irradiation levels, 5Jm–2 and 20Jm–2; each level has two samples and each sample has 50 evenly spaced time points. They monitored eight genes: uvrD, lexA, umuD, recA, uvrA, uvrY, ruvA and polB. They performed extensive data preprocessing on the raw data using hybrid Gaussian median filter and polynomial fit for smoothing. They also assumed that the rate of accumulation of GFP was proportional to transcript production. We shall make the same assumption.
To test our method on more complex models, we decided to use a biologically inspired artificial system by Zak et al. (2003). They used stochastic simulation to simulate gene expressions and protein interactions. The data had 550 time points, but very few time-course data currently available are that long. So we sampled one time point out of every five consecutive time points to obtain a time series of 110 time points. A complete diagram of the artificial system is shown in Figure 5, and the parameters are constrained as follows:
|
|
|
|
| 3 RESULTS |
|---|
|
|
|---|
To examine the consistency of the constrained EM approach and to test its biological applicability, we applied our new method to two sets of synthetic data and the SOS DNA repair network data. We first examined the distributions of standardized residuals (or errors) of the Kalman filter for the synthetic data and found that they largely resemble Gaussian distributions (these results are presented in Supplementary Materials). For the synthetic data sets, we compared the predictive power and estimation precision of our constrained EM and the unconstrained EM through prediction errors and confidence intervals. For the SOS data, two replications were not enough for bootstrapping so no confidence interval could be derived.
To compare the predictive power of identified models from conventional EM and models from our constrained EM, we used cross-validation. For our synthetic data, we generated a sample of 200 time points, of which the first 100 were used for parameter estimation, and the rest for prediction. Of Zak's data, we selected 110 time points, out of which the first 80 time points were used for estimation and the remaining 30 for validation. The error in prediction is defined to be the difference between the measured gene expression levels and the predicted gene expression levels by the estimated model. Using constrained EM, the error in prediction for four gene expression data in the artificial network from t = 101 to t = 200 were calculated (Fig. 1). In Supplementary Figure 3, for comparison, we plotted the error in prediction by conventional EM. From the plots, we can see that conventional EM starts off with small errors, and sometimes it produces smaller errors initially than our method. But very quickly it strays into wrong directions with larger and larger errors, which makes models estimated from conventional EM having little predictive power. This is even more striking with Zak's data. The constrained EM algorithm yields models with prediction error at worst around 100%, with the output of the worst error plotted in the top panel of Figure 3, while unconstrained EM generates models whose prediction error grows without bound as seen in the bottom panels of Figure 3. In fact, we were forced to cut the plot into three panels so that later values would not obscure earlier ones. This demonstrates that conventional EM, which does not take into account the structure, tends to over-fit.
|
To further test our method we considered the variation in the estimated values. Since many parameter values could fit data equally well, confidence intervals are usually preferred over a single estimation. Using bootstrapping, we were able to estimate 95% confidence intervals for all free parameters and the eigenvalues of the estimated system [same as the eigenvalues of matrix A in Equation (1)]. The idea is that tighter intervals are better than wide intervals and that the estimation methods need to get eigenvalues roughly right, since they are invariant to similarity transformation and they determine important dynamical properties. We found that for almost all free parameters, our constrained EM produced much tighter bounds than those produced by unconstrained EM (see Supplementary Material Table 1 for selected parameters), and for all eigenvalues, our method uniformly produced better bounds (presented in Supplementary Table 2). As we can see from Supplementary Table 2, estimated eigenvalues for our simulated model all include the true eigenvalues 0.8, but our method has much tighter bounds. For Zak's model, unconstrained EM resulted in one eigenvalue having wide interval while constrained EM all have tight bounds. That wide interval, much of it outside the unit circle, could be a sign of a misestimated eigenvalue, since it could account for the unbounded prediction error we observed. A misestimated eigenvalue is a serious concern because the trajectory of an LTI system largely depends on its eigenvalues.
To examine the biological applicability of our method and to evaluate its performance with real world data, we applied our constrained EM to the SOS DNA repair network data. Since there are only 50 time points available, we used the last 10 time points for validation purpose. The error in prediction of the seven-gene expression data in the SOS DNA repair network by the constrained and the conventional EM are plotted in Figure 2 and in Figure 4 of Supplementary Materials. Two features of the plots can be observed. First, the errors oscillate within a certain range rather than blow up in one direction. This suggests that with fewer time points conventional EM performs better than with more data, a classic sign of over-fitting. Second, the constrained EM's estimated model has much smaller deviation from measurement than conventional EM's estimated model, once again demonstrating constrained EM's merits. The figures show that the constrained EM better approximates the true model of the SOS DNA repair network and generate models with superior predictive power.
|
|
|
|
| 4 DISCUSSION |
|---|
|
|
|---|
It is increasingly recognized that dynamics of genetic networks can impact phenotypes, and the study of dynamics can provide new insights into diseases and potential treatments for the diseases (Strohman, 2002). But before we can analyze genetic networks and propose possible treatments, we need a quantitative model that can predict, with reasonable accuracy, the dynamical behaviors of the genetic network. In this report, we proposed a new method that can learn such a model, a linear state-space system, and tested on both synthetic and real experimental data.
Researchers so far have used the linear state-space model primarily in two ways. Either they are used in black-box dynamical modeling or in inferring a genetic network's structure. Wu et al.'s (2004) is representative of the first approach, where the internals of the model is not important but only the dynamical behaviors is, where the number of states is a parameter depending on the data, and where the states have no biological interpretation. This black-box approach is perfectly valid when we have little information regarding the mechanistic details. However, sometimes we have structural information for some genetic networks, either through existing knowledge in the literature or ChIP-on-chip experiments, in which case parameter estimation should take the known structure into account to get a better model. A better model is in the sense that the estimated model in the end does not contradict known biological facts represented by the structure and possess better predictive power. Another use of linear state-space modeling is to infer a genetic network's structure. Using the linear model for network inference is especially appealing because the structure and the parameters have a simple relationship: there is a straightforward mapping between parameters and edges in the network. A naive approach would be to estimate a model with all parameters free to be estimated and to consider those parameters whose values are below a threshold as really being zeros and thereby signifying no interaction. We have seen that conventional EM tends to over-fit and produce a model that has limited predictive power. One approach to alleviate over-fitting is to enforce sparsity on the parameters (dAlché-Buc et al., 2005). Another approach is to separate parameter estimation and structural inference, to incorporate structural constraints into parameter estimation, as in Gennemark and Wedlin (2007). Our method can be seen as the parameter estimation part of the overall system identification, which also infers structure.
In order to incorporate structural constraints into an identification of genetic networks in this report, we presented a framework where certain parameters are fixed while others remain variable. Imposing constraints on parameters lead to a set of non-linear equations to be solved in the M-step. In order to have fast convergence, we intentionally avoided generalized EM (which is slower than our method) or using iterative solutions to the set of non-linear equations in the M-step (which can also be very slow). Instead, with only mild assumptions about noise, we obtained a closed-form, decoupled, explicit solution to the equations arising from the maximization of likelihood in the M-step.
To evaluate the performance of our new method, we applied it to two synthetic data sets and a real world SOS DNA repair network data set. From the results, we can see positive features of incorporating structural constraints in general and constrained EM in particular. First, by incorporating known connectivity between genes, we have better biological realism along with a biological interpretation for the identified model. Second, a state-space model with structural constraints exploits the sparse nature of genetic networks to reduce the number of parameters that need to be estimated. We know that reverse engineering of genetic networks is a grossly underdetermined task, and that we have too few data for a fully connected network. However, the problem can be alleviated if we can effectively incorporate more information by imposing structural constraints. Increasingly, connectivity information is becoming available for more genetic networks (Iyer et al., 2001; Lee et al., 2002; Ren et al., 2000), and it becomes necessary for us to use this information. Third, by using analytic, explicit solution to the maximization of likelihood, we have fast computation, so that our method can be easily incorporated into larger structural inference algorithms. The last and the most important feature of our work is that by incorporating structural constraints into an identified model we identify models that are not prone to over-fitting, that have better predicative power than unconstrained models, and, therefore, are closer to the true model. Cross-validation using both the synthetic data and the SOS data demonstrated this point. With better predictive power, the identified model can then be used for the analysis of dynamical properties or for the design of control strategies for the genetic network under study. There is considerable work that remains. On the modeling front, our method does not take into account the signs of parameters, which represent whether regulation is activation or inhibition. This is crucial information that should be incorporated as another set of constraints. A problem of a more theoretical nature is that we have as yet no principled way to determine the order of each gene, as too high an order can result in over-fitting and too low an order leaves too much unmodeled dynamics. Finally, as LDS is only a linear approximation of gene regulation systems, care must be exercised in extrapolating results presented here to real world expression data. Much more work on diagnostics and model validation remains as well.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Olga Troyanskaya
Received on May 9, 2007; revised on November 14, 2007; accepted on December 14, 2007
| REFERENCES |
|---|
|
|
|---|
Ahn JH, Oh JH. A constrained EM algorithm for principal component analysis. Neural Comput (2003) 15:57–65.[CrossRef][Web of Science][Medline]
Akutsu T, et al. Identification of genetic networks from a small number of gene expression patterns under the Boolean network model. Pac. Symp. Biocomput (1999) 4:17–28.
Beal MJ, et al. A Bayesian approach to reconstructing genetic regulatory networks with hidden factors. Bioinformatics (2005) 21:349–356.
Camas FM, et al. Autogenous and nonautogenous control of response in a genetic network. Proc. Natl Acad. Sci. USA (2006) 103:12718–12723.
Chen C-T. Linear System Theory and Design. (1999) New York: Oxford University Press.
Chen T, et al. Modeling gene expression with differential equations. Pac. Symp. Biocomput (1999) 4:29–40.[Medline]
dAlché-Buc F, et al. A Dynamic model of gene regulatory networks based on inertia principle. In: Bioinformatics Using Computational Intelligence Paradigms.—Seiffert U, Jain LC, Schweizer P, eds. (2005) Berlin: Springer. 93–118.
DHaeseleer P, et al. Linear modeling of mRNA expression levels during CNS development and injury. Pac. Symp. Biocomput (1999) 4:41–52.
Dempster AP, et al. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. [Ser B] (Methodological) (1977) 39:1–38.
Dojer N, et al. Applying dynamic Bayesian networks to perturbed gene expression data. BMC bioinformatics (2006) 7:249.[CrossRef][Medline]
Friedman N, et al. Using Bayesian networks to analyze expression data. J. Comput. Biol (2000) 7:601–620.[CrossRef][Web of Science][Medline]
Friedman N, et al. Precise temporal modulation in the response of the SOS DNA repair network in individual bacteria. PLoS Biol (2005) 3:e238.[CrossRef][Medline]
Gennemark P, Wedelin D. Efficient algorithms for ordinary differential equation model identification of biological systems. IET Systems Biol (2007) 1:120–129.[CrossRef]
Ghahramani Z, Hinton GE. Parameter Estimation for Linear Dynamical Systems. (1996) Canada: University of Toronto.
Gibson S, Ninness B. Robust maximum-likelihoode estimation of multivariable dynamic systems. Automatica (2005) 41:1667–1682.[CrossRef][Web of Science]
Husmeier D. Reverse engineering of genetic networks with Bayesian networks. Biochem. Soc. Trans (2003) 31:1516–1518.[Web of Science][Medline]
Iyer VR, et al. Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature (2001) 409:533–538.[CrossRef][Medline]
Kailath T, et al. Linear Estimation. (2000) New Jersey: Prentice Hall, Upper Saddle River.
Kauffman S, et al. Random Boolean network models and the yeast transcriptional network. Proc. Natl Acad. Sci. USA (2003) 100:14796–14799.
Lee TI, et al. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science (2002) 298:799–804.
Lemmens K, et al. Inferring transcriptional modules from ChIP-chip, motif and microarray data. Genome Biol (2006) 7:R37.[CrossRef][Medline]
Li Z, et al. Using a state-space model with hidden variables to infer transcription factor activities. Bioinformatics (2006) 22:747–754.
Murphy K, Mian S. Modelling Gene Expression Data using Dynamic Bayesian Networks. (1999) Berkeley, California: University of California.
Murphy K, Mian S. Dynamic Bayesian Networks: Representation, Inference and Learning. (2002) Computer Science Division. UC Berkeley.
Peer D, et al. Inferring subnetworks from perturbed expression profiles. Bioinformatics (2001) 17(Suppl. 1):S215–S224.[Abstract]
Rangel C, et al. Modeling T-cell activation using gene expression profiling and state-space models. Bioinformatics (2004a) 20:1361–1372.
Rangel C, et al. Modeling genetic regulatory networks using gene expression profiling and state space models. In: Applications of Probabilistic Modelling in Medical Informatics and Bioinformatics.—Husmeier D, Robert S, Dybowski R, eds. (2004b) Springer-Verlag. 269–293.
Rangel C, et al. Modeling biological responses using gene expression profiling and linear dynamical systems. (2001) Proceedings of the 2nd International Conference on Systems Biology. Madison, WI: Omipress. 248–256.
Ren B, et al. Genome-wide location and function of DNA binding proteins. Science (2000) 290:2306–2309.
Ronen M, et al. Assigning numbers to the arrows: parameterizing a gene regulation network by using accurate expression kinetics. Proc. Natl Acad. Sci. USA (2002) 99:10555–10560.
Shumway RH, Stoffer DS. An approach to time series smoothing and forecasting using the EM algorithm. J. Time Series Anal (1982) 3:253–264.[CrossRef]
Strohman R. Maneuvering in the complex path from genotype to phenotype. Science (2002) 296:701–703.
Welling M, Weber M. A constrained EM algorithm for independent component analysis. Neural Comput (2001) 13:677–689.[CrossRef][Web of Science][Medline]
Wu FX, et al. Modeling gene expression from microarray expression data with state-space equations. Pac. Symp. Biocomput (2004) 9:581–592.
Wu LSY, et al. An algorithm for estimating parameters of state-space models. Stat. Probability Lett (1996) 28:99–106.[CrossRef]
Yamaguchi R, Higuchi T. State-space aproach with the maximum likelihood principle to identify the system generating time-course gene expression data of yeast. Intl J. Data Mining and Bioinformatics (2006) 1:77–87.[CrossRef]
Yamaguchi R, et al. Finding module-based gene networks with state-space models. IEEE Signal Process. Mag (2007) 24:37–46.[CrossRef]
Zak DE, et al. Importance of input perturbations and stochastic gene expression in the reverse engineering of genetic regulatory networks: insights from an identifiability analysis of an in silico network. Genome Res (2003) 13:2396–2405.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||















