Skip Navigation


Bioinformatics Advance Access originally published online on April 26, 2007
Bioinformatics 2007 23(13):1623-1630; doi:10.1093/bioinformatics/btm151
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/13/1623    most recent
btm151v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Fujita, A.
Right arrow Articles by Ferreira, C.E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Fujita, A.
Right arrow Articles by Ferreira, C.E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Time-varying modeling of gene expression regulatory networks using the wavelet dynamic vector autoregressive method

A. Fujita 1,2, J.R. Sato 1, H.M. Garay-Malpartida 2, P.A. Morettin 1, M.C. Sogayar 2 and C.E. Ferreira 1,*

1Institute of Mathematics and Statistics, University of São Paulo, Rua do Matão, 1010 – São Paulo, 05508-090, SP, Brazil and 2Chemistry Institute, University of São Paulo, Cell and Molecular Biology Laboratory, Av. Lineu Prestes, 748 – São Paulo, 05508-900, SP, Brazil

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Motivation: A variety of biological cellular processes are achieved through a variety of extracellular regulators, signal transduction, protein–protein interactions and differential gene expression. Understanding of the mechanisms underlying these processes requires detailed molecular description of the protein and gene networks involved. To better understand these molecular networks, we propose a statistical method to estimate time-varying gene regulatory networks from time series microarray data. One well known problem when inferring connectivity in gene regulatory networks is the fact that the relationships found constitute correlations that do not allow inferring causation, for which, a priori biological knowledge is required. Moreover, it is also necessary to know the time period at which this causation occurs. Here, we present the Dynamic Vector Autoregressive model as a solution to these problems.

Results: We have applied the Dynamic Vector Autoregressive model to estimate time-varying gene regulatory networks based on gene expression profiles obtained from microarray experiments. The network is determined entirely based on gene expression profiles data, without any prior biological knowledge. Through construction of three gene regulatory networks (of p53, NF-{kappa}B and c-myc) for HeLa cells, we were able to predict the connectivity, Granger-causality and dynamics of the information flow in these networks.

Contact: cef{at}ime.usp.br

Supplementary information: Additional figures may be found at http://mariwork.iq.usp.br/dvar/


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
In order to understand cell functioning as a whole, it is necessary to uncover, at the molecular level, which genes are expressed at each time point and how their products interact. These interactions between gene products are called gene regulatory networks. Due to the high number of genes involved in these networks, which involve activating or suppressing feedback loops, it is difficult to analyze and understand their dynamics in vitro or in vivo. As a consequence, mathematical and statistical approaches for modeling and simulation of these networks have become a field of intensive research.

In general, these mathematical methods are based on data arising from high-throughput technologies, such as gene expression analysis using DNA microarrays, which allows to simultaneous analysis of up to thousands of genes over different periods of time or under different pharmacological conditions or genetic backgrounds.

Several gene regulatory networks were proposed during the last few years, most of which using Bayesian networks (Dojer et al., 2006; Friedman et al., 2000; Friedman, 2004; Imoto et al., 2002; Tamada et al., 2003), Structural Equation Models (Xiong et al., 2004), Probabilistic Boolean Networks (Akutsu et al., 2000; Pal et al., 2005; Shmulevich et al., 2002), Graphical Gaussian Models (Schäfer and Strimmer, 2005), Fuzzy controls (Woolf and Wang, 2000) and Differential Equations (Mestl et al., 1995).

Although these methods allowed adequate modeling of several regulatory networks and inferring association of genes, they do not yield causal relationships from gene expression data. In general, causality can be assessed by biological information, however, sometimes, no a priori biological information is available to provide this information. Moreover, the greatest interest relies on investigating novel relationships rather than already known associations.

A common approach for inferring relationships between signals is the Granger causality, introduced by Granger (1969), in which one gene X(t) is said to be causal for another gene Y(t) if the prediction of Y(t) using all relevant information available at time t–1 apart from X(t) can be improved by adding the available information about X(t).

In addition, to the best of our knowledge, there are only a few models described to date which do not assume stationarity, implying that the topology of the network does not change with time (Rao et al., 2005; Yoshida et al., 2005a). However, it is well known that cellular gene expression networks are not static, i.e. the topology of the network changes throughout the cell cycle. In order to model and understand these changes, time series expression data provide more detailed information than single time point expression profile.

Here, we describe the classical vector autoregressive (VAR) method to search for relationships of Granger causality between genes by analyzing gene expression levels without prior biological knowledge (Mukhopadhyay and Chatterjee, 2007). In addition, we present an improved VAR method, called dynamic vector autoregressive model (DVAR) to infer the dynamics of the networks’ topology, i.e. changes in this Granger-causal connectivity with time. In addition, we show the results obtained when we applied DVAR to three different networks generated from gene expression data extracted from microarrays experiments carried out throughout the HeLa cells cycle.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Granger causality (Granger, 1969) was first introduced in econometric research to analyze the relationships and influences among macroeconomic time series (e.g. income, exchange rate, etc). It is based on the assumption that a cause cannot ever occur after its effect. Therefore, temporal relationships as delay or precedence may contain information about causality. Formally, a time series xt is said to Granger-cause a time series yt, if the prediction error of yt at time t given all past information (xt–1, xt–2, ... yt–1, yt–2, ...) is less than considering only the information about past values of yt (yt–1, yt–2, ...). Hence, past values of xt allow forecasting yt. Note that as this relationship may not be reciprocal, the direction of information flow is embedded in Granger-causality identification. Nevertheless, Granger-causality does not imply ‘effective causality’. The former is based solely on prediction and numerical information, while the latter is deeply related to influences of one entity over the other. On the other side, due to its simplicity, Granger causality may be useful to infer or suggest effective causality.

Granger causality is usually identified by estimating vector autoregressive models (VAR). A VAR of p-order and k-dimensional time series is given by


Formula 1

(1)
where yt = (y1t, ... , ykt)' is a (K x 1) random vector, the Ai are (K x K) autoregressive coefficient matrices, vt = (v1t, ... , vkt)' is a (K x 1) vector of intercept and ut = (u1t, ... , ukt)' is a k-dimensional error vector of random variables with zero mean and covariance matrix {sum}.

Due to its simplicity, the VAR model allows a simple way of identifying Granger causality, but considering only linear relationships. A necessary and sufficient condition for yp being not Granger-causal for yjt is if and only if ajli = 0 (i = 1, 2, ... , p). Thus, Granger-non-causality may be identified by looking at the autoregressive matrices of VAR models. The direction of causality can be interpreted as the direction of information flow and is not necessarily reciprocal.

This approach can be applied to the analysis of cell cycle time series gene expression data to identify functional connectivity between gene products by simply testing the significance of the components of Ai. However, a limitation of VAR models is the requirement for the stationarity condition to be valid. In practice, this means that the influences between the time series must be the same throughout time. Hence, classical VAR models cannot be used to identify the dynamics of the network, i.e. it is not possible to analyze the variation of connectivity evolving in time.

Upon analyzing a similar problem in neuroscience studies, Sato et al. (2006) suggest the application of the DVAR (dynamic VAR) to overcome this limitation. The DVAR model is defined by


Formula 2

(2)
where ut is a vector of random variables with zero mean and covariance matrix {sum}(t) given by


Formula 3

(3)
and v(t) and Ai(t) (i = 1, 2, ... , p) are a trend vector and coefficient matrices given by


Formula 4

(4)


Formula 5

(5)

Using this time-variant structure for the intercept (the intercept is related to the time-varying mean gene expression signal), autoregressive coefficients and covariance matrices, it is possible to model the regulatory network in a dynamic fashion to analyze the information flow along the cell cycle.

As proposed by Sato et al. (2006), one may consider the wavelet expansion of functions in order to estimate the time-variant functions in v(t), Ai(t) and {sum}(t). The main idea is that any function f(t) with Formula can be expanded as


Formula 6

(6)

Hence, a function f(t) can be represented by a linear combination of wavelet functions {psi}j,t (t), where index j and k are related to scale and time-location, respectively. Therefore, considering the wavelet expansion, the autoregressive coefficient functions almi(t) may be written as


Formula 7

(7)

In practice, we use an approximation, by truncating the wavelet expansion, i.e.


Formula 8

(8)
where the time series extension T and Formula Formula are the wavelet coefficients for the i-th autoregressive coefficient function almi(t). Basically, estimation of the wavelet dynamic autoregressive parameters is accomplished by estimating each of the wavelet coefficients Formula for all the autoregressive functions in the matrices Ai(t) i = 1, 2, ... , p the trend functions in v(t) and the covariance functions in {sum}(t).

A point to be highlighted is the determination of the maximum resolution scale parameter J, which controls the truncation of wavelet expansion. The larger the value of J, the noisier will be the curves estimates. An inherent problem in non-parametric estimation of curves is that the product between bias and variance of estimates is a constant (uncertainty principle). In other words, bias reduction imply in variance increase and vice-versa. An objective criterion to select the best J could be obtained by cross-validation. On the other hand, one may choose the degree of smoothness based on expected changes, according to a priori biological information or desired detail level. In our analysis, we set J as equal to four because we have a priori knowledge that the time-varying connectivity varies along the four different cell cycle phases (S, G2, M, G1).

Sato et al. (2006) proposed an iterative generalized least square estimation composed by a loop of two stages. The first stage consists of estimating the coefficients of wavelets expansions in Ai(t) and v(t) using a generalized least squares estimation. In the second stage, the squared residuals obtained in the previous step are used to estimate the wavelets expansion functions in covariance matrix {sum}(t). See Sato et al. (2006) for further details.

Finally, the selection of the wavelet family is a point to be highlighted. Asymptotically, it will not make any difference in convergence or estimation. However, better estimates in finite sample can be achieved if the wavelet family is adequately chosen. For most biological applications, in which continuous variation is expected, we suggest the use of D4 or D8 wavelets (Daubechies, 1988).

2.1 Algorithms
Consider the following matrices:


Formula 9

(9)


Formula 10

(10)


Formula 11

(11)
the row-Kronecker product is defined by


Formula 12

(12)
and also


Formula



Formula 13

(13)
where 1Tp column vector of (Tp) ones and IK is the identity matrix with k rows.

Let the vector Z = vec(Yt), where the vec operator concatenates all columns of a matrix. The DVAR model can be written as


Formula 14

(14)

A vector of parameters ß contains the wavelet coefficients Formula for all connectivity functions. Let the covariance matrix of {varepsilon} be denoted by {Gamma}. The generalized least square estimator for the parameters of the model is given by


Formula 15

(15)

An estimator for {Gamma} can be obtained considering a wavelet smoothing of the squared/cross residuals. Formula .

The hypothesis of connectivity significance or any linear combination of parameters in ß can be achieved simply by applying the well-known Wald test, considering an adequate contrast matrix. Details about the Wald test for linear contrasts can be found in Graybill (1976).

2.2 Implementation
We implemented our program using R, a statistical computing environment. Computation was conducted under a Pentium IV CPU 3.06 GHz, 2.5 GB of RAM.

2.3 Application of the DVAR method to actual data
We applied this method to HeLa cell cycle gene expression data collected by Whitfield et al. (2002). These data contain three complete cell cycles, i.e. 48 time points distributed at intervals of 1 h, where the HeLa cell cycle is ~16 h. Therefore, we have used all of the 48 points and assumed them as triplicates at each time point, i.e. each time point is represented by three points. This data may be downloaded at: http://genome-www.stanford.edu/Human-CellCycle/HeLa/.

In our analysis, we modeled three networks, one of which is composed by the p53, gadd45a and p21 genes; the second one by NF-{kappa}B, ikk{alpha}, nemo and nik and the third one by c-myc, reck, src and timp-2, since these networks play important roles in gene regulation and progression of diverse types of tumors.


    3 RESULTS AND DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Here, we propose a new statistical gene network estimation method based on the time-varying DVAR model.

Figures 1, 2 and 3 illustrate the networks estimated using the proposed method (DVAR) and the general VAR method, which we believe is the most appropriate and classical method to model networks based on time series data.


Figure 1
View larger version (11K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Application of DVAR and VAR models to a network composed by the p53, gadd45a and p21 genes. DVAR: significant connectivities (P < 0.05). The connectivity functions are shown in each arrow.

 

Figure 2
View larger version (12K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Application of DVAR and VAR models to a network composed by the nf-{kappa}b, ikk{alpha}, nemo and nik genes. DVAR: significant connectivities (P < 0.05). The connectivity functions are shown in each arrow. VAR: dashed arrows are significant connectivites (P < 0.1) and full arrows are significant connectivities (P < 0.05).

 

Figure 3
View larger version (10K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. Application of DVAR and VAR models to a network composed by the reck, c-myc, src and timp-2 genes. DVAR: significant connectivities (P < 0.05). The connectivity functions are shown in each arrow. VAR: dashed arrows are significant connectivites (P < 0.1) and full arrows are significant connectivities (P < 0.05).

 
As a validation approach, we analyzed three gene modules, which are regulated by three transcription factors, namely: p53, nuclear factor-{kappa}B (NF-{kappa}B) and c-Myc in the context of cell cycle progression of transformed HeLa cells.

The results obtained by applying our approach to a set of three genes (p53, p21 and gadd45a) are illustrated in Figure 1. In Figure 2, the network composed by NF-{kappa}B, IKK{alpha}, NEMO and NIK is shown. In Figure 3, we present the set of genes comprised by c-myc, reck, src and timp-2. Since we are interested in analyzing actual connections, we considered three or four as an adequate number of genes for the size of our data set to insure that the power of the test is not being lost. The above mentioned seven genes, it begins to display singularity numeric computational problems. Therefore, if one is interested in constructing large networks, a simple solution is to make a pairwise comparison and control the false positives rate applying FDR (Benjamini and Hochberg, 1995). It is important to notice that, in this approach, the causalities inferred are not partial. We have applied the order one DVAR due to the low number of time points in our time series data. It is important to point out that as the number of genes to be analyzed and/or the order of the DVAR increases, the power of the test decreases, since the number of parameters to be estimated increases.

The p53 is a tumor suppressor protein that plays a critical role in cell cycle and apoptosis control. The p53 binds to the enhancer/promoter elements of downstream target genes and regulates their transcription, by triggering cellular programs that account for most of its tumor-suppressor functions. Two of these target genes are p21 and gadd45a, which are transcriptionally activated by p53 at the G1-S and G2-M cell cycle transitions, respectively, inducing growth inhibition and/or apoptosis. However, a recent study demonstrates a novel finding, namely, that GADD45A is able to generate a positive p38 kinase-dependent feedback signal that is essential for maintaining p53 protein stability and transcriptional activity (Jin et al., 2002). Indeed, it was observed that GADD45A may be able to work as an upstream effector of p53 protein, since p53 induction is diminished in gadd45a knock-out cells (Hildesheim et al. 2002). Our in silico gene expression analysis (Fig. 1) is suggesting that, in HeLa cells, p53 is also transcriptionaly regulated in a GADD45A-dependent fashion, increasing the connectivity at the G1 phase of the cell cycle. Consistently, the self-regulating connectivity of GADD45A at the G2 phase could be the result of a p53-independent activation, such as that observed in quercetin-treated HeLa cells (Yoshida et al., 2005b). This regulation could be involving other well known GADD45 regulators such as BRCA1 (Jin et al., 2000), oct-1 (Takahashi et al., 2001) and NF-YA (Jin et al., 2001).

Several explanations could be offered to the absence of a Granger-causal link from GADD45A to p53, namely: (1) the time-series length is not large enough; (2) the Granger-cause could be happening in order different from one; (3) in HeLa cells this regulation could not be happening; (4) the regulation is not at the transcriptional level. On the other hand, the DVAR prediction of a positive connection between p21 and p53 at the G1-S phase of the cell cycle, is in agreement with experimental evidence of increased p53-dependent p21 gene expression at this point, inducing cell cycle arrest and restricting aberrant cell growth in response to DNA damage, oncogene activation, hypoxia and loss of normal cell contacts (el-Deiry et al., 1993; Lohrum and Vousden, 1999). However, at the same time, a negative feedback appears to be present in HeLa cells, since a negative self-regulation of p21 is almost evident at the G1 cell cycle phase, in the apparently independent and contradictory p53-mediated regulation. Since HeLa is a transformed cell line, this could be one of the signaling pathways activated towards tumorigenesis and/or tumor progression. Figure 4 shows the gene expression profile of the actual data composed by p21, p53 and gadd45a and the fitted values by DVAR. The fitted values for the other genes could be accessed in the Supplementary material.


Figure 4
View larger version (7K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4. Gene expression profile plot of actual data and fitted values by DVAR to the p21, p53 and gadd45a genes.

 
The NF-{kappa}B, a stress-regulated transcription factor belonging to the Rel family, plays a pivotal role in the control of inflammatory and innate immune responses.

Once activated, mostly through phosphorylation and subsequent degradation of its inhibitor, namely, the I{kappa}B protein, NF-{kappa}B translocates to the nucleus, where it modulates the expression of a variety of genes, including those encoding cytokines, growth factors, acute phase response proteins, cell adhesion molecules and several cell apoptosis regulators. More recently, NF-{kappa}B activation has been related to multiple aspects of tumorigenesis, including the control of cell proliferation and migration, cell cycle progression and apoptosis.

Whereas only limited information is available regarding the direct involvement of NF-{kappa}B in cell-cycle regulation, it was also found that the levels of NF-{kappa}B activation are linked to signaling that controls cell-cycle progression in HeLa cells. However, there are controversial observations of the functional consequence of this regulation. Some authors suggest that inhibition of NF-{kappa}B causes impairment of cell cycle progression and delayed G1/S transition (Kaltschmidt et al., 1999). Others hold that overexpression of NF-{kappa}B prevents G1/S cell cycle transition in HeLa cells (Bash et al., 1997). Our in silico gene expression analysis (Fig. 2) indicates that this augmented activation of NF-{kappa}B in the G1/S transition correlates with increased activity of IKK{alpha}, a natural repressor of I{kappa}B-dependent inhibition of NF-{kappa}B. As predicted by DVAR, IKK{alpha} activity is positively connected with NEMO, simultaneously affecting the NEMO-mediated negative regulation of I{kappa}B, apparently in order to activate NF-{kappa}B, thus promoting cell cycle progression at the G1/S transition. These proliferative mechanisms are increased by an NF-{kappa}B-mediated negative regulation of I{kappa}B at the G0/G1 transition, which are enforced by itself I{kappa}B-negative regulation at G1/S transition phase. However, DVAR also indicates an I{kappa}B negative regulation of NF-{kappa}B at this point, which could be operating as a negative feedback towards impairment of NF-{kappa}B transcriptional activity at the G1/S transition. Therefore, it is clear that the balance between NF-{kappa}B-activated pro- and/or anti-proliferative stimuli could be determining the cell fate towards cell cycle progression/arrest or triggering programmed cell death. Taken together, as reported by experimental data, our in silico results indicate that the NF-{kappa}B transcription factor participates (directly or indirectly) of the control of a complex pattern of HeLa cell cycle regulators in a bidirectional fashion.

The c-myc oncogene acts as a pluripotent modulator of transcription during normal cell proliferation. Deregulated c-myc activity in cancer leads to excessive activation of its downstream pathways, and may also stimulate changes in gene expression and cellular signaling that are not observed under non-pathological conditions (Wade and Wahl, 2006). The c-myc plays a central role in the G1/S transition, inducing several genes which are involved in cell cycle control (Matsumura et al., 2003). The tissue inhibitors of metalloproteinases (TIMPs) and RECK maintain the balance between matrix destruction and formation. TIMPs are capable of inhibiting proliferation of endothelial cells in response to angiogenic stimuli. It is likely that TIMPs and RECK may play a significant role, controlling the invasive phenotype of malignant tumors. At present, the relationship between c-myc and inhibitors of metalloproteinases during the cell cycle progression is poorly understood. Down-regulation of the reck mRNA was observed in NIH-3T3 cells transformed by c-myc, and src oncogenes, suggesting that the reck gene is a common negative target for oncogenic signals (Takahashi et al., 1998). Indeed, in several types of cancer, an inverse correlation between the level of reck expression and the degree of malignancy of the tumors (such as invasion, metastasis, recurrence, and low patient survival rates) are observed. However, the DVAR model predicts (Fig. 3) The c-myc-mediated activation of RECK at the M/G1 transition in the HeLa cell cycle, in the immediately previous phase of self-regulation of c-myc. This is in agreement with hepatoma gene expression analysis, in which reck expression is higher in the tumor when compared to the surrounding tissues (Noda et al., 2003). Additionally, data obtained at our laboratory, using Real Time RT-PCR gene expression analysis of astrocytomas from around 50 patients, reveal a positive correlation between c-myc and reck expression (Fig. 5). Indeed, since apparently there is no feedback control of c-myc expression by reck, it was not possible to establish a direct relationship between c-myc and reck expression in HeLa cells cycle control. On the other hand, DVAR also identifies a positive RECK-mediated regulation of TIMP-2 and, simultaneously, a negative regulation of SRC kinase at the G2/M transition, in contrast to experimental data obtained in primary culture cells (Oh et al., 2006). Considering that the src oncogene is involved in tumor progression and TIMP-2 is a cell proliferation inhibitor, reck could be a target for c-myc-regulated apoptosis induction c-myc expression, as referred in the literature (Matsumura et al., 2003; Wade and Wahl, 2006). These DVAR revealed in silico evidence to support the hypothesis that the site, the timing, and the level of reck expression is likely to be under the tight control by developmental programs as well as various environmental conditions.


Figure 5
View larger version (5K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 5. Plot representing the gene expression data of reck and c-myc derived from ~50 human astrocytomas. The line represents the regression curve between these genes (P-value < 0.001).

 
In general, at the molecular level, causation occurs at a defined time interval, and, more precisely, at specific phases of the cell cycle. It is interesting that the power of the test using DVAR is greater than that using VAR. This is due to the narrow interval of time in which the causation occurs, which may be observed in Figure 1, between gadd45a -> p53; in Figure 2, between reck -> src, c-myc -> reck, and in the feedback to reck and in Figure 3, between ikk{alpha} -> nemo, nemo -> i{kappa}b and nf-{kappa}b -> i{kappa}b, indicating that these connectivities are time-varying.

We have also simulated a large regulatory network composed by 1000 genes, from which we inferred a pairwise network by applying DVAR. The computational time was of ~8 h in a Pentium IV 3.06 Ghz, showing that this approach is also computationally feasible.

The main advantage of the wavelet-based DVAR, compared with other connectivity models is that it avoids stationarity as reported by (Mukhopadhyay and Chatterjee, 2007), and linearity assumptions.

It is well known that different cell cycle phases or conditions involve different circuitries, and it is widely accepted that the cell exhibits dynamic alterations in its connectivity. Hence, adoption of probably unwarranted stationarity assumptions may lead to spurious results. Furthermore, the DVAR approach does not require model prespecification, unlike the structural equation modeling. Therefore, the DVAR modeling is unbiased when compared to models which require a priori connectivity knowledge, rendering it possible to infer new connections, and, therefore, being useful to design experiments to find new targets for a specific gene. Differently from Graphical Gaussian Models, which operate with partial correlations, i.e. no directionality at the edges, here it is possible to infer causality based on the Granger causality concept. Another advantage when compared to classical models, such as the Boolean network, is the fact that discretization of gene expression to boolean variables is not necessary for DVAR. Therefore, there is no loss of information when compared to Boolean models. In addition, it is naturally applied to cycles, an advantage relative to networks based on DAGs (directed acyclic graphs), since it is well known that genetic regulatory networks maintain their control and balance through a number of positive/negative feedbacks (cycles). DVAR is always identifiable when the number of samples is larger than the number of estimated parameters because this method uses the lags in the regressions, differently from SEM, which uses simultaneous data.

Classical dynamic models are based on local fitting using a moving window. However, detection of dynamic changes by this approach may have poorer time-resolution and be less flexible than that achieved by wavelet-based methods.

In our data sets, the time grid was equally spaced, therefore, we applied the Daubechies Extreme Phase 16 wavelet which, due to orthogonality properties, reduces data multicolinearity and, consequently, reduces the variance of the estimators. Unfortunately, this is not the general case. When time grids are non equally spaced, one can deal with this problem using warped wavelets or explicit functions as Haar, Morlet, Shannon or Mexican hat, which do not require equispaced time data.

In summary, modeling gene regulatory networks is an essential step for understanding of cellular function and, consequently, to find new treatments for several kinds of diseases. The complexity generated by gene expression, variation with time along the cell cycle, registered by techniques such as DNA microarrays, are obstacles to classical regulatory network models. The DVAR technique presented here allows the analysis of different network structures for each time point, being useful to detect which gene is going to be activated/repressed. Differently from the VAR model, which is applied to infer linear Granger causalities, DVAR can infer non-linear time-dependent Granger causalities, thus increasing the power of the test and revealing more reliable causalities. As we showed earlier, by applying this technique to an actual microarray data set and obtaining plausible results, we demonstrate that this technique could be useful to uncover the pathways which are operating and when they are activated/repressed and, also, to compare the networks from different experimental conditions, such as reaction to drugs and antigens or comparisons between normal versus disease cells and tissues.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
This research was supported by FAPESP, CNPq, FINEP, CAPES and PRP-USP.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: John Quackenbush

Received on January 31, 2007; revised on March 27, 2007; accepted on April 15, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Akutsu T, et al. Algorithms for identifying Boolean networks and related biological networks based on matrix multiplication and fingerprint function. J. Comput. Biol, ( (2000) ) 7, : 331–343.[CrossRef][ISI][Medline].

    Bash J, et al. c-Rel arrests the proliferation of HeLa cells and affects critical regulators of the G1/S-phase transition. Mol. Cell. Biol, ( (1997) ) 17, : 6526–6536.[Abstract].

    Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R.Statist. Soc. B, ( (1995) ) 57, : 289–300..

    Daubechies I. Ortonormal bases of compactly supported wavelets. Commun. Pure Appl. Math, ( (1988) ) 41, : 909–996.[CrossRef].

    Dojer N, et al. Applying dynamic Bayesian networks to perturbed gene expression data. BMC Bioinformatics, ( (2006) ) 7, : 249.[CrossRef][Medline].

    el-Deiny, et al. WAF1, a potential mediator of p53 tumor suppression. Cell, ( (1993) ) 75, : 817–825.[CrossRef][ISI][Medline].

    Friedman N, et al. Using Bayesian networks to analyze expression data. J. comput. Biol, ( (2000) ) 7, : 601–620.[CrossRef][ISI][Medline].

    Friedman N. Inferring cellular networks using probabilistic graphical models. Science, ( (2004) ) 303, : 799–805.[Abstract/Free Full Text].

    Granger CWJ. Investigating causal relations by econometric models and cross-spectral methods. Econometrica, ( (1969) ) 37, : 424–438.[CrossRef].

    Graybill FA. Theory and Application of the Linear Model, ( (1976) ) North Scituate. Massachusetts, Duxbury Press..

    Hildesheim J, et al. Gadd45a protects against UV irradiation-induced skin tumors, and promotes apoptosis and stress signaling via MAPK and p53. Cancer Res, ( (2002) ) 62, : 7305–7315.[Abstract/Free Full Text].

    Imoto S, et al. Estimation of genetic networks and functional structures between genes by using Bayesian networks and nonparametric regression. Pac. Symp. Biocomput, ( (2002) ) 7, : 175–186..

    Jin S, et al. BRCA1 activation of the GADD45 promoter. Oncogene, ( (2000) ) 19, : 4050–4057.[CrossRef][ISI][Medline].

    Jin S, et al. Transcription factors Oct-1 and NF-YA regulate the p53-independent induction of the GADD45 following DNA damage. Oncogene, ( (2001) ) 20, : 2683–2690.[CrossRef][ISI][Medline].

    Jin S, et al. GADD45-induced cell cycle G2-M arrest associates with altered subcellular distribution of cyclin B1 and is independent of p38 kinase activity. Oncogene, ( (2002) ) 21, : 8696–704.[CrossRef][ISI][Medline].

    Jin S, et al. Gadd45a contributes to p53 stabilization in response to DNA damage. Oncogene, ( (2003) ) 22, : 8536–8540.[CrossRef][ISI][Medline].

    Kaltschmidt B, et al. Repression of NF-kappaB impairs HeLa cell proliferation by functional interference with cell cycle checkpoint regulators. Oncogene, ( (1999) ) 18, : 3213–3225.[CrossRef][ISI][Medline].

    Lohrum MA, Vousden KH. Regulation and activation of p53 and its family members. Cell Death Differ, ( (1999) ) 6, : 1162–1168.[CrossRef][ISI][Medline].

    Matsumura I, et al. E2F1 and c-Myc in cell growth and death. Cell Cycle, ( (2003) ) 2, : 333–338.[Medline].

    Mestl T, et al. A mathematical framework for describing and analyzing gene regulatory networks. J. Theor. Biol, ( (1995) ) 176, : 291–300.[CrossRef][ISI][Medline].

    Mukhopadhyay ND, Chatterjee S. Causality and pathway search in microarray time series experiment. Bioinformatics, ( (2007) ) 23, : 442–449.[Abstract/Free Full Text].

    Noda M, et al. RECK: a novel suppressor of malignancy linking oncogenic signaling to extracellular matrix remodeling. Cancer Metastasis Rev, ( (2003) ) 22, : 167–175.[CrossRef][ISI][Medline].

    Oh J, et al. TIMP-2 upregulates RECK expression via dephosphorylation of paxillin tyrosine residues 31 and 118. Oncogene, ( (2006) ) 25, : 4230–4234.[CrossRef][ISI][Medline].

    Pal R, et al. Intervention in context-sensitive probabilistic Boolean networks. Bioinformatics, ( (2005) ) 21, : 1211–1218.[Abstract/Free Full Text].

    Rao A, et al. Inferring time-varying network topologies from gene expression data. ( (2005) ) Proceedings of IEEE GENSIP's 05..

    Sato JR, et al. A method to produce evolving functional connectivity maps during the course of an fMRI experiment using wavelet-based time-varying Granger causality. NeuroImage, ( (2006) ) 31, : 187–96.[CrossRef][ISI][Medline].

    Schäfer J, Strimmer K. An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics, ( (2005) ) 21, : 754–764.[Abstract/Free Full Text].

    Shmulevich I, et al. Gene perturbation and intervention in probabilistic Boolean networks. Bioinformatics, ( (2002) ) 18, : 1319–1331.[Abstract/Free Full Text].

    Tamada Y, et al. Estimating gene networks from gene expression data by combining Bayesian network model with promoter element detection. Bioinformatics, ( (2003) ) 19, : ii227–ii236.[Abstract].

    Takahashi S, et al. Regulation of matrix metalloproteinase-9 and inhibition of tumor invasion by the membrane-anchored glycoprotein RECK. Proc. Natl Acad. Sci. USA, ( (1998) ) 95, : 13221–13226.[Abstract/Free Full Text].

    Takahashi C, et al. Involvement of the Oct-1 regulatory element of the gadd45 promoter in the p53-independent response to ultraviolet irradiation. Cancer Res, ( (2001) ) 61, : 1187–1195.[Abstract/Free Full Text].

    Wade M, Wahl GM. c-Myc, genome instability, and tumorigenesis: the devil is in the details. Curr. Top Microbiol. Immunol, ( (2006) ) 302, : 169–203.[ISI][Medline].

    Whitfield ML, et al. Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol. Biol. Cell, ( (2002) ) 13, : 1977–2000.[Abstract/Free Full Text].

    Woolf PJ, Wang Y. A fuzzy logic approach to analyzing gene expression data. Physiol. Genomics, ( (2000) ) 3, : 9–15.[Abstract/Free Full Text].

    Xiong M, et al. Identification of genetic networks. Genetics, ( (2004) ) 166, : 1037–1052.[Abstract/Free Full Text].

    Yoshida R, et al. Estimating time-dependent gene networks from time series DNA microarray data by dynamic linear model with Markov switching. ( (2005a) ) Proceedings of IEEE 4th Computational Systems Bioinformatics. 289–298..

    Yoshida T, et al. Quercetin induces gadd45a expression through a p53-independent pathway. Oncol. Rep, ( (2005b) ) 14, : 1299–303.[ISI][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
R. Nagarajan and M. Upreti
Comment on causality and pathway search in microarray time series experiment
Bioinformatics, April 1, 2008; 24(7): 1029 - 1032.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/13/1623    most recent
btm151v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Fujita, A.
Right arrow Articles by Ferreira, C.E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Fujita, A.
Right arrow Articles by Ferreira, C.E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?