Bioinformatics Advance Access originally published online on February 26, 2008
Bioinformatics 2008 24(7):1029-1032; doi:10.1093/bioinformatics/btm586
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment on causality and pathway search in microarray time series experiment
1Department of Biostatistics and 2Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Contact: nagarajanradhakrish{at}uams.edu
Granger causality (GC) (Granger, 1969) is ideally suited to infer causal relationships from bivariate time series such as first-order vector-autoregressive processes VAR (1) (Hamilton, 1994). Recent studies have used GC to infer causal relationships and network structure from temporal gene expression profiles (Fujita et al., 2007; Mukhopadhyay and Chatterjee, 2007). Mukhopadhyay and Chatterjee (2007) proposed a method to infer causality from temporal gene expression data. Their approach determine the P-values from F-test (Hamilton, 1994) across each pair of genes (gi, gj) addressing the null hypotheses gi fails to Granger cause gj and gj fails to Granger cause gi. Subsequently, they chose the desired uni-directional causal relationship as the direction with the lowest P-value. Mukhopadhyay and Chatterjee (2007) also caution the audience that their approach does not guarantee true causality but may prove to be a strong starting point in uncovering ... functional, metabolic relationship and biological causation models for temporal gene expressions ... . They demonstrate their approach on stationary and non-stationary networks modeled as VAR processes as well as microarray gene expression profiles.
In this brief letter, we investigate the usefulness of such uni-directional/acyclic approximations within the context of GC and VAR. Subsequently, biological significance of the various parameters in a VAR (1) model that can give rise to acyclic approximations is elucidated. In this respect, synthetic two-gene networks modeled as VAR (1) with and without autoregulatory feedback are investigated. The choice of VAR (1) is especially encouraged by (i) its close connection with the formulation of GC (Granger, 1969; Hamilton, 1994); (ii) recent simulation studies presented by the authors Mukhopadhyay and Chatterjee (2007) and (iii) its prevalence in modeling genetic networks (Fujita et al., 2007; Opgen-Rhein and Strimmer, 2007). We show that such uni-directional/acyclic approximations reflected by discrepancy in P-values as proposed by (Mukhopadhyay and Chatterjee, 2007) may provide biologically meaningful results under the assumptions of
- equal transcriptional noise variance, reflected by the variance of the noise terms in VAR (1).
- equal strength of autoregulatory feedback, reflected by the magnitude of the diagonal elements in VAR (1).
Synthetic two-gene network modeled as VAR (1)
|
| (1) |
- (
t,
t): represents zero-mean uncorrelated transcriptional noise.
- (
11,
22): represents autoregulatory feedback (diagonal elements).
- (
12,
21): represents transcriptional coupling strength (off-diagonal elements).
12 = 0,
21
0: represents uni-directional relationship where yt does not cause xt, (ii)
12
0,
21 = 0: represents uni-directional relationship where xt does not cause yt and (iii)
12 =
21 = 0: represents the case where xt and yt are independent. It is important to note that uni-directional/acyclic approximations are ideally governed by the off-diagonal elements of VAR (1).
Granger Causality: In order to determine whether yt Granger causes xt i.e. yt
xt, we determine the mean-squared error of the forecast (s0) obtained by regression of xt with its own past i.e. xt–1, ... , xt–
and the mean-squared error of the forecast (s1) obtained by regression of xt with its own past i.e. xt–1, ... , xt–
as well as the past of yt i.e. yt–1, ... , yt–
. The parameter (
) represents optimal autoregressive length, (
= 1) for VAR (1). Following (Hamilton, 1994 Ch. 11.2), process yt fails to Granger cause xt if s0 = s1. From a statistical standpoint (F-test) (Hamilton, 1994), the null hypothesis yt fails to Granger cause xt is rejected if the ratio of the mean-squared errors is greater than those determined from F-distribution at a given significance level
. A detailed discussion can be found elsewhere (Hamilton, 1994). In the present study, we investigate the ratio of the mean-squared errors
= (s0/s1) for yt
xt and xt
yt. It is important to note that considerable discrepancy in
gives rise to considerable discrepancy in P-values.
Model 1 Synthetic two gene network with no autoregulatory feedback and unequal transcriptional coupling strength
|
| (2) |
11,
22 = 0), the off-diagonal elements are chosen such that (
12,
21 > 0), and the transcriptional noise variance is chosen such that
Covariance Stationary: The resulting model is covariance stationary provided the roots of the reverse characteristic polynomial (Hamilton, 1994)
|
| (3) |
To determine whether (yt
xt) we follow the following steps
|
|
|
| (4) |
yt) we have |
|
|
| (5) |
(yt
xt) >>
(xt
yt). Substituting from expressions (4) and (5) we get
|
| (6) |
|
| (7) |
Case (i) Consider the bi-directional two-gene network (2) with unequal transcriptional coupling strengths (
12 = 0.38,
21 = 0.20) and equal transcriptional noise variance (
= 
= 1) and length N = 100 satisfying constraint (10). For these parameters, P-values from F-test (Hamilton, 1994) for (yt
xt) were considerably smaller than those of (xt
yt). The distribution of the P-values across these two cases for 200 independent realizations is shown in Figure 1a. The proportion of successful rejections of the null at significance level (
= 0.01) across the 200 independent realizations in favor of (yt
xt) was 81% whereas those in favor of (xt
yt) was 19%. The discrepancy in the P-values is an outcome of discrepancy in transcriptional coupling strength with significant contribution by gene y in forecasting the expression of gene x, hence may be meaningful.
|
Case (ii) Consider the bi-directional two-gene network (2) with unequal transcriptional coupling strengths (
12 = 0.38,
21 = 0.20) (same as in Case (i)) and unequal transcriptional noise variance (
= 2
) and length N = 100. Unlike Case (i), Case (ii) fails to satisfy constraint (7). For these parameters, P-values from F-test (Hamilton, 1994) for (xt
yt) were considerably smaller than those of (yt
xt). The distribution of the P-values across for 200 independent realizations is shown in Figure 1b. The proportion of successful rejections of the null hypothesis (
= 0.01) across the 200 independent realizations for (xt
yt) and (yt
xt) were 79 and 11%, respectively. Unlike Case (i), the discrepancy in the P-values do not reflect the discrepancy in the transcriptional coupling strengths. It might not be possible to determine which VAR parameters contribute to the discrepancy in the P-values in the case of experimental settings. This in turn can lead to spurious uni-directional causal representation between the given pair of genes. In the above case studies, data was sampled purposely across transient regions in order to accommodate limited artifacts observed commonly in experimental settings.
Model 2 Synthetic two gene network with autoregulatory feedback and equal transcriptional coupling strength
|
| (8) |
12,
21 > 0) represent autoregulatory feedback. The transcriptional noise variance is chosen such that
Covariance Stationary: The resulting model is covariance stationary provided the roots of the reverse characteristic polynomial (Hamilton, 1994)
|
| (9) |
In order to determine whether (yt
xt) we follow the steps below.
From (8) we have xt =
11xt–1 + βyt–1 +
t
Substituting for yt–1 from (8) results in
|
|
|
|
|
| (10) |
|
| (11) |
t,
t) are uncorrelated noise by definition. Therefore, the expression (11) reduces to |
|
|
| (12) |
|
| (13) |
yt) we have
|
| (14) |
(yt
xt) >>
(xt
yt)
i.e. From (13) and (14) we have
|
| (15) |
|
| (16) |
xt) is considerably lesser than those corresponding to (xt
yt).
It is important to note that constraint (16) is more complex than constraint (7). It has significant contributions from the autoregulatory feedback terms (
12,
21 > 0) in addition to the transcriptional noise variances
. It is equally important to note that (15) has no contributions from the transcriptional coupling strength (β > 0). Uni-directional/acyclic approximations revealed by discrepancy in P-values from F-test can change significantly depending on the contribution of these parameters, hence may provide only limited insight into the transcriptional circuitry. In the case of microarray gene expression the gene expression must be normalized (Speed, 2003) across the arrays prior to the modeling. Normalization is an important preliminary step and useful in minimizing bias in gene expression due to non-biological factors.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
R.N. was partially supported by 5R03LM008853-02 and NIH Grant # P20 RR-16460 from the IDeA Networks of Biomedical Research Excellence (INBRE) Program of the National Center for Research Resources. The authors would like to thank the reviewers for useful critiques.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Martin Bishop
Received on August 30, 2007; revised on November 5, 2007; accepted on November 20, 2007
| REFERENCES |
|---|
|
|
|---|
Fujita A, et al. Time-varying modeling of gene expression regulatory networks using the wavelet dynamic vector autoregressive method. Bioinformatics (2007) 23:1623–1630.
Granger CWJ. Investigating causal relations by econometric models and cross-spectral methods. Econometrica (1969) 37:424–438.[CrossRef]
Hamilton JD. Time Series Analysis (1994) Princeton, NJ: Princeton University press.
Mukhopadhyay ND, Chatterjee S. Causality and pathway search in microarray time series experiment. Bioinformatics (2007) 23:442–449.
Opgen-Rhein R, Strimmer K. Learning causal networks from systems biology time course data: an effective model selection procedure for the vector autoregressive process. BMC Bioinformatics (2007) 8(Suppl. 2):S3.
Speed T. Statistical Analysis of Gene Expression Microarray Data (2003) USA: Chapman & Hall/CRC.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||







