Skip Navigation


Bioinformatics Advance Access originally published online on February 26, 2008
Bioinformatics 2008 24(7):1029-1032; doi:10.1093/bioinformatics/btm586
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/7/1029    most recent
btm586v4
btm586v3
btm586v2
btm586v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Nagarajan, R.
Right arrow Articles by Upreti, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Nagarajan, R.
Right arrow Articles by Upreti, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Comment on causality and pathway search in microarray time series experiment

Radhakrishnan Nagarajan 1,* and Meenakshi Upreti 2

1Department of Biostatistics and 2Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 ACKNOWLEDGEMENTS
 REFERENCES
 

Contact: nagarajanradhakrish{at}uams.edu

Granger causality (GC) (Granger, 1969) is ideally suited to infer causal relationships from bivariate time series such as first-order vector-autoregressive processes VAR (1) (Hamilton, 1994). Recent studies have used GC to infer causal relationships and network structure from temporal gene expression profiles (Fujita et al., 2007; Mukhopadhyay and Chatterjee, 2007). Mukhopadhyay and Chatterjee (2007) proposed a method to infer causality from temporal gene expression data. Their approach determine the P-values from F-test (Hamilton, 1994) across each pair of genes (gi, gj) addressing the null hypotheses ‘gi fails to Granger cause gj’ and ‘gj fails to Granger cause gi’. Subsequently, they chose the desired uni-directional causal relationship as the direction with the lowest P-value. Mukhopadhyay and Chatterjee (2007) also caution the audience that their approach does not guarantee true causality but may prove to be a strong starting point in uncovering’ ... functional, metabolic relationship and biological causation models for temporal gene expressions ... ’. They demonstrate their approach on stationary and non-stationary networks modeled as VAR processes as well as microarray gene expression profiles.

In this brief letter, we investigate the usefulness of such uni-directional/acyclic approximations within the context of GC and VAR. Subsequently, biological significance of the various parameters in a VAR (1) model that can give rise to acyclic approximations is elucidated. In this respect, synthetic two-gene networks modeled as VAR (1) with and without autoregulatory feedback are investigated. The choice of VAR (1) is especially encouraged by (i) its close connection with the formulation of GC (Granger, 1969; Hamilton, 1994); (ii) recent simulation studies presented by the authors Mukhopadhyay and Chatterjee (2007) and (iii) its prevalence in modeling genetic networks (Fujita et al., 2007; Opgen-Rhein and Strimmer, 2007). We show that such uni-directional/acyclic approximations reflected by discrepancy in P-values as proposed by (Mukhopadhyay and Chatterjee, 2007) may provide biologically meaningful results under the assumptions of

  1. equal transcriptional noise variance, reflected by the variance of the noise terms in VAR (1).
and
  1. equal strength of autoregulatory feedback, reflected by the magnitude of the diagonal elements in VAR (1).
The results presented encourage cautious interpretation of the results of GC (Granger, 1969; Hamilton, 1994) and its variants (Mukhopadhyay and Chatterjee, 2007) in acyclic representation of transcriptional networks inferred from their expression profiles.

Synthetic two-gene network modeled as VAR (1)


Formula 1

(1)
In (1), the expression of the genes (x, y) at time t, (xt, yt) is given as a linear combination of their expression at time (t – 1), (xt–1, yt–1). The terms

  1. (isint, {eta}t): represents zero-mean uncorrelated transcriptional noise.
  2. ({alpha}11, {alpha}22): represents autoregulatory feedback (diagonal elements).
  3. ({alpha}12, {alpha}21): represents transcriptional coupling strength (off-diagonal elements).
Special cases (i) {alpha}12 = 0, {alpha}21!=0: represents uni-directional relationship where yt does not cause xt, (ii) {alpha}12!=0, {alpha}21 = 0: represents uni-directional relationship where xt does not cause yt and (iii) {alpha}12 = {alpha}21 = 0: represents the case where xt and yt are independent. It is important to note that uni-directional/acyclic approximations are ideally governed by the off-diagonal elements of VAR (1).

Granger Causality: In order to determine whether yt Granger causes xt i.e. yt->xt, we determine the mean-squared error of the forecast (s0) obtained by regression of xt with its own past i.e. xt–1, ... , xt{tau} and the mean-squared error of the forecast (s1) obtained by regression of xt with its own past i.e. xt–1, ... , xt{tau} as well as the past of yt i.e. yt–1, ... , yt{tau}. The parameter ({tau}) represents optimal autoregressive length, ({tau} = 1) for VAR (1). Following (Hamilton, 1994 Ch. 11.2), process yt fails to Granger cause xt if s0 = s1. From a statistical standpoint (F-test) (Hamilton, 1994), the null hypothesis yt fails to Granger cause xt is rejected if the ratio of the mean-squared errors is greater than those determined from F-distribution at a given significance level {alpha}. A detailed discussion can be found elsewhere (Hamilton, 1994). In the present study, we investigate the ratio of the mean-squared errors {gamma} = (s0/s1) for yt->xt and xt->yt. It is important to note that considerable discrepancy in {gamma} gives rise to considerable discrepancy in P-values.

Model 1 Synthetic two gene network with no autoregulatory feedback and unequal transcriptional coupling strength


Formula 2

(2)
In (2), the diagonal elements are set to zero i.e. ({alpha}11, {alpha}22 = 0), the off-diagonal elements are chosen such that ({alpha}12, {alpha}21 > 0), and the transcriptional noise variance is chosen such that Formula . It is important to note that for these choice of parameters (2) represents a bi-directional causal relationship.

Covariance Stationary: The resulting model is covariance stationary provided the roots of the reverse characteristic polynomial (Hamilton, 1994)


Formula 3

(3)
do not lie inside the unit circle. i.e. |zi| > 1, i = 1, 2.

To determine whether (yt->xt) we follow the following steps


Formula

Therefore,


Formula 4

(4)
Similarly, for (xt->yt) we have


Formula

Therefore,


Formula 5

(5)
A significant discrepancy in the P-values can arise when {gamma}(yt->xt) >> {gamma}(xt->yt). Substituting from expressions (4) and (5) we get


Formula 6

(6)
i.e.


Formula 7

(7)
Remark 1 It is important to note that expression (7) is dependent on the transcriptional coupling strength as well as the transcriptional noise variance. Thus significant discrepancy in P-values can be due discrepancies in the former and/or latter. Therefore, any inference on uni-directional causal approximation by direct comparison of P-values from the observed expression data can result in spurious conclusions. Two cases are discussed below to illustrate this crucial point. The reversal in the direction of uni-directional causal relationship for equal and unequal transcriptional noise variance for the same network structure is elucidated.

Case (i) Consider the bi-directional two-gene network (2) with unequal transcriptional coupling strengths ({alpha}12 = 0.38, {alpha}21 = 0.20) and equal transcriptional noise variance ({sigma}isin = {sigma}{eta} = 1) and length N = 100 satisfying constraint (10). For these parameters, P-values from F-test (Hamilton, 1994) for (yt->xt) were considerably smaller than those of (xt->yt). The distribution of the P-values across these two cases for 200 independent realizations is shown in Figure 1a. The proportion of successful rejections of the null at significance level ({alpha} = 0.01) across the 200 independent realizations in favor of (yt->xt) was 81% whereas those in favor of (xt->yt) was 19%. The discrepancy in the P-values is an outcome of discrepancy in transcriptional coupling strength with significant contribution by gene y in forecasting the expression of gene x, hence may be meaningful.


Figure 1
View larger version (15K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Distribution of the P-values from F-test for cases (i) and (ii), with equal (a) and unequal (b) transcriptional noise variance. The distribution of the P-values corresponding to yt->xt is considerably lesser than those of xt->yt in the case of equal noise variance (a). However, these results are completely reversed in the case of unequal noise variance (b). It is important to note that the transcriptional coupling strengths, hence the network structure is unchanged across these two cases.

 
Case (ii) Consider the bi-directional two-gene network (2) with unequal transcriptional coupling strengths ({alpha}12 = 0.38, {alpha}21 = 0.20) (same as in Case (i)) and unequal transcriptional noise variance ({sigma}isin = 2{sigma}{eta}) and length N = 100. Unlike Case (i), Case (ii) fails to satisfy constraint (7). For these parameters, P-values from F-test (Hamilton, 1994) for (xt->yt) were considerably smaller than those of (yt->xt). The distribution of the P-values across for 200 independent realizations is shown in Figure 1b. The proportion of successful rejections of the null hypothesis ({alpha} = 0.01) across the 200 independent realizations for (xt->yt) and (yt->xt) were 79 and 11%, respectively. Unlike Case (i), the discrepancy in the P-values do not reflect the discrepancy in the transcriptional coupling strengths.

It might not be possible to determine which VAR parameters contribute to the discrepancy in the P-values in the case of experimental settings. This in turn can lead to spurious uni-directional causal representation between the given pair of genes. In the above case studies, data was sampled purposely across transient regions in order to accommodate limited artifacts observed commonly in experimental settings.

Model 2 Synthetic two gene network with autoregulatory feedback and equal transcriptional coupling strength


Formula 8

(8)
The diagonal elements ({alpha}12, {alpha}21 > 0) represent autoregulatory feedback. The transcriptional noise variance is chosen such that Formula and Formula . In (8), it is important to note that the off-diagonal terms corresponding to transcriptional coupling between the genes are kept constant (β > 0). Thus each gene regulates the expression of the other transcriptionally.

Covariance Stationary: The resulting model is covariance stationary provided the roots of the reverse characteristic polynomial (Hamilton, 1994)


Formula 9

(9)
do not lie inside the unit circle. i.e. |zi| > 1, i = 1, 2.

In order to determine whether (yt->xt) we follow the steps below.

From (8) we have xt = {alpha}11xt–1 + βyt–1 + isint

Substituting for yt–1 from (8) results in


Formula

Substituting for yt–2 from (8) results in


Formula

Substituting repeatedly for y terms by x terms (from (8)) in the above expression results in the series


Formula 10

(10)
Therefore,


Formula 11

(11)
It is important to recall that (isint, {eta}t) are uncorrelated noise by definition. Therefore, the expression (11) reduces to


Formula

The above is a geometric progression, hence can be summed to


Formula 12

(12)
Therefore,


Formula 13

(13)
Similarly, to determine whether (xt->yt) we have


Formula 14

(14)
A significant discrepancy in the P-values can arise when

{gamma}(yt->xt) >> {gamma}(xt->yt)

i.e. From (13) and (14) we have


Formula 15

(15)
Simplifying (15) results in


Formula 16

(16)
The above constraint represents the case where the P-values corresponding to (yt->xt) is considerably lesser than those corresponding to (xt->yt).

It is important to note that constraint (16) is more complex than constraint (7). It has significant contributions from the autoregulatory feedback terms ({alpha}12, {alpha}21 > 0) in addition to the transcriptional noise variancesFormula . It is equally important to note that (15) has no contributions from the transcriptional coupling strength (β > 0). Uni-directional/acyclic approximations revealed by discrepancy in P-values from F-test can change significantly depending on the contribution of these parameters, hence may provide only limited insight into the transcriptional circuitry. In the case of microarray gene expression the gene expression must be normalized (Speed, 2003) across the arrays prior to the modeling. Normalization is an important preliminary step and useful in minimizing bias in gene expression due to non-biological factors.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 ACKNOWLEDGEMENTS
 REFERENCES
 
R.N. was partially supported by 5R03LM008853-02 and NIH Grant # P20 RR-16460 from the IDeA Networks of Biomedical Research Excellence (INBRE) Program of the National Center for Research Resources. The authors would like to thank the reviewers for useful critiques.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Martin Bishop

Received on August 30, 2007; revised on November 5, 2007; accepted on November 20, 2007

    REFERENCES
 TOP
 ABSTRACT
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Fujita A, et al. Time-varying modeling of gene expression regulatory networks using the wavelet dynamic vector autoregressive method. Bioinformatics (2007) 23:1623–1630.[Abstract/Free Full Text]

    Granger CWJ. Investigating causal relations by econometric models and cross-spectral methods. Econometrica (1969) 37:424–438.[CrossRef]

    Hamilton JD. Time Series Analysis (1994) Princeton, NJ: Princeton University press.

    Mukhopadhyay ND, Chatterjee S. Causality and pathway search in microarray time series experiment. Bioinformatics (2007) 23:442–449.[Abstract/Free Full Text]

    Opgen-Rhein R, Strimmer K. Learning causal networks from systems biology time course data: an effective model selection procedure for the vector autoregressive process. BMC Bioinformatics (2007) 8(Suppl. 2):S3.

    Speed T. Statistical Analysis of Gene Expression Microarray Data (2003) USA: Chapman & Hall/CRC.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/7/1029    most recent
btm586v4
btm586v3
btm586v2
btm586v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Nagarajan, R.
Right arrow Articles by Upreti, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Nagarajan, R.
Right arrow Articles by Upreti, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?