Skip Navigation


Bioinformatics Advance Access originally published online on January 31, 2007
Bioinformatics 2007 23(7):866-874; doi:10.1093/bioinformatics/btm021
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary data
Right arrow All Versions of this Article:
23/7/866    most recent
btm021v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (13)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Martin, S.
Right arrow Articles by Faulon, J.-L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Martin, S.
Right arrow Articles by Faulon, J.-L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Boolean dynamics of genetic regulatory networks inferred from microarray time series data

Shawn Martin 1, Zhaoduo Zhang 2, Anthony Martino 3 and Jean-Loup Faulon 4,*

1Sandia National Laboratories, Computational Biology Department PO Box 5800, Albuquerque, NM, 87185-1316, USA, 2Sandia National Laboratories, Biosystems Research, PO Box 969, Livermore, CA 94551-9291, USA 3Sandia National Laboratories, Biomolecular Analysis and Imaging, PO Box 5800, Albuquerque, NM 87185-0895, USA and 4Sandia National Laboratories, Computational Biosciences Department PO Box 5800, Albuquerque, NM 87185-1413, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Motivation: Methods available for the inference of genetic regulatory networks strive to produce a single network, usually by optimizing some quantity to fit the experimental observations. In this article we investigate the possibility that multiple networks can be inferred, all resulting in similar dynamics. This idea is motivated by theoretical work which suggests that biological networks are robust and adaptable to change, and that the overall behavior of a genetic regulatory network might be captured in terms of dynamical basins of attraction.

Results: We have developed and implemented a method for inferring genetic regulatory networks for time series microarray data. Our method first clusters and discretizes the gene expression data using k-means and support vector regression. We then enumerate Boolean activation–inhibition networks to match the discretized data. Finally, the dynamics of the Boolean networks are examined. We have tested our method on two immunology microarray datasets: an IL-2-stimulated T cell response dataset and a LPS-stimulated macrophage response dataset. In both cases, we discovered that many networks matched the data, and that most of these networks had similar dynamics.

Contact: jfaulon{at}sandia.gov

Supplementary information: Supplementary data are available at Bioinformatics online.


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Modeling and inferring genetic regulatory networks are important problems in systems biology. Accordingly, there are numerous computational approaches to these problems, including partial differential equations, ordinary differential equations, Bayesian networks and Boolean networks (de Jong, 2002; Smolen et al., 2000). Less common approaches include Petri nets (Goss and Peccoud, 1998) and matrix decomposition methods (Alter and Golub, 2005; Liao et al., 2003).

Of these different approaches, one of the simplest methods is the use of Boolean networks to infer genetic regulatory systems from time series gene expression data. Modeling genetic regulatory networks was first proposed in Kauffman (1969, 1993). Later Boolean networks were proposed to infer genetic regulatory systems from time series gene expression data (Akutsu et al., 2000; Liang et al., 1998). More recently, probabilistic Boolean networks have been proposed to infer genetic networks (Shmulevich et al., 2002a,b). Also related to probabilistic Boolean networks are dynamic Bayesian networks (Friedman et al., 2000; Lahdesmaki et al., 2006; Murphy and Mian, 1999).

Once a network has been inferred, the next step is to consider its dynamical properties. Huang (1999) suggests that Boolean network dynamics can be used to understand cellular states such as proliferation, differentiation and apoptosis. By analyzing the attractor basins of a Boolean network, it may be possible to determine its cellular functions under different initial conditions. In this article, we propose a method for the investigation of these claims using time series gene expression data.

Although the probabilistic Boolean and dynamic Bayesian methods are both well founded theoretically, they are also computationally complex (Ching et al., 2005; Zou and Conzen, 2005). Furthermore, both methods produce a single most probable network. This network is in reality one of many possible representations. This fact makes the interpretation of the network dynamics difficult. In the best case, Monte Carlo approaches can be used with probabilistic Boolean networks to approximate dynamics (Shmulevich et al., 2003). There are also theoretical results (Brun et al., 2005).

In our investigation of network dynamics inferred from time series gene expression data, we do not employ either probabilistic Boolean networks or dynamic Bayesian networks. Instead, we consider the use of dynamics in combination with the simpler qualitative Boolean methods suggested in Akutsu et al. (2000). Unlike Akutsu et al. (2000), we do not attempt to obtain only a single network to match the data. We consider all possible networks that match the data and group them by attractor basin. We find that while a great many networks can match a given dataset, a very large percentage of these networks fall into the same attractor basins, suggesting that many of the networks are equally valid. Further, we find that many of these networks have locally consistent substructures. These results agree with the general intuition that biological systems are modular, robust and can often function adequately despite extreme change (Wagner, 2005).

In terms of Bayesian and dynamic Bayesian networks, our approach is most similar to the work in Friedman and Koller (2003) and Segal et al. (2005). In Segal et al. (2005), data points are grouped as nodes in a Bayesian network in order to obtain networks of modules and in Friedman and Koller (2003) a Bayesian approach is used to discover multiple Bayesian networks matching a given dataset. The work in Segal et al. (2005) is also applied to microarray data. These methods differ from our approach in that they do not take into account dynamics and attractors, and that they model networks as directed acyclic graphs, thus prohibiting feedback loops. Nevertheless, if the two methods were combined with dynamic Bayesian networks (Murphy and Mian, 1999; Zou and Conzen, 2005), the result would be very similar to the approach we have taken.

In the following sections, we describe our method and the results of the method applied to two time series gene expression datasets. In Section 2, we describe the datasets and normalization procedures, our method for clustering and discretizing the datasets and our inference algorithms. In Section 3, we apply our method to an interleukin (IL-2)-stimulated T cell immune response dataset and a LPS-stimulated macrophage response dataset. In the Section 4, we summarize the advantages and disadvantages of using dynamics to select Boolean models inferred from time series gene expression data.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
2.1 Datasets and normalization
We used two gene expression time series datasets to benchmark our approach. The first dataset is a time series gene expression dataset taken from an IL-2-stimulated immune response experiment performed at Sandia National Laboratories using arrays hybridized by the Stanford PAN Biotechnology Facility. The experiment was performed using a murine T cell line called CTLL-2. Mouse CTLL-2 cells were cultured without IL-2 (IL-2 starvation). Cells were then collected (0 h, no IL-2 stimulation) immediately before IL-2 was added (IL-2 stimulation). Cells were further collected at 11 time points: 15, 30 mins, 1, 2, 4, 6, 8, 10, 12, 16 and 24 h, all after IL-2 stimulation. Three replicates were done for each time point.

Affymetrix GeneChip Mouse Genome 430 2.0 Arrays were used for the gene expression experiments. This array provides complete coverage of the transcribed mouse genome, with 45 000 probe sets to analyze the expression level of over 39 000 transcripts from over 34 000 well-characterized mouse genes. Target hybridization was processed following the manufacturer's recommendation using the instrument operated by Affymetrix GeneChip Operating Software (GCOS) version 1.3 and Microarray Suite version 5.1 (MAS 5.1). The fluorescent intensity of each probe was quantified using MAS 5.1 and GCOS 1.3. This software makes a detection call (present [P], marginal [M] or absent [A]) for each gene or probe set. This call is based on the consistency of the performance of the individual probe pairs, the hybridization above background and the signal-to-noise ratio. Two-way comparisons of the microarray data were also performed using GCOS 1.3. Specifically, changes in gene expression between the control cells (time point 0 h, no IL-2 stimulation) and IL-2 stimulated cells were evaluated at each time point. These comparisons provided data including the signal log ratio (fold change presented in logarithmic form) and the ‘change call’ (increased [I], decreased [D], marginally increased [MI], marginally decreased [MD] or no change [NC]) for each gene being interrogated.

To identify genes that exhibited differences in expression between the control cells and IL-2-stimulated cells, the datasets were trimmed using the following inclusion criteria. For a probe set to be included in this trimmed dataset, it had to display in all the three replicates: (1) a change call other than no change (NC), (2) the same trend of change call (I, increase, D, decrease), (3) a present call (P) and/or signal intensity ≥100 and (4) at least a 1.5-fold difference in expression between the two compared conditions. The trimmed dataset was log normalized and mean subtracted.

The second dataset was a LPS-stimulated macrophage response dataset downloaded from the cell signaling gateway at http://www.signaling-gateway.org. In this dataset, RAW264.7 murine macrophage cell lines were stimulated by LPS. The response was measured using microarrays. The measurements were made at six time points (1, 2, 4, 8, 16, and 32 hs) with six replicates from each time point. Normalization consisted of first removing all genes without names as well as fiducials. Replicates were treated as new experiments, resulting in additional virtual genes. We removed virtual genes with missing data and screened out genes with expression <0.05 SD over time. After clustering the resulting dataset (as described in the following section), the virtual genes were mapped back to the original genes by a voting method. In this method, each replicate ‘votes’ for the cluster to which it belongs. The original gene then belongs to the cluster with the most votes. Ties are broken by weighting votes according to the distance of the replicates to their cluster centers.

2.2 Clustering and discretization
After normalization, we clustered the time series microarray data. We clustered the data because many co-regulated genes are indistinguishable when they are discretized. Clustering the microarray data also simplified the task of network inference by reducing the problem size. Finally, the clustering resulted in networks of gene groups, instead of actual genes. The gene groups, which we call meta-genes, made the biological analysis and interpretation of the inferred networks more tractable.

There are a variety of algorithms available for clustering microarray data, including k-means, hierarchical clustering (Eisen et al., 1998), self-organizing maps (SOMs) (Tamayo et al., 1999), and biclustering (Getz et al., 2000). We chose k-means because it was the simplest method that provided a partition of our data into groups. We also considered the use of hierarchical clustering and SOMs (as discussed further in the online supplement), but avoided them because they are both heavily oriented towards discovering relations between clusters for visualization, unnecessary for our purposes. We did not use biclustering because it partitions both rows and columns of a matrix and is therefore inappropriate (without modification) for time series data (Zhang et al., 2005). A comparison of k-means with hierarchical clustering and SOMs using the IL-2 dataset can be found in the online supplement.

To decide how many clusters should be produced (the value of k), we developed a measure of internal consistency. Our measure is defined for a given partition of the dataset using the singular value decomposition (SVD) (Trefethen and Bau, 1997). To define internal consistency, suppose we are given k and we compute a k-means partition of our m x n dataset X, where m is the number of time points and n is the number of genes. For the jth cluster (j = 1, ... , k) we have a matrix Xj of microarray measurements, where the rows are time points and the columns are genes, so that Xj is a m x gj matrix, where gj is the number of genes in the jth cluster. Using the SVD, we decompose Formula , where Uj and Vj are orthogonal matrices and Sj is a diagonal matrix whose entries describe the importance of the columns of Uj and Vj. The matrix Formula contains the projections of the columns (time courses) of Xj onto the basis Uj. The entries of Sj (singular values) give the relative importance of the columns of Uj. If the first entry of Sj is much larger than the second entry then we know that most of the information in the columns of Xj is captured by a single dimension. We thus define the internal consistency of the jth cluster to be the ratio of the first and second singular values in Sj. This is a measure of the correlation between all of the time courses in the jth cluster. The internal consistency also provides a measure of how well a single dimension can describe all the time courses. For the problem of network inference, we want each of our clusters to have a high internal consistency. We can decide how many clusters we should use by comparing the average internal consistency of different partitions of the dataset (different values of k) and choosing the clustering with the best average internal consistency.

Given an appropriate set of meta-genes, we next discretized the meta-gene expression levels. Such a discretization is necessary because we use a Boolean network inference algorithm. Our discretization is accomplished in two steps. First, support vector regression (SVR) (Smola and Scholkpof, 1998) is used to provide a continuous, smooth representation of the 10 genes closest to the cluster center in a given group. This type of regression is performed by solving a quadratic programming problem and has two parameters: an {varepsilon} width and a kernel function. The {varepsilon} width is used to encapsulate the curves in a given meta-gene group within an {varepsilon}-tube, and the kernel function is used to fit different types of curves (e.g. linear or non-linear). The end result of SVR encapsulates the time courses in a meta-gene group within an {varepsilon}-tube centered around a smooth curve, where the curve is a linear combination of kernel functions. For this work, we used Gaussian kernels with a width {sigma} = 1, and we chose {varepsilon} to be one and a half times the average standard deviation of the values at each time point.

The second step in our discretization consists of thresholding the curve obtained by the SVR. Assuming that the meta-gene group is well represented by the SVR curve, we can produce a discrete version of the meta-gene by thresholding the curve against its average value: a higher than average meta-gene expression is given a value of 1 (up-regulated), while a lower than average meta-gene expression value is given a value of 0 (down-regulated).

2.3 Boolean inference and network dynamics
After the data has been clustered and discretized, we then infer and analyze the regulatory network. Our algorithms for network inference are similar to the algorithms presented in Akutsu et al. (2000). We incorporate potential errors (mismatches), we limit the number of possible inputs to a Boolean function and we restrict our output to activation or inhibition Boolean functions. Our algorithms are different from the algorithms in Akutsu et al. (2000) in that we do not place any restrictions on the amount of data necessary to perform an inference. Instead of requiring enough data to infer a unique network, we consider all possible networks matching the data. Pseudo-code for our algorithms (denoted using ALL_CAPS in this section), as well as additional explanation, can be found in the online supplement.

Our algorithms count, sample, enumerate, identify attractors and simulate the dynamics of all the possible networks matching a given set of discretized expression profiles. Networks are counted, sampled and enumerated at the node (meta-gene) level. For every node v, we determine all the possible sets of nodes that might control the expression profile of v. Expression profiles are given over time and the algorithms can accept several time courses corresponding to different initial conditions. These initial conditions can be different stimuli, or various knockout experiments.

The algorithms take as input a set of n nodes V = {v1, v2,...,vn} corresponding to the meta-genes and a set of discretized expression profiles. For any given profile in the set, the expression of every node is specified over the time course, although different profiles are not required to have the same time course length. To avoid constructing redundant networks, we require that different nodes should have different profiles, but this requirement is not necessary to run the algorithms.

The basic step used by the algorithms is INFER_FUNCTION, a routine that determines if a set of nodes v1, v2,...,vq with q ≤ n can explain the expression profile of a given node vi. INFER_FUNCTION returns the activation–inhibition Boolean function by which v1, v2,...,vq control the expression of vi. An activation–inhibition function is a Boolean function of the form v(t) = (v1(t) OR v2(t) OR ...) AND NOT (vj(t) OR vj+1(t) OR ...), where v1(t), v2(t), ... are activators and vj(t), vj+1(t), ... are inhibitors. As an example, suppose we are given a time series with six time points for three genes v1, v2 and v3, as shown in Table 1, where we write 1 when the gene is up-regulated and 0 when down-regulated. The Boolean function for v3 in terms of v1 and v2 as returned by INFER_FUNCTION is given by v3(t + 1) = v1(t) AND NOT v2(t). In other words, INFER_FUNCTION finds that v3 is activated by v1 and inhibited by v2.


View this table:
[in this window]
[in a new window]

 
Table 1. An example of discrete time courses with three genes and six time points. Zero (light grey) denotes down-regulation and 1 (dark grey) denotes up-regulation. In this example, we see that v3(t + 1) = v1(t) AND NOT v2(t). We say that v1 activates v3 and v2 inhibits v3

 
Using INFER_FUNCTION, we infer Boolean networks by processing each node in sequence. This is done using the INFER_NETWORKS routine, which returns possible connections within a network and the associated Boolean functions for each node vi. To count the number of possible networks matching a given set of expression profiles we use COUNT_NETWORKS. COUNT_ NETWORKS runs INFER_NETWORKS and computes the product of the number of possible inputs for each node. We have also coded SAMPLE_NETWORKS, which first runs INFER_NETWORKS and then for each node selects at random one of its possible inputs. Finally, to enumerate all networks, we use ENUMERATE_NETWORKS. ENUMERATE_NETWORKS first runs INFER_NETWORKS and then lists and prints all possible inputs for each node.

Using INFER_NETWORKS, SAMPLE_NETWORKS and ENUMERATE_NETWORKS, we can infer networks using sampling and enumeration. In addition, we can explore the dynamics of the inferred networks. The dynamics of these networks are used to (1) verify that the expression profiles given as input can indeed be reproduced by these inferred networks, (2) explore the dynamics beyond the times series that were provided as input and (3) predict expression profiles under different initial conditions. Of particular interest is computing the steady state or equilibrium dynamics of the networks (Huang, 1999). These steady states are called attractors (Kauffman, 1969). An attractor is a cyclic pattern of expression that all networks will eventually exhibit (due to the finite nature of Boolean networks).

We use two additional routines to locate attractors. First we use RUN_NETWORK, which takes as input an inferred network along with initial conditions and returns the resulting expression of the genes up to some time T. Then we use ATTRACTOR to find attractors. ATTRACTOR takes as input expression profiles given up to time T, and identifies the time step t1 that an attractor is found.

To finish this section, we briefly remark on the computational complexity of our inference algorithms. Recall that n is the number of nodes in the networks and q is the maximum number of inputs to a node. While q is in theory unbounded and can be equal to n, we restrict q to be no greater than 5. As justification for this choice, we note that gene regulatory networks follow a power law with exponent greater than 2 (Basso et al., 2005), so that q should not be greater than 5 for <100 nodes. In addition, our experiences inferring parsimonious networks (minimum number of edges) indicates that q never exceeds 3.

Let P be the number of expression profiles and T be the number of time points. We assume that P and T are constant independent of n. INFER_FUNCTION runs in O(PT) time steps. INFER_NETWORKS runs in O(nPTn + nPTn2 + nPTn3+ ·+nPTnq) = O(nq+1) steps. COUNT_NETWORKS and SAMPLE_NETWORKS both run in O(n) steps, while ENUMERATE_NETWORKS runs in O(nI) where I is the number of possible input vectors (i.e. the number of solutions). Note that this number can be exponential in n, and thus ENUMERATE_NETWORKS can run for an exponential time and can output an exponential number of solutions. Both RUN_NETWORK and ATTRACTOR run in O(nT) steps. Although some of these routines are computationally complex (most notably INFER_ NETWORKS), the run time of the algorithms can be controlled by using a smaller number k of meta-genes; using a smaller number q of inputs for the Boolean functions; using SAMPLE_NETWORKS instead of ENUMERATE_NETWORKS; and/or using a shorter maximum time T when simulating the networks.


    3 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
3.1 IL-2-stimulated T cell immune response
The T cell immune response dataset consisted of mouse microarrays with 45 119 probes per array taken at 12 time points. Normalization reduced the datasets to 5085 probes. After normalization, we performed clustering and discretization. We first computed the appropriate value of k to use in k-means. In Figure 1, we show the value of the mean internal consistency versus k for k = 2,...,40. The internal consistency increases until k is ~25 then flattens out. We chose a small local peak at k = 23. Using k-means with k = 23, we obtained the partition of the full dataset shown in Figure 2.


Figure 1
View larger version (7K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Average internal consistency as a function for k for the IL-2 immune response dataset. Based on this curve we selected k = 23.

 

Figure 2
View larger version (61K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Profiles of the 23 meta-genes used in our analysis.

 
Using the 23 clusters obtained by k-means, we computed the SVR representations of the meta-genes. The SVR representations were computed using the 10 time courses closest to the cluster centers and were discretized by comparison with the average expression value of the representations. An example of the SVR representation for cluster 6 (Bcl3) is shown in Figure 3.


Figure 3
View larger version (27K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. Smoothing and discretization of expression profiles in cluster 6 (Bcl3) using SVR. The solid black curve was obtained using SVR, the dotted curves show the {varepsilon}-tube, the blue line gives the mean value of the SVR and the red curve gives the final discretization.

 
The discretization resulted in 23 discrete profiles, which were further reduced to 12 unique profiles. These 12 profiles are shown in Table 2. We also performed a comparison of our final discretization with discretizations that we would have obtained using alternate clustering algorithms (hierarchical clustering and SOMs). This comparison, which can be found in the online supplement, revealed that for k = 23 there was >80% agreement between discretization regardless of clustering algorithm.


View this table:
[in this window]
[in a new window]

 
Table 2. Twelve unique discretized profiles from the 23 meta-genes in Figure 2 (dark grey is up-regulated and light grey is down-regulated). Each profile is labeled with a representative gene in the cluster preceeded by E, I or L. E stands for early genes up-regulated after 1 h, I stands for intermediate genes up-regulated after 2 hs and L stands for late genes up-regulated after 8 hs. Further, the time series is divided into two groups: IL-2 starved (before 1 h) and IL-2 stimulated (after 1 h)

 
We used the discrete profiles shown in Table 2 as input to the network inference algorithms. These profiles grouped naturally into two distinct sets. The first set consisted of measurements made prior to 1 h and represented the IL-2 starvation state. The second set consisted of measurements made after 1 h and represented the IL-2-stimulated state. These two groups (IL-2 starved and IL-2 stimulated) were separated and used as input (simultaneously) to INFER_NETWORKS and ENUMERATE_NETWORKS. The inference algorithms discovered a total of 161 558 networks.

The dynamics of the 161 558 networks were analyzed using the RUN_NETWORK and ATTRACTOR routines. Attractors were determined for the two initial conditions corresponding to IL-2 starved and IL-2 stimulated. It was found that 160 657 (99.4%) of these networks had a single fixed point steady-state dynamic for the IL-2-stimulated initial conditions. In the case of IL-2 stimulation gene expression should fluctuate as IL-2 stimulated T cells proliferate (Nelson and Willerford, 1998). We therefore discarded these 160 657 networks, leaving 901 (0.6%) networks to be interpreted. These 901 networks had the same steady-state dynamic, that dynamic consisting of three time points, shown in Table 3.


View this table:
[in this window]
[in a new window]

 
Table 3. Steady-state dynamics for 0.6% of the inferred networks (dark grey up-regulated, light grey down-regulated). For the IL-2-stimulated condition, the steady-state dynamic follows a three-step cycle (t1, t2, t3)

 
An example illustrating our model correctly describing the steady-state fluctuation of genes due to IL-2 stimulation is given by the cyclin-dependent kinase inhibitor p27 (AFFX ID 1419497_at). This inhibitor has been shown experimentally to fluctuate during proliferation of endothelial cells (Huang and Ingber, 2000). Consistent with these results, we found p27 to fluctuate at steady state with cluster E-Stat5b (Table 3).

The 901 networks were analyzed for similarities, yielding the consensus network shown in Figure 4. The viable networks inferred from the IL-2 time series data and depicted in Figure 4 reveal that (in general) early genes (E) activate other early genes and late genes (L); intermediate genes (I) activate late genes and inhibit early genes; early genes inhibited by intermediate genes can be up-regulated when the intermediate genes are down-regulated; late genes activate other late genes and inhibit early and intermediate genes; and early and intermediate genes inhibited by late genes can be up-regulated when late genes are down-regulated.


Figure 4
View larger version (17K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4. Activation and inhibition relationships between the 12 meta-genes in Table 2. Solid arrows indicate relationships occurring in all of the 901 networks, while the numbers associated with the dashed arrows indicate the fraction of networks having that relationship.

 
3.2 LPS-stimulated macrophage response
The LPS-stimulated macrophage dataset consisted of 15 142 genes measured over six time steps. Using replicates we obtained 90 852 virtual genes. Removing virtual genes with missing values and <0.05 SD in expression resulted in 60 831 virtual genes, corresponding to 14 779 actual genes, or 93.3% of the original 15 142 genes. Using the 60 831 virtual genes, we obtained 23 meta-genes (again by the internal consistency measure). These 23 meta-genes were smoothed and discretized to obtain 15 unique discrete expression profiles given in Table 4. The virtual genes were mapped back to the actual genes before they were assigned to the 15 profiles as described in Section 2.1.


View this table:
[in this window]
[in a new window]

 
Table 4. The original times series (left) and 16 attractors (right) of the 15 discrete expression profiles (labeled with representative genes) for 100 000 networks inferred from the LPS dataset (dark gret—up-regulated, light grey—down-regulated)

 
We verified the contents of the discretized clusters against current knowledge of toll-like receptor signaling networks (see for example, the toll-like receptor Kegg maps at http://www.genome.jp). In the RANTES cluster, for example, we found cytokines such as I-TAC, MIPS-1ß, IP-10 and IL-1B. All of these are induced by NF-{kappa}B and should therefore be co-expressed. Another example is the TNF{alpha} cluster, including genes such as A20 and IKB{alpha}. These genes are transcribed by NF-{kappa}B, but (unlike the previous cytokines) are negative regulators that shutdown NF-{kappa}B activity. As a final example, we found that genes such as P38, JNK, IKK{varepsilon} and NIK cluster together. These kinases regulate phosphorylation and ultimately activity in the transcription factors NF-{kappa}B and AP-1.

Using COUNT_NETWORK we identified a total of 311 039 826 possible networks matching the 15 expression profiles. From these possible networks, we sampled 100 000 networks and computed their steady-state dynamics using the RUN_NETWORK and ATTRACTOR routines. These 100 000 networks produced only 16 steady-state dynamics, all of which were fixed points. These attractors are shown in Table 4.

In addition to the fact that there were only 16 attractors from 100 000 networks, it is also interesting to note that most of these attractors were very similar. In particular, the 16 attractors discovered were all fixed points, meaning that the genes in the sampled networks did not fluctuate at steady state. This may indicate cell death, which would be consistent with previous knowledge that LPS triggers an innate immunity response through TLR4 (Beutler, 2004) eventually leading to apoptosis in macrophage cells (Xaus et al., 2000).

To corroborate these findings, we searched the microarray dataset for probes corresponding to activators of apoptosis (GO ID = 43 065) and probes corresponding to inhibitors of apoptosis (GO ID = 43 066). Out of 4188 annotated probes, we found 25 positive regulators and 14 negative regulators. For the 15 discrete expression profiles, the average number of up-regulated genes was 48 with 5% SD. This average percentage can be contrasted with the average percentages for the 16 fixed-point attractors. For the attractors, an average of 58% apoptosis activators was up-regulated, while only 42% of apoptosis inhibitors were up-regulated. In certain cases, these contrasts were even more pronounced for particular steady states (64 and 36% for attractors 7 and 15). The full set of percentages can be found in Table 5.


View this table:
[in this window]
[in a new window]

 
Table 5. Attractor distribution among 100 000 sampled networks

 
To further quantify these results, we inferred a set of 100 000 networks using random expression profiles for the 15 meta-genes in Table 4. These 100 000 networks produced 77 583 steady-state dynamics, 10 114 of which were fixed points. Using the random networks, we computed for each of our 16 microarray attractors the fraction of random networks (with steady-state dynamics) having more or less than the percentage of apoptosis activators or inhibitors found using the microarray data. These values are recorded in the fifth column of Table 5 and are empirical probabilities that a given pair of percentages could be obtained at random. For instance, the probability that attractor 16 would have >68% apoptosis activators and <43% apoptosis inhibitors is 0.02. If we further restrict ourselves to fixed-point steady states, we obtain the fractions recorded in column 6 of Table 5. Thus the probability that attractor 16 would occur by random chance with a fixed-point steady state having >68% apoptosis activators and <43% apoptosis inhibitors is 0.003. In general, the numbers in column 6 of Table 5 indicate that our results are significant.


    4 DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
There are typically many genetic regulatory networks that will match a given time series dataset. Despite this fact, current algorithms available for the inference of regulatory networks produce only a single network. Depending on the method this network might be chosen to be the most probable (Bayesian) or have the lowest dimensional hidden representation (matrix decomposition). In this article we consider all possible networks matching a given time series dataset and group the networks according to dynamics. This approach is appealing due to the fact that we are considering networks which may be missed using other approaches and that biological systems are thought to be robust to variation (Wagner, 2005) so that different networks with similar dynamics may very well be biologically equivalent.

We first considered an IL-2-stimulated immune response dataset. Using this dataset we discovered that dynamics could be used to eliminate 99.4% of 161 558 possible matching networks. We then produced a composite network using the remaining 0.6% of possible networks. This network confirmed known biological results, namely the identification of early-, intermediate- and late-responding genes to IL-2 stimulation.

Next, in the case of a LPS-stimulated macrophage response dataset we reduced 100 000 sampled networks to 16 fixed point dynamics. These dynamics identified up- and down-regulated apoptosis activators and inhibitors which again agreed with known results for the TLR4 apoptosis pathway.

However, our direct use of dynamics raises additional questions that have not been considered in previous algorithms. These questions have been investigated abstractly in recent work, and should be taken into account in a practical setting such as ours. First, there is the issue of attractor scaling with network size. It was originally thought (Kauffman, 1993) that the number of attractors scaled with the square root of the number of nodes in a network. Recent studies (Bilke and Sjunnesson, 2001) and theoretical work (Samuelsson and Troein, 2003) have shown this to be untrue. In fact, the number of attractors scales superpolynomially with network size. Second, there is the issue of computational artifact. In particular, Boolean networks are typically modeled by simultaneous (synchronous) update of all nodes at each time step. Such networks reduce both theoretical and practical complications. However, it has been discovered that many attractors in synchronous Boolean networks disappear when using asynchronous updates (Bagley and Glass, 1996). To further complicate these issues, stable attractors may be immune to both of these problems (Klemm and Bornholdt, 2005).

Our approach deals with the first issue (attractor scaling) by limiting the number of nodes by using clusters and limiting the network type by using activiation–inhibition functions only. We also limit the number of attractors due to fixed initial conditions. The second issue (attractor artifact) is much more difficult to accommodate, since it implies that attractors found by our method may be artifacts of computation with no biological relevance. To address this issue, we compared the results of our method with experimentally confirmed and/or suspected behavior. A possible computational solution for future study would be the use of some asynchronous update and/or stability criterion, as suggested by the work in Klemm and Bornholdt (2005).

We have shown that the use of dynamics can be an interesting approach for analyzing different networks matching expression profiles for a time series gene expression dataset. Dynamics can be useful when trying to understand the overall behavior of a system and the consequences of this behavior on possible pathways. Dynamics can be particularly useful for isolating networks of interest that relate to a particular behavior under investigation.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
This work was funded by Sandia Laboratory Directed Research and Development. Sandia is a multi-program laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Golan Yona

Received on September 14, 2006; revised on December 29, 2006; accepted on January 19, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Akutsu T, et al. Inferring qualitative relations in genetic networks and metabolic pathways. Bioinformatics (2000) 16:727–734.[Abstract/Free Full Text]

    Alter O, Golub GH. Reconstructing the pathways of a cellular system from genome-scale signals by using matrix and tensor computations. Proc. Natl. Acad. Sci. USA (2005) 102:17559–17564.[Abstract/Free Full Text]

    Bagley RJ, Glass L. Counting and classifying attractors in high dimensional dynamical systems. J. Theor. Biol (1996) 183:269–284.[CrossRef][Web of Science][Medline]

    Basso K, et al. Reverse engineering of regulatory networks in human B cells. Nat. Genet (2005) 37:382–390.[CrossRef][Web of Science][Medline]

    Beutler B. Inferences, questions and possibilities in Toll-like receptor signalling. Nature (2004) 430:257–263.[CrossRef][Medline]

    Bilke S, Sjunnesson F. Stability of the Kauffman model. Phys. Rev. E (2001) 65:016129-1–016129-5.

    Brun M, et al. Steady-state probabilities for attractors in probabilistic Boolean networks. Signal Processing (2005) 85:1993–2013.[CrossRef][Web of Science]

    Ching W.-K, et al. On construction of stochastic genetic networks based on gene expression sequences. Inter. J. Neur. Sys (2005) 15:297–310.[CrossRef]

    de Jong H. Modeling and simulation of genetic regulatory systems: a literature review. J. Comput. Biol (2002) 9:67–103.[CrossRef][Web of Science][Medline]

    Eisen MB, et al. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA (1998) 95:14863–14868.[Abstract/Free Full Text]

    Friedman N, Koller D. Being Bayesian about network structure: a Bayesian approach to structure discovery in Bayesian networks. Mach. Learn (2003) 50:95–126.[CrossRef]

    Friedman N, et al. Using Bayesian networks to analyze expression data. J. Comput. Biol (2000) 7:601–620.[CrossRef][Web of Science][Medline]

    Getz G, et al. Coupled two-way clustering analysis of gene microarray data. Proc. Natl. Acad. Sci. USA (2000) 97:12079–12084.[Abstract/Free Full Text]

    Goss PJ, Peccoud J. Quantitative modeling of stochastic systems in molecular biology by using stochastic Petri nets. Proc. Natl. Acad. Sci. USA (1998) 95:6750–6755.[Abstract/Free Full Text]

    Huang S. Gene expression profiling, genetic networks, and cellular states: an integrating concept for tumorigenesis and drug discovery. J. Mol. Med (1999) 77:469–480.[CrossRef][Web of Science][Medline]

    Huang S, Ingber DE. Shape-dependent control of cell growth, differentiation, and apoptosis: switching between attractors in cell regulatory networks. Exp. Cell. Res (2000) 261:91–103.[CrossRef][Web of Science][Medline]

    Kauffman SA. Metabolic stability and epigenesis in randomly constructed genetic nets. J. Theor. Biol (1969) 22:437–467.[CrossRef][Web of Science][Medline]

    Kauffman SA. The Origins of Order: Self-Organization and Selection in Evolution. (1993) New York: Oxford University Press.

    Klemm K, Bornholdt S. Stable and unstable attractors in Boolean networks. Phys. Rev. E (2005) 72:055101.[CrossRef]

    Lahdesmaki H, et al. Relationships between probabalistic Boolean networks and dynamic Bayesian networks as models of gene regulatory networks. Signal Processing (2006) 86:814–834.[CrossRef][Web of Science][Medline]

    Liang S, et al. REVEAL: a general reverse engineering algorithm for inference of genetic network architectures. In: Pacific Symposium on Biocomputing (PSB’98). (1998) Singapore: World Scientific Publishing.

    Liao JC, et al. Network component analysis: reconstruction of regulatory signals in biological systems. Proc. Natl. Acad. Sci. USA (2003) 100:15522–15527.[Abstract/Free Full Text]

    Murphy K, Mian S. Modelling Gene Expression Data Using Dynamic Bayesian Networks. (1999) Berkeley, CA: University of California.

    Nelson BH, Willerford DM. Biology of the interleukin-2 receptor. Adv. Immunol (1998) 70:1–81.[Web of Science][Medline]

    Samuelsson B, Troein C. Superpolynomial growth in the number of attractors in Kauffman networks. Phys. Rev. Lett (2003) 90:098701-1–098701-4.

    Segal E, et al. Learning module networks. J. Mach. Learn. Res (2005) 6:557–588.[Web of Science]

    Shmulevich I, et al. Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks. Bioinformatics (2002a) 18:261–274.[Abstract/Free Full Text]

    Shmulevich I, et al. From Boolean to probabalistic Boolean networks as models of genetic regulatory networks. Proc. IEEE (2002b) 90:1778–1792.[CrossRef]

    Shmulevich I, et al. Steady-state analysis of genetic regulatory networks modeled by probabilistic Boolean networks. Comp. Funct. Genomics (2003) 4:601–608.[CrossRef]

    Smola AJ, Scholkpof. A Tutorial on Support Vector Regression. (1998) London, UK: Holloway College, University of London.

    Smolen P, et al. Mathematical modeling of gene networks. Neuron (2000) 26:567–580.[CrossRef][Web of Science][Medline]

    Tamayo P, et al. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. USA (1999) 96:2907–2912.[Abstract/Free Full Text]

    Trefethen L, Bau D. Numerical Linear Algebra. (1997) Philadelpha, PA: SIAM.

    Wagner A. Robustness and Evolvability in Living Systems. (2005) Princeton, NJ: Princeton University Press.

    Xaus J, et al. LPS induces apoptosis in macrophages mostly through the autocrine production of TNF-alpha. Blood (2000) 95:3823–3831.[Abstract/Free Full Text]

    Zhang Y, et al. A time-series biclustering algorithm for revealing co-regulated genes. In: Proc. Int. Conf. Inf. Tech. Coding and Comp. (ITCC). IEEE Computer Society. (2005).

    Zou M, Conzen SD. A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data. Bioinformatics (2005) 21:71–79.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Microbiol. Mol. Biol. Rev.Home page
S. A. F. T. van Hijum, M. H. Medema, and O. P. Kuipers
Mechanisms and Evolution of Control Logic in Prokaryotic Transcriptional Regulation
Microbiol. Mol. Biol. Rev., September 1, 2009; 73(3): 481 - 509.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D. Nam, S. H. Yoon, and J. F. Kim
Ensemble learning of genetic networks from time-series expression data
Bioinformatics, December 1, 2007; 23(23): 3225 - 3231.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary data
Right arrow All Versions of this Article:
23/7/866    most recent
btm021v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (13)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Martin, S.
Right arrow Articles by Faulon, J.-L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Martin, S.
Right arrow Articles by Faulon, J.-L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?