Skip Navigation


Bioinformatics Advance Access originally published online on July 14, 2006
Bioinformatics 2006 22(18):2276-2282; doi:10.1093/bioinformatics/btl380
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/18/2276    most recent
btl380v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (9)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Chang, Y.-H.
Right arrow Articles by Chen, B.-S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Chang, Y.-H.
Right arrow Articles by Chen, B.-S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Identification of transcription factor cooperativity via stochastic system model

Yu-Hsiang Chang {dagger}, Yu-Chao Wang {dagger} and Bor-Sen Chen *

Laboratory of Control and Systems Biology, Department of Electrical Engineering National Tsing Hua University, Hsinchu 300, Taiwan

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 

Motivation: Transcription factor binding sites are known to co-occur in the same gene owing to cooperativity of the transcription factors (TFs) that bind to them. Genome-wide location data can help us understand how an individual TF regulates its target gene. Nevertheless, how TFs cooperate to regulate their target genes still needs further study. In this study, genome-wide location data and expression profiles are integrated to reveal how TFs cooperate to regulate their target genes from the stochastic system perspective.

Results: Based on a stochastic dynamic model, a new measurement of TF cooperativity is developed according to the regulatory abilities of cooperative TF pairs and the number of their occurrences. Our method is employed to the yeast cell cycle and reveals successfully many cooperative TF pairs confirmed by previous experiments, e.g. Swi4-Swi6 in G1/S phase and Ndd1-Fkh2 in G2/M phase. Other TF pairs with potential cooperativity mentioned in our results can provide new directions for future experiments. Finally, a cooperative TF network of cell cycle is constructed from significant cooperative TF pairs.

Contact: bschen{at}ee.nthu.edu.tw

Supplementary information: http://www.ee.nthu.edu.tw/~bschen/cooperativity/


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
Precise transcriptional control is one of the major reasons for different gene expression and regulation. Owing to advances in DNA microarray technology and genome sequencing, measuring gene expression levels on a genomic scale has become possible. Measuring gene expression time profiles can help us understand mechanisms of transcriptional regulation, including functional significance of cell cycle regulation and response of environmental changes (Spellman et al., 1998; Gasch et al., 2000). However, time profiles alone are not enough to identify precisely the whole transcriptional regulatory network. More precise method also needs the binding information of transcription factors (TFs) to promoters in DNA. Genes are always regulated by a number of upstream regulatory genes through binding of TFs to specific sites in the DNA promoter region. To construct a gene regulatory network, it is important to understand the binding relationship between target genes and TFs. In recent studies, the genome-wide location (ChIP-chip) analysis is employed to obtain the binding information of TFs to promoters in DNA (Iyer et al., 2001; Simon et al., 2001; Lee et al., 2002; Harbison et al., 2004; Chang et al., 2005; Lin et al., 2005). However, these studies did not shed light on the interactions or cooperativity between TFs.

With advances in experimental approaches and abundant data sources, functional genomics has begun to investigate the more complex, cooperative TF interactions to regulate properly gene expression. In order to find cooperative TF interactions, a statistical technique was employed to identify significant homotypic or heterotypic TF binding site clusters (Wagner, 1999). Moreover, the correlation of genome-wide expression profiles was employed to uncover functional motif combinations in the promoters (Pilpel et al., 2001). Furthermore, genome-wide location data (ChIP-chip) and gene expression profiles were integrated to assess TF cooperativity rigorously (Banerjee and Zhang, 2003; Kato et al., 2004; Tsai et al., 2005). However, these studies only use expression correlation score in the view of statistics to determine TF cooperativity. We are not aware of the previous attempts to construct an overall transcriptional regulatory system to uncover TF cooperativity from the dynamic system perspective. In this study, we assess TF cooperativity not only in dynamic system perspective, but also in statistics. We could get more insight into the regulatory mechanism by constructing its dynamic system model and could also make more precise prediction of TF cooperativity.

A dynamic model is often employed to describe a complex and kinetic system in many fields. Systems biology and computational biology methods have recently been widely employed to describe the biological functions from the dynamic system point of view (Hasty et al., 2002; Davidson et al., 2003; Hood, 2003; Tegner et al., 2003; Chen et al., 2004). In this article, we exploit genome-wide location data (Harbison et al., 2004) and gene expression profiles (Spellman et al., 1998) to construct a dynamic regulatory model for the genes of interest. The dynamic regulatory model is developed to describe how the upstream regulatory genes control a target gene to produce the output mRNA expression through its regulatory network. In order to reduce computational complexity, we estimate TF cooperativities one gene by one gene. In the dynamic regulatory model, there are two groups of input regulations affecting expression profiles of the target gene. The first group is due to individual TF regulation, and the other is due to TF cooperativity regulation. After estimating the regulatory abilities of these two groups, we can get all TF abilities for regulating the transcriptional process of the target gene. It is more possible to have TF cooperativity if binding sites for specific pairs of regulators co-occur more frequently within the same promoter regions of target genes (Harbison et al., 2004). Therefore, according to the estimated cooperative TF abilities and the number of TF pairs co-binding to the target genes, a statistical measurement method is specified to assess TF cooperativity rigorously. The statistical measurement method is proposed to reduce the possible estimation error owing to few data points. The new measurement of TF cooperativity is the multiplicative cooperative p-value, which can be obtained by multiplying all individual p-values of different target genes to estimate efficiently the cooperativity of any cooperative TF pairs on a genomic scale. Further, we use the shuffling method to avoid overfitting to confirm the reliability of the proposed method.

Our results show that many cooperative TF pairs that were previously characterized through experiments indeed have a high cooperativity in our analysis, thus validating the method we proposed as a TF cooperativity predictor and providing valuable information for further analysis. A cooperative network is also constructed with significant cooperative TF pairs for the yeast cell cycle. Finally, the results reveal several novel possible cooperative TF pairs not found in previous studies, thus providing new directions for future experiments.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
2.1 Selecting and processing experimental data
We used the genome-wide yeast microarray hybridization data of Spellman et al. (1998) as our mRNA expression profiles. They proposed many experimental methods for resetting the yeast cell cycle to measure mRNA expression profiles for the whole genome comprehensively. Here, we used the 768 cell cycle-related genes selected by Simon et al. (2001) from ‘{alpha} factor’ experimental cell cycle data as the target genes. ‘{alpha} factor’ is one experimental technique to synchronize cell cycle. The genome-wide location data are taken from Harbison et al. (2004), in which the genomic occupancy of 203 DNA-binding TFs is determined in yeast. We selected the significant binding set using a binding p-value p < 0.0015 as the inputs in the dynamic gene transcriptional regulatory model.

In genome-wide expression profiles of Spellman et al. (1998), there are many missing data. We used the cubic spline interpolation method (Faires and Burden, 1998), which employs piecewise third-order polynomials to fit data points, for compensating these missing values in the profiles. Furthermore, in order to avoid the overfitting estimation of the profiles, we also use the cubic spline method to interpolate the data points when there are more than six TFs binding to the target gene, which implies that the number of parameters to be estimated is 23. Only 5% of the target genes are bound by more than six TFs (Supplementary Material). In order to fit the gene dynamic model in the linear scale, the microarray data are returned from the log2 scale to the linear scale.

2.2 Dynamic model of gene regulatory networks
First, we consider a gene regulatory network as a system block with several regulatory genes as inputs and a target gene as output. Owing to random noise and fluctuation at the molecular level, the transcriptional behavior of the gene regulatory network is described by a stochastic discrete dynamic equation, and the general form of transcriptional regulation for a target gene is written as follows:

Formula 1(1)
where y[t] represents mRNA expression level of the target gene at time point t, and the parameter a indicates the effect of the present state value y[t] to the next state value y[t + 1]. xi[t] i isin {1, 2, ... , N} represent the regulation functions of N TFs binding to the target gene and bi indicates the regulatory ability of the i-th TF. G[t] is the possible regulatory function of cooperative TFs. k represents the basal level from other factors, and {varepsilon}[t] denotes a stochastic noise owing to model uncertainty and fluctuation of mRNA microarray data in the target gene.

For system identification of gene regulatory network, it is more practical to consider the biochemical reaction between the TF regulation functions xi[t] at the motif binding sites and their relevant mRNA expression profiles yi[t] of the upstream regulatory gene (Goldbeter and Koshland, 1981; Mestl et al., 1995). For this purpose, we describe the binding regulation function xi[t] of the TF as the following sigmoid function

Formula 2(2)
where r denotes the transition rate of the sigmoid function and Mi denotes the mean of mRNA expression level of the regulatory gene i. yi[t] and xi[t] represent the mRNA expression profiles of the i-th regulatory gene and the binding regulation function of the corresponding TF, respectively. The sigmoid function can also be considered as a method for normalizing the expression profiles of regulatory genes between 0 and 1, which has been successfully employed to describe the binding of regulatory genes (Chen et al., 2004).

The regulatory function G[t] is combined by all possible TF cooperativities of the target gene. We describe the regulatory function G[t] in Equation (1) as the following:

Formula 3(3)
where xi·j[t] {equiv} fi·j(yi[tyj[t]) is a sigmoid function of product yi[t] · yj[t] to denote the binding function of cooperative TFs i and j, and ci·j denotes the regulatory ability (or kinetic activity) of the cooperative TFs i and j. N denotes the number of TFs binding to the target gene.

Substituting Equation (3) into Equation (1), we get the following dynamic transcriptional regulatory equation:

Formula 4(4)
In Figure 1, there are examples of constructing the discrete dynamic models for target genes. After describing the general stochastic dynamic model of transcriptional regulation, we could identify the models of gene expression from the available microarray data.


Figure 1
View larger version (24K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 Examples of dynamic transcriptional regulatory models of target genes CLB5 and HO. The candidates of regulatory TFs of target gene were obtained from the genome-wide location data of Harbison et al. (2004) when p-value was chosen as p < 0.0015. In the examples, the regulatory TFs of CLB5 are Mbp1 and Swi6, and the regulatory TFs of HO are Mbp1, Swi6 and Swi4. The regulations of individual TF are represented by solid lines and the regulations of possible TF cooperativities are represented by dotted lines.

 
2.3 Identifying gene regulatory networks
After constructing the discrete stochastic dynamic model, we use the method of maximum likelihood to estimate the parameters. The details are shown in Supplementary Material.

After identifying the parameters, the transcriptional regulatory network of target genes could be expressed as the following dynamic equation:

Formula 5(5)
where Formula 5, Formula 5, Formula 5, Formula 5 are the estimated parameters of a, bi, ci·j, k, respectively. Therefore, the TF cooperativity at each target gene could be evaluated by Formula 5, the regulatory ability of the cooperative TF pair, in Equation (5). However, Formula 5 in (5) is only from one target gene point of view. A statistical method should be developed to measure the cooperativity of TFs to combine all Formula 5 of the TF i and j for all target genes (i.e. on a genomic scale). It would be discussed in the following section.

2.4 Measurement of TF cooperativity
In order to measure the possible TF cooperativity, we define a new measurement, the multiplicative cooperative p-value PC, according to the statistics of regulatory abilities Formula 5 of all target genes in the dynamic model of Equation (5) and the number of TF pairs appearing in all target genes. For any TF pair, if the binding sites of specific TF pairs co-occur more frequently in the target genes, then it is more likely to have cooperativity between these two TFs (Harbison et al., 2004). In addition, the magnitude of the regulatory ability Formula 5 of TF cooperation in our dynamic model can be also useful to help us determine the significant TF cooperativity because the magnitude represents its importance in the transcriptional regulation of the target gene.

To define the multiplicative cooperative p-value PC on a genomic scale, we first calculate the individual p-value for each regulatory ability of a cooperative TF pair. The p-value is defined as (PB)m when a cooperative TF pair binds to the m-th target gene. For the m-th target gene bound by the cooperative TF pair i and j, we rewrote its regulatory ability Formula 5 as Formula 5. After constructing 1000 random permutations of xi·j[t] (Supplementary Material) in Equation (4) for the m-th target gene bound by TFs pair i and j, we used these random permutations to estimate 1000 different Formula 5 rewritten as Formula 5 and then computed a probability density Formula 5 of all absolute values Formula 5 according to their magnitudes by normalization. The probability density distribution Formula 5 of cooperativity is shown in Figure 2. Then, using Formula 5, we calculated the individual p-value (PB)m of TFs i and j at the m-th target gene by

Formula 6(6)
where Formula 6 denotes the regulatory ability of the cooperative TF pair i and j at the m-th target gene without random permutation.


Figure 2
View larger version (32K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2 Probability density function Formula 6 and normalized histogram for Formula 6. The pdf Formula 6 of all Formula 6 is obtained according to the number of Formula 6 occurrence estimated by random permuting Formula 6 for the m-th target gene bound by TF pair i and j. The individual p-value Formula 6 of the m-th target gene is the sum of heights of corresponding bars in practice.

 
When calculating individual p-value for each regulatory ability (PB)m, we use 1000 random permutations to infer the probability density function Formula 6. Theoretically, the continuous probability density function is obtained when the number of shuffling is infinite, which is not applicable in practice. Thus, we use the discrete normalized histogram to approximate the probability density function instead. In Figure 2, (PB)m should be the area under the smooth curve (pdf), but as we can calculate only the approximation of the pdf, (PB)m would be in fact the numbers of Formula 6, which are greater than Formula 6 normalized by the number of shuffling, i.e. the sum of heights of corresponding bars. In reality, the minimal value of (PB)m is equal to the inverse number of shuffling, i.e. 0.001 in the case of 1000 shuffling. Thus, taking a greater number of shuffling would make quite a different (PB)m value. However, choosing the number of shuffling would not affect the final significant cooperative TF pairs, as long as the number is large enough. So in order to reduce the computation complexity, we choose the number of shuffling to be 1000.

A TF pair may co-occur to bind many target genes, so we could obtain many individual p-values (PB)m, m isin {1, 2, ... , L}, where L is the number of the target genes bound by the possible cooperative TF pair i and j. In Figure 1, for example, TFs Mbp1 and Swi6 both bind to HO and CLB5. Therefore, the gene HO has its cMbpSwi6 and the corresponding (PB)HO, and so does CLB5 with its cMbpSwi6 and the corresponding (PB)CLB5. After computing all the individual p-value (PB)m for the TF pair i and j for all target genes, i.e. (PB)m for m isin {1, 2, ... , L}, we define the multiplicative cooperative p-value PC for the TF pair i and j by multiplying all of its (PB)m as follows:

Formula 7(7)
where L is the number of target genes bound by the possible cooperative TF pair i and j, and (PB)m denotes the individual p-value of the cooperative TF pair i and j binding to the m-th target gene.

In Equations (6) and (7), the more likely the TF cooperativity is, the smaller PC of the TF pair is. Therefore, there are two situations that make PC smaller. One is the larger number L of the TF pairs co-occurring in all target genes, i.e. with a larger number of individual p-values for the TF pair. This situation is the same with the result that the TF pair has a greater possibility of cooperation if binding sites of specific TF pairs co-occur more frequently in target genes (Harbison et al., 2004). Another situation is that the regulatory ability Formula 7 of the TF pair i and j is so significant at some target genes that the individual p-value (PB)m of the TF pair is small at the target genes. In this way, the multiplicative cooperative p-value PC also becomes small. Therefore, it is reasonable to use the multiplicative cooperative p-value PC to evaluate the TF cooperativity.

2.5 Determination of significant cooperative TF pairs
After calculation of the multiplicative cooperative p-value PC, we have a list of cooperative TF pairs. The list has Formula 7 pairs with their PC values, respectively, where N represents the total number of TFs. Sorted by the PC values (the smaller PC means the more likely cooperative TF pair), it can be presented as a list in which we can find the most possible cooperative TF pairs to the least possible ones. However, which TF pairs are considered as significant cooperative TF pairs should be determined.

Because the measurement of TF cooperativity depends on the individual p-values PB and the number of the target genes bound by the possible cooperative TF pair L, we determine the threshold of significant cooperative pairs by PB and L as well. The idea is that we determine the threshold of PB and L, respectively, and then calculate the significance threshold as

Formula 8(8)
PC,Threshold is defined as PB,Threshold to the power of LThreshold, where PB,Threshold and LThreshold denote the threshold of PB and L, respectively. How to determine PB,Threshold and LThreshold is as follows. We first construct PB distribution and L distribution according to all PB values and all L values. Then at the significance level 0.05 of PB distribution and L distribution, we can determine the significance thresholds of PB and L, respectively. Because a smaller PB means that the PB is more significant, so PB,Threshold is determined as the fifth percentile of the PB distribution. For LThreshold, a larger L means that the L is more significant, so LThreshold is determined as the 95th percentile of the L distribution. By PB,Threshold, LThreshold and (8), we can calculate the significance threshold PC,Threshold and determine the significant cooperative TF pairs, which PC are smaller than PC,Threshold. Once the significance threshold PC,Threshold is determined, whether or not PC is significant at that level is binary.


    3 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
3.1 Analysis of TF cooperativity
We integrated the expression profiles of the yeast cell cycle taken from Spellman et al. (1998) and genome-wide location data of Harbison et al. (2004) to identify the cooperativity of all possible TF interaction pairs among 203 TFs. After constructing and identifying parameters of gene transcriptional regulatory networks of yeast, we calculated the multiplicative cooperative p-value PC of each possible TF interaction pair. Sorted by PC values, all possible TF cooperative pairs are ranked according to their cooperative possibilities. We then calculate the significance threshold PC,Threshold and determine the significant cooperative TF pairs. In this study, the significance threshold PC,Threshold =10–21 and we identified 55 significant cooperative TF pairs, which are shown in Table 1. According to these results, the cooperative network of significant cooperative TF pairs is given in Figure 3. Of the results listed, 72.73% are confirmed by evidence in many studies. Most of these TF cooperative pairs confirmed are related to the cell cycle of yeast. It may be because our source data are the yeast cell cycle expression profiles so that most of these TF cooperative pairs found in the list are related to cell cycle. Moreover, the TF cooperative pairs that have not yet been proved but are found by the method we proposed can provide a direction for future experiments. The details of the TF cooperativities we found are discussed below.


Figure 3
View larger version (24K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3 The significant cooperative TF network of cell cycle target genes. The cooperative TF pairs confirmed by literature evidences are shown in solid lines, and those still to be confirmed are expressed in dotted lines.

 


View this table:
[in this window]
[in a new window]

 
Table 1 Significant cooperative TF pairs of cell cycle target genes

 
3.2 Cell cycle
3.2.1 G1/S Phase (Swi4-Swi6, Mbp1-Swi6, Mbp1-Swi4, Stb1-Swi6)
The G1 to S phase transition of the eukaryotic cell cycle is a critical point for the coordination of cell cycle progression with cellular growth. SBF and MBF are sequence-specific TFs that activate gene expression to mediate the G1/S transition of the cell cycle in yeast (Iyer et al., 2001). SBF is a protein complex of Swi4 and Swi6, and MBF is a protein complex of Mbp1 and Swi6 (Koch et al., 1993). The related Swi4 and Mbp1 proteins are the DNA-binding components of the respective factors, and Swi6 may have a regulatory function (Dirick et al., 1992; Primig et al., 1992). The cooperativity between Swi4 and Swi6 is included in our results, and so is the cooperativity between Mbp1 and Swi6. In addition, Mbp1 and Swi4 sharing 50% identity in their DNA binding domains are found with the probable cooperativity in our results (Koch et al., 1993).

In the late G1 phase, Stb1 is an important regulator controlling the timing of start transcription that is revealed in the absence of the G1 regulator Cln3 and binds to Swi6 in vivo (Ho et al., 1999). The interaction of Stb1 and Swi6 confirmed by Ho et al. (1999) and Costanzo et al. (2003) is also found in our results.

3.2.2 G2/M Phase (Ndd1-Fkh1, Ndd1-Fkh2, Fkh1-Fkh2, Mcm1-Fkh2, Ndd1-Mcm1)
In the G2/M phase, the important regulator is SFF, which is a larger transcription factor containing Ndd1, Fkh1 and Fkh2 (Koranda et al., 2000; Kumar et al., 2000; Pic et al., 2000; Futcher, 2002). Our results strongly indicate the cooperativity between Ndd1 and Fkh2, which are components of SFF. Another cooperativity between Ndd1 and Fkh1, which are also components of SFF, is found in our results, too. In addition, Fkh1 and Fkh2 are found with cooperativity in our results. Fkh1 and Fkh2 share 72% identical DNA binding domain, and the double mutant of Fkh1 and Fkh2 displays obvious morphological change (Kumar et al., 2000).

SFF is thought to regulate a program of mitotic transcription in conjunction with the transcription factor Mcm1. Moreover, Fkh2, a component of SFF, assembles into a ternary complex with Mcm1 on both the SWI5 and CLB2 cell cycle genes (Koranda et al., 2000; Kumar et al., 2000). The fact that Fkh2 shows cooperativity with Mcm1 is also listed in our result. In addition to Ndd1 and Fkh2, Mcm1 and Fkh2, another cooperative TF pair in the G2/M phase, Ndd1 and Mcm1, is found in our results. For the above cooperative TF pairs found in the G2/M phase, it is not surprising to find experimentally that Mcm1, Fkh2 and Ndd1 also form a complex to regulate the CLB2 gene and other genes (Kumar et al., 2000; Zhu et al., 2000).

3.2.3 M/G1 Phase (Ace2-Swi5)
In the M/G1 phase, we found that the TF pair, Ace2 and Swi5, is cooperative. Ace2 and Swi5 is a pair of TFs of yeast that regulates the expression of many cell cycle-specific genes (Doolin et al., 2001). In recent studies, Ace2 and Swi5 cooperate to induce the expressions of a subset of genes, but the antagonistic interaction between Ace2 and Swi5 has been found (Doolin et al., 2001). With 82% identical DNA binding domains, Ace2 and Swi5 bind to the same DNA sequence (McBride et al., 1999), and it is possible that proteins compete for access to these promoters, but only one activates transcription (Doolin et al., 2001). Therefore, one partner of Swi5 and Ace2 sometimes can have a stronger contribution towards regulation, and the finding of antagonistic interaction of Ace2 and Swi5 is not surprising.

3.3 Mating (Ste12–Dig1)
The TF Ste12 is responsible for activating genes in response to MAP kinase cascades controlling mating and filamentous growth. Two inhibitors Dig1 and Dig2 regulate Ste12 negatively (Olson et al., 2000). It was found that Dig1 and Dig2 do not function through redundant mechanisms, but rather inhibit pheromone-responsive transcription through interactions with separate regions of Ste12 (Olson et al., 2000). In our results, we indeed found that Ste12 cooperates with Dig1. However, in the location data of Harbison et al., Dig2 is not listed in the 203 TF. That is why another protein Dig2 was not found to show cooperativity with Ste12 in our results. If the genome-wide location data are comprehensive for more TFs, we believe that our method can predict more cooperative TF pairs, like Dig2 and Ste12.

3.4 Comparison with results of other methods
Pilpel et al. (2001) uncover functional motif combinations in the promoters of Saccharomyces cerevsiae using microarray data. They uncover not only cell cycle-related motifs, but also sporulation and various stress responses. Focusing on cell cycle, they identified only 10 motif pairs. Comparing our results with the cell cycle results from Pilpel et al. (2001), there are only three TF pairs in common (Supplementary Material). This may be because they only use microarray data but not use ChIP-chip data to infer combinatorial motifs.

Banerjee and Zhang (2003) integrated genome-wide location data from Lee et al. (2002) and gene expression data from Cho et al. (1998) to infer cooperativity among transcription factors by expression correlation. Comparing the TF cooperativities we found with Banerjee and Zhang's results show that many cooperative TF pairs confirmed by literature evidences are found in both results even though their study based on different dataset (Supplementary Material). Furthermore, our results indicate more cooperative TF pairs confirmed by literature evidences but not found by Banerjee and Zhang's methods. However, there are still four pairs, namely Arg80–Arg81, Ace2–Hsf1, Hsf1–Skn7 and Hir1–Hir2, found in Banerjee and Zhang's results but not in our results. The details can be seen in Supplementary Material.

Tsai et al. (2005) use statistical methods (ANOVA) to identify synergistic pairs of yeast cell cycle TFs. They combined ChIP-chip data from Harbison et al. (2004) and microarray data from Spellman et al. (1998) as we did. Comparing our results with Tsai et al.'s results (confident synergistic TF pairs as stated in their study), we find that many cooperative TF pairs confirmed by literature evidences are found in both results. Once more, our method finds more real cooperative TF pairs than Tsai et al.'s method. The overlap of our results, Tsai et al.'s results and cooperative TF pairs supported by literature evidences is shown in Figure 4. In addition, there are four pairs, Fkh2-Swi6, Fkh1-Mbp1, Dat1-Hap1 and Hap1-Rap1, which are still not confirmed by literature evidences, are also found by both results. These TF pairs provide directions for future experiments. From Figure 4, we also find that there are three pairs, Hir1–Hir2, Hir1–Hir3 and Hir2–Hir3, found in Tsai et al.'s results but not in ours. These three TF pairs could be considered as false negative TF pairs of our method. After checking these three cooperative TF pairs in detail, we find that Hir1 and Hir2 co-occurred to bind only six target genes of the 768 cell cycle genes from the location data of Harbison et al. (2004) at p-value p < 0.0015. For this cooperative TF pair, the number L in Equation (7) is only 6, and that is why its PC is not small enough to be considered significant. The situations of Hir1–Hir3 and Hir2–Hir3 are the same, thus these three pairs are considered as false negatives in our results. Finally, we calculated the confirmed rates, which are the rates of both results compared with literature evidences, of the TF pairs listed in our findings and Tsai et al.'s results (confident synergistic TF pairs as stated in their study). The confirmed rate of our method, 52.73%, is higher than that of Tsai et al.'s, 28.89%.


Figure 4
View larger version (49K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4 Overlap of our results, Tsai et al.'s (2005) results, and cooperative TF pairs supported by literature evidences. There are total 45 and 55 cooperative pairs in Tsai et al.'s results (confident synergistic TF pairs as stated in their study) and in our results, respectively. By comparing both results, we can easily find that the confirmed rate of our method is higher than that obtained by Tsai et al.'s method.

 

    4 DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
In this study, we successfully identified cooperative TF pairs by integrating genome-wide location data and expression profiles. Our method is based on transcriptional regulatory mechanisms of TFs and their cooperativities at binding sites from dynamic system perspective. Unlike others' studies using statistical correlation (Pilpel et al., 2001; Banerjee and Zhang, 2003; Tsai et al., 2005), our method provides the view of dynamic regulatory systems to mimic the transcriptional procedure. The measure of TF cooperativity of TFs is developed by a multiplicative cooperative p-value according to the statistics of estimated regulatory ability of cooperative genes in the dynamic regulatory model. We developed a method to determine significant cooperative TF pairs among all possible TF cooperativities as well. Our results showed that many cooperative TF pairs identified by our method are confirmed by literature evidences. We also indicated several possible TF pairs with cooperativity that are listed in our results but not confirmed yet. These provide new directions for future experiments. Moreover, it was shown that the confirmed rate of our method is higher than other methods using statistical correlation (Pilpel et al., 2001; Banerjee and Zhang, 2003; Tsai et al., 2005).

From the cooperative network of significant TF cooperative pairs shown in Figure 3, we can find that the interaction links among several TFs (Swi4, Swi6, Fkh1, Fkh2, Ndd1 and Mbp1) are more compact than others (compactness means that there are more interaction links with the TFs). All of these TFs have important roles in the yeast cell cycle, so it is likely that there exist TF cooperativities among these TFs. However, some of these TFs may co-bind to many target genes because of their importance for the yeast cell cycle but nevertheless they show no cooperativity among them. In our analysis, this situation will lead to a false positive (the false positive pairs are listed in the Supplementary Material). It may be another possible reason why the interaction links among these important TFs are more compact. We also list the TF cooperative pairs that are confirmed by literature evidences but not shown up in our results, i.e., the false negatives. The false negative rate is 27.27% (Supplementary Material).

In order to confirm the reliability of the proposed method, the shuffling method was used to test the overfitting. In the expression data shuffling case, only 60% of TF cooperativities are found, which may be due to the correct ChIP-chip data and the uncertainty reduction by p-value detection method. In the location data shuffling, no cooperativities of TFs are found due to the same significance threshold PC,Threshold as the original result without shuffling, i.e. 10–21. When comparing the results, we found that there is no overfitting by the proposed method (Supplementary Material).

With the microarray data of the yeast cell cycle of Spellman et al. (1998) and the ChIP-chip data of Harbison et al. (2004), it is shown that our method can successfully predict the cooperative TF pairs of the yeast cell cycle. If the gene expression profiles and genome-wide location data of other process for yeast or other species, such as the heat shock stress, are available, our method could be employed to identify the cooperative TF pairs of different stress conditions or other species (Supplementary Material). Furthermore, our method can be extended to identify the cooperativities among more than two TFs by adding extra terms generated by multiplications of their expression profiles through the sigmoid function. Moreover, we can integrate protein-protein interaction data to determine interactions not only among TFs but also among TFs and other proteins in our model (Supplementary Material). That is, by modifying the method we proposed, we can construct an integrated cellular network of transcription regulation and protein–protein interaction.


    Acknowledgments
 
The authors thank Professor Wen-Ping Hsieh for her valuable discussions and comments.

Conflict of Interest: none declared.


    FOOTNOTES
 
{dagger}The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors. Back

Associate Editor: Martin Bishop

Received on May 8, 2006; revised on June 30, 2006; accepted on July 6, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 

    Banerjee, N. and Zhang, M.Q. (2003) Identifying cooperativity among transcription factors controlling the cell cycle in yeast. Nucleic Acids Res, . 31, 7024–7031[Abstract/Free Full Text].

    Chang, W.C., et al. (2005) Quantitative inference of dynamic regulatory pathways via microarray data. BMC Bioinformatics, 6, 44[CrossRef][Medline].

    Chen, H.C., et al. (2004) Quantitative characterization of the transcriptional regulatory network in the yeast cell cycle. Bioinformatics, 20, 1914–1927[Abstract/Free Full Text].

    Cho, R.J., et al. (1998) A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell, 2, 65–73[CrossRef][Web of Science][Medline].

    Costanzo, M., et al. (2003) G1 transcription factors are differentially regulated in Saccharomyces cerevisiae by the Swi6-binding protein Stb1. Mol. Cell Biol, . 23, 5064–5077[Abstract/Free Full Text].

    Davidson, E.H., et al. (2003) Regulatory gene networks and the properties of the developmental process. Proc. Natl. Acad. Sci. USA, 100, 1475–1480[Abstract/Free Full Text].

    Dirick, L., et al. (1992) A central role for SWI6 in modulating cell cycle start-specific transcription in yeast. Nature, 357, 508–13[CrossRef][Medline].

    Doolin, M.T., et al. (2001) Overlapping and distinct roles of the duplicated yeast transcription factors Ace2p and Swi5p. Mol. Microbiol, . 40, 422–432[CrossRef][Web of Science][Medline].

    Faires, J.D. and Burden, R. Numerical Methods, (1998) 2nd edn Brooks/Cole Publishing Company, Pacific Grove, CA, pp. 87–91.

    Futcher, B. (2002) Transcriptional regulatory networks and the yeast cell cycle. Curr. Opin. Cell Biol, . 14, 676–683[CrossRef][Web of Science][Medline].

    Gasch, A.P., et al. (2000) Genomic expression programs in the response of yeast cells to environmental changes. Mol. Biol. Cell, 11, 4241–4257[Abstract/Free Full Text].

    Goldbeter, A. and Koshland, D.E., Jr. (1981) An amplified sensitivity arising from covalent modification in biological systems. Proc. Natl. Acad. Sci. USA, 78, 6840–6844[Abstract/Free Full Text].

    Harbison, C.T., et al. (2004) Transcriptional regulatory code of a eukaryotic genome. Nature, 431, 99–104[CrossRef][Medline].

    Hasty, J., et al. (2002) Engineered gene circuits. Nature, 420, 224–230[CrossRef][Medline].

    Ho, Y., et al. (1999) Regulation of transcription at the Saccharomyces cerevisiae start transition by Stb1, a Swi6-binding protein. Mol. Cell Biol, . 19, 5267–5278[Abstract/Free Full Text].

    Hood, L. (2003) Systems biology: integrating technology, biology, and computation. Mech. Ageing. Dev, . 124, 9–16[CrossRef][Web of Science][Medline].

    Iyer, V.R., et al. (2001) Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature, 409, 533–538[CrossRef][Medline].

    Kato, M., et al. (2004) Identifying combinatorial regulation of transcription factors and binding motifs. Genome Biol, . 5, R56[CrossRef][Medline].

    Koch, C., et al. (1993) A role for the transcription factors Mbp1 and Swi4 in progression from G1 to S phase. Science, 261, 1551–1557[Abstract/Free Full Text].

    Koranda, M., et al. (2000) Forkhead-like transcription factors recruit Ndd1 to the chromatin of G2/M-specific promoters. Nature, 406, 94–98[CrossRef][Medline].

    Kumar, R., et al. (2000) Forkhead transcription factors, Fkh1p and Fkh2p, collaborate with Mcm1p to control transcription required for M-phase. Curr. Biol, . 10, 896–906[CrossRef][Web of Science][Medline].

    Lee, T.I., et al. (2002) Transcriptional regulatory networks in Saccharomyces cerevisiae. Science, 298, 799–804[Abstract/Free Full Text].

    Lin, L.H., et al. (2005) Dynamic modeling of cis-regulatory circuits and gene expression prediction via cross-gene identification. BMC Bioinformatics, 6, 258[CrossRef][Medline].

    Manke, T., et al. (2003) Correlating protein–DNA and protein–protein interaction networks. J. Mol. Biol, . 333, 75–85[CrossRef][Web of Science][Medline].

    McBride, H.J., et al. (1999) Distinct regions of the Swi5 and Ace2 transcription factors are required for specific gene activation. J. Biol. Chem, . 274, 21029–21036[Abstract/Free Full Text].

    Mestl, T., et al. (1995) A mathematical framework for describing and analyzing gene regulatory networks. J. Theor. Biol, . 176, 291–300[CrossRef][Web of Science][Medline].

    Olson, K.A., et al. (2000) Two regulators of Ste12p inhibit pheromone-responsive transcription by separate mechanisms. Mol. Cell Biol, . 20, 4199–4209[Abstract/Free Full Text].

    Pic, A., et al. (2000) The forkhead protein Fkh2 is a component of the yeast cell cycle transcription factor SFF. EMBO J, . 19, 3750–3761[CrossRef][Web of Science][Medline].

    Pilpel, Y., et al. (2001) Identifying regulatory networks by combinatorial analysis of promoter elements. Nat. Genet, . 29, 153–159[CrossRef][Web of Science][Medline].

    Primig, M., et al. (1992) Anatomy of a transcription factor important for the start of the cell cycle in Saccharomyces cerevisiae. Nature, 358, 593–597[CrossRef][Medline].

    Simon, I., et al. (2001) Serial regulation of transcriptional regulators in the yeast cell cycle. Cell, 106, 697–708[CrossRef][Web of Science][Medline].

    Spellman, P.T., et al. (1998) Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cererisiae by microarray hybridization. Mol. Biol. Cell, 9, 3273–3297[Abstract/Free Full Text].

    Tegner, J., et al. (2003) Reverse engineering gene networks: integrating genetic perturbations with dynamical modeling. Proc. Natl. Acad. Sci. USA, 100, 5944–5949[Abstract/Free Full Text].

    Tsai, H.K., et al. (2005) Statistical methods for identifying yeast cell cycle transcription factors. Proc. Natl. Acad. Sci. USA, 102, 13532–13537[Abstract/Free Full Text].

    Wagner, A. (1999) Genes regulated cooperatively by one or more transcription factors and their identification in whole eukaryotic genomes. Bioinformatics, 15, 776–784[Abstract/Free Full Text].

    Zhu, G., et al. (2000) Two yeast forkhead genes regulate the cell cycle and pseudohyphal growth. Nature, 406, 90–94[CrossRef][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
Y. Wang, X.-S. Zhang, and Y. Xia
Predicting eukaryotic transcriptional cooperativity by Bayesian network integration of genome-wide data
Nucleic Acids Res., October 1, 2009; 37(18): 5943 - 5958.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
T.A. Knijnenburg, L.F.A. Wessels, and M.J.T. Reinders
Combinatorial influence of environmental parameters on transcription factor activity
Bioinformatics, July 1, 2008; 24(13): i172 - i181.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D. Datta and H. Zhao
Statistical methods to infer cooperative binding among transcription factors in Saccharomyces cerevisiae
Bioinformatics, February 15, 2008; 24(4): 545 - 552.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/18/2276    most recent
btl380v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (9)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Chang, Y.-H.
Right arrow Articles by Chen, B.-S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Chang, Y.-H.
Right arrow Articles by Chen, B.-S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?