Bioinformatics Advance Access originally published online on October 28, 2004
Bioinformatics 2005 21(7):1062-1068; doi:10.1093/bioinformatics/bti094
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Extracting relations between promoter sequences and their strengths from microarray data
1Graduate School of Information Sciences, Nara Institute of Science and Technology 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
2Computational Biology Research Center, The National Institute of Advanced Industrial Science and Technology Aomi Frontier Building 2-43 Aomi, 17F, Koto-ku, Tokyo 135-0064, Japan
3Department of Computational Biology, Faculty of Frontier Science, The University of Tokyo 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Motivation: The relations between the promoter sequences and their strengths were extensively studied in the 1980s. Although these studies uncovered strong sequence-strength correlations, the cost of their elaborate experimental methods have been too high to be applied to a large number of promoters. On the contrary, a recent increase in the microarray data allows us to compare thousands of gene expressions with their DNA sequences.
Results: We studied the relations between the promoter sequences and their strengths using the Escherichia coli microarray data. We modeled those relations using a simple weight matrix, which was optimized with a novel support vector regression method. It was observed that several non-consensus bases in the 35 and 10 regions of promoter sequences act positively on the promoter strength and that certain consensus bases have a minor effect on the strength. We analyzed outliers for which the observed gene expressions deviate from the promoter strength predictions, and identified several genes with enhanced expressions due to multiple promoters and genes under strong regulation by transcription factors. Our method is applicable to other procaryotes for which both the promoter sequences and the microarray data are available.
Contact: hisano-k{at}is.aist-nara.ac.jp
| 1 INTRODUCTION |
|---|
|
|
|---|
The relations between the promoter sequences and their strengths were extensively studied in the 1980s (Mulligan et al., 1984; Mulligan et al., 1985; Mulligan and McClure, 1986; Kobayashi et al., 1990; Szoke et al., 1987; Ayers et al., 1989; O'Neill, 1989; Stefano and Gralla, 1982; Youderian et al., 1982; Gardella et al., 1989; Burr et al., 2000; Strohl, 1992; Kumar et al., 1993; Straney et al., 1994). Several studies used Escherichia coli promoters corresponding to the
70 subunit of RNA polymerase. In this case, a promoter sequence comprises two separately conserved regions, called 35 region and 10 region. Their consensus sequences are TTGACA and TATAAT, respectively, and they are separated by approximately 17 base pairs (bp) (Siebenlist et al., 1980; Strohl, 1992; Hawley and McClure, 1983; Harley and Reynolds, 1987; Lisser and Margalit, 2000; Mulligan et al., 1985). These studies uncovered strong sequence-strength correlations: promoters closer to the consensus sequence have stronger activities. Promoters with a spacer of length 17 are more active than the promoters with other spacer lengths. Matching the consensus sequence in the 10 region rather than in the 35 regions is much more important for the promoter activity. However, since most of their methods were based on base-wise mutations to a specific promoter, and experimental conditions varied between different experiments, it was difficult to analyze their results statistically. Moreover, the cost of their elaborate methods was too high to be applied to various organisms.
On the contrary, a recent increase in the microarray data provides us with opportunities to compare the strengths of hundreds of promoters under the same experimental conditions for various organisms. In this paper, we propose a statistical learning method to extract the promoter sequence-strength relations from the microarray data and apply our method to E.coli
70 promoters.
| 2 SYSTEMS AND METHODS |
|---|
|
|
|---|
2.1 The model
Weight matrix models of proteinDNA binding are useful in recognizing the protein-binding sites in a given DNA fragment (Stormo, 2000). These models assign partial energies to each base at each position in a putative binding site and define the proteinDNA binding energy as the sum of them. Subsequently, the sites with high binding energies are considered as candidates for the protein-binding sites.
We apply the weight matrix method to recognize the promoter sequences, which are known as the binding sites of the
70 subunit of RNA polymerase. The
70-promoter binding energy E is defined by,
![]() | (1) |
base(k,i) represents the partial energies between base i and the DNA binding region of
70 associated with the k-th promoter site. The values of k = 1 to 6 and k = 7 to 12 correspond to the 35 and 10 regions, respectively. We also consider the spacer length contribution
spacer(l) to the
70-promoter binding energy in order to compare the relative importance of the spacer length variations to the base binding energies.
spacer(l) are interpreted as the stress energies of
70-promoter complex for different spacer lengths. For each promoter site with sequences s1s2 ... s6 and s7 ... s12 for the 35 and 10 regions, respectively, and spacer length n, we associate nbase(k,i) and nspacer(l) defined by
![]() |
i,j is the Kronecker's delta symbol. We limit the range of spacer lengths between 15 and 19 bases, as most of the spacers fall within this range. In the following equations, we denote the right-hand side of Equation (1) as a dot product of the 53-dimensional vectors:
![]() |
![]() |
![]() |
We impose the following constraints on the weight vector W in order to fix the base energy for each position k and the overall energy scale of W, which are not related to the sequence specificity of W,
![]() | (2) |
The method employed to obtain the best estimate of W from the given protein-binding sites is described in Heumann et al. (1994). According to their method, the weight vector W = W0 for promoter recognition is, in our normalization convention, given by
![]() | (3) |
![]() |
|
The solid line in Figure 1 represents the energy distribution in the feature space with respect to W0, which is obtained by generating random feature vectors N and plotting the histogram of energy E = W0 · N. Due to our normalization convention, most of the feature vectors are located around zero energy (Sengupta et al., 2002). As the energy E increases from zero, the number of feature vectors with energy E rapidly decreases. We also generated 10 000 random sequences of 60 bases and found the maximal binding energy available among all possible binding sites for each sequence. The corresponding energy distribution is shown in the broken line with crosses in Figure 1. The energy distribution has moved toward the higher energy direction as compared to the energy distribution in the feature space.
|
Experimentally identified promoters were obtained from the RegulonDB database (Salgado et al., 2004). Among the 656 promoters associated with the
70 transcription factor of RNA polymerase, we obtained 114 promoters by excluding the promoters that were annotated in the EcoCyc database (Karp et al., 2004) to be a member of multiple promoters or regulated by any transcription factors. We identified the exact positions of the 35 and 10 regions as the site with highest energies with respect to W0. The corresponding energy histogram is shown in the dotted line with white boxes in Figure 1. The energies of true promoters are higher than those of random sequences. The mean energy is E = 1.79. The number of sequences with an energy higher than 1.79 is 3.8% of the random sequences.
We now describe the model for the promoter sequence-strength relations. We formulate it as the linear relation of promoter strength z to the binding energy E = W · N,
![]() | (4) |
The rough ideas that lead to Equation (4), are as follows: it is reasonable that the promoter sequences can affect their strengths via the binding energies of
70-promoter interactions. This binding energy is a part of the activation energy of transcription reaction. We assume that the other part of the activation energy is irrelevant to the variety of promoter strengths. According to simple chemical kinetics, the logarithm of chemical reaction rate is linearly dependent on the activation energy. If the primary mRNA degradation processes are insensitive to the types of mRNA sequence, then the abundance of mRNA is proportional to the production rate. Therefore, the logarithm of fluorescent intensities is linearly related to the
70-promoter binding energies.
Of course, transcription is a notoriously complicated cellular process, including multiple steps of conformational transitions of large proteins. However, complicated models display worse performance than simple models when the data quality is low. The above arguments are made only to derive a model that is sufficiently simple to be applied to the statistical method, yet capable of expressing the promoter sequence-strength relations.
2.2 Microarray data
We obtained the E.coli microarray data from the KEGG database (Kanehisa et al., 2004; Mori et al., 2000) containing results of 48 gene depression experiments. For each experiment, the expression levels of all open reading frames of the E.coli genome are measured.
We treated control and target data of each gene depression experiment separately, as we were concerned with the absolute values of the gene expressions. Therefore, the number of gene expression profiles amounted to 96. For each profile, we subtracted the background intensities from the signal intensities. We shifted these values so that their median vanished. Next, we took the logarithm of the data, neglecting the negative-valued data. The histograms of the resulting data acquired similar bell-shaped forms, although their center positions varied irregularly among profiles. We standardized each profile in order to set the mean value and standard deviation to zero and unity, respectively. After these normalizations on each profile, we collected the values associated with each gene from the 96 profiles. We calculated their median as the representative value of intensity.
Figure 2 shows the scatter plot of median and absolute deviation of the normalized fluorescent intensities. The figure shows that genes with higher intensities tend to be expressed stably. The absolute deviation of the entire gene intensities is 0.87; therefore, variations of most genes with positive intensities are smaller than the width of the gene intensities.
|
It may be suspected that these normalized fluorescent intensities do not faithfully represent mRNA concentrations in the cell, due to the sequence dependence of mRNAcDNA hybridization efficiencies. To assess this issue, one of the authors (T. O.) examined a microarray experiment, in which mRNA transcripts are competitively hybridized with DNA fragments, obtained by cutting the E.coli genomes using restriction enzymes. Since the concentrations of these genomic DNA fragments will be constant through all genes in the genome, the relative intensities of mRNA transcripts to those of DNA fragments should faithfully represent the gene expression strengths.
Figure 3 shows the scatter plot of the relative intensities of mRNA to those of the genomic DNA fragments, and the normalized absolute intensities described previously. The figure shows a good correlation between the data with the correlation coefficient 0.7. This indicates that the absolute values of microarray expression have strong correlation with the mRNA concentrations in the cell. Unfortunately, the method of competitive hybridization with genomic DNA fragments has certain limitations; the estimates of the strengths of the weakly expressed genes are inaccurate, because of the hybridization of reverse cDNA strands with mRNA transcripts. The figure shows only the 697 strongly expressed genes that have non-vanishing relative intensities. Hereafter, we identify the normalized intensities with the absolute values of gene expression strengths.
|
We now relate these expression strengths to the promoter strengths. For each promoter sequence, the median of expression strengths of genes transcribed only by the promoter is considered as the promoter strength. The white boxes in Figure 2 show the 114 datasets used as promoter strengths. We used these data to obtain the optimal weight vector W.
There are a number of factors that may alter the mRNA levels from the promoter strengths. Apart from unidentified transcription factors and overlapping transcription units, the mRNA attenuation, the sequence-specific mRNA instability and the operon length may modify the mRNA concentrations. Unfortunately, the annotations on these factors are still very limited in the database and we simply assume that the majority of mRNA levels in our dataset are not affected by these factors.
2.3 Support vector regression
In this section, we describe our regression method to train the weight vector W which represents the promoter sequence-strength correlations. We use a novel kind of support vector regression which yields better correlation between the promoter strengths and W · N than more popular regression methods such as
SVR (Schölkopf and Smola, 2002). It is noted that similar regression problems were considered to search for novel sequence motifs in eukaryotic genomes in Bussemaker et al. (2001) and Conlon et al. (2003). It is also noted that the relatively large number of optimization parameters forbids us to use the simplest least-squares regression, which has no mechanism to avoid the overfitting to the training data.
We express W as a sum of W0 and the residual W1. It is W1 that is actually optimized by our support vector machine algorithm, which is defined by
![]() | (5) |
![]() |
![]() |
![]() |
zc, zi
zc, respectively. z1i is defined by
![]() |
is a positive real parameter that determines the relative scale of the promoter strength to W0 and b0 is defined by the median of the set {(z W0 · N)i}. The reason for the separation of W into W0 and W1 is as follows: since our regression method uses only the promoter sequences existing in the E.coli genome, it occurs that certain promoter positions do not have sufficient base variations to determine the corresponding components of the weight vector W. In such a case, ordinary support vector methods that do not include W0 tend to eliminate these components and to emphasize the components of less-conserved positions. The addition of W0 reduces the risk of neglecting the most-conserved positions such that the weight vector components that are not well-determined are set to the corresponding components of W0. Since W0 is essentially the logarithm of base frequencies, this can be considered as the inclusion of prior knowledge obtained in the 1980s that the more conserved bases act positively on the promoter strengths.
We next describe our regression method. Our regression method is based on the support vector classification (SVC) algorithm (Schölkopf and Smola, 2002). In ordinary support vector classification problems, one divides the training data into two classes and seeks the optimal linear function f(N) = W · N + b such that f(Ni)
1 for any Ni in one class and f(Ni)
1 for the other. Similarly, we divide the promoters into strong and weak promoters according to the threshold value zc, and we seek the linear function that satisfies f(Ni)
zi for strong promoters and f(Ni)
zi for weak ones. This problem can be solved with the same algorithms as used to solve ordinary SVC problems.
We solved Equation (5) using the pr_loqo routine (http://www.kernel-machines.org/), which implements the dual-primal interior point method of quadratic programming and provides extremely accurate optimization results. For each value of parameter
, we performed a 10-fold cross-validation and determined the parameter C. The value of
is determined such that the result yielded the best correlation between the strength z and the energy W · N. It was found that our regression method showed higher correlations (correlation coefficient 0.63) than the
SVR regression method (Schölkopf and Smola, 2002) (correlation coefficient 0.58).
Because of the rather large number of optimization parameters relative to the available dataset, we were unable to have a large test set separate from the training dataset. However, for the optimal
and C, the standard deviation of trained weight vectors W at each validation was 10 times smaller than the scale of components of W. Thus, we are convinced of avoiding overfitting of W to any particular training set.
| 3 RESULTS |
|---|
|
|
|---|
Figure 4 shows the weight vector W0 (top), which is defined in Equation (3), and is essentially the logarithm of base and spacer length frequencies. This figure shows that the conventional consensus sequence (TTGACA, TATAAT, 17 bp) is the collection of the most likely bases at each position and spacer length. Figure 4 also shows the weight vector W (bottom), which is obtained using our support vector regression method, and represents the relations between the promoter sequences and their strengths.
At most positions, the bases cytosine and guanine have inhibitory effects on the promoter activity, except the positive guanine contributions at the positions k = 3 and 4. These positions are also the only positions wherein thymine acts negatively on the strength. It may be noted that the non-consensus bases A at k = 1, 2, 9 and T at k = 6 have positive contributions to the strength comparable to most of the consensus bases, although the consensus promoter is among the strongest promoters. It may also be noted that no significant contributions from the consensus bases C at k = 5 and A at k = 6 are found. The figure also shows the large differences in effect among base kinds that are inhibitory on the promoter strength. For example, cytosine at the position k=9 has a highly adverse effect than guanine at the same position, despite the similar observed frequencies of these bases. The mean contributions of 35 and 10 regions, and the spacer length to the binding energy with respect to the obtained W are given by
![]() | (6) |
![]() | (7) |
70-DNA binding sites from random sequences, the base sequences in the 35 region affect the variety of genomic promoter strengths in magnitude comparable to the 10 region. Figure 5 shows the scatter plot of the promoter strength z and the binding energies W0 · N (top) and W · N (bottom). One can observe a better correlation of the zW · N plot than the zW0 · N plot. The corresponding correlation coefficients are 0.63 and 0.40, respectively. We also numbered three outliers in zW · N plot, and listed the corresponding promotergene pairs and promoter sequences in Table 1.
|
|
The metYp2 promoter (numbered 1 in Fig. 5) is associated with the transcription unit including the gene rbfA. It is known (Nakamura and Mizusawa, 1985) that there is a relatively efficient
-independent terminator upstream of rbfA, which is consistent with the low expression of rbfA despite the almost complete coincidence of metYp2 with the consensus promoter. For the promotergene pair (fimBp1, fimB) (numbered 2), it is known (Schwan et al., 1994) that there is an uncharacterized protein-binding upstream of fimBp1, which may explain the low expression level of fimB. For the pair (hscB, hscA) (numbered 3), another transcription unit including hscA is known (Seaton and Vickery, 1994). Although the corresponding
factor is not annotated in the EcoCyc database, its activity may have a dominant effect on the hscA expression; higher than the predicted strength of the hscB promoter. We now investigate the multiple promoter and transcription factor effects on the gene expressions, which are among the various factors used to alter the gene expression strengths, from the predicted promoter strengths. In Figure 6 we plotted the strengths of the genes transcribed by multiple transcription units, the genes under transcriptional regulation, as well as the data at the bottom of Figure 5. The crosses in Figure 6 show the expression strengths of multiply transcribed genes paired with the maximal predicted strength W· N among the promoters. We plotted only the stably expressed genes (sample size over 80 among the 96 microarray profiles and absolute deviation under 1.0) with no known regulatory sites. One can observe that the expressions of genes pepD, nlpD and ompA (numbered 4, 5 and 6, respectively) are clearly enhanced by the multiple promoter effect.
|
The white and black boxes in Figure 6 show the regulated genes with a single promoter for which all the regulatory sites consist of activators and inhibitors, respectively. As in the case of multiple promoters, we plotted only the genes with stable expressions. We numbered several outliers for which the expression strengths are significantly different from the promoter strengths.
The cpdB promoter is known to be regulated by the cyclic AMPcyclic AMP receptor protein (cAMPCRP) (Liu and Beacham, 1990). Although only positive regulation of cAMPCRP is described in Liu and Beacham (1990) the low expression level of the cpdB may indicate the inhibitory effect of cAMPCRP on this site, since cAMPCRP is known as a dual regulator (Kolb et al., 1999). There are constitutive activators MarA, Rob and SoxS for the promoter inaAp (Martin et al., 1999). Currently, no facts are available to explain the low basal expression level of inaA. For the promoter yihEp, there exists an activator CpxR (Danese and Silhavy, 1997; Pogliano et al., 1997). The high expression levels of rdoA rather than the predicted strength are consistent with the existence of the CpxR binding site. Only activators are known for the promoter sodBp, which contradicts the high expression level of sodB than the predicted strength. However, in Dubrac and Touati (2000) it is described that the mRNA transcripts of sodB undergo the post-transcriptional regulation by the regulator protein Fur that enhances the expression of sodB 7-fold by preventing sodB mRNA from degradation. In the reference, it is also described that the effect of the activators is much smaller than that of Fur. Our result is consistent with these facts.
As can be seen from the figure, even when only activators (inhibitors) are annotated to the promoters, their expressions do not show higher (lower) mRNA levels than the predicted promoter strengths. This may occur if the activities of the annotated regulons are so weak that their effects are buried in the noise of microarray data, or if there are still unidentified factors which have the activities opposite to the annotated ones. These discrepancies from the expectations might indicate that experiments frequently fail to find out all the components participating in the complicated regulatory activities.
| 4 CONCLUSION |
|---|
|
|
|---|
In this paper, we analyzed the relations of promoter sequences to their strengths. We presented a method to extract those relations from microarray data, using a novel kind of regression method. This analysis has been possible due to the availability of abundant microarray data.
It was observed that several non-consensus bases act positively on the promoter strength and that certain consensus bases have a minor effect on the strength. It was also found that certain bases with similar observed frequencies have large differences in the strength of inhibitory activity.
We calculated the individual contributions of the 35, the 10 regions, and the spacer length to the promoter strength, and showed that the base sequences in the 35 region affect the variety of genomic promoter strengths in magnitude comparable to the 10 region, although the 10 region and the spacer length are more important to discriminate
70-DNA binding sites from random sequences.
Our model describes only the simplest promoters whose associated mRNA levels are not modified from the basal promoter strengths. However, once we have optimized the weight vector W using the non-regulated promoters, we can use it to detect promoters under strong regulations by analyzing outliers in the zW · N plot (Fig. 6), for which the observed mRNA levels are significantly different from the predicted promoter strengths. We identified several genes with enhanced expressions by multiple promoters, and genes under strong regulation by transcription factors. This analysis of outliers will be a promising approach for the discovery of genes with strongly modified expressions.
Our method uses only the promoter sequences existing in the genome. Since these promoters have highly biased base frequencies, certain promoter positions do not have sufficient base variations to determine certain components of the weight vector W accurately. Although we reduced the influence of this problem by the normalization convention (Equation 2) to reduce the number of parameters and by the introduction of the prior weight vector W0 (Equation 3), it will be useful to perform microarray experiments for the E.coli strains mutated to have several promoters rarely observed in wild-type strains.
Our method is applicable to other organisms if a collection of promoter sequences and microarray data are available. The present study also implies the rich information contained in the absolute fluorescent intensities of microarray experiments.
| Acknowledgments |
|---|
We are grateful to Prof. Ogasawara for providing helpful comments on the absolute values of fluorescent intensities.
Received on August 5, 2004; revised on October 10, 2004; accepted on October 10, 2004
| REFERENCES |
|---|
|
|
|---|
Ayers, D.G., Auble, D.T., deHaseth, P.L. (1989) Promoter recognition by Escherichia coli RNA polymerase. Role of the spacer DNA in functional complex formation. J. Mol. Biol., 207, 749756[CrossRef][ISI][Medline].
Burr, T., Mitchell, J., Kolb, A., Minchin, S., Busby, S. (2000) DNA sequence elements located immediately upstream of the 10 hexamer in Escherichia coli promoters: a systematic study. Nucleic Acids Res., 28, 18641870
Bussemaker, H.J., Li, H., Siggia, E.D. (2001) Regulatory element detection using correlation with expression. Nat. Genet., 27, 167171[CrossRef][ISI][Medline].
Conlon, E.M., Liu, X.S., Lieb, J.D., Liu, J.S. (2003) Integrating regulatory motif discovery and genome-wide expression analysis. Proc. Natl Acad. Sci. USA, 18, 33393344.
Danese, P.N. and Silhavy, T.J. (1997) The sigma(E) and the Cpx signal transduction systems control the synthesis of periplasmic protein-folding enzymes in Escherichia coli. Genes Dev., 11, 11831193
Dubrac, S. and Touati, D. (2000) Fur positive regulation of iron superoxide dismutase in Escherichia coli: functional analysis of the sodB promoter. J. Bacteriol., 182, 38023808
Gardella, T., Moyle, H., Susskind, M.M. (1989) A mutant Escherichia coli sigma 70 subunit of RNA polymerase with altered promoter specificity. J. Mol. Biol., 206, 579590[CrossRef][ISI][Medline].
Hawley, D.K. and McClure, W.R. (1983) Compilation and analysis of Escherichia coli promoter DNA sequences. Nucleic Acids Res., 11, 22372255
Harley, C.B. and Reynolds, R.P. (1987) Analysis of E.coli promoter sequences. Nucleic Acids Res., 15, 23432361
Heumann, J.M., Lapedes, A.S., Stormo, G.D. (1994) Neural networks for determining protein specificity and multiple alignment of binding sites. Proc. Int. Conf. Intell. Syst. Mol. Biol., 2, 188194[Medline].
Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., Hattori, M. (2004) The KEGG resources for deciphering the genome. Nucleic Acids Res., 32, D277D280
Karp, P.D., Arnaud, M., Collado-Vides, J., Ingraham, J., Paulsen, I.T., Saier, M.H., Jr. (2004) The E.coli EcoCyc database: no longer just a metabolic pathway database. ASM News, 70, 2530.
Kobayashi, M., Nagata, K., Ishihama, A. (1990) Promoter selectivity of Escherichia coli RNA polymerase: effect of base substitutions in the promoter 35 region on promoter strength. Nucleic Acids Res., 18, 73677372
Kolb, A., Busby, S., Buc, H., Garges, S., Adhya, S. (1999) Transcriptional regulation by cAMP and its receptor protein. Annu. Rev. Biochem., 62, 749795[CrossRef].
Kumar, A., Malloch, R.A., Fujita, N., Smillie, D.A., Ishihama, A., Hayward, R.S. (1993) The minus 35-recognition region of Escherichia coli sigma 70 is inessential for initiation of transcription at an "extended minus 10" promoter. J. Mol. Biol., 232, 406418[CrossRef][ISI][Medline].
Lisser, S. and Margalit, H. (2000) Compilation of E.coli mRNA promoter sequences. Nucleic Acids Res., 21, 15071516.
Liu, J. and Beacham, I.R. (1990) Transcription and regulation of the cpdB gene in Escherichia coli K12 and Salmonella typhimurium LT2: evidence for modulation of constitutive promoters by cyclic AMPCRP complex. Mol. Gen. Genet., 222, 161165[CrossRef][ISI][Medline].
Martin, R.G., Gillette, W.K., Rhee, S., Rosner, J.L. (1999) Structural requirements for marbox function in transcriptional activation of mar/sox/rob regulon promoters in Escherichia coli: sequence, orientation and spatial relationship to the core promoter. Mol. Microbiol., 34, 431441[CrossRef][ISI][Medline].
Mori, H., Isono, K., Horiuchi, T., Miki, T. (2000) Functional genomics of Escherichia coli in Japan. Res. Microbiol., 151, 121128[Medline].
Mulligan, M.E. and McClure, W.R. (1986) Analysis of the occurrence of promoter-sites in DNA. Nucleic Acids Res., 14, 109126
Mulligan, M.E., Hawley, D.K., Entriken, R., McClure, W.R. (1984) Escherichia coli promoter sequences predict in vitro RNA polymerase selectivity. Nucleic Acids Res., 12, 789800[ISI][Medline].
Mulligan, M.E., Brosius, J., McClure, W.R. (1985) Characterization in vitro of the effect of spacer length on the activity of Escherichia coli RNA polymerase at the TAC promoter. J. Biol. Chem., 260, 35293538
Nakamura, Y. and Mizusawa, S. (1985) In vivo evidence that the nusA and infB genes of E.coli are part of the same multi-gene operon which encodes at least four proteins. EMBO J., 4, 527532[ISI][Medline].
O'Neill, M.C. (1989) Consensus methods for finding and ranking DNA binding sites. Application to Escherichia coli promoters. J. Mol. Biol., 207, 301310[CrossRef][ISI][Medline].
Pogliano, J., Lynch, A.S., Belin, D., Lin, E.C., Beckwith, J. (1997) Regulation of Escherichia coli cell envelope proteins involved in protein folding and degradation by the Cpx two-component system. Genes Dev., 11, 11691182
Salgado, H., Gama-Castro, S., Martinez-Antonio, A., Diaz-Peredo, E., Sanchez-Solano, F., Peralta-Gil, M., Garcia-Alonso, D., Jimenez-Jacinto, V., Santos-Zavaleta, A., Bonavides-Martinez, C., Collado-Vides, J. (2004) RegulonDB (version 4.0): transcriptional regulation, operon organization and growth conditions in Escherichia coli K-12. Nucleic Acids Res., 32, 303306.
Schölkopf, B. and Smola, A.J. Learning with Kernels, (2002) , Cambridge, MA MIT Press.
Schwan, W.R., Seifert, H.S., Duncan, J.L. (1994) Analysis of the fimB promoter region involved in type 1 pilus phase variation in Escherichia coli. Mol. Gen. Genet., 242, , pp. 623630[CrossRef][ISI][Medline].
Seaton, B.L. and Vickery, L.E. (1994) A gene encoding a DnaK/hsp70 homolog in Escherichia coli. Proc. Natl Acad. Sci., USA, 91, 20662070
Sengupta, A.M., Djordjevic, M., Shraiman, B.I. (2002) Specificity and robustness in transcription control networks. Proc. Natl Acad. Sci. USA, 99, 20722077
Siebenlist, U., Simpson, R.B., Gilbert, W. (1980) E.coli RNA polymerase interacts homologously with two different promoters. Cell, 20, 269281[CrossRef][ISI][Medline].
Stefano, J.E. and Gralla, J.D. (1982) Mutation-induced changes in RNA polymerase-lac ps promoter interactions. J. Biol. Chem., 257, 1392413929
Stormo, G.D. (2000) DNA binding sites: representation and discovery. Bioinformatics, 16, 1623
Straney, R., Krah, R., Menzel, R. (1994) Mutations in the 10 TATAAT sequence of the gyrA promoter affect both promoter strength and sensitivity to DNA supercoiling. J. Bacteriol., 176, 59996006
Strohl, W.R. (1992) Compilation and analysis of DNA sequences associated with apparent streptomycete promoters. Nucleic Acids Res., 20, 961974
Szoke, P.A., Allen, T.L., deHaseth, P.L. (1987) Promoter recognition by Escherichia coli RNA polymerase: effects of base substitutions in the 10 and 35 regions. Biochemistry, 26, 61886194[CrossRef][Medline].
Youderian, P., Bouvier, S., Susskind, M.M. (1982) Sequence determinants of promoter activity. Cell, 30, 843853[CrossRef][ISI][Medline].
This article has been cited by other articles:
![]() |
J. Weindl, P. Hanus, Z. Dawy, J. Zech, J. Hagenauer, and J. C. Mueller Modeling DNA-binding of Escherichia coli {sigma}70 exhibits a characteristic energy landscape around strong promoters Nucleic Acids Res., November 29, 2007; 35(20): 7003 - 7010. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. S. Rani, S. D. Bhavani, and R. S. Bapi Analysis of E.coli promoter recognition problem in dinucleotide feature space Bioinformatics, March 1, 2007; 23(5): 582 - 588. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||























