Bioinformatics Advance Access originally published online on November 21, 2006
Bioinformatics 2007 23(2):191-197; doi:10.1093/bioinformatics/btl591
A framework for gene expression analysis
Australian Centre for Plant Functional Genomics, Hartley Grove, PMB 1 Waite Campus, The University of Adelaide Glen Osmond 5064, Australia
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Global gene expression measurements as obtained, for example, in microarray experiments can provide important clues to the underlying transcriptional control mechanisms and network structure of a biological cell. In the absence of a detailed understanding of this gene regulation, current attempts at classification of expression data rely on clustering and pattern recognition techniques employing ad-hoc similarity criteria. To improve this situation, a better understanding of the expected relationships between expression profiles of genes associated by biological function is required.
Results: It is shown that perturbation expansions familiar from biological systems theory make precise predictions for the types of relationships to be expected for expression profiles of biologically associated genes, even if the underlying biological factors responsible for this association are not known. Classification criteria are derived, most of which are not usually employed in clustering algorithms. The approach is illustrated by using the AtGenExpress Arabidopsis thaliana developmental expression map.
Contact: andreas.schreiber{at}adelaide.edu.au
Supplementary information: Supplementary material is available at Bioinformatics online.
| 1 INTRODUCTION |
|---|
|
|
|---|
Microarray experiments have produced a large body of expression data over recent years, data which has been utilized in applications ranging from medical diagnosis (Campbell and Ghazal, 2004) to studies on the evolution of transcriptomes (Khaitovich et al., 2004). By using these data, which provide global snapshots of the transcriptional activity of cells or tissues, systems biologists hope to elucidate interactions between individual genes and delineate whole gene networks (Monk, 2003).
An armada of techniques from the statistical, pattern recognition and machine learning fields has been assembled to aid in the task of classifying genes from expression data, and numerous software packages implementing these tools are now freely available. Methods include, for example, various clustering techniques (Jain and Dubes, 1988; Eisen et al., 1998; Tavazoie et al., 1999), self-organizing maps (Tamayo et al., 1999) and pattern searches using support vector machines (Brown et al., 2000). Nevertheless, a completely satisfactory method of analysis has so far been elusive, exemplified by the fact that usually the final classification depends on the method chosen for the task. Faced with conflicting results it is difficult to choose which classification is the one most appropriately reflecting the underlying biology. Indeed, there are even some doubts that it might not be possible to improve this situation (Quackenbush, 2001), hardly a satisfactory state of affairs.
These ambiguities are closely related to the choice of criterion to use for gene classification. In so-called unsupervised techniques, such as hierarchical clustering, one typically decides on a distance measure or perhaps a pairwise correlation function as a yardstick. These criteria are certainly reasonable: for example, if high positive pairwise correlation of a group of genes is maintained under an increasingly diverse number of conditions (e.g. in different tissues, under a variety of environmental stresses and conditions, etc.), one's confidence that these genes indeed have some functional relation will increase. On the other hand, a transcriptional inhibitor to the above set of genes would not be identified by this criterion because this would tend to have expression that is anti-correlated with their transcription profiles (of course, for this simple example a generalization of the correlation criterion to one selecting both strong positive and negative correlations would be straightforward). Similarly, the standard Euclidean distance between profiles would not associate the set of genes with an inhibitor because the relative distance could be arbitrarily large, with many unrelated genes positioned much closer. With increasing and more realistic complexity of the transcriptional control (e.g. several inhibitors acting independently, etc.) the choice of appropriate criteria becomes more and more complicated and so it is common practice not to stray too far from the simple choices made above. As will be seen below, in general both of the criteria discussed will fail as soon as the transcription is regulated by more than one (independent) transcription factor.
Supervised techniques, such as those based on support vector machines, are a popular alternative to clustering as described above. Here one effectively defines the criterion so that it yields a correct classification of a previously identified cluster, hoping that the same criterion will be valid more generally. This sort of approach relies on the appropriateness of the training data in the same way that unsupervised clustering relies on the correct choice of criterion and therefore equivalent problems arise. These methods too would benefit from a better understanding of what to look for. Indeed, the patterns derived in the next Section may be used as guidance for these supervised techniques.
The ad hoc nature of choice of a criterion can be remedied through the development of a detailed quantitative understanding of gene regulation, which would enable one to make predictions for the types of patterns and correlations, and which should be exhibited in transcriptomic data. The development of mathematical models of gene regulatory circuits and systems is a difficult endeavour with a long history, with much of the initial work concentrating on enzyme-kinetic descriptions of the regulation and interaction of a very small number of genes in simple model systems such as the
-phage (Ptashne, 1992). For reviews of these early efforts see, for example, Bower and Bolouri (2001) and Hasty et al. (2001). Analogous studies of the much more complicated regulation of genes in higher organisms form an important part of the field of computational biology and have become a very active area of research. Current areas of interest include studies of the connectivity and topology of regulatory networks (Schlitt and Brazma, 2005), their description in terms of convenient modular building blocks (Bolouri and Davidson, 2002), as well as efforts at reverse engineering of these networks (Kaern et al., 2003; Chua et al., 2004; Guido et al., 2006). As more data on diverse components (transcriptomic, metabolomic, proteomic) of the intra-cellular networks become available these approaches will become increasingly important, particularly in comprehensive studies of various model organisms. For practical purposes, however, the general application of this approach to the study of gene expression is stymied by the lack of knowledge of virtually all relevant quantities such as enzymatic rate-constants or abundances of transcription and other regulatory factors.
Here we show that, while detailed models inarguably are desirable, our present knowledge of gene regulation does in fact make it possible to already derive criteria, which may be used in guiding the pattern searching techniques mentioned above. Fundamental to this approach is that, by and large, microarray experiments provide comparisons between control and perturbed systems. Typical examples include the comparison of gene expression in cancerous and normal cells, studies of the similarities and differences of the transcriptomes of different species and inquiries into the differential expression of genes in a variety of tissues of the same organism. In this context, the use of the terms perturbed and perturbation is not meant to be taken in a dynamical sense (although it could be, if one is studying the expression profiles in, e.g. a time-series after imposing an external stress on an organism). We merely mean that these perturbed biological systems have undergone changes when compared to the control state. The point is that it is often only the changes in the systems, which are of immediate interest: this can be utilized because analyses of changes in complex non-linear systems are generally more tractable than an analysis of the complex system itself (Khalil, 1996). In the present case we shall show that, subject to some very general assumptions about the underlying transcriptional control mechanisms, quantitative predictions can be made for relations between gene expression profiles if the perturbations are sufficiently small (this is made more precise in the following). These relations can be used to guide the search for suitable clustering criteria.
Indeed, the analysis of perturbations has already been exploited extensively within the context of enzymatic processes, giving rise to biological systems theory and the concept of so-called S-systems, developed particularly by Savageau, Voit and co-workers (Savageau, 1969a, b, 1998; Savageau et al., 1987; Voit, 2000). As transcriptional regulation is at least partially enzymatic (see next section), the S-system approach has also found application to the analysis of gene expression data (Voit and Radivoyevitch, 2000). The key to this approach is the observation that enzymatic reaction rates may frequently be approximated very well, and over a wide range of enzyme concentrations, by powerlaw behaviour. This suggests that Taylor expansions in logarithms of concentrations, truncated at the linear term, may serve as a useful tool. We argue here that, this approach leads to predictions that do not require knowledge of the above-mentioned input parameters and gives rise to characteristic patterns in microarray data, irrespective of the details of the underlying biology. These patterns may serve as a definition for the unknown criteria required by clustering algorithms and may give guidance as to which criterion is appropriate for a particular set of genes. Furthermore, it is likely that analogous techniques may also be useful in analysing other large-scale data sets such as those resulting from Q-PCR, proteomic or metabolomic experiments.
| 2 THE PERTURBATION EXPANSION |
|---|
|
|
|---|
We shall assume that the logarithm Xg of the abundance of mRNA resulting from the transcriptional activity of a particular gene (and, if appropriate, a particular splice-form) g may be expressed as a function of the logarithms of concentrations Zi of transcription factors, enhancers, inhibitors, enzymes involved in splicing control, iRNA concentrations, ribonucleases, etc. through
![]() | (1) |
The change in expression may therefore be written as
![]() | (2) |
Z is sufficiently small, one may cut off a Taylor expansion of Equation (2) at the linear term, i.e.
![]() | (3) |
Z is larger than the convergence radius of the expansion in Equation (3). Genes that fall into these two classes fall outside the scope of the analysis carried out here. It is likely, for example, that differential regulation via methylation, at least within an individual cell, should be grouped into this category because it is essentially discrete: i.e. in this case either
Z
(Z) (where
is the step function), and hence fg is non-analytic at the control conditions, orprobably more correctly
Z is so large that
Xg effectively looks like a step function (i.e.
Z is either larger than the convergence radius of the expansion or, at least, so large that a small number of terms in the expansion in Equation (3) don't suffice). Notwithstanding these continuity and analyticity restrictions, it may be possible that even if they are not met on the level of an individual cell they may be effectively met, in an average sense, for a large number of cells. Alternatively, if they are not met for a particular control state, one may wish to choose another.
On the other hand, no assumption is made about whether or not
Z itself is discrete. This is a situation which would occur, for example, if the copy number of a particular transcription factor is very low; in this case
Z would only change (stochastically) in finite steps. Also there is no assumption being made that in any particular system
Z can actually be reduced arbitrarily close to zero. For this reason it is immaterial whether or not the control system is a biologically realizable system, as long as fg is smooth around the control state. Indeed, below we shall use the average expression across tissues as a control, which is in itself not a real biological state.
Whether or not
Z is small enough so that the contribution from the higher order terms in the expansion can be neglected is an issue that can only be determined experimentally. As argued in the introduction, observed powerlaw behaviour in enzymatic reaction rates provides some evidence that it may often not be a bad approximation, as long as one measures X and Z on a logarithmic scale. Furthermore, as discussed below, there should be telltale signs when higher order terms in Equation (3) become significant. In short, we shall assume in the following that Equation (3) provides a numerically reasonable approximation and shall examine its consequences.
The utility of Equation (3) becomes apparent when considering its implication for microarray data collected for more than one (say, M) perturbed systems. This is the case, for example, for a developmental time series, for a comparison of a set of tissues, or for a series of experiments where the organism in question has been exposed to an assortment of environmental stresses. The key point is that the partial derivatives, while unknown, are evaluated at the control conditions and, therefore, are independent of the perturbation. They merely characterize the ability of the gene to respond to a change in the Zi's. Information about the perturbation enters only through the
Zi s. Hence, in a scatterplot of gene expression levels under a variety of conditions, the vector describing the expression level of gene g (relative to the control) is given by
![]() | (4) |
corresponds to the change in Xg, relative to the control, in the perturbed system j (1
j
M) and the j-th component of
corresponds to the analogous change in the i-th factor Zi (1
i
k).
Equation (4) defines suitable criteria with which to decompose a microarray dataset. Consider the example of a set of genes that are expressed differentially in a number of perturbed systems because they are being controlled by, say, a single differentially expressed transcription factor. This would correspond to k = 1 in Equation (4). In this case, the direction of the relative expression vector
Xg is determined solely by the change in the concentrations of the controlling factor and is independent of the gene g. The magnitude of the expression vector, on the other hand, is proportional to each individual gene's capacity to respond to these changes in concentration (i.e.
fg/
Z1). For this reason in a scatterplot these genes would be part of a feature consisting of a line passing through the origin and pointing in the direction of
Z1. Because the pairwise correlation between two genes is related to the cosine of the angle between their expression vectors, the present analysis therefore provides an understanding of why pairwise correlations are empirically found to be a preferred clustering criterion when used, say, with hierarchical clustering (Eisen et al., 1998).
The existence of these pairwise correlations (or, for that matter, the higher-order correlations discussed below) in no way implies that they will continue to be maintained if further perturbed systems are added to the analysis (e.g. additional tissues to a set of tissues). The only information that has been gained through the existence of a linear feature as discussed above is that the differential regulation within the subset of conditions that have been examined is effectively achieved through the variation of just one factor. As data probing the response of the transcriptome to further conditions is added, k = 1 features in the smaller subspace may simply turn out to be slices through higher dimensional features in the larger subspace. From an operational point of view this makes the perturbative framework discussed here feasible in practice because the absolute (rather than differential) regulation of individual genes in general is known to involve a very large number of different factors; the complete mapping out of these would require extensive datasets indeed. Note also that this implies that the connectivity of individual nodes in gene regulatory networks (which could be inferred from a classification into the k-dimensional features of the perturbative framework) should always be understood as pertaining only to the subset of conditions probed.
Each k > 1 in Equation (4) leads to new criteria, hence pairwise correlations will not be a sufficient criterion for all genes. Consider, for example, k = 2. Genes whose differential expression is controlled by two independent factors have expression vectors,
Xg, composed of linear combinations of the two independent vectors
Z1 and
Z2. These genes may therefore be searched for by algorithms, which are able to detect, in the scatterplots mentioned above, features consisting of 2D planes (spanned by
Z1 and
Z2, which are themselves undetermined) passing through the origin. The expression vectors of two genes contained in a set such as this can have an arbitrary Pearson correlation coefficient and would not be clustered together by any clustering based on pairwise correlations alone. For k > 2 the appropriate features become analogous higher-dimensional generalizations of these planes, i.e. planar k-dimensional subspaces.
This kind of systematic decomposition of a microarray dataset has other attributes, which have some appeal:
- It allows for the classification of genes involved in a number of disparate pathways or processes as expression vectors may be simultaneously categorized into more than one feature such as a line, plane, etc.
- The geometric representation described above clearly delineates the kind of information extractable from a dataset: if the data has been collected for M experimental conditions (plus control) then one is limited to the detection of an arbitrary number of processes involving at most k = M 1 independent controlling factors.
- There should be obvious signals if the truncation of the perturbative expansion in Equation (3) begins to fail. For example, for k = 1 the next order (i.e. quadratic) term in Equation (3) would give rise to curvature of the corresponding linear feature and would at the same time provide some support for the validity of the ideas described here.
- In general the underlying cause of the change of protein concentration
Zi (this being, for example, a transcription factor) will be quite complicated, however a particularly simple scenario would occur if this change is itself a result of perturbatively altered transcriptional activity of the gene coding for this factor, i.e.
Zi
Xi. In this case the expression vector for the controlling gene (i.e. the transcription factor) will itself be part of the feature, providing justification for searching the features predicted here for relevant transcription factors etc.
| 3 RESULTS |
|---|
|
|
|---|
With the perturbative framework outlined above providing a sound theoretical basis for expression pattern analyses, its practical applicabilityin particular the validity of the truncation of the perturbative expansion in Equation (3)remains to be explored. Also, even if the change from the control to the perturbed system is small enough for this truncation to be valid, it could well be that underlying noise of either biological or technical origin is sufficient to mask the predicted features. For this reason we have searched publicly available microarray datasets for evidence of these patterns. While higher-dimensional features (k > 2) must be searched for with appropriate algorithms, lines (k = 1) and 2D planes (k = 2) can simply be identified by visual inspection of suitable scatterplots. Evidence for the existence of these is described below.
3.1 Application to Arabidopsis thaliana dataset
Here we show results obtained from a developmental time and tissue series for Arabidopsis thaliana (Schmid et al., 2005). Using data from a tissue series such as this has the disadvantage that one does not have as much control over the size of the
Zi's that one would have, say, in a stress series (e.g. a series of different imposed stresses, a time series after the imposition of a single stress, a series of measurements of the response to graduated abiotic stress or perhaps a combination of these). On the other hand, the Arabidopsis developmental series provides a large amount of high quality data (of large dimensionality), and is therefore suitable for our present purpose. Moreover, some of the tissues contained in this dataset are very closely related, so one would hope that for these at the very least one is within the perturbative regime where the linear terms in the expansion in Equation (4) provide a reasonably accurate numerical approximation. Details concerning our use of this dataset, including a cut to eliminate genes that are turned off in the examined tissues, may be found in the online Supplementary information.
Figure 1 shows an example of a scatterplot of expression (relative to control) in Arabidopsis root tissue against cotyledon, where we have used the average expression over all tissues as control.
|
Several linear protrusions emanating from the origin are clearly visible, consistent with the expectations for the k = 1 features discussed above. In Figure 2 we show another example, this time using expression in mature pollen and stamens. Here the linear feature shows distinct curvature, as would be expected from higher order terms in the expansion in Equation (3). While this curvature could in general also be due to a saturation effect in one of the two tissues, there is evidence that for the data shown in Figure 2 (see the online Supplementary information) this is not the case. Hence the existence of this curvature lends support to the perturbative interpretation being advocated here. While it is not our purpose to perform a systematic study of the Arabidopsis dataset, examinations of large numbers of similar scatterplots, both within the Arabidopsis dataset as well as those for other species for which data is publicly available, indicate that these sort of linear features are quite common, with non-zero curvature being observable in some. Two further plots, using available developmental tissue series for barley (Druka et al., 2006) and mice (Su et al., 2004), respectively, are included as Supplementary Figures S1 and S2.
|
Planar structures (k = 2 features), as predicted by Equation (4), may be observed in the Arabidopsis dataset by examining three rather than two tissues simultaneously. At times, in particular if the vast majority of probed genes can be classified into one of these structures, this may be for a trivial reason. In particular, irrespective of the mechanism underlying the transcriptional control, a plot involving two very similar tissues against one quite disparate may give rise to this sort of genome-wide pattern. While many examples such as this can be found in the dataset, there is also some evidence for planar correlations whose origin appears to be more subtle. An illustrative example is shown in Figures 3 and 4, where the expression in three stages of floral tissue is shown, the control being the average expression over all the 13 floral tissues contained in the dataset. For convenience we plot only those genes, which show strong differential expression (|
Xg|>3) in the three tissues. A number of 2D planes identified by visual inspection have been colored to guide the eye in Figure 3 and can also be detected as linear features in the polar plot of the same data in Figure 4. Alternatively, in the on-line Web supplement, we provide an animated version of the 3D plot (see Supplementary Figure S3; in addition, a similar planar feature observable in the barley dataset of Druka et al., 2006, is shown in Supplementary Figure S4).
|
|
While the mere existence of the features shown in Figures (14) lends support to the applicability of the perturbative framework discussed above one would, in addition, expect to see meaningful correlation with biological function of the genes contained therein. This appears to be the case. For example, not surprisingly, the identified linear feature in Figure 2 contains an abundance of genes that are known to be expressed predominantly in pollen tissue. In Figures 3 and 4 (and Supplementary Figure S3) on the other hand, the annotation of the genes contained in the planar feature highlighted in green indicates association with a heat-shock response. Because genes involved in the heat-shock response within Arabidopsis thaliana have recently been studied in some detail (Busch et al., 2005), a comparison with the genes in this planar feature is possible.
3.2 Heat-shock response in Arabidopsis thaliana leaves
Busch et al. (2005) studied the heat-shock response within Arabidopsis thaliana leaf tissue by identifying genes regulated by the known heat-shock transcription factors Hsf1A and Hsf1B. They found, using the ATH1 microarray, a set of 11 genes that were differentially expressed by a factor >2 between wild-type and double-knockout mutants under control conditions. Furthermore, they identified a total of 112 genes as being controlled by HsfA1a and HsfA1b under heat-shock conditions. Therefore, a total of 125 genes implicated in some way in heat shock response in leaf tissue (i.e. the two genes coding for HsfA1a & HsfA1b, the 11 differentially expressed in mutants and the 112 differentially expressed under heat-stress) have been identified in Busch et al. (2005).
While data for each of these 125 genes is available in the developmental tissue series of Schmid et al. because the ATH1 chip was used in both studies, most do not survive the stringent cuts imposed here: only 14 of these genes (6 out of 11 of the ones identified in the knock-outs and 8 out of 112 identified under heat-stress) are not turned off in any of the three tissues (for details, see the online material) and simultaneously satisfy |
Xg| > 3. These 14 remaining genes are highlighted in Figure 4 and, as can be seen, 11 of them (all 6 identified in the knock-outs as well as 5 of the 8 identified in the heat-stress experiments) are associated with the planar feature coloured green in Figures 3 and 4 (and Supplementary Figure S3). This remarkable concordance of gene-sets suggests that the regulation of the heat-stress related genes identified by Busch et al. is amenable to the type of perturbative analysis discussed here. This conclusion is given added strength by the results shown in Figure 5 (see also the animated version Supplementary Figure S5), where the planar feature is shown alongside all 125 genes identified by Busch et al. (2005) (i.e. regardless of whether they survive the above cuts). Virtually all appear to lie on the same plane (while the complete dataset does not; see Figures 3 and 4 and Supplementary Figure S3).
|
| 4 DISCUSSION |
|---|
|
|
|---|
The linear and planar features visible in the Arabidopsis thaliana dataset discussed in the previous section provide support that the patterns expected from the perturbation expansion derived in Section 2 are observable in practice. The detection of higher-dimensional patterns, on the other hand, is only possible with suitable algorithms, which go beyond the primitive exploratory pattern recognition attempted here. Equation (4) is, of course, simply a linear decomposition and an abundance of powerful techniques employing potentially useful linear decompositions are well established in the literature. The most well known among these are the singular value decomposition (Alter et al., 2000; Holter et al., 2000) and, more generally, factor analysis (Harman, 1967). By construction, the resulting principal components or factors are orthogonal, making these methods unsuitable for implementing the decomposition in Equation (4). In independent component analysis, which is increasingly used in the analysis of microarray data (Hyvärinen, 1999; Roberts and Everson, 2001; Liebermeister, 2002; Lee and Batzoglou, 2003; Kreil and MacKay, 2003), the requirement of orthogonality of underlying factors is replaced by one of statistical independence (Hyvärinen, 1999), which may be more suitable in the present context. However, of most direct interest are related matrix decompositions such as sparse matrix factorization (Dueck et al., 2005) and, in particular, network component analysis (Liao et al., 2003), these techniques actually being directly based on the underlying biological assumption that gene transcription is regulated by a relatively small number of biological factors. Whatever the method chosen to implement the classification, it will require careful attention to the statistical significance of the results. This becomes an important problem particularly if the number of genes contained within a pattern becomes small and/or if noise becomes an issue.
The present work, by making use of reasonable biological assumptions about the nature of transcriptional regulatory mechanisms, elucidates the connection between the above matrix methods and the perturbative expansions which also underpin biological systems theory (Savageau, 1969a, b; 1998; Savageau et al., 1987; Voit, 2000). In addition, the fact that the Arabidopsis dataset appears to show at least some of the features predicted by this approach provides experimental justification for the use of these matrix methods and elucidates the choice of variables for which they are appropriate. The assumption and restrictions discussed, in the derivation of Equation (4), shed light on the circumstances under which these analysis methods are appropriate and the implications if they should fail. The ability to do this highlights the advantages of searches for patterns one expects to be present, and which have a biological interpretation, as opposed to classifications based on ad hoc statistical criteria, which may or may not be relevant to the task at hand.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Martin Bishop
Received on December 14, 2005; revised on October 18, 2006; accepted on November 17, 2006
| REFERENCES |
|---|
|
|
|---|
Alter, O., et al. (2000) Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl Acad. Sci. USA, 97, 1010110106
Attwood, J.T., et al. (2002) DNA methylation and regulation of gene transcription. Cell. Mol. Life Sci, . 59, 241257[CrossRef][Web of Science][Medline].
Bolouri, H. and Davidson, E.H. (2002) Modeling transcriptional regulatory networks. Bioessays, 24, 11181129[CrossRef][Web of Science][Medline].
Bower, J.M. and Bolouri, H. Computational Modeling of Genetic and Biochemical Networks, (2001) , Cambridge, MA MIT Press.
Brown, M.P.S., et al. (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl Acad. Sci. USA, 97, 262267
Busch, W., et al. (2005) Identification of novel heat-shock factor-dependent genes and biochemical pathways in Arabidopsis thaliana. Plant J, . 41, 114[CrossRef][Web of Science][Medline].
Campbell, C. and Ghazal, P. (2004) Molecular signatures for diagnosis of infection: application of microarray technology. J. Appl. Microbiol, . 96, 1823[CrossRef][Medline].
Chua, G., et al. (2004) Transcriptional networks: reverse-engineering gene regulation on a global scale. Curr. Opin. Microbiol, . 7, 638646[CrossRef][Web of Science][Medline].
Dueck, D., et al. (2005) Multi-way clustering of microarray data using probabilistic sparse matrix factorization. Bioinformatics, 21, (Suppl. 1), i144i151[Abstract].
Druka, A., et al. (2006) An atlas of gene expression from seed to seed through barley development. Funct. Int. Genomics, 6, 202211.
Eisen, M.B., et al. (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl Acad. Sci. USA, 95, 1486314868
Geiman, T.M. and Robertson, K.D. (2002) Chromatin remodeling, histone modifications, and DNA methylationHow does it all fit together? J. Cell. Biochem, . 87, 117125[CrossRef][Web of Science][Medline].
Guido, N.J., et al. (2006) A bottom-up approach to gene regulation. Nature, 439, 856860[CrossRef][Medline].
Harman, H.H. Modern Factor Analysis, (1967) 2nd edn , Chicago University of Chicago Press.
Hasty, J., et al. (2001) Computational studies of gene regulatory networks: In Numero Molecular Biology. Nat. Rev. Genet, . 2, 268279[Web of Science][Medline].
Holter, N.S., et al. (2000) Fundamental patterns underlying gene expression profiles: simplicity from complexity. Proc. Natl Acad. Sci. USA, 97, 84098414
Hyvärinen, A. (1999) Survey on independent component analysis. Neural Comp. Surv, . 2, 94128.
Jain, A.K. and Dubes, R.C. Algorithms for Clustering Data, (1988) , Engelwood Cliffs, New York Prentice Hall.
Kaern, M., Blake, W.J., Collins, J.J. (2003) The engineering of gene regulatory networks. Ann. Rev. Biomed. Eng, . 5, 179206[CrossRef][Web of Science][Medline].
Khaitovich, P., et al. (2004) A neutral model of transcriptome evolution. PLoS Biol, . 2, 06820689.
Khalil, H.K. Nonlinear Systems, (1996) , Upper Saddle River, NJ Prentice Hall.
Kreil, D.P. and MacKay, D.J.C. (2003) Reproducibility assessment of independent component analysis of expression ratios from DNA microarrays. Comp. Funct. Genom, . 4, 300317[CrossRef].
Lee, S.-I. and Batzoglou, S. (2003) Application of independent component analysis to microarrays. Genome Biol, . 4, R76[CrossRef][Medline].
Liao, J.C., et al. (2003) Network component analysis: reconstruction of regulatory signals in biological systems. Proc. Natl Acad. Sci. USA, 100, 1552215527
Liebermeister, W. (2002) Linear modes of gene expression determined by independent component analysis. Bioinformatics, 18, 5160
Monk, N.A. (2003) Unravelling nature's networks. Biochem. Soc. Trans, . 31, 14571461[Web of Science][Medline].
Ptashne, M. A Genetic Switch, (1992) , Oxford, UK Cell Press & Blackwell Scientific Publications.
Quackenbush, J. (2001) Computational analysis of microarray data. Nat. Rev. Genet, . 2, 418427[CrossRef][Web of Science][Medline].
Roberts, S. and Everson, R. Independent Component Analysis: Principles and Practice, (2001) , Cambridge Cambridge University Press.
Savageau, M.A. (1969a) Biochemical Systems analysis I: some mathematical properties of the rate law for the component enzymatic reactions. J. Theor. Biol, . 25, 365369[CrossRef][Web of Science][Medline].
Savageau, M.A. (1969b) Biochemical systems analysis II: the steady-state solutions for an n-pool system using a power-law approximation. J. Theor. Biol, . 25, 370379[CrossRef][Web of Science][Medline].
Savageau, M.A., et al. (1987) Recasting nonlinear differential equations as S-systems: a canonical nonlinear form. Mol. Biol. Cell, 87, 83115.
Savageau, M.A. (1998) Rules for the evolution of gene circuitry. Pac. Symp. Biocomput, . 3, 5465.
Schlitt, T. and Brazma, A. (2005) Modelling gene networks at different organizational levels. FEBS Lett, . 579, 18591866[CrossRef][Web of Science][Medline].
Schmid, M., et al. (2005) A gene expression map of Arabidopsis thaliana development. Nat. Genet, . 37, 501506[CrossRef][Web of Science][Medline].
Su, A.I., et al. (2004) A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl. Acad. Sci. USA, 101, 60626067
Tamayo, P., et al. (1999) Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. USA, 96, 29072912
Tavazoie, S., et al. (1999) Systematic determination of genetic network architecture. Nat. Genet, . 22, 281285[CrossRef][Web of Science][Medline].
Van Driel, R., et al. (2003) The eukaryotic genome: a system regulated at different hierarchical levels. J. Cell. Sci, . 116, 40674075
Voit, E.O. Computing Analysis Of Biochemical Systems: A Practical Guide for Biochemists and Molecular Biologists, (2000) , Cambridge, UK Cambridge University Press.
Voit, E.O. and Radivoyevitch, T. (2000) Biochemical systems analysis of genome-wide expression data. Bioinformatics, 16, 10231037
This article has been cited by other articles:
![]() |
Z. Khan, J. S. Bloom, B. A. Garcia, M. Singh, and L. Kruglyak Protein quantification across hundreds of experimental conditions PNAS, September 15, 2009; 106(37): 15544 - 15548. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Hoffmann and J. Stoye ChromA: signal-based retention time alignment for chromatography-mass spectrometry data Bioinformatics, August 15, 2009; 25(16): 2080 - 2081. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||








is the corresponding azimuthal angle (see inset). A plane cutting through the origin in 

