Bioinformatics Advance Access originally published online on November 19, 2007
Bioinformatics 2008 24(2):184-191; doi:10.1093/bioinformatics/btm568
Clustering of change patterns using Fourier coefficients
1Department of Statistics, Duksung Women's University and 2Bioinformatics and Biostatistics Laboratory, Seoul National University, Seoul, S. Korea
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: To understand the behavior of genes, it is important to explore how the patterns of gene expression change over a time period because biologically related gene groups can share the same change patterns. Many clustering algorithms have been proposed to group observation data. However, because of the complexity of the underlying functions there have not been many studies on grouping data based on change patterns. In this study, the problem of finding similar change patterns is induced to clustering with the derivative Fourier coefficients. The sample Fourier coefficients not only provide information about the underlying functions, but also reduce the dimension. In addition, as their limiting distribution is a multivariate normal, a model-based clustering method incorporating statistical properties would be appropriate.
Results: This work is aimed at discovering gene groups with similar change patterns that share similar biological properties. We developed a statistical model using derivative Fourier coefficients to identify similar change patterns of gene expression. We used a model-based method to cluster the Fourier series estimation of derivatives. The model-based method is advantageous over other methods in our proposed model because the sample Fourier coefficients asymptotically follow the multivariate normal distribution. Change patterns are automatically estimated with the Fourier representation in our model. Our model was tested in simulations and on real gene data sets. The simulation results showed that the model-based clustering method with the sample Fourier coefficients has a lower clustering error rate than K-means clustering. Even when the number of repeated time points was small, the same results were obtained. We also applied our model to cluster change patterns of yeast cell cycle microarray expression data with alpha-factor synchronization. It showed that, as the method clusters with the probability-neighboring data, the model-based clustering with our proposed model yielded biologically interpretable results. We expect that our proposed Fourier analysis with suitably chosen smoothing parameters could serve as a useful tool in classifying genes and interpreting possible biological change patterns.
Availability: The R program is available upon the request.
Contact: jaehee{at}duksung.ac.kr
Supplementary information: Supplementary data are available at Bioinformatics online.
| 1 INTRODUCTION |
|---|
|
|
|---|
One of the most common approaches in genome science is the analysis of gene expression patterns or change patterns. If the expressions of genes are measured at various time points during the course of an experimental study, each gene can be characterized by its pattern of change. Grouping genes that share similar expression profiles into clusters is usually the first step in understanding the huge amount of DNA microarray data associated with complicated biological networks. However, most research on gene clustering has been performed with the observed expression data, while ignoring the change patterns. The motivation of this research is to derive an efficient and robust statistical method for an area where little research has been done yet the needs from a biological standpoint are numerous.
The researcher is frequently interested in studying gene expression changes along time and evaluating trend differences between the various experimental groups. For example, rather than comparing and grouping genes with the same pattern from corn oil or fish oil diet on colon cancer, and only considering observation values, change values and change patterns need to be investigated. The researcher is then interested in detecting biologically meaningful gene expression trends and in spotting differences between the various experimental groups.
Due to the differences in the initial levels of background noise in the experiment, difference values or derivatives need to be used as a measure of change. Also a basic premise is that the genes sharing similar change profiles may be functionally related or co-regulated. As such, microarray derivative data provide further insight into gene–gene interactions, gene functions and pathways. Derivative functions also provide statistical convenience in that: (1) functions with a constant amount of difference have the same derivatives (2) difference values give information about their changes as well as about their original functions. Nevertheless, few of the previous methods took derivative functions into account.
We propose to use Fourier coefficients in clustering expression patterns and change patterns. Fourier coefficients have several advantages over other methods. Some of these advantages are: (1) the dimension of a data set can be reduced to several Fourier coefficients (2) the estimated Fourier coefficients give information about the underlying function and enable automatic estimation of the change or pattern function (3) the Fourier coefficient estimation does not depend strongly on the covariance structure (4) as the sample Fourier coefficients asymptotically follow the multivariate normal distributions. A Gaussian mixture model that incorporates underlying probability distributions can be effective.
There has been considerable research about discovering patterns using clustering and testing. Serban and Wasserman (2005) proposed clustering after transformation and smoothing as a technique for non-parametrically estimating and clustering a large number of curves. To discover change patterns in gene expressions, Ernst et al. (2005) clustered short time series gene expression data by selecting a set of potential expression profiles. Li and Wong (2002) proposed an effective discretization and gene selection method using the concept of emerging patterns. Park et al. (2003) proposed a method to test the statistical significance of time-dependent gene expression data and to identify genes with significant change based on an ANOVA model. Lai et al. (2004) proposed a method for selecting genes that have differential gene–gene co-expression patterns with the idea of correlation difference.
Model-based clustering is a clustering approach considering probability distribution. Yeung et al. (2001) showed the performance of model-based clustering on several simulated and real gene expression data sets. Murtage and Raftery (1984) successfully applied model-based hierarchical clustering in character recognition problems using a multivariate normal model. Fraley and Raftery (2002) suggested model-based hierarchical agglomerative clustering based on computing an approximate maximum for the classification likelihood.
Smoothing away noise-induced wiggles with Fourier series has been studied by some researchers. Zhang et al. (2003) used the first harmonic of discrete Fourier transform to translate the multi-dimensional time series microarray expression data into a two-dimensional scatter plot. Murthy and Hua (2004) proposed improved Fourier method considering irregular or monotonic component of cell-cycle expression. Kim et al. (2006) suggested a two-step procedure for clustering periodic patterns of gene expression profiles. They used the least squares non-linear curve fitting based on a Fourier series approximation with frequency and amplitude of order one. Though they considered the periodicity and mixture model-based likelihood for the estimated parameters, change patterns of the gene expression were not taken into account.
There has been much research on clustering microarray data, mostly on grouping common expression patterns. However, there are many cases in gene study in which grouping change patterns is of interest. In this research, we propose a new method for clustering change patterns with derivative Fourier coefficients. The proposed method consists of four main steps. The first and second steps consist of representing a gene profile with sample Fourier coefficients, and then the calculation of derivatives from the Fourier coefficients. The third step is to cluster the derivative Fourier coefficients using model-based clustering. In the final step, genes with the same change pattern are clustered and the underlying change pattern is automatically estimated using the Fourier representation.
We demonstrated the usefulness of the Fourier analysis and model-based clustering by applying the method to simulated data. We also extended the application of our model to real gene expression data resulting in interpretable genes.
| 2 THE MODEL |
|---|
|
|
|---|
Consider the data Yiu, uth observation on the ith curve, of the form
|
| (1) |
iu) = 0 and Var(
iu) =
2. In the microarray experiment Yiu is the log gene expression of gene i at time tiu.
We assume that the curve fi belongs to a class of smooth functions F as defined below:
|
| (2) |
|
| (3) |
|
| (4) |
J
m, is a smoothing parameter to be chosen based on the data.
The sample Fourier estimate can be estimated as
|
| (5) |
[0,1].
With regard to changes, the difference data
|
| (6) |
|
| (7) |
|
|
This setup can be extended to the cases where the design or time points are not the same for all curves. We want to classify the same patterns with differences or derivatives that give information about the underlying change pattern.
| 3 TRIGONOMETIC FOURIER SERIES ESTIMATION OF DERIVATIVES |
|---|
|
|
|---|
The function represented with a Fourier series with the cosine bases is given as
|
| (8) |
We can estimate fi with J terms of Fourier coefficients as
|
| (9) |
|
| (10) |
|
| (11) |
The model in (7) can be expressed as
|
| (12) |
ij is a Fourier coefficient of the derivative function, we call
ij the derivative Fourier coefficient and
With the independent
ijs, var(
ij) = 2
2, and
|
|
The parameter J controls the amount of smoothing and should be determined based on the data. Even though the optimal choice for J varies from function to function, we choose to use a single smoothing parameter that operates reasonably well for all of the curves. There has been some research on optimal choices for J. For example, to find global smoothing parameter, Serban and Wasserman (2005) calculated J as the minimizer of the total regret. Eubank and Hart (1992) also suggested choosing the smoothing parameter J minimizing the risk or mean-squared error.
With a large number of gene curves and various functional shapes, a universal rule for an optimal choice for J does not exist. Therefore, instead, we capitalize on the convergence property of Fourier transforms. Since the Fourier estimator converges to the true function, usually the first few Fourier coefficients contribute to the estimation of the whole function. In practice, we can select a smaller J for linear or smooth functions and a larger J for wigglier functions.
| 4 CLUSTERING CURVES OF THE SAME CHANGE |
|---|
|
|
|---|
The similarity of cluster derivatives
After clustering with the estimated Fourier coefficients
s for the original function and with
s for the derivative function, we can estimate the function of each gene with these estimated Fourier coefficients using (4). The change pattern can also be estimated with derivative Fourier coefficients using (11). This automatic estimation is another capability of Fourier representation. These estimated periodic functions show the functional shape and periodicity.
4.1 Mixture model of derivative Fourier coefficients
Clustering using a mixture model assumes that each group of the data is generated by an underlying probability distribution. Suppose that data X1, ... , Xn are multivariate observations.
In a Gaussian mixture model, each group k is modeled by the multivariate normal distribution with parameters
(mean vector) and
(covariance matrix):
|
| (13) |
Geometric features (shape, volume, orientation) of each group k are determined by the covariance matrix
. Banfield and Raftery (1993) proposed a general framework for exploiting the representation of the covariance matrix in terms of its eigenvalue decomposition. Each elliptical model is implemented in Mclust (Fraley and Raftery, 1999).
We consider model-based clustering with the estimated Fourier coefficients of change
. The sample Fourier coefficient
in (5) is a form of weighted average of random variables with variance O(m–1). Freedman and Lane (1980) showed that the empirical distribution of Fourier coefficients is normal. By Central Limit Theorem for independent and identically distributed samples, the sample Fourier coefficient
is asymptotically normally distributed as
. As
and fixed
,
has an asymptotically J-dimensional multivariate normal distribution. Therefore
has an asymptotically multivariate normal distribution as a linear function of
. With this asymptotic property, we can use the Gaussian mixture model for clustering.
Model-based hierarchical agglomerative clustering is an approach to compute an approximate maximum of the classification likelihood,
|
|
4.2 Cluster validity
A major challenge in cluster analysis is the estimation of the optimal number of clusters. To identify the partition of clusters for which a measure of quality is optimal, as a cluster validity technique silhouette method was proposed by Rousseeuw (1987).
The silhouette width for the ith sample in the jth cluster is defined as:
|
|
j. A point is regarded as well clustered if s(i) is large. The overall average silhouette value can be used as an effective validity index for any partition. Kaufman and Rousseeuw (1990) proposed choosing the optimal number of clusters as the value maximizing the average s(i) over the data set. We can consider the overall average silhouette in selecting the number of Fourier coefficients and the optimal number of clusters. A silhouette is generally known to work best with roughly spherical clusters. If the clustering algorithm does not result in this shape of cluster, the overall average silhouette width tends to become very low. Azuaje (2002) studied the assessment of expression cluster validity with 18 measures and remarked that there is no universal validity paradigm to predict consistent results across different clustering techniques. Evaluation of biologically relevant results may support the cluster validity.
| 5 RESULTS |
|---|
|
|
|---|
5.1 Simulated data set
Since real expression data sets are generally noisy and their clusters may not be fully reflective of the class information, we first evaluate the performance of our method with simulated data, where the classes are known.
We simulate data according to the regression model
|
|
|
|
The simulated data consist of 2000 curves originating from nine different functions, 1200 f1s and 100 curves of each f2, ..., f9, to reflect typical gene expression data. There are only five different change patterns f1, f2, f4, f6, f8. We assume that the noise is normally distributed: for low noise 
Unif(0.4,0.7) and for high noise 
Unif(1.0,1.2). m = 5, 10, 20, 30, 50 repeated design points are considered.
Pollard (1982) showed that under weak conditions, as the sample size tends to infinity, the set of cluster centers as the minimizer of the distance of samples in each cluster converges almost surely to the population cluster centers and converges in distribution to the multivariate normal distribution. Since the means with K-means clustering can satisfy this property, we consider K-means clustering in the comparison study.
Let T be a clustering map defined as
|
|
Regarding the estimation error, the clustering estimation error rate
(K) is defined as
|
|
(K) then is the fraction of all pairs that are incorrectly put in separate clusters depending on K clusters, as described in Serban and Wasserman (2005). Table 1 shows the clustering estimation error rates for the model-based method and K-means methods with Fourier coefficients and also with difference data for the number of Fourier terms and the number of repeated design points. The clustering estimation error is smaller in the model-based method with the Fourier coefficients than in the K-means with the Fourier coefficients. Also the clustering estimation error is smaller in the model-based with Fourier coefficients than in the model-based with the difference data. With Fourier coefficients, the clustering estimation error becomes smaller as m becomes larger. The number of clusters is determined to be 5 in accordance with Bayesian Information Criterion (BIC). Once J exceeds 5, the clustering estimation error rate does not change appreciably. Therefore, we suggest using J around 5 for dimension reduction and to perform the biological interpretation. Optimal J values are highlighted in Table 1. Supplementary Table S1 shows the similar result as Table 1 with high noise data. Figure 1 shows the functions grouped in each cluster with J = 2 with low noise simulated data. It shows the true functions of the same derivatives.
|
|
5.2 Yeast cell cycle microarray expression data example
5.2.1 Yeast cell cycle data
We also applied our method to yeast cell cycle data. Cell cycle is important in understanding cell replication, malignancy and reproductive disease that are associated with genomic instability and abnormal cell division. Biologists have been studying the cell cycle with budding yeast Saccharomyces cerevisiae that is a free living, eukaryotic and single cell but highly complex organism.
Spellman et al. (1998) created a comprehensive catalog of yeast genes whose transcript levels vary periodically within the cell cycle. They used DNA microarrays and samples from yeast cultures synchronized by three independent methods:
-factor arrest, elutriation and the arrest of a cdc15 temperature-sensitive mutant.
We applied our method of clustering to yeast cell cycle data downloaded from http://genome-www.stanford.edu/cellcycle/. We used yeast alpha data collected at 18 time points for 120 min during two full cell cycles. After removing genes with the missing values, there were 4489 genes remaining.
5.2.2 Choice of Fourier coefficients and clusters
To determine the J value and the number of clusters, we considered several J values and Bayesian Information Criterion (BIC) with the assumption that each cluster covariance has the same elliptical volume and shape. Since we found that the optimal J value varied for each function, we surmised that a true optimal J value may not exist. As such, we experimented with the model-based clustering using various numbers of clusters and J values.
Table 2 shows the median and average silhouette values with Euclidean distance between samples by model-based and K-means clusterings for various J values in 5 clusters. Although J = 1 yields higher overall silhouette widths using both K-means and model-based clustering, we think a larger number than 1 is appropriate to extract enough information about the underlying change patterns. Judging from the highest overall silhouette value, the model-based with 4 Fourier coefficients and 5 clusters was considered most appropriate. With K-means, silhouette value with J = 1 is the largest. Silhouette value of K-means with J = 3 is larger than that of the model-based clustering with J = 4. Therefore, it should be noted that silhouette values of Euclidean distance between two clustering models may not be the only criterion for model comparison. Rather as in the following gene ontology analysis biological interpretation should be done to validate clustering. However, the model-based method including density connects the probability-neighboring data, while K-means method measures intra-cluster homogeneity as cluster compactness.
|
Using the model-based and J = 4, each partition of 5 clusters has the following number of genes 3032, 401, 164, 400 and 492. Figure 2 shows means, 5% and 95% of Fourier estimated gene scores in 5 clusters with sample derivative Fourier coefficients. The graph in the bottom right-hand corner of Figure 2 shows the estimated change patterns of the 5 clusters altogether. Supplementary Figure S1 shows the means of 4 derivative Fourier coefficients as a cluster profile and gives the variation between clusters. Supplementary Figure S2 shows chisquare plot of each cluster for multivariate normality with a dimension of 4. If the four derivative Fourier coefficients follow a multivariate normal distribution, they would scatter around the line with a slope of 1. Even though they satisfy asymptotic multivariate normality, this assumption can also be checked with chisquare plots. Except for cluster 4, they appear to have a slightly heavier tail than a normal distribution.
|
Supplementary Figure S3 and S4 show plots of
Owing to noise and the high dimensionality of data, careful consideration of statistical and biological validity is needed when analyzing the real microarray data.
5.2.3 Gene ontology analysis
In order to evaluate the result of the clustering analysis, we obtained Gene Ontology (GO) information for the clustered genes biological processes, molecular functions and cellular components. The GO database provides a useful tool to annotate and analyze the functions of a large number of genes. We searched statistically overrepresented GO annotations using GOstat for evaluating statistical significance of overrepresented functional and molecular mechanisms (Beissbarth and Speed, 2004). GOstat allows us to identify which annotations are typical for the group of genes. GOstat simply derives the statistical significance between expected and observed functional categories based on the Fisher's exact test.
In order to compare our method with other clustering methods, we also applied K-means clustering (MacQueen, 1967) to yeast cell cycle data. Table 3 shows some results of the overrepresented biological processes from the proposed method and the K-means clustering method for various values of k from 5–15.
|
In Table 3, the first column shows the cluster number of the proposed method. The second column summarizes the list of the selected overrepresented biological processes that had their children GO terms in the same cluster. For example, we first selected total 81 GO terms in cluster 1 by using GOstat and then selected 6 GO terms that had as many children nodes as possible in cluster 1. In the same way, K-means clustering results were obtained. We compared the list of the overrepresented GO terms from the proposed method (second column) with that from the K-means clustering method. The black dots in Table 3 represented the GO terms that were selected by both methods. In summary, there are some GO terms that can only be detected by the proposed method such as GO:0000209, GO:0000079, GO:0009086 and GO:0005978. In particular, all GO terms in cluster 5 of our proposed method are closely related to biosynthesis. The three GO terms in cluster 5, GO:0009086, GO:0006537 and GO:0005978, are rarely overrepresented by the K-means clustering method. Our proposed method not only found the GO terms that were not identified by the K-means method but also grouped them in the same cluster.
Furthermore, the genes in cluster 5 are closely related to the glucose metabolic pathway. For example, GLC3 (GO:0005978) encodes 1,4-glucan-6(1,4-glucano)-transferase, involved in glycogen accumulation. Glycogen in turn serves as a major storage carbohydrate (glucose) (Rowen et al., 1992). Free glucose is oxidized to pyruvate. The other genes from GO:0006537, GO:0006526 and GO:0009086 are related to the synthesis of amino acids in the citric acid cycle,15ATPs and 3CO2 are produced from one pyruvate molecule. IDP1 (GO:0006526) catalyzes the oxidation of isocitrate to alpha-ketoglutarate (Haselbeck and McAlister-Henn, 1993). GLT1 (GO:0006537) synthesizes glutamate from glutamine and alpha-ketoglutarate (Valenzuela et al., 1998). ARG1, ARG3 and ARG4 in GO:0006526 are involved in the synthesis of alginine from the glutamate (Crabeel et al., 1988; Jauniaux et al., 1978). Oxaloacetate an intermediary in the citric acid cycle, is the entry point for the metabolism of the underlying carbon structure of the amino acids aspartate and asparagine. MET2 (GO:0009086) is involved in the synthesis of methionine from the aspartate (Masselot and De Robichon-Szulmajster, 1975). It catalyzes the conversion of homoserine to O-acetyl homoserine using one molecule of acetyl coenzyme A (acetyl-CoA) (Thomas and Surdin-Kerjan, 1997). These findings illustrate that our proposed methodology can identify genes that are biologically interpretable.
| 6 CONCLUDING REMARKS |
|---|
|
|
|---|
The method proposed in this study provides an efficient tool for clustering curves of the same change pattern by Fourier estimation.
Because Fourier coefficients can give information on both the original underlying functions and their derivatives, we used the sample Fourier coefficients of derivatives to summarize the change patterns. We demonstrated the effectiveness of our approach using model-based clustering of change patterns. Although we assumed that the residuals within each curve over time were independent and had constant variance, due to the large number of repetitions, we found that it is not necessary to assume independence between curves.
There are several areas that deserve further research. Determining the number of Fourier coefficients and selecting the number of clusters are topics that many researchers are actively pursuing. Also, there needs a study to handle the instability of Fourier estimation affected by outliers when only a small number of repeated time points is available. Another topic of future research is to develop validity measures incorporating the probability framework for clusters. When further information about correlations is available, a time series analysis approach would also be an area worthy of consideration.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
We are grateful to Dr Carroll, Dr Hart and Dr Vannucci for their motivation and advice. We also thank the referees for their constructive comments. This work took place during J.K.s visit to the Bioinformatics Training Program at the Department of Statistics at Texas A&M University and her research was supported by the Korea Research Foundation (R04-2004-000-10138-0). The work of H.K. was supported by the National Research Laboratory Program of Korea Science and Engineering Foundation (M10500000126).
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Martin Bishop
Received on March 21, 2007; revised on November 7, 2007; accepted on November 8, 2007
| REFERENCES |
|---|
|
|
|---|
Ajuaje F. A cluster validity framework for genome expression data. Biometrics (2002) 18:319–320.
Banfield JD, Raftery AE. Model-based Gaussian and non-Gaussian clustering. Biometrics (1993) 49:803–821.[CrossRef][Web of Science]
Beissbarth T, Speed TP. GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics (2004) 20:1464–1465.
Beran R, Dumbgen L. Modulation of estimators and confidence Sets. Ann. Stat. (1998) 26:1826–1856.[CrossRef]
Crabeel M, et al. Arginine repression of the Saccharomyces cerevisiae ARG1 gene Comparison of the ARG1 and ARG3 control regions. Curr. Genet. (1988) 3:113–124.
Ernst J, et al. Clustering short time series gene expression data. Bioinformatics (2005) 21:159–168.[CrossRef][Web of Science]
Eubank R, Hart JD. Testing goodness-of-fit via order selection criteria. Ann. Stat. (1992) 20:1412–1425.[CrossRef]
Fraley C, Raftery AE. MCLUST: software for Model-based cluster analysis. J. Classif. (1999) 16:297–306.[CrossRef]
Fraley C, Raftery AE. Model-based clustering, discriminant analysis, and Density Estimation. J. Am. Stat. Assoc. (2002) 97:611–631.[CrossRef][Web of Science]
Freedman D, Lane D. The Empirical distribution of Fourier coefficients. Ann. Stat. (1980) 8:1244–1251.[CrossRef]
Haselbeck RJ, McAlister-Henn L. Function and expression of yeast mitochondrial NAD- and NADP-specific isocitrate dehydrogenases. J. Biol. Chem. (1993) 268:12116–12122.
Jauniaux JC, et al. Arginine metabolism in Saccharomyces cerevisiae: subcellular localization of the enzymes. J. Bacteriol. (1978) 133:1096–1107.
Kaufman L, Rousseeuw PJ. Finding Groups in Data: An Introduction to Cluster Analysis (1990) New York: Wiley.
Kim B, et al. Clustering periodic patterns of gene expression based on Fourier approximations. Curr. Genomics (2006) 7:197–203.[CrossRef]
Lai Y, et al. A statistical method for identifying differential gene-gene co-expression patterns. Bioinformatics (2004) 20:3146–3155.
Li J, Wong L. Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. Bioinformatics (2002) 18:725–734.
Masselot M, De Robichon-Szulmajster H. Methionine biosynthesis in Saccharomyces cerevisiae. I. Genetical analysis of auxotrophic mutants. Mol. Gen. Genet. (1975) 139:121–132.[CrossRef][Web of Science][Medline]
MacQueen JB. Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability (1967) 1. Berkeley: University of California Press. 281–297.
Murtage C, Raftery AE. Fitting straight lines to point patterns. Pattern Recognit. (1984) 17:479–483.[CrossRef][Web of Science]
Murthy K.RK, Hua LJ. Improved Fourier transform method for unsupervised cell-cycle regulated gene prediction. Proc. IEEE Comput. Syst. Bioinform. Conf. (2004) 194–203.
Park T, et al. Statistical tests for identifying differentially expressed gene in time-course microarray experiments. Bioinformatics (2003) 19:694–703.
Pollard D. A central limit theorem for K-means clustering. Ann. Stat. (1982) 10:919–926.
Rousseeuw PJ. Silhouettes: graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. (1987) 20:53–65.[CrossRef]
Rowen DW, et al. GLC3 and GHA1 of Saccharomyces cerevisiae are allelic and encode the glycogen branching enzyme. Mol. Cell Biol. (1992) 12:22–29.
Serban N, Wasserman L. CATS: clustering after transformation and smoothing. J. Am. Stat. Assoc. (2005) 471:990–999.
Spellman PT, et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccaromyces cerevisiae by microarray hybridization. Mol. Biol. Cell (1998) 9:3273–3297.
Thomas D, Surdin-Kerjan Y. Metabolism of sulfur amino acids in Saccharomyces cerevisiae. Microbiol. Mol. Biol. Rev. (1997) 61:503–532.
Valenzuela L, et al. Regulation of expression of GLT1, the gene encoding glutamate synthase in Saccharomyces cerevisiae. J. Bacteriol. (1998) 180:3533–3540.
Yeung KY, et al. Model based clustering and data transformations for gene expression data. Bioinformatics (2001) 17:977–998.
Zhang L, et al. Fourier harmonic approach for visualizing temporal patterns of gene expression data. Proc. IEEE Comput. Syst. Bioinform. Conf. (2003) 2:137–147.
This article has been cited by other articles:
![]() |
T. Zeng and J. Li Maximization of negative correlations in time-course gene expression data for enhancing understanding of molecular pathways Nucleic Acids Res., January 1, 2010; 38(1): e1 - e1. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



