Bioinformatics Advance Access originally published online on October 27, 2005
Bioinformatics 2006 22(1):68-76; doi:10.1093/bioinformatics/bti742
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Classification using functional data analysis for temporal gene expression data
1Wake Forest University School of Medicine, Public Health Sciences, Section on Biostatistics Medical Center Blvd., MRI-3, Winston-Salem, NC 27157, USA
2Department of Statistics, University of California, One Shields Avenue Davis, CA 95616, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Temporal gene expression profiles provide an important characterization of gene function, as biological systems are predominantly developmental and dynamic. We propose a method of classifying collections of temporal gene expression curves in which individual expression profiles are modeled as independent realizations of a stochastic process. The method uses a recently developed functional logistic regression tool based on functional principal components, aimed at classifying gene expression curves into known gene groups. The number of eigenfunctions in the classifier can be chosen by leave-one-out cross-validation with the aim of minimizing the classification error.
Results: We demonstrate that this methodology provides low-error-rate classification for both yeast cell-cycle gene expression profiles and Dictyostelium cell-type specific gene expression patterns. It also works well in simulations. We compare our functional principal components approach with a B-spline implementation of functional discriminant analysis for the yeast cell-cycle data and simulations. This indicates comparative advantages of our approach which uses fewer eigenfunctions/base functions. The proposed methodology is promising for the analysis of temporal gene expression data and beyond.
Availability: MATLAB programs are available upon request.
Contact: ileng{at}wfubmc.edu
Supplementary information: Supplementary materials are available on the journal's website.
| 1 INTRODUCTION |
|---|
|
|
|---|
Since cDNA and oligonucleotide microarray techniques were developed to monitor the expression of many genes in parallel (Schena et al., 1995, 1996), this high-capacity system has been applied routinely for identifying and analyzing genes involved in various biological processes in different organisms (Spellman et al., 1998; Cho et al., 1998, 2001; Eisen et al., 1998; Wen et al., 1998; Golub et al., 1999; Iyer et al., 1999; White et al., 1999; Hill et al., 2000; Laub et al., 2000; Iranfar et al., 2001; Breyne and Zabeau, 2001). Recently, microarray experiments have been widely used to collect large-scale temporal data to monitor gene expression underlying development or other dynamic processes in many organisms. For example, a precise regulation of gene activity probably controls the molecular processes of DNA replication, chromosome segregation and mitosis during the cell cycle, which makes the study of cell-cycle dependent genome-wide expression an attractive system for genetic analysis. The first genome-wide expression analyses of cell-cycle regulating genes were performed in budding yeast by Spellman et al. (1998). More recently, several other genome-wide expression studies of cell-cycle regulated genes have been completed in bacteria (Laub et al., 2000), fission yeast (Rustici et al., 2004; Peng et al., 2005), plants (Breyne and Zabeau, 2001) and humans (Cho et al., 2001).
Another widely studied organism is the amoeba, Dictyostelium discoideum, which provides opportunities for studying fundamental cellular processes, including aspects of development such as cell-type determination. Recent work on Dictyostelium includes a review by Mohanty and Firtel (1999) on mechanisms controlling spatial patterning and cell-type proportioning, and a study of gene expression patterns with microarrays by Shaulsky and Loomis (2002). Cell-type specific gene expression patterns were studied in Dictyostelium by Iranfar et al. (2001). Other types of gene expression data were generated in large-scale temporal gene expression studies in the mapping of development of the mouse central nervous system (Wen et al., 1998), physiological response of human fibroblasts to serum (Iyer et al., 1999), and development of Caenorhabditis elegans (Hill et al., 2000) and Drosophila (White et al., 1999; Arbeitman et al., 2002). Information gleaned from the analysis of temporal gene expression profiles will provide an added dimension to insights into the characterization of gene function.
For these large-scale data, classifying genes into different functional groups is a first step in order to gain more sophisticated knowledge of different biological pathways and/or functions. Many classification analyses have been performed for such temporal gene expression profiles. Hierarchical clustering (Spellman et al., 1998; Eisen et al., 1998; Wen et al., 1998, Iyer et al., 1999; Gasch et al., 2000; Qin et al., 2003), k-means clustering (Tavazoie et al., 1999; Wu et al., 2003), principal component analysis (PCA) and singular value decomposition (SVD) (Alter et al., 2000, 2003; Raychaudhuri et al., 2000; Li et al., 2002; Holter et al., 2000), self-organizing maps or its variants (Tamayo et al., 1999; Nikkila et al., 2002; Resson et al., 2003), correlation analysis (Kruglyak and Tang, 2001) and independent component analysis (ICA) (Liebermeister, 2002; Lee and Batzoglou, 2003), as well as simulated annealing (Lukashin and Fuchs, 2001), and support vector machines (SVM) (Brown et al., 2000) have been used.
These statistical and computational methods belong to the general framework of multivariate analysis, i.e. data are treated as vectors of discrete samples and permutation of components will not affect the analysis results, hence the timing of the biological processes is irrelevant in these analyses. A more efficient way to look at such data is to incorporate the information that is inherent in time order and smoothness of processes over time. The tools for such an approach are provided by the recently developed methodology of functional data analysis (FDA; Ramsay and Silverman, 2005), especially discrimination and classification methods based on FDA (Hall et al., 2001; James and Hastie, 2001; Müller, 2005), dynamic time warping (Aach and Church, 2001; Liu and Müller, 2003) and periodicity analysis (Zhao et al., 2004). Recent non-parametric applications for the analysis of temporal gene expression data include work by Klevecz and Murray (2001), Luan and Li (2003) and Bar-Joseph et al. (2003). In the latter two papers, B-spline approaches to cluster genes based on mixed effects and mixture models were emphasized.
We view the observed gene expression profiles as independent realizations of a smooth stochastic process. The covariance function of the process is then also smooth and can be expanded into smooth orthogonal eigenfunctions (functional principal components), leading to the Karhunen-Loève representation of the observed sample paths as a sum of a smooth mean trend and an expansion of the random part in terms of these eigenfunctions. A truncated version of the random part of this representation serves as a statistical approximation of the random process (Rice and Silverman, 1991). In this paper, we consider functional discrimination through logistic regression based on functional principal components. We demonstrate the usefulness of this approach in a simulation study and for the analysis of yeast cell-cycle temporal data, as well as for data on the differential expression patterns of Dictyostelium cell-type specific genes. Although our methods do not require a regular time design, these two datasets happen to have equally spaced time points. For more details on FPCA methods for irregular and/or sparse data, see Yao et al. (2005).
An alternative way to model random curves is provided by B-splines, which have been previously used for clustering problems. Rice and Wu (2001) proposed a non-parametric mixed effect model based on B-splines (see also Shi et al., 1996; Luan and Li, 2003), emphasized cluster analysis derived from these approaches. We compared our approach based on functional principal components with a B-spline implementation of functional discriminant analysis for the yeast cell-cycle data and in a simulation study. This indicates comparative advantages of our approach, which unlike the B-spline based model does not rely on Gaussian assumptions.
| 2 MODELS AND METHODS |
|---|
|
|
|---|
2.1 Functional Principal Component Analysis (FPCA)
We model the sample curves as independent realizations of a square integrable stochastic process X(t) on [0, T], with mean E{X(t)} = µ(t) and covariance function cov{X(s), X(t)} = G(s, t) (Rice and Silverman, 1991; Capra and Müller, 1997). By Mercer's Theorem, G(s, t) has an orthogonal expansion in L2([0, T]):
![]() | (1) |
m and
m are eigenfunctions and eigenvalues ordered by size,
1
2
....
A random curve from the population then has the following KarhunenLoève representation:
![]() | (2) |
![]() | (3) |
m) = 0,
and 
m <
. The eigenfunctions
m are referred to as functional principal components (FPCs) with FPC scores
m.
The deviation of each sample function from the mean is thus represented as a sum of orthogonal curves with uncorrelated random coefficients. We shall suppose that the mean curve and the FPCs are smooth and that the random part can be sufficiently well approximated by the first M FPCs, for an M <
; we discuss methods how to choose M data-adaptively in Supplement (S3).
For a sample of n random curves observed on a closed interval [0, T], let Xi = (Xi(ti1), Xi(ti2), ..., Xi(tini))T be the vector of observations for the random curve Xi(·) at time points tij, i = 1, ..., n, j = 1, ..., ni. An estimate
of the mean function µ(t) can be obtained by any linear scatterplot smoother (see Supplement (S1); compare Fan and Gijbels, 1996).
Forming a dense grid sk of [0, T], e.g. sk = (k 1)/(S 1) T, k = 1, ..., S, for a suitable large S, the estimation of the covariance function G(s, t) proceeds via the empirical covariances
![]() | (4) |
l. For the case of irregular time grids, a pre-smoothing step may be included. The empirical covariances are then smoothed, using a 2D scatterplot smoother on the dense grid of points (sk, sl), k, l = 1, ..., S (S1). A spectral analysis is performed on the resulting S x S-matrix
, yielding the first M eigenvectors/eigenvalues for
. The m-th eigenvector is
with the corresponding eigenvalue
for m = 1, ..., M. The FPC scores
im for the i-th gene and the m-th FPC are obtained numerically by
![]() | (5) |
Individual temporal gene expression profiles can then be predicted, using their FPC scores, by
![]() | (6) |
2.2 Functional Logistic Regression
Generalized linear models are extensions of classical linear models with the following three components (McCullagh and Nelder, 1989): a random component where for the responses, Y
exponential family, with means E(Y) = µ; linear predictors,
=
xpßp, where xp is the p-th predictor variable; and a monotone link function, g(µ) =
. When Y is binomial, this is a binomial regression model. A special case is logistic regression where the link function g(·) is the logit function, i.e. logit (x) = log{x/(1 x)}, so that g1(x) = ex/(1 + ex).
In the framework of the classification problem, the response Y denotes membership in one of two groups, say G0 and G1, coded as a binary random variable, where Y = 1 if the observation comes from G1 and Y = 0 if it comes from G0 (Efron et al., 1975; Press and Wilson, 1978). The predictor function X(t), t
[0, T] from now on is assumed to be a centered random curve, i.e. µ(t)
0. For an i.i.d. sample Xi(t), for i = 1, ..., n the linear predictors are defined by
i =
+
ß(t) Xi(t)dt, leading to the functional generalized linear model (James, 2002; Müller and Stadtmüller, 2005):
![]() | (7) |
is a constant and ß(·) is the parameter function. The errors ei are assumed to be independent, E(ei) = 0, var(ei) < C <
. The M-truncated model (see S2) becomes
![]() | (8) |
, ß1, ß2, ..., ßM), the unknown parameter vector, can be estimated by solving the estimating or score equation
![]() | (9) |
For functional binomial regression, as in classical binomial regression for discriminant analysis, set
i = P(Yi = 1) and prior probabilities p1 and p0 for the groups G1 and G0, respectively. We estimate
i by
. Then we classify the i-th observation into G1 if
, otherwise into G0.
| 3 RESULTS |
|---|
|
|
|---|
3.1 Application to temporal gene expression data for yeast cell cycle
3.1.1 Functional discriminant analysis
Temporal gene expression data (
factor synchronized) for the yeast cell cycle were obtained by Spellman et al. (1998). There are 6178 genes in total, and each gene expression time-course consists of 18 data points, measured every 7 min between 0 and 119 min. Of 90 genes, which were identified by traditional methods and have data available, 44 are known to be related to G1 phase regulation and 46 to non-G1 phase regulation (i.e. S, S/G2, G2/M and M/G1 phases) of the yeast cell cycle; these serve as the training set. The expression profiles for these 90 genes are depicted in Figure 1, differentiated into phase-specific groups of gene expression profiles in Figure 2. The estimated covariance surface for these 90 genes in Figure 3 illustrates the pattern of time-dependence of gene expression and provides the basis for constructing the eigenfunctions by spectral decomposition. The diagonal elements
were not used when constructing this surface estimate, as these elements may reflect additional measurement errors.
|
|
|
The number M of FPCs is chosen by minimizing the leave-one-gene-out cross-validation classification error rate. For each gene in the training set, the FPC scores are estimated from the data of the other 89 genes. Then a functional logistic regression model is fitted for these 89 genes, and group membership for the left-out gene is predicted; this procedure is iterated over all 90 genes, providing the cross-validated predictions.
The solid line in Figure 4 displays the cross-validation classification error rate as a function of M, the number of FPCs. This plot indicates that when the first five FPCs are used, the overall cross-validation classification error rate is at a minimum 10.00%, with the misclassification rate for G1 genes estimated at 11.36% and for non-G1 genes at 8.70%.
|
The first five FPCs for the gene expression curves in the training set are depicted in Figure 5. Plotting the FPC scores for the second FPC versus the first FPC reveals interesting patterns for genes of different phases (Figure 6, left panel). We find that although both G1 and S genes tend to have positive second FPC scores, all the S genes have positive first FPC scores, whereas most G1 genes are on the negative side. However, most S/G2, G2/M and M/G1 genes have negative second scores; S/G2 and G2/M genes also tend to have positive first FPC scores. The discrimination between S and G1 genes and also between G1/S and non-G1/S genes based on the first two FPC scores is seen to be relatively clear-cut. Similar plots can be produced for each pair of the first five FPC scores. The pairwise scatterplots also highlight the order of the phases. This feature appears to be most evident in the scatterplot of the third versus second FPC scores (Figure 6, right panel). The clockwise order of the genes is G1
S
S/G2
G2/M
M/G1
G1.
|
|
A closer look at the misclassified genes showed that there were five genes in the G1 group that were misclassified into the non-G1 group. The left panel of Figure 7 displays four of these five genes, overlaid with the trajectories of G1 genes and S genes. It appears that the trajectories of these genes are in fact close to those of the S genes. The right panel in Figure 7 displays the fifth misclassified gene in the G1 group, overlaid with trajectories of G1 genes and M/G1 genes. This gene's trajectory is seen to be close to the M/G1 trajectories. It appears likely that these five genes are actually non-G1 genes, but somehow were erroneously identified as G1 genes using traditional methods. There are four misclassified genes in the non-G1 group (data not shown). Upon close inspection, we find that the trajectories of two of these four genes are closer to those of the G1 group than to those of the non-G1 group. The trajectory of one gene cannot be clearly associated with either group and the fourth gene lies on the boundary between the two groups.
|
3.1.2 Comparison with B-spline based method
Rice and Wu (2001) and Shi et al. (1996) proposed a mixed effects model for unequally sampled noisy curves. Let Xi = (Xi(ti1), Xi(ti2), ..., Xi (tini))T be the vector of observations for the ith curve for i = 1, 2, ..., n, where Xij = Xi(tij) is the observed function value at time tij, j = 1, ..., ni. Note that the setup is the same as described above.
The approximating model of Rice and Wu is
![]() |
and
, Bl(·) are possibly different B-spline bases on [0, T]. The
il are random effects, with E(
il) = 0 and cov(
il) =
. The corresponding estimate of an individual trajectory is the smooth curve
![]() |
i, and the dashed line in Figure 4 shows the overall misclassification error rates. Using B-splines, the misclassification error rate attains its lowest value 11.1% for 11 bases, whereas the minimum error rate using FPCA is 10% and is achieved with only 5 FPCs, as described above. Thus FPCA is seen to be advantageous in this example by employing fewer components than the B-spline approach, while simultaneously yielding a slightly lower misclassification error rate.
3.2 Application to expression patterns of cell-type specific genes in Dictyostelium
Iranfar et al. (2001) studied expression patterns of cell-type specific gene fragments in Dictyostelium discoideum. Such studies are of particular interest for Dictyostelium, since only prestalk and prespore cells are differentiated during development. DNA microarrays carrying 690 targets were used to determine expression profiles during development. Fitting a biologically based kinetic equation to extract the times of transcription onset and cessation, the authors recognized 35 cell-type specific genes, including 17 newly identified ones, which were confirmed by Northern blots. We used these 35 genes, with 14 prestalk genes and 21 prespore genes, as our training set to explore other potential cell-type specific genes. Figure 8 shows the relative intensity of the signals for prestalk and prespore genes. A considerable number of prestalk genes peaked between 8 and 10 h of development and then decreased significantly, whereas most prespore genes were not expressed until 10 h of development and continued to be expressed thereafter.
|
We used these genes as training set for functional discriminant analysis. Cross-validation error rates indicated that using the first three FPCs yields the lowest overall misclassification rate of 22.86%, with 28.57% for prestalk genes and 19.05% for prespore genes. Misclassified prestalk genes are highlighted in the left panel of Figure 9. Besides emcA and emcB, which were screened out by Iranfar et al. (2001) owing to their prespore-like profiles, we found that ostB and mitA might not be correctly classified either. These two genes show no cell-type specific features yet were grouped into prestalk genes. For another gene rasD, mentioned by Iranfar (Iranfar et al., 2001), the estimated probability for classification into prespore genes with 0.4376 only slightly exceeds the threshold of 0.4, which is the prior probability according to the training set. The right panel of Figure 9 highlights the four misclassified prespore-specific genes. Gene cbpB shows an early peak at 8 h, and follows the pattern of prestalk genes. Genes sodA and thfA did not show obvious cell-specific features in their expression patterns. Gene cprF did not start to express until 20 h, which may have contributed to its misclassification.
|
We then used the model fitted to the training set to classify the rest of the genes, and chose ranges of estimated probabilities for a gene to be classified into the prestalk group of [0.95, 1], [0.85, 0.95) and [0.75, 0.85), in order to identify subgroups of prestalk genes. Analogous probability ranges of [0, 0.05], (0.05, 0.15] and (0.15, 0.25] were used to identify subgroups of prespore genes. The results are shown in Figure 10. With these three classification probability ranges, 40 prestalk-specific and 36 prespore-specific genes were identified, displaying reasonably homogeneous patterns within each identified subgroup. Especially the genes in the left upper panel show very typical cell-specific patterns.
|
3.3 A simulation study
3.3.1 Functional discriminant analysis
A data-based simulation study was performed based on the first five estimated FPCs from the yeast cell-cycle data, where we assume that these correspond to the real underlying FPCs. Then five random coefficients
m, m = 1, ..., 5, were generated for each subject from normal distributions with means 0.6, 0.5, 0.4, 0.3, and 0.2 for group 1 and 0.6, 0.5, 0.4, 0.3, and 0.2 for group 2, with variances
, 0.8850, 0.1957, 0.1266 and 0.1079 for both groups. These variances correspond to the estimated eigenvalues from the yeast cell-cycle data. The priors for the two groups were chosen equal, i.e.
1 =
2 =
, so that the generated samples have overall mean 0. For all subjects, 18 equally spaced data points were taken, just as is the case for the yeast cell-cycle data. We generated 100 training and test datasets. Each dataset was composed of 200 samples, where the first 100 samples formed the training set and the remaining 100 samples the test set. For each of the 100 simulated datasets, classification error rates were calculated for the test data based on FPCA and B-spline methods, respectively. The simulation classification error rates based on FPCA and B-splines are compared in Table 1 (Monte Carlo standard errors are in parentheses). The average classification error rate over the 100 datasets indicates that all five FPCs should be used for optimal classification with either method. Except for the case with one eigenfunction/base function, the overall classification error rates based on FPCA are always slightly lower than those observed for B-splines.
|
| 4 DISCUSSION AND CONCLUSIONS |
|---|
|
|
|---|
Owing to the dynamic nature of biological systems, temporal gene-expression data play a critical role in exploring the regulation of gene expression, in particular, in highlighting genes that are time critical for the regulation of certain biological processes such as the cell cycle for different organisms (Spellman et al., 1998; Laub et al., 2000; Breyne and Zabeau, 2001; Cho et al., 2001), the central nervous system development (Wen et al., 1998), Drosophila development (White et al., 1999; Arbeitman et al., 2002) and Dictyostelium cell differentiation (Iranfar et al., 2001). With rapidly accumulating amounts of temporal microarray gene expression data, developing adequate models to analyze such data is urgent. In this paper, we propose a functional discriminant analysis method, using a functional version of logistic regression and functional principal components for the temporal gene expression data.
Temporal gene expression data provide valuable functional information about temporal patterns of gene expression and also interactions between genes. For example, a typical yeast mitotic cell cycle is commonly broken down into the four standard phases: G1, S, G2, and M. When the daughter cell breaks away from the mother cell, it is typically smaller than the mother cell. During the G1 phase, the daughter cell will grow until it is of a large enough size to enter the cell cycle. The G1 phase of the cell cycle is important for determining the fate of the cell. Statistically identifying genes that regulate the G1 phase will be helpful for studies of genetic cell-cycle regulation. Since most biological processes are in fact continuous, temporal gene expression data can be viewed as discretized samples from smooth random gene expression trajectories over time, naturally leading to a functional data analysis approach. The proposed method provides low-error rate (10%) classification for the yeast cell-cycle gene expression data and also in simulations. Differentiating cell-cycle regulated genes from non-cell-cycle regulated genes is another important goal for cell-cycle studies. Extensions of the functional methods proposed here will be of interest in approaching this problem.
In comparisons with the B-spline approach, both yeast cell-cycle data analysis and simulations demonstrate overall lower-error rates with fewer eigenfunctions/base functions when using the FPCA method. The proposed FPCA methods allow the identification of genes that were probably misclassified by traditional biological classification methods. The phase order displayed by scatterplots of pairwise FPC scores suggests that FPCA has potential for time ordination analysis of temporal gene expression.
The NIH has designated Dictyostelium discoideum as a model organism for the functional analysis of sequenced genes. Applying our methods, we screened out previously misclassified cell-type specific genes. Furthermore, we identified 76 genes falling into subgroups that show cell-type specific features of gene expression. Extending the proposed algorithm to functional cluster analysis is feasible and useful in the common situation where group membership is unknown, as is often the case in biological applications.
| Acknowledgments |
|---|
The comments of three reviewers led to numerous improvements and are gratefully acknowledged. Research supported in part by NSF grants DMS03-54448 and DMS05-05537.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: John Quackenbush
Received on September 1, 2005; revised on October 17, 2005; accepted on October 21, 2005
| REFERENCES |
|---|
|
|
|---|
Aach, J. and Church, G.M. (2001) Alignment gene expression time series with time warping algorithms. Bioinformatics, 17, 495508
Alter, O., et al. (2000) Singular value decomposition for genome-wide expression data processing and modelling. Proc. Natl Acad. Sci. USA, 97, 1010110106
Alter, O., et al. (2003) Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms. Proc. Natl Acad. Sci. USA, 100, 33513356
Arbeitman, M.N., et al. (2002) Gene expression during the life cycle of Drosophila melanogaster. Science, 297, 22702275
Bar-Joseph, Z., et al. (2003) Continuous representation of time-series gene expression data. J. Comput. Biol, . 10, 341356[CrossRef][Web of Science][Medline].
Breyne, P. and Zabeau, M. (2001) Genome-wide expression analysis of plant cell cycle modulated genes. Curr. Opin. Plant Biol, . 4, 136142[CrossRef][Web of Science][Medline].
Brown, M.P.S., et al. (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl Acad. Sci. USA, 97, 262267
Capra, W.B. and Müller, H.G. (1997) An accelerated-time model for response curves. J. Am. Statist. Ass, . 92, 7283[CrossRef].
Cho, R.J., et al. (1998) A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell, 2, 6573[CrossRef][Web of Science][Medline].
Cho, R.J., et al. (2001) Transcriptional regulation and function during the human cell cycle. Nat. Genet, . 27, 4854[Web of Science][Medline].
Efron, B. (1975) The efficiency of logistic regression compared to normal discriminant analysis. J. Am. Statist. Ass, . 70, 892898[CrossRef].
Eisen, M.B., et al. (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl Acad. Sci. USA, 95, 1486314868
Fan, J. and Gijbels, I. Local Polynomial Modelling and its Applications, (1996) , New York Chapman & Hall.
Gasch, A.P., et al. (2000) Genomic expression programs in the response of yeast cells to environmental changes. Mol. Bio. Cell, 11, 42414257
Golub, T.R., et al. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531537
Hall, P., et al. (2001) A functional data-analytic approach to signal discrimination. Technometrics, 43, 19.
Hill, A.A., et al. (2000) Genomic analysis of Gene expression in C. elegans. Science, 290, 809812
Holter, N.S., et al. (2000) Fundamental patterns underlying gene expression profiles: simplicity from complexity. Proc. Natl Acad. Sci. USA, 97, 84098414
Iranfar, N., et al. (2001) Expression patterns of cell-type specific genes in Dictyostelium. Mol. Bio. Cell, 12, 25902600
Iyer, V.R., et al. (1999) The transcriptional program in the response of human fibroblasts to serum. Science, 283, 8387
James, G.M. and Hastie, T.J. (2001) Functional linear discriminant analysis for irregular sampled curves. J. R. Statist. Soc. B, 63, 533550[CrossRef].
James, G.M. (2002) Generalized linear models with functional predictors. J. R. Statist. Soc. B, 64, 411432[CrossRef].
Klevecz, R.R. and Murray, D.B. (2001) Genome wide oscillations in expression: wavelet analysis of time series data from yest expression arrays uncovers the dynamic architecture of phenotype. Mol. Biol. Reports, 28, 7382[CrossRef][Web of Science][Medline].
Kruglyak, S. and Tang, H. (2001) A new estimator of significance of correlation in time series data. J. Comput. Biol, . 8, 463470[CrossRef][Web of Science][Medline].
Laub, M.T., et al. (2000) Global analysis of the genetic network controlling a bacterial cell cycle. Science, 290, 21442148
Lee, S.I. and Batzoglou, S. (2003) Application of independent component analysis to microarrays. Genome Biol, . 4, Art. R76.
Li, K.C., et al. (2002) A simple statistical model for depicting the cdc15-synchronized yeast cell-cycle regulated gene expression data. Statistica Sinica, 12, 141158.
Liebermeister, W. (2002) Linear modes of gene expression determined by independent component analysis. Bioinformatics, 18, 5160
Liu, X.L. and Müller, H.G. (2003) Modes and clustering for time-warped gene expression profile data. Bioinformatics, 19, 19371944
Luan, Y.H. and Li, H.Z. (2003) Clustering of temporal gene expression data using a mixed-effects model with B-splines. Bioinformatics, 19, 474482
Lukashin, A.V. and Fuchs, R. (2001) Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters. Bioinformatics, 17, 405414
McCullagh, P. and Nelder, J.A. Generalized Linear Models, (1989) , London Chapman & Hall.
Mohanty, S. and Firtel, R.A. (1999) Control of spatial patterning and cell-type proportioning in Dictyostelium. Semin. Cell Dev. Biol, . 10, 597607[CrossRef][Web of Science][Medline].
Müller, H.G. (2005) Functional modelling and classification of longitudinal data. Scand. J. Stat, . 32, 223240[CrossRef].
Müller, H.G. and Stadtmüller, U. (2005) Generalized functional linear models. Annals Stat, . 33, 774805[CrossRef].
Nikkila, J., et al. (2002) Analysis and visualization of gene expression data using self-organizing maps. Neural Networks, 15, 953966[CrossRef][Web of Science][Medline].
Press, S.J. and Wilson, S. (1978) Choosing between logistic regression and discriminant analysis. J. Am. Statist. Ass, . 73, 699705[CrossRef].
Peng, X., et al. (2005) Identification of cell cycle-regulated genes in fission yeast. Mol. Biol. Cell, 16, 10261042
Qin, J., et al. (2003) Kernel hierarchical gene clustering from microarray expression data. Bioinformatics, 19, 20972104
Ramsay, J.O. and Silverman, B.W. Functional Data Analysis, (2005) , New York Springer.
Raychaudhuri, S., et al. (2000) Principal components analysis to summarize microarray experiments: application to sporulation time series. Pac. Symp. Biocomput, 2000, 455466.
Resson, H., et al. (2003) Clustering gene expression data using adaptive double self-organizing map. Physiol. Genomics, 14, 3546
Rice, J.A. and Silverman, B.W. (1991) Estimating the mean and covariance structure nonparametrically when the data are curves. J. R. Statist. Soc. B, 53, 233243.
Rice, J.A. and Wu, C.O. (2001) Nonparametric mixed effects models for unequally sampled noisy curves. Biometrics, 57, 253259[CrossRef][Web of Science][Medline].
Rustici, G., et al. (2004) Periodic gene expression program of the fission yeast cell cycle. Nat. Genet, . 36, 809817[CrossRef][Web of Science][Medline].
Schena, M., et al. (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science, 270, 467470
Schena, M., et al. (1996) Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. Proc. Natl Acad. Sci. USA, 93, 1061410619
Shaulsky, G. and Loomis, W.F. (2002) Gene expression patterns in Dictyostelium using microarrays. Protist, 153, 938[Medline].
Shi, M.G., et al. (1996) An analysis of paediatric CD4 counts for acquired immune deficiency syndrome using flexible random curves. J. R. Statist. Soc. C, 45, 151163.
Spellman, P.T., et al. (1998) Comprehensive identification of cell cycle-regulated gene of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell, 9, 32733297
Tamayo, P., et al. (1999) Interpreting pattern of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Natl Acad. Sci. USA, 96, 29072912
Tavazoie, S., et al. (1999) Systematic determination of genetic network architecture. Nat. Genet, . 22, 281285[CrossRef][Web of Science][Medline].
Wen, X., et al. (1998) Large-scale temporal gene expression mapping of central nervous system development. Proc. Natl Acad. Sci. USA, 95, 34339.
White, K.P., et al. (1999) Microarray analysis of Drosophila development during metamorphosis. Science, 286, 21792184
Wu, F.X., et al. (2003) A genetic K-means clustering algorithm applied to gene expression data. Lecture in Artificial Intelligence, 2671, 520526.
Yao, F., et al. (2003) Shrinkage estimation for functional principal component scores with application to the population kinetics of plasma folate. Biometrics, 59, 676685[CrossRef][Web of Science][Medline].
Yao, F., et al. (2005) Functional Data Analysis for Sparse Longitudinal Data. J. Am. Statist. Ass, . 100, 577590[CrossRef].
Zhao, X., et al. (2004) The functional data analysis view of longitudinal data. Statistica Sinica, 14, 789808.
This article has been cited by other articles:
![]() |
A. A. Smith, A. Vollrath, C. A. Bradfield, and M. Craven Clustered alignments of gene-expression time series data Bioinformatics, June 15, 2009; 25(12): i119 - i1127. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Tang and H.-G. Muller Time-synchronized clustering of gene expression trajectories Biostat., January 1, 2009; 10(1): 32 - 45. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






















