Bioinformatics Advance Access originally published online on January 19, 2007
Bioinformatics 2007 23(7):842-849; doi:10.1093/bioinformatics/btl667
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A hidden Markov model-based approach for identifying timing differences in gene expression under different experimental factors
1Bioinformatics Center, Kyoto University, Gokasho Uji, 611-0011, Japan and 2Pharmaceutical Research Laboratories, Pharmaceutical Division, Kirin Brewery Co. Ltd, 3 Miyahara, Takasaki, Gunma 370-1295, Japan
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Time series experiments of cDNA microarrays have been commonly used in various biological studies and conducted under a lot of experimental factors. A popular approach of time series microarray analysis is to compare one gene with another in their expression profiles, and clustering expression sequences is a typical example. On the other hand, a practically important issue in gene expression is to identify the general timing difference that is caused by experimental factors. This type of difference can be extracted by comparing a set of time series expression profiles under a factor with those under another factor, and so it would be difficult to tackle this issue by using only a current approach for time series microarray analysis.
Results: We have developed a systematic method to capture the timing difference in gene expression under different experimental factors, based on hidden Markov models. Our model outputs a real-valued vector at each state and has a unique state transition diagram. The parameters of our model are trained from a given set of pairwise (generally multiplewise) expression sequences. We evaluated our model using synthetic as well as real microarray datasets. The results of our experiment indicate that our method worked favourably to identify the timing ordering under different experimental factors, such as that gene expression under heat shock tended to start earlier than that under oxidative stress.
Contact: t-yoneya{at}kirin.co.jp
Supplementary information: Supplementary data are available at Bioinformatics online.
| 1 INTRODUCTION |
|---|
|
|
|---|
Experiments by cDNA microarray, which have been widely used for comprehensive gene expression profiling, can be classified into two types: static and time series. A static experiment is used to measure a snapshot of the expression profile in an experimental condition, while a time series experiment is used to measure a continuous change under an experimental condition. Analyzing time series microarray data is important to understanding the dynamic mechanism of biological phenomena, and, in this paper, we focus on the data from time series microarray experiments. In fact, time series experiments have been used to analyze a variety of biological phenomena, including environmental stresses (Gasch et al., 2000; Tirosh et al., 2006), immune responses (Guillemin et al., 2002), developmental studies (Arbeitman et al., 2002), etc.
A major purpose of conducting time series microarray analysis is to check the genes that are expressed in some expected manner under an experimental condition. Computational approaches for assisting this purpose often attempt to check the similarity/difference in time series expression between genes, and clustering time series sequences is a typical example. A lot of techniques including clustering expression sequences have been used for time series analysis (Bar-Joseph, 2004; Filkov et al., 2002). They contain dynamic time warping (Aach and Church, 2001), singular value decomposition (Alter et al., 2000), ANOVA and related approaches (Park et al., 2003; Storey et al., 2005), hidden Markov models (Costa et al., 2005; Schliep et al., 2003, 2004), kernel-based approaches (Borgwardt et al., 2006), clustering with predefined expression patterns (Ernst et al., 2005; Ernst and Bar-Joseph, 2006), etc. We emphasize that the attention of all these methods is concentrated on genes.
In contrast, our focus is not on genes but on factors in microarray experiments, such as experimental conditions. That is, the purpose of this paper is to find the timing difference in the effects of experimental factors on genes. This purpose is obviously important, since we often have to evaluate the timing difference by experimental factors, to understand their exact effects on genes (Chen et al., 2003). In particular, we address the issue of finding the timing difference made by overall genes rather than by specific ones. In order to examine the effect on overall genes, we'll not check the behavior of each gene, but use a set of time series expression profiles obtained under the same experimental factor. We describe our data usage more concretely by using a sample dataset. Table 1 shows a sample of time series expression sequences under two conditions. We can modify this dataset into a set of pairwise sequences, which is shown in Figure 1. By doing so, time series sequences between two experimental conditions can easily be compared. That is, we can see that a gene under Condition A is always expressed earlier that under Condition B. Thus, the next issue is to build a method to capture this type of rules found in a set of pairwise (generally triplewise or more) time series sequences.
|
|
For this issue, we present a systematic approach based on a hidden Markov model (HMM) (Durbin et al., 1998; Rabiner et al., 1986). Our model has two specific characteristics: First, a real-valued vector is generated at each state of our model. This is because the output/input of our model at each time point is a pair (or more generally multiple) of expression values. Second, the state transition diagram of our model has two types of states: the first state for relatively average expression values and the second for expression values that are different from the average. The parameters of the first state are fixed and not trained. This setting is useful to capture time points with expression values that are very different from the average, because they will be generated at the second states while others are generated at the first states. By using these two characteristics, our approach can identify an expression pattern, like that found in Figure 1. An important feature of our model is that the given pairwise (generally multiplewise) sequences can vary in length, because of the nature of hidden Markov models.
We have conducted a variety of experiments, using both synthetic and real datasets, to evaluate the effectiveness of our approach of finding the timing difference in two or more experimental conditions. The results obtained by synthetic datasets showed that our method could capture an embedded timing difference in a set of pairwise time series sequences. We then checked the difference in time series gene expressions of four different strains under a certain stress, using real microarray datasets. From the comparison of all six combinations of four different strains, our method could find a clear time series ordering in gene expression of four strains. Finally, we compared the effects of four different stresses on gene expression of a certain strain and found a clear timing difference in gene expression caused by two stresses. These results indicate that our method can identify timing differences in gene expression that are caused by different experimental factors.
| 2 METHOD |
|---|
|
|
|---|
2.1 Notations
Let Y be a set of real-valued sequences, and let all sequences in Y have the same length. Let N(Y) be the length of a sequence in Y, and K be the number of sequences in Y. In practice, Y is a set of time series microarray expression sequences, and K is the number of conducted time series microarray experiments. We call Y an example in this paper. Let Y be a set of Ys, and |Y| be the number of Ys in Y. In practice, |Y| is the number of genes for which time series microarray experiments are conducted under K conditions. Figure 1 shows a simple example of Y with | Y| = 3, K = 2, and N(Y) is 5 for all Y
Y. We note that K must be kept the same in Y, but N(Y) not. That is, our model can deal with time series microarray datasets with different time points. In fact, in synthetic datasets of our experiments, we will deal with the case that N(Y) takes a value from 6 to 10. Let N be maxY
Y N(Y) In Y, let yt be the K real-valued expression values at time t, which we call a vector in this paper. Each square in Figure 1 is a vector. Let yt,k be the k-th (expression) value of yt. For example, in the first example of Y in Figure 1, y1, 1 = 1.2 and y2,2 = 1.5. For simplicity, we sometimes just write y for yt. Let q be a state of our model, and let Q = (q1, ..., q|Q|) be a state transition of a given example in our proposed model, which will be described in detail later.
2.2 Definition of the proposed model
The proposed model is a special case of the so-called hidden Markov model (HMM). Our model has two types of probabilities: state transition probability ai, j for a transition from state i to j and continuous value generation probability bi (y) to generate real-valued vector y at state i, which satisfy that
j ai,j = 1 and
.
Let µi and vi be real-valued vectors of size K, and µi,k and vi,k be the k-th values of vectors µi and vi, respectively. Assuming that yt,k (k = 1,K) are independent of each other, the probability bi (yt) is defined as a normal (Gaussian) distribution, which has µi as the average and vi as the variance, as follows:
|
|
Figure 2 shows the state transition diagram of the proposed model. A state q of this model can be classified into two types, which we write F and G, called a feature state and a control state, respectively. The parameters µi and vi at a control state are fixed and not trained, whereas the µj and vj at a feature state are trained. Let M be the number of feature states in a given model.
|
2.3 Likelihood computation
Given an example Y and a state transition Q, we can compute probability P(Y, Q) as follows:
|
|
The log-likelihood of an entire dataset Y is then given as follows:
|
| (1) |
2.4 Parameter estimation
A standard way to estimate the parameters of a probabilistic model is the maximum likelihood, i.e. to maximize the log-likelihood given in Equation (1). A popular approach of the maximum likelihood is a so-called EM (expectation-maximization) algorithm, by which it is guaranteed that we can find a local optimum. To estimate the parameters of our model, we use the EM algorithm, which iterates the following E- and M-steps alternately until some stopping condition is satisfied.
E-step: For each Y
Y, we compute two auxiliary probabilities, which we call forward and backward probabilities. The forward probability
Y(j,t) is the probability that all expression values at time points 1 to t are already generated and the current state is j. Similarly, the backward probability ßY(i,t) is the probability that all expression values at time point t to the last time point of Y are already generated and the current state is i. The forward probabilities are computed recursively by increasing t from zero to the last time point, according to the following equation.
|
|
|
|
M-step: Using the forward and backward probabilities computed in the E-step, we can update the three parameters in our model: ai,j, µi and vi, as follows:
|
|
Initial values of the above iteration in our experiments are set up as follows: we first computed the variance of all expression values in a given dataset and then assigned it as initial values for vi, k at both control states and feature states. On the other hand, as initial values for µi,k, we assigned zero for control states and some fixed large value, which is larger than zero, for each feature state.
2.5 Time and space complexities
Our state transition diagram in Figure 2 shows that the number of outgoing edges at a node is three at maximum. So, the number (space complexity) of ai, j can be almost linear in the number of states. Thus, in each iteration of the EM algorithm, the most time-consuming part is updating µ (or v) in the M-step. When we update each µi,k (i = 1, ..., M,k = 1, ..., K) , we have to sum up
Y(i,t) ßY(i,t) yt,k over all Y
Y and t (= 1, ..., N(Y)). The maximum of t is N, and so the time complexity of our method is O(M · K · |Y| · N). Our model generates a vector at each state, and so the above complexity is larger than that of a usual HMM by K, i.e. the size of a vector.
On the other hand, the space complexity of our method is kept the same as that by a standard HMM for real-valued sequences that have been used in some applications including speech recognition. That is, at first, a is almost linear in the number of states, and all µ, v,
and ß stay at quadratic complexity (We note that we do not have to store
and ß for each Y.). Thus, the space complexity of our method is max {O(M · K), O(M · N)}.
2.6 Why the model works
Our model has two unique characteristics: First, our model generates a real-valued vector at each state, while a usual HMM generates only one real value (or symbol) at a state. Second, the probability b is trained at feature states only. It is not trained at control states where the parameters of b are fixed at some reasonable value like an average over all expression values in the given data.
Due to these two characteristics, our model works as follows: if the expression values at a time point are all rather normal (or average), they should be generated at a control state; otherwise they can be generated at a feature state. More concretely, if one or more expression values in the vector of a time point are very different from the average, this vector can be generated at a feature state. That is, our model attempts to capture a time point with unusual (or unique) values, and these values will appear at feature states.
Figure 3 shows a simple example of the state transition diagram of our model with only two feature states. When we train this model by the data in Figure 1, the shaded areas (in Figure 1) with expression values of zero will be assigned to control states. On the other hand, the white squares in Figure 1 will be assigned to the two feature states, because the values in these squares are much higher than zero. More concretely, the case of a high value at Condition A and zero at Condition B will be assigned to F1, and the reverse case will be assigned to F2. This meets our purpose of finding the difference between a given set of pairwise (generally multiplewise) sequences. Finally, we note that by checking the parameter values of b at feature states of the trained model, we can easily see the difference between a given set of time series sequences.
|
| 3 TIME SERIES DATA |
|---|
|
|
|---|
3.1 Synthetic data
We first evaluated our model using two types of synthetic datasets, which we call SD1 and SD2. Each dataset had 100 examples, and each example had a pair of real-valued sequences, assuming that two time series expression values are measured under two different experimental conditions, which we name Condition A and Condition B for further explanations. Thus, Y| = 100 and K = 2. The length of a sequence ranged from six to ten and was randomly chosen according to the uniform distribution. Of course, the length of two sequences in a pair was kept the same.
A time point of each sequence randomly takes a value from –1 to +1, except some time points that randomly take higher values ranging from +1 to +4. We note that this high value simulates that a gene is highly expressed at this time point. In SD1, only one randomly chosen time point of a sequence takes a high value, and this high value appears in the first half for Condition A and in the last half for Condition B. Figure 4a shows a schematic example of a dataset of SD1. On the other hand, randomly chosen three continuous time points take high values, and they start in the first half in Condition A and in the last half of Condition B. This is also shown in Figure 4b as a schematic example.
|
3.2 Real microarray data
We used GSE3406 [NCBI GEO] (Tirosh et al., 2006) of the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/) (Edgar et al., 2002). This dataset included time series expression values of four different yeast strains measured under four different stresses. Table 2 shows a summary of this dataset. From GSE3406 [NCBI GEO] , we generated datasets with K = 2, each of which was a set of pairwise sequences under two different factors, i.e. stresses or stains, keeping other factors the same. The purpose of this experiment is to find the difference in gene expression under different stresses or strains. We then focused on genes, which are categorized into response to stress in the Gene Ontology (Gene Ontology Consortium., 2006), and selected all of them from the SGD database (Christie et al., 2004). Table 3 shows the list of 56 genes we used. For each pair of stresses or strains, we generated a dataset by using only genes of which at least one expression value is higher than 1.0 in GSE3406 [NCBI GEO] . This means that |Y|, i.e. the size of a dataset, varies.
|
|
| 4 RESULTS |
|---|
|
|
|---|
4.1 Synthetic data
We used M = 2 in this experiment, i.e. the state transition diagram in Figure 3. Table 4a and b show the parameter values obtained by applying our method to the two synthetic datasets, SD1 and SD2, respectively. We note that, in this table, a µ value is bolded if it is higher than that at the other feature state under the same condition. For example, the µ value at state F1 for Condition A in SD1 is bolded, because it was 2.83, which is higher than 0.01, i.e. the µ value at state F2 for Condition A, by more than 1.0. This indicates that this state captured a time point with a high value for Condition A (and a low value for Condition B). From the table, we can see that the µ of F1 was high for Condition A and around zero for Condition B, and became almost zero for Condition A and high for Condition B. This indicates that our model captured the embedded pattern in SD1, i.e. that a high value appears first in Condition A and then in Condition B.
|
These results and observations were basically true of SD2. However, interestingly, the µ of state F1 for Condition A was 3.83 and that of state F2 for Condition B was 3.47, indicating that these values were higher than those in SD1. We think that this is from the following reason. At first, the model used in this experiment has only two feature states, and high values in SD2 appear first in Condition A and then in Condition B, indicating that one of the two feature states can be given to each condition. More concretely, one of the three high continuous values in a sequence of SD2 was assigned to one feature state of our model. Thus, the highest value of the three high values was assigned to a feature state, since the µ of a control state was fixed at around zero (i.e. a very low value). Finally, the µ of SD2 must be higher than that of SD1.
From these results on synthetic data, we can say that our method worked favorably in finding expression values that are different between given two sets of sequences.
4.2 Real microarray data: difference between yeast strains under a stress
Out of the four stresses, we focused on the 37°C heat shock. We first show the result obtained by M = 2, corresponding to the transition diagram in Figure 3.
A same experiment is conducted twice for Saccharomyces paradoxus in GSE3406 [NCBI GEO] , meaning that we can have two datasets, which we call Sp and Sp2, for a set of sequences of S.paradoxus. We first examined the difference in gene expression between Sp and Sp2, which are obtained for the same strain under the same stress, to check the variability/stability of microarray expression values. Table 5 shows the trained parameter values at feature states in this combination. From the table, we can see that the µ of Sp was almost the same as that of Sp2 at F1, whereas they were not always the same at F2. In fact, the µ of Sp was larger than that of Sp2 by around 0.5, indicating that a difference in gene expression of 0.5 can happen even when we compare two datasets obtained by the same experiment. Thus we can say that when we compare two datasets obtained under different conditions, we have to focus on a larger difference, say 1.0 or more. We think that 1.0 would be a natural threshold to judge that a difference happens in gene expression between two different experimental factors (Speed et al., 2003).
|
We then focused on four strains under the 37°C heat shock. Table 6 shows the trained parameters at feature states in all possible six combinations of the four strains. We note that a µ value is bolded if it is higher than that at the other feature state under the same condition by more than around 1.0. In this table, (a) shows the comparison between Saccharomyces cerevisiae (Sc) and S.paradoxus (Sp). The µ for Sp was significantly high at F1 but decreased to a low value at F2, while that for Sc was first low at F1 and then increased to a significantly high value at F2. This indicates that gene expression tended to start earlier in Sp than in Sc under the heat shock stress. We can see similar results in (b) and (e). The results in (c) and (d) were not so clear, but we would be able to say that gene expression in Sk seemed to tend to start earlier than in Sc. This is true of (d). On the other hand, the µ values in (e) imply that there are not such significant timing differences in gene expression between Sp and Sk. We can summarize these results as follows:
- Gene expression tended to start earlier in Sp than in Sc.
- Gene expression tended to start much earlier in Sm than in Sc.
- Gene expression tended to start a little earlier in Sk than in Sc.
- Gene expression tended to start a little earlier in Sm than in Sp.
- No significant timing difference in gene expression was found between Sp and Sk.
- Gene expression tended to start earlier in Sm than in Sk.
Sp
Sk
Sc. We note that this ordering agrees with all the above six observations without any contradictions. We further checked a set of genes that clearly agrees with each tendency of (a) to (f) except (e), in which no significant difference was found. Table 7 shows the lists of these genes. When we selected each of these genes, we first checked the highest value of each of the two given sequences of a gene, and this gene was selected if the following two conditions were satisfied: (1) The two highest values are both larger than 1.0. (2) The two time points that provide these highest values are rightly ordered, like that the time point of Sp is earlier than that of Sc.
|
|
|
We then conducted experiments for M = 3 using the same setting as done for M = 2. The trained parameter values, which are in the supplementary information, show that the results were almost similar to the case of M = 2. Thus, we skip the detailed explanation for M = 3 due to space limitation.
4.3 Real microarray data: difference in gene expressions of S.cerevisiae under different stresses
We then focused on S.cerevisiae to check the difference between four types of stresses, i.e. 37°C heat shock (heat shock), H2 O2 (Oxidative stress), a transfer of medium from glucose to glycerol (Glycerol) and 0.02% MMS (DNA damage). Table 8 shows the trained parameter values at features states of all six combinations of the above four different stresses. As in the case of different strains, this table shows the timing difference in gene expression between two different stresses. From (a), we can see that at F1, the µ was high for Heat shock and low for Oxidative stress, while they were reversed at F2. This result clearly indicates that gene expression under Heat shock tended to start earlier than that under Oxidative stress. A similar type of conversion in the timing of gene expression were slightly shown in (b) and (e), but they were not necessarily significant. Similarly, such a clear conversion was not found in (c), (d) and (f). Figure 6 shows a summary picture computed from the parameter values in Table 8. This picture was drawn in the same manner as done in drawing Figure 5. From the figure, we can see that gene expression under Heat shock tended to start earlier than that under other stresses, Oxidative stress in particular. In other words, gene expression tended to start earlier under Heat shock, later under Oxidative stress and normal under DNA damage and Glycerol. Another finding from this figure is that gene expression under Glycerol is always higher than that under the other three stresses. Overall, this result indicates that our method is useful in understanding the time series ordering in gene expression under different stresses/species.
|
|
We then performed an experiment for the case of M = 3 to check the difference in gene expression of different stresses. The result of M = 3, which are in the supplementary information, was almost the same as that of M = 2, and so we skip the detailed explanation due to space limitation.
| 5 CONCLUSION AND DISCUSSION |
|---|
|
|
|---|
We have developed a systematic approach to capture the differences in microarray time series sequences obtained by different factors, based on the learning of hidden Markov models. We emphasize that our model has the following two unique features. First, a real-valued vector, which corresponds to a set of expression values at a time point, is generated at each state. Second, our model has an unique state transition diagram, which is designed to identify a time point with distinctive expression values from an average value. Based on these two characteristics, our method allows to capture the differences in a given set of time series sequences.
We have conducted a series of experiments to evaluate the effectiveness of our method using synthetic datasets as well as real microarray datasets. In the synthetic datasets, one apparent timing difference was embedded in gene expression as a pattern. The parameters of the trained probabilistic model showed that our method clearly captured the embedded pattern in gene expression. The experiments using real microarray datasets showed that our method could identify the timing differences in gene expression, which are caused by external experimental factors. The typical two results are as follows: (1) Under the heat shock stress, gene expression tended to start in the ordering of in S.mikatae, S.paradoxus, S.kudriavzevii and S.cerevisiae. (2) Gene expression of S.cerevisiae tended to start earlier under heat shock stress than under oxidative stress. We stress that these findings are very useful for biologists who are conducting time series microarray experiments to compare the experimental conditions (or species) like environmental stresses.
In our experiments, the result of M = 3 was almost the same as that of M = 2. This is probably because the number of time points in our datasets is only six. If we use a larger number of time points, a larger M may be more useful to analyze the data. This would be possible future work. Another possible direction in future experiments is to use a larger number of sequences as one example. That is, by increasing K, a larger number of experimental conditions can be combined, and we can see the relations by two or more conditions at once. However, by increasing K, the number of timing differences between triplewise or more sequences will also increase, and they will not be captured by a small number of feature states easily. This problem was avoided by focusing on only a pair of sequences in this paper. Another plausible future direction is to deal with periodic patterns in time series gene expression, which already have often been found in time series microarray data. It would be an interesting research theme to design another state transition diagram that may be more useful in capturing the periodic timing difference in gene expression of different experimental factors.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
The authors would like to thank Ichigaku Takigawa, Raymond Wan, Shanfeng Zhu and Motoki Shiga of Kyoto University, and Takayuki Onuma, Reina Matsumoto, Satoshi Yoshida and Osamu Kobayashi of Kirin Brewery for fruitful discussions and valuable comments.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Satoru Miyano
Received on October 23, 2006; revised on December 1, 2006; accepted on December 28, 2006
| REFERENCES |
|---|
|
|
|---|
Aach J, Church GM. Aligning gene expression time series with time warping algorithm. Bioinformatics (2001) 17:495–508.
Alter O, et al. Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. USA (2000) 97:10101–10106.
Arbeitman MN, et al. Gene expression during the life cycle of Drosophila melanogaster. Science (2002) 297:2270–2275.
Bar-Joseph Z. Analyzing time series gene expression data. Bioinformatics (2004) 20:2493–2503.
Borgwardt K, et al. Class prediction from time series gene expression profiles using dynamic systems kernels. PSB (2006) 11:547–558.
Chen D, et al. Global transcriptional responses of fission yeast to environmental stress. Mol. Biol. Cell (2003) 14:214–229.
Christie KR, et al. Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms. Nucl. Acids Res (2004) 32(Database issue):D311–D314.
Costa IG, et al. The Graphical Query Language: a tool for analysis of gene expression time-courses. Bioinformatics (2005) 21:2544–2545.
Durbin R, et al. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. (1998) Cambridge, UK: Cambridge University Press.
Edgar R, et al. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucl. Acids Res (2002) Jan;30:207–210.
Ernst J, et al. Clustering short time series gene expression data. Bioinformatics (2005) 21:i159–i168.[Abstract]
Ernst J, Bar-Joseph Z. STEM: a tool for the analysis of short time series gene expression data. BMC Bioinformatics (2006) 7:191.[CrossRef][Medline]
Filkov V, et al. Analysis techniques for microarray time-series data. J. Comput. Biol (2002) 9:317–330.[CrossRef][Web of Science][Medline]
Gasch AP, et al. Genomic expression programs in the response of yeast cells to environmental changes. Mol. Biol. Cell (2000) 11:4241–4257.
Gene Ontology Consortium. The Gene Ontology (GO) project in 2006. Nucl. Acids Res (2006) 34(Database issue):D322–D326.
Guillemin K, et al. Cag pathogenicity island-specific responses of gastric epithelial cells to Helicobacter pylori infection. Proc. Natl. Acad. Sci. USA (2002) 99:15136–15141.
Park T, et al. Statistical tests for identifying differentially expressed genes in time-course microarray experiments. Bioinformatics (2003) 19:694–703.
Rabiner LR, Juang B. An introduction to hidden Markov models. ASSP Magazine of the IEEE (1986) 3:4–16.
Schliep A, et al. Using hidden Markov models to analyze gene expression time course data. Bioinformatics (2003) 19:i255–i263.[Abstract]
Schliep A, et al. Robust inference of groups in gene expression time-courses using mixtures of HMMs. Bioinformatics (2004) 20:i283–i289.[Abstract]
Speed T. Statistical Analysis of Gene Expression Microarray Data. (2003) CRC Press.
Storey JD, et al. Significance analysis of time-course microarray experiments. Proc. Natl. Acad. Sci. USA (2005) 102:12837–12842.
Tirosh I, et al. A genetic signature of interspecies variations in gene expression. Nat. Genet (2006) 38:830–834.[CrossRef][Web of Science][Medline]
This article has been cited by other articles:
![]() |
T.-h. Lin, N. Kaminski, and Z. Bar-Joseph Alignment and classification of time series gene expression in clinical studies Bioinformatics, July 1, 2008; 24(13): i147 - i155. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||









