Bioinformatics Advance Access originally published online on October 28, 2004
Bioinformatics 2005 21(7):1069-1077; doi:10.1093/bioinformatics/bti095
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Clustering of gene expression data using a local shape-based similarity measure
1Max-Planck Institute for Terrestrial Microbiology, Department of Organismic Interactions Karl-von-Frisch-Straße, 35043 Marburg, Germany
2Department of Mathematics and Computer Science, University of Marburg Hans-Meerwein-Straße, 35032 Marburg, Germany
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Motivation: Microarray technology enables the study of gene expression in large scale. The application of methods for data analysis then allows for grouping genes that show a similar expression profile and that are thus likely to be co-regulated. A relationship among genes at the biological level often presents itself by locally similar and potentially time-shifted patterns in their expression profiles.
Results: Here, we propose a new method (CLARITY; Clustering with Local shApe-based similaRITY) for the analysis of microarray time course experiments that uses a local shape-based similarity measure based on Spearman rank correlation. This measure does not require a normalization of the expression data and is comparably robust towards noise. It is also able to detect similar and even time-shifted sub-profiles. To this end, we implemented an approach motivated by the BLAST algorithm for sequence alignment.
We used CLARITY to cluster the times series of gene expression data during the mitotic cell cycle of the yeast Saccharomyces cerevisiae. The obtained clusters were related to the MIPS functional classification to assess their biological significance. We found that several clusters were significantly enriched with genes that share similar or related functions.
Contact: kaemper{at}staff.uni-marburg.de; eyke{at}mathematik.uni-marburg.de
Availability: Upon request from the authors.
| INTRODUCTION |
|---|
|
|
|---|
DNA microarray technology allows for studying the expression of thousands of genes in parallel (Shalon et al., 1996; Brown and Botstein, 1999; Duggan et al. 1999; Lipshutz et al., 1999; Hedge et al., 2000). Especially, the data obtained from microarray analyses provide us with the opportunity to analyze the relationship between genes in response to a particular experimental condition. Thus, it becomes possible to extract biologically meaningful knowledge, e.g. to assign functionality to genes whose function is yet unknown and to assemble them into genetic networks (Gerstein and Jansen, 2000; Slonim, 2002).
If the expressions of genes are measured at various time points during the course of an experimental study, each gene can be characterized by means of an expression profile in the form of a time series (sequence). Grouping genes that share similar expression profiles into clusters is usually the first step towards the understanding of the huge amount of data produced by a DNA array experiment. The clustering of expression profiles is done in two steps. First, the similarity or distance between pairs of profiles is determined, using standard measures such as, e.g. Euclidean distance or Pearson correlation. In the second step, a clustering method is employed in order to group the expression profiles on the basis of their pairwise similarities.
A clustering structure thus obtained can support the understanding of the transcription-control mechanism of the organism under study. For an in-depth analysis that gives more insight into the functional and biochemical role of the genes in the clusters, this transcriptional information can be combined with additional information on proteinprotein interactions, sub-cellular localizations or functional classifications.
Most methods that have been applied to time series microarray data so far derive similarity degrees between genes by comparing complete expression profiles, typically after these profiles have been normalized. Moreover, these methods do often filter the data by removing profiles containing only insignificant changes (2-fold changes are usually considered as significant). This approach has several drawbacks.
First, with regard to filtering the data, it deserves mentioning that genes encoding regulatory factors, such as transcriptional regulators that are the acting switches in a transcriptional network, may be expressed only at very low levels. Moreover, small changes of these levels may produce significant changes in the expression of the genes regulated by the corresponding factors. By considering only significant changes of expression values, such key players in a regulatory network might be missed.
Second, while normalization is clearly reasonable when comparing profiles whose expression values, in terms of absolute numbers, differ by orders of magnitude, it may also lead to undesirable side effects. For example, normalization is extremely sensitive toward outliers and noisy data.
Third, a relationship between genes often presents itself by locally rather than globally similar patterns in their expression profiles. That is, the complete profiles share similar sub-profiles, which might furthermore be time-shifted, as, e.g. in the case of transcriptional regulators and the genes controlled by them. Apart from that, it is conceivable that not all genes respond to all conditions (or time points) in an experiment. In all these cases, the use of a global similarity measure, i.e. the computation of a similarity degree between the complete expression profiles, will not reveal the relationship between two genes, simply because this degree might be rather small.
In this paper, we propose a new method for time course experiments that addresses all of the aforementioned problems. In particular, we employ a shape-based similarity measure based on the Spearman rank correlation. This measure circumvents the problem of normalization and is more robust towards noise in data. Moreover, our approach does not require profiles to be globally similar. Rather, it also detects similar and even time-shifted sub-sequences of expression values. To this end, we implemented an approach motivated by the BLAST algorithm for sequence alignment (Altschul et al., 1990).
The remainder of the paper is organized as follows: In Section 2, we give a brief overview of similarity (distance) measures and existing algorithms for discovering relationships between gene expression profiles. In Section 3, we describe our algorithm CLARITY in detail. In Section 4, we discuss the results that we have obtained for the time series data of gene expression during the mitotic cell cycle of the yeast Saccharomyces cerevisiae, focusing on the biological significance of clustered genes. The paper concludes with a brief summary in Section 5.
| RELATED WORK |
|---|
|
|
|---|
Various methods for extracting meaningful information from gene expression data have already been proposed in literature. This section gives a brief overview of related work on similarity measures. For a more comprehensive and general overview on microarray time series data analysis we refer the interested reader to the review of Möller-Levet et al. (2003).
The most popular and probably most simple measures that have been used in the context of gene expression data are the Pearson correlation, a statistical measure of (linear) dependence between random variables, and the Euclidean distance (Gerstein and Jansen, 2000; Slonim, 2002). Apart from that, more sophisticated measures have been employed such as, e.g. mutual information (Herwig et al., 1999). A method for approximating an ideal similarity measure by training a neural network has been suggested in (Sawa and Ohno-Machado, 2003). Spellman et al. (1998) introduced a method particularly suitable for cyclic data such as, e.g. cell cycle time series. Roughly speaking, they compare the Fourier transformations of the time series rather than the original series themselves.
Qian et al. (2001) addressed the problem of identifying local similarities in gene expression data by means of a method that is based on the SmithWaterman algorithm for (local) sequence alignment. However, they first normalize the data by converting each expression value to its z-score (i.e. subtracting the mean and dividing by the standard deviation of a profile). A global normalization of this type is critical since it is not in agreement with the goal to discover local similarities. In fact, one should realize that normalizing a complete profile means that two sub-profiles cannot be compared independently of all other expression values. All of the aforementioned indices are numerical measures in the sense that they depend on the concrete numbers in the expression profiles.
As opposed to this, Wen et al. (1998) suggest a shape-based similarity measure that compares two profiles on the basis of qualitative changes of expression values. Thus, two sequences are considered as similar if they increase and decrease more or less simultaneously. However, this measure is still a global one in the sense that all time points are taken into consideration. As mentioned in the previous section, exclusively looking for global similarity is often too demanding and comes along with a risk of missing interesting (local) relationships. Filkov et al. (2002) proposed a kind of edge detection method for periodic datasets with small sequences. This method looks for local regions in pairs of expression profiles where major changes in expression occur (edges). Roughly speaking, the profiles are regarded as similar if they do have similar edges. Kwon et al. (1999) suggested an event-based edge detection method. An event in a specific time interval is considered as the directional change of the gene expression curve at that instant. This method converts the raw data to a string of events, such as: R representing changes greater than a certain (upper) threshold value, F for changes less than a (lower) threshold and C for insignificant changes. The event strings are then aligned using a modified version of the NeedlemanWunsch algorithm for global sequence alignment.
By converting a time series into a sequence of events such as increase or decrease, the methods in the last paragraph tend to oversimplify the original data. This makes them robust toward noise and outliers, but also loses a lot of information contained in the original time series. Our method, detailed below, can be seen as a good compromise between the numerical and event-based approaches described in this section.
| METHOD |
|---|
|
|
|---|
Recall that we are looking for local relationships (similarities) between expression profiles and, furthermore, that we seek to incorporate the possibility of time-shifts. Thus, we consider two genes expression profiles X and Y, represented by sequences (x1 ... xn) and (y1 ... yn), as similar if there are similar subsequences X[i,j] and Y[k,l], where X[i,j] = def (xi, xi+1 ... xj) for 1
i
j
n. In analogy to sequence analysis, one can consider a tuple (X[i,j],Y[k,l]) as a (local) alignment (of length j i + 1 = l k + 1).
Exact similarity computation
The simplest approach for discovering local, time-shifted relationships between two profiles is of course to enumerate all possible alignments in a systematic way. Thus, the similarity SIM(X,Y) between two profiles X and Y of length n is computed as
![]() | (1) |
![]() | (2) |
As can be seen, SIMk(X,Y) corresponds to the similarity of the best alignment of length k. Of course, deriving the similarity between two profiles from very short local alignments is questionable (and usually not statistically significant). Therefore, the parameter kmin specifies a lower bound to the length of an alignment.
Approximate similarity computation
A straightforward implementation of Equation (1) leads to a sliding window algorithm [i.e. SIMk(X,Y) is computed by sliding two windows of size k over X and Y] whose time complexity is O(n3). Note that the complexity is reduced to O(n2) if no time-shifts are allowed [and, hence, i = j in Equation (2)]. Both cases are completely acceptable for small n. For longer expression profiles, however, the exact computation of SIM(X,Y) might become too expensive. In that case, we suggest the use of an approximate algorithm that is inspired by the well-known BLAST method for sequence alignment. The idea of this approach is to find an initial hit in the form of a short optimal alignment. Then, in a second step, this alignment is extended in both directions. More precisely, our heuristic approach works as follows:
- Hit: SIMk(X,Y) is computed for k = kmin. Suppose that this similarity degree is obtained for the best match X[ax, bx], Y[ay,by], i.e.
If the best match is not unique, the second step is carried out for all other candidates as well.
- Extend: The similarity degrees
are derived for all
and the best match (X[ax u*, bx + v*], Y[ay u*, by + v*]) is determined. In the case of ties, longer matches are preferred to shorter ones. If there are still several optimal matches, one of them is chosen at random.
- Iterate: The optimal local alignment is updated by setting ax
ax u*, bx
bx + v*, ay
ay u*, by
by + v*, and the second step is repeated. This process is iterated until the optimal alignment does not change (u* = v* = 0).
The parameter d in the second step is a pre-specified constant that determines the size of the neighborhood to be searched and, hence, the complexity of this step, which is obviously O(d2). Note that the myopic strategy obtained for d = 1 carries a high risk of getting caught in local maxima. On the other hand, experience has shown that large values for d are usually not necessary for this type of look-ahead search. Most often, sufficiently good approximations or even exact results are already obtained for d = 2.
Basic similarity measure
So far, the algorithm outlined above is completely independent of the basic similarity measure S. As noted before, measures commonly used in gene expression analysis include the Euclidean distance and the Pearson correlation. Such numerical measures are easy to compute but suffer from some disadvantages. Particularly, they are quite sensitive toward outliers and measurement errors, a point of critical importance in connection with gene expression data. Moreover, in the context of expression analysis, we prefer a concept of similarity that is based on the qualitative behavior or, say, the shape of the profiles to one that is very sensitive to the precise values of fold changes.
Our similarity measure S is therefore defined by the Spearman rank correlation (SRC). The SRC between two profiles X and Y is Given by
![]() |
| {j|xj < xi}| = k 1. Actually, we used an extended version of the SRC which takes the possibility of ties, i.e. xi = xj for i
j, into account. The SRC satisfies 1
SRC(X,Y)
1 for all X,Y.
To exemplify the aforementioned difference between the Pearson correlation and SRC, Figure 1 shows two profiles (of length 7), which are highly correlated according to the latter but almost uncorrelated according to the former. This is mainly caused by the comparatively large value of the third fold change in one of the sequences. As opposed to this, the SRC correctly reflects the fact that both profiles have a rather similar shape. In fact, even though SRC ignores some information, it seems that it retains just the relevant information, making it much more robust than the Pearson correlation. In this connection, it should also be noted that SRC retains more information than a frequently used qualitative measure that compares the sign of the first-order differences (i.e. the ups and downs):
![]() | (3) |
|
|
|
The overall similarity of two profiles X and Y, as defined by Equation (1), is the maximum of similarity degrees for sequences of different lengths k. In order to guarantee the comparability of the similarities SIMk(X,Y), kmin
k
n, these similarities have been defined by their corresponding P-value rather than by the SRC directly. Thus, if S* denotes the optimal SRC that has been found for sequences of length k, then SIMk is given by the probability of obtaining a correlation of at most S* under the null hypothesis of completely unrelated profiles (of length n). Note that the statistical distribution of this measure is an extreme value distribution which depends on the parameters k and n. As there is apparently no simple analytical expression for this distribution, we derived approximations from simulated data. Figure 3 shows the result of such a simulation for profiles with 17 time points.
|
In order to decide whether or not two profiles X and Y are significantly similar, we also need the P-value of the overall similarity SIM(X,Y). Again, we derived approximations of these P-values from simulated distributions.
| EXPERIMENTAL RESULTS |
|---|
|
|
|---|
Data and clustering
We used data from a mitotic cell cycle time course experiment in the yeast S.cerevisiae that included 6331 open reading frames and has been measured over 17 time points by Cho et al. (1998). The yeast cell cultures were synchronized using the so-called Cdc28 arrest and sampled at uniform intervals covering nearly two complete cycles of cell cycle. The experiments were done using Affymetrix oligonuleotide array. The dataset is scaled to account for the experimental differences between the arrays used. As a first step some data points that appeared to be aberrant were eliminated. The dataset was then converted to ratio style measurements by dividing each measurement by the average value of the measurements for that gene as described in Spellman et al. (1998). A total of 6145 genes were taken for further analysis.
We first applied CLARITY with kmin = 15 in order to calculate the pairwise similarity matrix between the expression profiles of the individual genes, i.e. we allow for time-shifts of maximally two time points as longer shifts are hard to explain from a biological point of view. Additionally, it is difficult to obtain significant degrees of similarity if much shorter sub-profiles are used as can be seen from the simulations described above. Figure 5 shows an example of a time-shifted relationship among genes that was described by Yu et al. (2003) and that was also successfully identified by our approach.
|
Clusters of genes were derived from the similarity matrix thus obtained using CLUTO, a program package for graph-based clustering (Karypis, 2002, http://www.cs.umn.edu/~cluto). CLUTO first constructs a graph where each gene is represented by a node, and edges between nodes are labeled with corresponding similarity degrees. Dense regions in this graph thus correspond to sets of genes that are highly co-expressed and thus form good candidates for clusters. For reasons of computational efficiency, and in order to avoid a bias of the results due to insignificant relationships between genes, we simplified the graph in a preprocessing step: The edge between two nodes is deleted whenever the corresponding similarity degree falls below a similarity threshold. In order to derive a clustering structure, CLUTO partitions the graph thus obtained by means of optimal (minimal) cuts. This is repeated in a recursive manner until a pre-specified number of clusters has been constructed.
As can be seen, two critical parameters have to be specified for our clustering approach, namely the number of clusters and the similarity threshold. Fortunately, we found that in our case the clustering results are quite robust toward variations of the above parameters. More specifically, computations with various similarity thresholds showed that thresholds above 0.7 yield almost identical clustering structures. Likewise, we found that the clustering structure did not change appreciably if the number of clusters was raised above 25. We therefore decided to use this number to obtain a maximal number of independent sets of genes. Additionally, the generated clusters appear to be quite homogeneous and show a high internal similarity. See Figure 6 for two examples.
|
Functional evaluation
The MIPS functional categories (Mewes et al., 2002) provide comprehensive information about the known functions of the genes in the yeast genome. In several cases it has been shown that genes with similar functions can be co-expressed (Eisen et al., 1998). To elucidate the biological significance of the clusters generated by our procedure, the clusters produced from CLUTO were mapped to the 400 different MIPS functional categories, and one or several predominant categories were assigned to each cluster. Moreover, in order to prove that the occurrence of a predominant category is statistically significant, we derived corresponding P-values for each cluster. The probability of observing at least m ORFs from a functional category within a cluster of size n is given by
![]() | (4) |
Clusters with P-values less than 104 and average similarity among cluster members smaller than 0.5 were not reported. We found several clusters to be significantly enriched with genes of a similar function2. The results are summarized in Table 1.
|
In general, the clusters can be divided into those that follow the periodicity of the cell cycle [clusters 3, 4, 8; see Fig. 6(a) for an example] and those that are not cell cycle related [clusters 0, 5, 6, 15, 16 and 21; see Fig. 6(b) for an example]. Among the clusters with non-periodically expressed genes, the most significant functional grouping occurs in cluster 0. This cluster consists of 43 genes, 33 of them being associated with ribosome biogenesis (P-value 5.0 x 1034).
Cluster 16 also contains a significant number of protein synthesis related genes (46 out of 168, P-value 1018), including 29 ribosomal genes (P-value 1.2 x 1020). In addition, this cluster contains 14 genes related to amino acid metabolism (P-value 1.5 x 104), indicating that genes within this cluster might play a role in protein synthesis and related functions.
Cluster 21 has a significant enrichment of genes that can be related to energy (P-value 4.5 x 1012) in the broader sense, including genes related to mitochondrial protein synthesis (4.9 x 1011), mitochondrial organization (P-value 5.8x 1015), and energy and carbohydrate metabolism (P-value 2.5 x 107). Six among the seven genes for the nuclear encoded proteins of the cytochrome C oxidase protein complex IV (Cox4, Cox5a, Cox7, Cox8, Cox12 and Cox13) are present within this cluster, and, similarly, several genes encoding proteins of the mitochondrial protein synthesis turnover complex (Mrpl10, Mrpl17, Mrpl24, Mrpl28, Mrpl3, Ydr116C and Ypr100W). These findings support the functional relationship of these genes; the proteins encoded should be co-expressed in stoichiometric amounts required for the assembly of the respective protein complexes.
One of the advantages of the CLARITY algorithm is that time-shifted relations can be discovered. The time-shifted relations constitute up to 55.4% of the total number of relationships within individual clusters (Table 2).
|
To elucidate whether the implementation of time-shifted relations can aid the discovery of biological implications, we analyzed cluster 15, comprising the highest portion of time-shifted correlations, in more detail. We compared this cluster with the clusters from Tavazoie et al. (1999) that have been generated using Euclidean distance and k-means clustering. From 77 genes in cluster 15, 38 were found in clusters 4, 5 and 8 of Tavazoie et al. The remaining 39 genes have not been assigned to any cluster. Among such genes is Tps3, encoding the regulatory component of the trehalose-6-phosphate synthase/phosphatase complex consisting of Tps1p, Tps2p and Tps3p. Although Tps1, encoding the trehalose-6-phosphatase synthase is present in cluster 8 of Tavavoie et al., the time-shifted relation with Tps3 has not been detected by these authors [see Fig. 7(a)].
|
Similarly, Put1 (Proline oxidase) and Put2 (Delta-1-pyrroline-5-carboxylate dehydrogenase) are found to be in cluster 15, but were found to be in different clusters by Tavazoie et al. Put2p in conjunction with Put1p converts proline to glutamate in the mitochondrion. In addition, cluster 15 harbors several other genes involved in glutamate metabolism that show local similarities with Put1 and Put2: Put4, a high affinity proline permease, Agp1, the principal transporter of asparagine and glutamine, Dip5, an amino acid permease for the transport of alanine, glycine, serine, asparagine and glutamine, and the glutamate decarboxylase Gad1 [see Fig. 7(b)].
As expected, among the clusters with a periodic profile, many of the genes encode proteins with functions in cell-cycle dependent processes, like DNA- processing, DNA-synthesis and DNA-replication. The periodic clusters are defined by the timing of the maximum expression of the genes within the cluster. For example, cluster 8 can be considered as G1 specific as it harbors 95 out of 300 reported genes that peak during the G1 phase of the cell cycle, while cluster 3 includes 33 out of 197 genes regulated in the M phase (Spellman et al., 1998).
Let us finally note that not all clusters show a significant enrichment for any of the functional categories. It can be assumed that the genes in these clusters participate in several of the processes defined by the functional categories.
| CONCLUSION |
|---|
|
|
|---|
We presented a new approach to solve the problem of finding locally similar regions in gene expression profiles. These local regions can be time-shifted to allow, e.g. for the detection of transcription control relationships. Our measure of similarity is based on Spearman rank correlation and can be seen as a good compromise between numerical measures (like, e.g. Pearson correlation or Euclidean distance) and simple qualitative measures (like, e.g. measures that consider only ups and downs of a time series) that ignore much of the relevant information. Simulations were performed to assess the statistical significance of the obtained degrees of similarity.
The actual comparison of the profiles is then performed with a heuristic Sliding window approach. The latter has the advantage that it does not impose restrictions on properties of the similarity measure as do methods that rely on dynamic programming (Qian et al., 2001). For example, Spearman rank correlation could not be used with the approach of Qian et al. (2001), as the former does not allow one to calculate the similarity of two profiles directly from the similarities of its sub-profiles.
We applied our method to a dataset of gene expression profiles from the yeast S.cerevisiae that was measured to study the mitotic cell cycle (Cho et al., 1998). The similarities among the profiles were then used to assign co-expressed genes to clusters. Some of the obtained clusters contain mainly cell-cycle related genes that showas expecteda periodic behavior. Other clusters contain genes that have a non-periodic expression profile and thus are not directly related to the cell cycle. This result is expected, as our approach considers similar profiles, which show a similar shapeand a cyclic behavior is just one of the many possible shapes of a profile.
The obtained clusters were then compared against an existing functional classification of the genes of S.cerevisiae. Some of the clusters showed a significant enrichment of genes of a particular functional category, reflecting the fact that genes with a similar function are often co-regulated and thus co-expressed.
| Acknowledgments |
|---|
We would like to thank S. Tornow. I. Tetko and W. Mewes for stimulating discussions and helpful advice.
| Footnotes |
|---|
1This P-value obviously ignores the problem of multiple-hypothesis testing and should therefore be interpreted with caution.
2The parameters for edge pruning and vertex pruning of the CLUTO program were set to 0.2 to increase the internal similarity of the generated clusters. ![]()
Received on July 2, 2004; revised on September 22, 2004; accepted on October 11, 2004
| REFERENCES |
|---|
|
|
|---|
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403410[CrossRef][Web of Science][Medline].
Brown, P. and Botstein, D. (1999) Exploring the new world of the genome with DNA microarrays. Nat. Genet. Suppl., 21, 3337.
Cho, R.J., Campbell, M.J., Winzeler, E.A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T.G., Gabrielian, A.E., Landsman, A.E., Lockhart, D.J., Davis, R.W. (1998) A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell, 2, 6573[CrossRef][Web of Science][Medline].
Duggan, D.J., Bittner, M., Chen, Y., Meltzer, P., Trent, J.M. (1999) Expression profiling using cDNA microarrays. Nat. Genet. Suppl., 21, 1014.
Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl Acad. Sci., USA, 95, 1486314868
Filkov, V., Skiena, S., Zhi, J. (2002) Analysis techniques for microarray time-series data. J. Comput. Biol., 9, 317330[CrossRef][Web of Science][Medline].
Gerstein, M. and Jansen, R. (2000) The current excitement in bioinformatics-analysis of whole-genome expression data: how does it relate to protein structure and function?. Curr. Opin. Struct. Biol., 10, 574584[CrossRef][Web of Science][Medline].
Hegde, P., Qi, R., Abernathy, K., Gay, C., Dharap, S., Gaspard, R., Hughes, J.E., Snesrud, E., Lee, N., Quackenbush, J. (2000) A concise guide to cDNA microarray analysis. Biotechniques, 2, 548550.
Herwig, R., Poustka, A.J., Muller, C., Bull, C., Lehrach, H., O'Brien, J. (1999) Large-scale clustering of cDNA-fingerprinting data. Genome Res., 9, 10931105
Karypis, G. (2002) CLUTOa clustering toolkit, Technical Report 02-017. Dept. of Computer Science, University of Minnesota.
Kwon, A.T., Hoos, H.H., Ng, R. (1999) Inference of transcriptional regulation relationships from gene expression data. Bioinformatics, 19, 905912.
Lipshutz, R.J., Fodor, S.P.A., Gingeras, T.R., Lockhart, D.J. (1999) High density synthetic oligonucleotide arrays. Nat. Genet. Suppl., 21, 2024.
Mewes, H.W., Frishman, D., Güldener, U., Mannhaupt, G., Mayer, K., Mokrejs, M., Morgenstern, B., Münsterkoetter, M., Rudd, S., Weil, B. (2002) MIPS: a database for genomes and protein sequences. Nucleic Acids Res., 30, 3134
Möller-Levet, C., Cho, K.H., Yin, H., Wolkenhauer, O. (2003) Clustering of gene expression time-series data. Technical report. University of Rostock, Department of Computer Science.
Qian, J., Dolled-Filhart, M., Lin, J., Yu, H., Gerstein, M. (2001) Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions. J. Mol. Biol., 314, 10531066[CrossRef][Web of Science][Medline].
Sawa, T. and Ohno-Machado, L. (2003) A neural network-based similarity index for clustering DNA microarray data. Comput. Biol. Med., 33, 115[CrossRef][Web of Science][Medline].
Shalon, D., Smith, S., Brown, P. (1996) A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. Genome Res., 6, 639645
Slonim, D.K. (2002) From patterns to pathways: gene expression data analysis comes of age. Nat. Genet. Suppl., 32, 502508[CrossRef].
Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M., Brown, P., Botstein, D., Futcher, B. (1998) Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell, 9, 32733297
Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M. (1999) Systematic determination of genetic network architecture. Nat. Genet., 22, 281285[CrossRef][Web of Science][Medline].
Wen, X., Fuhrman, S., Michaels, G.S., Carr, D.B., Smith, S., Barker, J.L., Somogyi, R. (1998) Large-scale temporal gene expression mapping of central nervous system development. Proc. Natl Acad. Sci., USA, 95, 334339
Yu, H., Luscombe, M.N., Qian, J., Gerstein, M. (2003) Genomic analysis of gene expression relationships in transcriptional regulatory networks. Trends Genet., 19, 422427[CrossRef][Web of Science][Medline].
This article has been cited by other articles:
![]() |
B. Andreopoulos, A. An, X. Wang, and M. Schroeder A roadmap of clustering algorithms: finding a match for a biomedical application Brief Bioinform, May 1, 2009; 10(3): 297 - 314. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Wang, Q. Wang, X. Li, B. Shen, M. Ding, and Z. Shen Towards patterns tree of gene coexpression in eukaryotic species Bioinformatics, June 1, 2008; 24(11): 1367 - 1373. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Shi, T. Mitchell, and Z. Bar-Joseph Inferring pairwise regulatory relationships from multiple time series datasets Bioinformatics, March 15, 2007; 23(6): 755 - 763. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. Ruan, D. Dutta, M. S. Schwalbach, J. A. Steele, J. A. Fuhrman, and F. Sun Local similarity analysis reveals unique associations among marine bacterioplankton species and environmental factors Bioinformatics, October 15, 2006; 22(20): 2532 - 2538. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||













