Fusing time series expression data through hybrid aggregation and hierarchical merge
1Innovation Center – East Flanders, House of Economy, Seminariestraat and 2, B-9000 Ghent, Belgium and 2Computer Systems and Technologies Department, Technical University of Sofia – branch Plovdiv, Tsanko Dyustabanov 25, 4000 Plovdiv, Bulgaria
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: A novel integration approach targeting the combination of multi-experiment time series expression data is proposed. A recursive hybrid aggregation algorithm is initially employed to extract a set of genes, which are eventually of interest for the biological phenomenon under study. Next, a hierarchical merge procedure is specifically developed for the purpose of fusing together the multiple-experiment expression profiles of the selected genes. This employs dynamic time warping alignment techniques in order to account adequately for the potential phase shift between the different experiments. We subsequently demonstrate that the resulting gene expression profiles consistently reflect the behavior of the original expression profiles in the different experiments.
Contact: vboeva{at}tu-plovdiv.bg
Supplementary information: Supplementary data are available at http://www.tu-plovdiv.bg/Container/bi/DataIntegration/
| 1 INTRODUCTION |
|---|
|
|
|---|
It is a common practice nowadays to monitor a particular biological phenomenon in multiple high-throughput experiments. For instance, microarray profiling studies of several different highly synchronized cell cultures in fission yeast (Schizosaccharomyces pombe) have been widely employed in recent years for the identification of periodically regulated genes during the cell cycle (Oliva et al., 2005; Peng et al., 2005; Rustici et al., 2004). Each microarray experiment is supposed to measure the gene expression levels of a set of genes in a number of different time points. The different experiments do not necessarily cover exactly the same phases of the phenomenon, neither do they have the same time range nor use the same time sampling interval. The question is how to combine these experimental data in order to derive consistent and relevant information about the biological phenomenon under study? A traditional approach to this problem is to evaluate quantitatively the gene expression in terms of criteria as up-regulation, periodicity, etc. and subsequently, select a set of genes for each experiment. The overlapping genes between the experiments will form the final set of significant genes that can eventually be associated with the phenomenon in question. However, the validity and feasibility of such studies is very often compromised by the relatively poor agreement between the different experiments due to presence of noise, stress response reactions and other artifacts.
There is no doubt that a multi-experimental setup has the advantage of providing, for a given biological process, diverse evidence about the role and function of genes and consequently, may lead to relevant insights into the underlying gene interaction mechanisms of this process. The ability to reliably combine data from different microarray studies together is, therefore, of a crucial importance for the microarray data mining results. Different microarray combination techniques and meta-analysis studies (Choi et al., 2003, 2007; Conlon et al., 2006; Gilks et al., 2005; Hermans and Tsiporkova, 2007; Zaykin et al., 2007) appear regularly in the bioinformatics literature, not many of them though, venture fusing expression profiles produced in different experiments. For instance, a PubMed query with key words fusing microarray experiments produces a single result (Gilks et al., 2005). The latter presents a data fusion method based on multivariate regression, which aims at producing a fused data matrix, representing a canonical time-course experiment at a number of equally spaced times. The fusion was performed over nine expression datasets, consisting of a set of genes already selected as potentially cell cycle regulated in a previous study on cell cycle control in S.pombe (Rustici et al., 2004). There is some inconsistency in this methodology since it relies on the initial selection of a common set of genes for all nine experiments. However, as already stated above, the interpretation of the experimental results and the generation of concrete hypotheses may be hindered by the failure of the different experiments to agree consistently on a common set of significant genes.
We propose here a somewhat different approach to this problem. Initially P-values for regulation are calculated for each gene in each experiment. These are subsequently aggregated together in a recursive fashion employing a set of different aggregation operators. The convergence of the recursive aggregation process results in assigning to each gene an overall P-value, which can be interpreted as the consensus P-value supported by all the different experiments. These consensus P-values are further used to select a subset of genes, e.g. either by using a predefined P-value threshold or retaining a certain percentage of the genes with the lowest P-values, which are eventually of interest for the biological phenomenon under study. The multiple-experiment expression profiles of the selected genes are then fused together via a hierarchical merge procedure. This employs dynamic time warping (DTW) alignment techniques in order to account adequately for the eventual phase shift between the different experiments. Subsequently, we demonstrate that the resulting gene expression profiles consistently reflect the behavior of the original expression profiles in the different experiments.
Our data fusion approach has some resemblance with the microarray merging procedure proposed in Hermans and Tsiporkova (2007). The method pastes expression profiles from two different plant cell synchronization experiments and results in an expression curve that spans more than one cell cycle. The optimal pasting overlap is determined via a DTW alignment. Finally, the different expression time series are merged together by aggregating the corresponding expression values lying within the overlap area. Thus, a sort of partial data fusion is performed on the original expression profiles.
| 2 METHODS |
|---|
|
|
|---|
2.1 P-value calculation
For each expression time series dataset, P-values for regulation have been calculated as described by de Lichtenberg et al. (2004). Namely, a P-value for regulation for a particular gene is resulting from the comparison of the gene expression variance with a randomly generated variance distribution, constructed by selecting at each time point the log-ratio value of a randomly chosen gene. The P-value for regulation is calculated as the fraction of artificial profiles with a variance equal to or greater than the score of the real expression profile.
2.2 Hybrid aggregation algorithm
We discuss herein a recursive aggregation algorithm aiming at reducing a given data matrix into a single vector. Assume that some phenomenon (e.g. cell cycle) is studied by evaluating quantitatively m different variables (e.g. genes) in several different experimental conditions. Thus, a vector
|
|
|
|
We suggested in Tsiporkova and Boeva (2004) a hybrid aggregation process employing a set of k different aggregation operators. The values of matrix X can initially be combined by aggregating in parallel the values of each vector xj (j=1,...,m) with these k aggregation operators. Consequently, a new matrix Y=[y1,...,ym] of m column vectors, each of k values (one per aggregation operator), is generated. This new matrix can be aggregated again by applying in parallel the k aggregation operators on each vector yj (j=1,...,m) and thus generating a new matrix. In this way, each interaction step is modeled via k parallel aggregations applied over the results of the previous step.1
Thus, the final result is obtained after passing a few layers of aggregation. At the first layer, we have the list of initial values that are to be combined. Using a vector of aggregation operators new values are obtained and the next step is to combine these new values again using the given aggregation operators. This process needs to be repeated again and again until the difference between the values produced for the different variables, or the maximum and minimum values in each column of the currently calculated matrix are small enough to stop further aggregation. In (Tsiporkova and Boeva, 2004, 2006, 2007a), we have shown that any recursive aggregation process, defined via a set of continuous and strict-compensatory aggregation operators, following the algorithm described herein is convergent. For instance, any weighted mean operator with non-zero weights is continuous and strict compensatory. Analogously, all power means are continuous and strict compensatory. Power means are for example the arithmetic, geometric and harmonic means.2
2.3 DTW algorithm
The DTW alignment algorithm aims at aligning two sequences of feature vectors by warping the time axis iteratively until an optimal match (according to a suitable metrics) between the two sequences is found. It was developed originally for speech recognition applications (Sakoe and Chiba, 1978). Due to its flexibility, DTW has been widely used in many scientific disciplines including several computational biology studies (Aach and Church, 2001; Criel and Tsiporkova, 2006; Hermans and Tsiporkova, 2007). A detail explanation of DTW algorithm can be found in Hermans and Tsiporkova (2007), Sakoe and Chiba 1978), Sankoff and Kruskal (1983). Therefore, the description following below is restricted to the relevant steps of the algorithm.
Consider two matrices A=[a1,...,an] and B=[b1,...,bm] with ai (i=1,...,n) and bj (j=1,...,m) column vectors of the same dimension. The two vector sequences [a1,...,an] and [b1,...,bm] can be aligned against each other by arranging them on the sides of a grid, e.g. one on the top and the other on the left hand side. Then a distance measure, comparing the corresponding elements of the two sequences, can be placed inside each cell. To find the best match or alignment between these two sequences one needs to find a path through the grid P=(1,1),...,(is,js),...,(n,m), (1
is
n and 1
js
m), which minimizes the total distance between A and B. Thus, the procedure for finding the best alignment between A and B involves finding all possible routes through the grid and for each one compute the overall distance, which is defined as the sum of the distances between the individual elements on the warping path. Consequently, the final DTW distance between A and B is the minimum overall distance over all possible warping paths:
|
|
| 3 DATA INTEGRATION PROCEDURE |
|---|
|
|
|---|
Assume that a particular biological phenomenon is monitored in a high-throughput experiment under n different conditions, for instance a cell-cycle study performed for several different synchronized cultures. Each experiment is supposed to measure the gene expression levels of m genes in a number of different time points. Thus, a set of n different data matrices A1,...,An will be produced, one per experiment. These matrices will not necessarily cover exactly the same phases of the studied phenomenon, neither have the same dimensions nor use the same time sampling interval. The question is how to combine these data matrices in order to derive consistent and relevant information about the phenomenon under study? A traditional approach to this problem is to evaluate quantitatively the gene performance in terms of criteria as up-regulation, periodicity, etc., i.e. each expression matrix will be converted into a vector (or vectors) of P-values. Next, applying a predefined threshold, a set of genes will be selected for each experiment and combined in some way (union, intersection, etc.) to form a final gene set that can eventually be associated with the phenomenon in question. We suggest a somewhat different approach here.
3.1 Feature selection phase
Assume that P-values are estimated in some way for each gene in each experiment. Then the P-values from all the experiments are joined together to form a single matrix, consisting of as many rows as experiments and as many columns as genes. Thus:
|
|
The hybrid aggregation procedure is schematically illustrated in Figure 1. The values of the resulting vector p can be interpreted as the consensus P-values supported by all the n experiments. These can be further used to select a subset of genes, which are eventually of interest for the studied biological phenomenon. Assume that a set of s (1
s
m) genes has been selected either by using a predefined P-value threshold or retaining a certain percentage of the genes with the lowest P-values. Subsequently, the time expression profiles of these s genes can be extracted from the original n data matrices and thus n new matrices B1,...,Bn are constructed. Each gene of interest is represented with multiple expression profiles, one for each experiment, shedding light on the gene's function from different experimental perspectives.
|
3.2 Data fusion phase
It is clear that a multi-experimental setup has the advantage of providing diverse evidence about the gene function and behavior in the same biological process and consequently, may lead to robust insights into the underlying interaction mechanisms of this process. However, the interpretation of the experimental results may be hindered by the failure of the different experiments to agree consistently on a common set of significant genes. The generation of concrete hypotheses about the gene role in the studied biological phenomenon might be further facilitated if a unique, over all experiments, expression profile is associated with each selected gene. Here, we propose a hierarchical merge procedure, which bares some resemblance with the progressive multiple sequence alignment techniques using guide trees (Thompson et al., 1994), which combines the multiple gene profiles pairwise into a unique expression profile. The algorithm performs exactly n–1 iterations, as depicted in Figure 2, and each iteration consists of the following three distinctive steps:
- Pairwise alignment. The newly constructed expression matrices B1,...,Bn (see Section 3.1) are aligned pairwise against each other using the DTW alignment algorithm,3 described in Section 2.3. Subsequently, the optimal DTW alignment path and the corresponding DTW distance for each matrix pair are obtained.
- Optimal order. Assume that the expression matrices are numbered from 1 to n. The goal is to re-order them in such a way that the subsequent merge is performed on matrix pairs, which are relatively close to each other in terms of DTW distance. The matrix pair with the lowest DTW distance is first identified. Let us denote it for simplicity with (i1,i2). Then from the remaining (n–2) matrices, the one with the lowest DTW distance to the one of the first two matrices i1 and i2 needs to be found. Subsequently, this matrix is appended left (if closer to i1) or right (if closer to i2) of (i1,i2). Thus, a new list is created, denoted (i1,i2,i3). Then the same procedure as above is repeated, but this time for i1 and i3. One needs to carry on in this fashion until no matrices are left in the original list.
- Pairwise merge. At the previous step, the expression matrices have been ordered as follows i1,...,in. However, for simplicity, these will be referred to as 1,...,n. Each adjacent matrix pair, i.e. (1,2),(2,3),...,(n–1,n) is merged, including the timing information, into a single matrix by just taking the mean of each aligned value pair as specified by the optimal DTW alignment path.
|
| 4 RESULTS AND DISCUSSION |
|---|
|
|
|---|
4.1 Data
In this work, we combine hybrid aggregation with hierarchical alignment and merge algorithms for integration of time series expression data coming from different experiments. The proposed algorithms are evaluated and demonstrated on gene expression time series data coming from a study examining the global cell cycle control of gene expression in fission yeast S.pombe (Rustici et al., 2004). The study includes eight independent time-course experiments synchronized, respectively, by (1) elutriation (three independent biological repeats), (2) cdc25 block-release (two independent biological repeats, of which one in two dye-swapped technical replicates, and one experiment in a sep1 mutant background) and (3) a combination of both methods (elutriation and cdc25 block-release as well as elutriation and cdc10 block-release). Thus, the following nine different expression test sets are available: (1) elu1, (2) elu2, (3) elu3, (4) cdc25-1, (5) cdc25-2.1, (6) cdc25-2.2, (7) cdc25-sep1, (8) elu-cdc25-br, (9) elu-cdc10-br. Measurements for each experiment were taken at synchronization time 0 and then every 15 min for up to 315 min. The elutriation datasets and the cdc25 block-release datasets consist of 18–20 time points covering two full cell cycles, the combined elutriation/block-release datasets contain 21–22 time points, however, covering only one cycle.
4.2 Feature selection through hybrid aggregation
The normalized data from the nine experiments (see Section 4.1) has been downloaded from the website of the Sanger Institute (http://www.sanger.ac.uk/PostGenomics/S_pombe/). Subsequently, the empty rows (genes with no expression measurements) have been filtered out from the expression matrices and any other missing expression entries have been imputed with the DTWimpute algorithm (Tsiporkova and Boeva, 2007b). Thus, nine complete expression matrices have been created in this way, one for each of the original test sets: (1) elu1, (2) elu2, (3) elu3, (4) cdc25-1, (5) cdc25-2.1, (6) cdc25-2.2, (7) cdc25-sep1, (8) elu-cdc25-br, (9) elu-cdc10-br. Then P-values for regulation have been calculated, performing permutation tests as described in Section 2.1, i.e. a P-value vector has been associated with each expression matrix. Subsequently, for the set of genes occurring in all nine experiments, a P-value matrix has been created. This is of dimension 3970 genes times nine experiments.
For the so-constructed P-value matrix, the number of genes with P-value lower than 0.05 has been recorded for each experiment separately. These can be found in the left column of Table 1. For most of the experiments this is a range of 230–260 genes. The three elutriation datasets deviate considerably from this range and in particular elu2, for which more than 450 genes with P-value lower than 0.05 have been identified. In a subsequent step, the number of genes with P-value lower than 0.05 in multiple experiments has been determined. Groups of experiments have been formed starting from Experiment 1 and adding, in an incremental fashion, the rest of the experiments. In this way, eight groups of experiments have been considered as presented in the right column of Table 1. It is remarkable to see that already for the first two experiments the number of genes with P-value below 0.05 is less than 100. Finally, only 21 from the total 3970 genes have P-values lower than 0.05 in all nine experiments. The only goal of this exercise is to demonstrate how poor the agreement between the different experiments can be and to justify the application of alternative combination methods.
|
As an alternative, the feature selection procedure, as defined in Section 3.1, has been applied to the P-value matrix. For the purpose of the hybrid aggregation procedure, three different aggregation operators have been selected: arithmetic, geometric and harmonic means. Each one of these aggregation operators exhibits certain shortcomings when used individually. For instance, the arithmetic mean values are strongly influenced by the presence of extremely low or extremely high values. This may lead in some cases to an averaged overall P-value, which does not adequately reflect the different individual P-values. In case of the geometric mean, the occurrence of a very low P-value (e.g. 0 or close to 0) in a single experiment for a particular gene is sufficient to produce a low overall P-value for this gene, no matter what the P-values for the rest of the experiments are. The harmonic mean behaves even more extreme in situations when single entries with very low values are present. These artifacts are reflected in Figure 3a, which reports the evolution of the number of genes with P-values lower than 0.05 for each of the three individual aggregation operators as the hybrid aggregation algorithm progresses. The initial number of significant genes for threshold 0.05 identified by each of the three operators is very different. The arithmetic mean identifies merely 58 out of 3970 genes (1.5%), while the harmonic mean selects more than 500 significant genes (12.6%). The geometric mean is more moderate in producing 173 significant genes (4.4%). The hybrid aggregation process has converged after a couple of iterations (see x-axis in Fig. 3a) resulting in 218 significant (with P-values lower than 0.05) genes (5.5%) in total. These may be consulted on our webpage: http://www.tu-plovdiv.bg/Container/bi/DataIntegration/.
|
4.3 Data fusion through hierarchical pairwise alignment and merge
The expression profiles of the 218 significantly regulated genes (threshold 0.05), identified at the feature selection step above, have been extracted from the complete expression matrices for each of the nine experiments. The number of equidistant (each 15 min) time points in which these genes were monitored in the different experiments varies considerably: 22 (0–315 min) for elu-cdc10-br; 21 (0–300 min) for elu-cdc25-br; 20 (0–285 min) for elu1, elu2, elu3 and cdc25-sep1; 19 (0–270 min) for cdc25-1; 18 (0–255 min) for cdc25-2.1 and cdc25-2.2. Moreover, elu-cdc10-br and elu-cdc25-br are covering only one cell cycle, while the rest of the experiments are stretched over two cell cycles.
The nine different expression matrices, consisting of 218 genes and a varying number of time points (ranging from 18 to 22), have been fused in eight steps, applying the procedure described in Section 3.2. In this way, we have arrived at a single expression matrix containing the fused profiles of the 218 genes in 25 fused time points. These are ranging from 0 to 272 min and are not everywhere equidistant: 0, 0.06, 0.12, 4.2, 15.7, 30.5, 45.7, 60.7, 75.7, 90.7, 105.7, 120.7, 126.2, 141.2, 156.1, 171.1, 186.1, 201.1, 201.6, 216.6, 217.1, 232.1, 247.1, 261.6, 272.
Recall that each iteration of the data fusion procedure consists of three distinctive steps: pairwise alignment, optimal order and pairwise merge. The DTW distance matrix created at the pairwise alignment step during the very first iteration of the fusion algorithm is given in Table 2. These distances have been used at the next step in order to determine the optimal sequence in which the different expression matrices would subsequently be pairwise merged. Table 2 is arranged according to the determined optimal order between the experiments. The latter seems quite logical since the three different groups of synchronization experiments, elutriation, cdc25 block-release and the combined elutriation/block-release, are clearly clustered together and the lowest DTW distance is assigned to the pair of the two dye-swapped technical replicates cdc25-2.1 and cdc25-2.2.
|
After the completion of the first data fusion iteration, eight new expression matrices have been created: elu-cdc25-br & elu-cdc10-br, elu-cdc10-br & cdc25-2.1, cdc25-2.1 & cdc25-2.2, cdc25-2.2 & cdc25-1, cdc25-1 & cdc25-sep1, cdc25-sep1 & elu2, elu2 & elu1 and elu1 & elu3. These have been subjected again to pairwise DTW alignment, optimal (re-)ordering and merge. At each iteration step the expression profiles have gradually been getting closer, in terms of DTW distance, to each other. The latter is illustrated in Fig. 3b, which depicts the evolution of the mean DTW distance, from about 8.5 at the first iteration to almost 3 at the last one, during the data fusion process.
Thus, in the course of eight iterations, the originally available nine different expression matrices have been fused smoothly into a single expression matrix of 25 time points. Figure 4 depicts for six different genes the fused expression profiles on the background of the nine original expression profiles for each gene. These genes have been selected since they have some known cell cycle involvement, as for instance cig2 B-type cyclin and mik1 kinase inhibitor are both involved in cell cycle control, klp8 is a kinesin microtubule motor playing a role in chromosome cohesion and segregation, cdc22 is involved in S-phase regulation, hht1 is a histone and TF2-7 is a TF2-type transposon. The first five plots, Figure 4a–4e, are examples of clear periodic behavior expressed over all the experiments and consequently, the fused profiles are periodic with two distinctive well formed cycles. Two separate cycles can also be identified for the fused expression profile in Figure 4f. However, these are less pronounced than the rest in Figure 4, since the underlying original expression profiles exhibit some peak shift and a difference in peak amplitude.
|
Figure 5 depicts the fused and the original profiles of another six genes. These genes are characterized by the presence of rather extreme outliers among the original expression profiles or expression profiles that vary considerably in terms of form and amplitude. Consequently, the resulting fused profiles do not exhibit a clear cell cycle activity. For instance, several of the original profiles for the gene in Figure 5a are really periodic, however, the periodic behavior is not supported by the rest of the experiments and thus only one distinctive peak is preserved in the fused profile. Figure 5b–5d present fused profiles, which are rather consistent with a stress response reaction with a clear up-regulation at the beginning of the synchronization. The profile in Figure 5e is quite flat due to inconsistent behavior throughout the different experiments, while the one in Figure 5f results in fluctuating noise since the original expression profiles disagree on the up- and down-regulation trends. It is obvious that the fused expression profiles can be a good indication for inconsistent and noisy behavior and thus facilitate the detection of false positives (non-cell cycle regulated genes).
|
Some of the fused expression profiles have a very distinctive shape and can be used as a sort of gene signature for a particular activity, e.g. cyclic behavior, stress response, noise, etc. For instance, Figure 6 depicts the fused expression profiles of three gene clusters: hht1, TF2-7 and cdc22. The cluster lists can be consulted in Table 3. These have been obtained by aligning, using the DTW algorithm, the fused profile of each one of the three genes against the rest of the fused expression profiles. The genes with the best matching profiles have been selected as associated with the gene in question. It can be seen that the gene cluster of histone hht1 contains exclusively histones. Analogously, the cluster of the retrotransposable element TF2-7 is populated with other retrotransposable elements: TF2-5, TF2-8, TF2-2, TF2-1, TF2-6, TF2-10. The fused profile of cdc22, a protein likely required for initiation of DNA replication, is matched by the fused profiles of other proteins linked to DNA replication as cdt2, mid2, cdc18 and mik2.
|
|
| 5 CONCLUSION |
|---|
|
|
|---|
We have presented a novel method for fusing expression profiles coming from different microarray experiments. It is a two step procedure consisting of a feature selection phase and a fusion phase. The feature selection phase reaches consensus between the different experiments about the regulation status of each gene. The fusion phase combines the multiple gene expression profiles, from different experiments, into a single overall profile. The performance of the proposed method has been evaluated on gene expression time series data coming from a study examining the global cell cycle control of gene expression in fission yeast S.pombe. It has been demonstrated that the shape of the fused expression profiles consistently reflects the overall gene behavior in the different experiments. Thus, for a given gene, a consistent cyclic behavior supported by all the different experiments will result in a fused gene profile with distinctive periodic shape. Analogously, gene expression profiles exhibiting a considerable peak and amplitude difference for several experiments will generate a rather noisy overall profile, possibly indicating the presence of noise, stress response or other artifacts. In general, it has been observed that the fused expression profiles can be used as a sort of gene signature for a particular activity, e.g. cyclic behavior, stress response, noise, etc. Thus, the proposed data fusion method can be considered as a normalization and smoothing procedure aiming at noise reduction and signal amplification.
| APPENDIX A |
|---|
|
|
|---|
- Hybrid aggregation. Consider a matrix X. The values of each column vector xj (j=1,...,m) of X are combined in parallel employing a set of k different aggregation operators A1,...,Ak. Consequently, a new matrix Y is generated with column vectors yj (j=1,...,m):
This new matrix is aggregated again, generating yet another matrix Y(1). In this fashion, each interaction step is modeled via k parallel aggregations applied over the results of the previous step, i.e. at step q (q=1,2,...) a matrix Y(q) is obtained with column vectors y
(q) (j=1,...,m,): 
- Convergence. Formally, convergence implies that at some step q of the aggregation process it will hold that
for each j=1,...,m and a very small real number
. Subsequently, vector z can be composed as follows 
- Strict Compensatory aggregation operators. An aggregation operator A is referred to as strict compensatory if
for any (x1,...,xn) with at least two different values (n
2) (Fodor and Roubens, 1994).
- Power means. These are aggregation operators defined by
for p
R and x1,...,xn
0. It can be easily verified that M1, M0 and M–1 are the arithmetic M=1/n
xi, the geometric G=(
xi)1/n and the harmonic H=n/(
1/xi) means, respectively.
| FOOTNOTES |
|---|
1The Appendix A contains a formal description of the aggregation algorithm.
2For further details see the Appendix A. ![]()
3The expression values need to be standardized (subtracting the mean and dividing by the SD) before performing the DTW alignment. ![]()
| REFERENCES |
|---|
|
|
|---|
Aach J, Church GM. Aligning gene expression time series with time warping algorithms. Bioinformatics (2001) 17:495–508.
Choi JK, et al. Combining multiple microarray studies and modeling interstudy variation. Bioinformatics (2003) 19(Suppl. 1):i84–i90.[Abstract]
Choi H, et al. A latent variable approach for meta-analysis of gene expression data from multiple microarray experiments. BMC Bioinformatics (2007) 8:364.[CrossRef][Medline]
Conlon EM, et al. Bayesian models for pooling microarray studies with multiple sources of replications. BMC Bioinformatics (2006) 7:247.[CrossRef][Medline]
Criel J, Tsiporkova E. Gene Time Expression Warper: a tool for alignment, template matching and visualization of gene expression time series. Bioinformatics (2006) 22:251–252.
De Lichtenberg U, et al. Comparison of computational methods for the identification of cell cycle-regulated genes. Bioinformatics (2004) 21:1164–1171.[CrossRef][Web of Science][Medline]
Fodor JC, Roubens M. Fuzzy Preference Modelling and Multicriteria Decision Support. (1994) Dordrecht: Kluwer Academic Publishers.
Gilks WR, et al. Fusing microarray experiments with multivariate regression. Bioinformatics (2005) 21:ii137–ii143.[Abstract]
Hermans F, Tsiporkova E. Merging microarray cell synchronization experiments through curve alignment. Bioinformatics (2007) 23:e64–e70.
Oliva A, et al. The cell cycle-regulated genes of Schizosaccharomyces pombe. PLOS (2005) 3:1239–1260.
Peng X, et al. Identification of cell cycle-regulated genes in fission yeast. Mol. Biol. Cell (2005) 16:1026–1042.
Rustici G, et al. Periodic gene expression program of the fission yeast cell cycle. Nat. Genet (2004) 36:809–817.[CrossRef][Web of Science][Medline]
Sakoe H, Chiba S. Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust (1978) ASSP-26:43–49.[Medline]
Sankoff D, Kruskal J. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. (1983) Reading, MA: AddisonWesley.
Thompson JD, et al. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res (1994) 22:4673–4680.
Tsiporkova E, Boeva V. Nonparametric recursive aggregation process, Kybernetika. J. Czech Soc. Cybern. Inf. Sci (2004) 40:51–70.
Tsiporkova E, Boeva V. Multi-step ranking of alternatives in a multi-criteria and multi-expert decision making environment. Inf. Sci (2006) 176:2673–2697.[CrossRef]
Tsiporkova E, Boeva V. Modelling and simulation of the genetic phenomena of additivity and dominance via gene networks of parallel aggregation processes. In: Bioinformatics Research and Development.—Hochreiter S, Wagner R, eds. (2007a) 4414. Springer-Verlag. 199–211. Lecture Notes in Bioinformatics.[CrossRef]
Tsiporkova E, Boeva V. Two-pass imputation algorithm for missing value estimation in gene expression time series. J. Bioinform. Comput. Biol (2007b) 5:1005–1022.[CrossRef][Medline]
Zaykin DV, et al. Combining p-values in large-scale genomics experiments. Pharm. Stat (2007) 6:217–226.[CrossRef][Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||








