Bioinformatics Advance Access originally published online on November 17, 2005
Bioinformatics 2006 22(2):251-252; doi:10.1093/bioinformatics/bti787
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Gene Time E
pression Warper: a tool for alignment, template matching and visualization of gene expression time series
1Devgen N.V., Science IT Technologiepark 30, B-9052 Ghent, Belgium
2Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology (VIB), Ghent University Technologiepark 927, B-9052, Ghent, Belgium
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: An application tool for alignment, template matching and visualization of gene expression time series is presented. The core algorithm is based on dynamic time warping techniques used in the speech recognition field. These techniques allow for non-linear (elastic) alignment of temporal sequences of feature vectors and consequently enable detection of similar shapes with different phases.
Availability: The Java program, examples and a tutorial are available at http://www.psb.ugent.be/cbd/papers/gentxwarper/
Contact: eltsi{at}psb.ugent.be
| 1 INTRODUCTION |
|---|
|
|
|---|
Detecting patterns in gene expression time-series data is a challenging knowledge discovery task due to the variation in time progression inherent to biological processes that may unfold with different rates in response to different experimental conditions or within different organisms and individuals. Classical distance metrics as Euclidean or a variation thereof fail to capture this temporal variation since they are very sensitive to small distortions in the time axis and, consequently, produce poor similarity measures between time series. Dynamic time warping (DTW) is a much more robust distance measure for time series, allowing similar shapes to match even if they are out of phase in the time axis (Fig. 1).
|
The DTW alignment algorithm was developed originally for speech recognition (Sakoe and Chiba, 1978) and it aims at aligning two sequences of feature vectors by warping the time axis iteratively until an optimal match (according to a suitable metrics) between the two sequences is found. Because of its flexibility, DTW is widely used in many scientific disciplines and business applications. In a pilot study, Aach and Church (2001) investigated the stability of the DTW algorithm on Saccharomyces cerevisiae cell cycle expression data, by mainly focusing on the alignment of the expression profiles of the class ptg50 (990 genes) in two different time series. For this purpose they used four command-line executable C++ programs, implementing classical and interpolated DTW algorithm and generating postscript files containing visualizations of the alignment.
We present here a gene time expression warping tool GenT
Warper, a Java-based program supplied with a powerful graphical user interface that enables alignment, template matching and visualization of time-series data in an easy and a flexible fashion. The original symmetric DTW algorithm (Sakoe and Chiba, 1978) has been extended with new features, as for instance the possibility for defining an anchor point in the alignment and for performing partial alignments by sliding the time series against each other along the time axis. Additionally, some typical microarray data transformations and several distance metrics to be applied between the feature vectors at each time point have been provided to the user. To our knowledge GenT
Warper is the first user-friendly DTW tool available to the biological community.
| 2 METHODS AND IMPLEMENTATION |
|---|
|
|
|---|
GenT
Warper operates in two main modes: aligning datasets and template matching. As the name suggests, the aligning datasets mode allows finding the best time alignment between two sets of gene expression time series and it can be useful for comparative studies of the temporal behaviour of a set of genes in different experimental conditions (e.g. cell cycle expression data generated with different synchronization techniques) or in different organisms (yeast, plants, human, etc.). The two profile sets are supplied separately in two different files and the aligned suite can be saved into a file, which may consequently be subjected to further studies with other microarray analysis tools. The template matching mode (Fig. 2) allows mining gene expression time series for patterns that fit best a template expression profile. Consequently, it facilitates the identification of a cluster of genes whose expression profiles are related, possibly with a non-linear time shift, to the profile of a gene supplied as a template. An additional feature enables also the computation of a gene pairwise DTW distance matrix for a complete microarray dataset. The template matching mode can be employed in studies requiring gene-centric approaches. For instance, Zhu et al., 2002 demonstrated that a transcription-factor-centric clustering can be successful, even when limited to linear time delay, in identifying transcription factor binding sites.
|
In both modes the performance of the core alignment algorithm is subject to modification via several parameters: data adjustment, metric, warping window, offset and anchor point. The data adjustment option enables z- and log2-transformations of the input expression profiles before alignment. Both are essential for enabling the comparison of gene expression time series between experiments and between species. The metric parameter allows a flexible choice between four different distance measures: Manhattan, Euclidean, Chebychev and Pearson correlation. The warping window constraint is meant to facilitate reduction of the search space and consequently leads to a faster processing of large datasets. The extreme usage of this feature, however, may have a negative effect on the accuracy of the final alignment.
The offset function enables sliding the time series against each other along the time axis. This may have multiple applications. (1) Many biological processes, as for instance cell cycle, are conserved between species, and for a given gene with a known function in one species, one may attempt to identify a set of genes in another species with a potentially similar function. However the duration of the different cell cycle phases may vary considerably between species and one way to correct for this is to apply an offset, eventually in a combination with an anchor point (see below), that positions the corresponding cell cycle phases of interest against each other. (2) The possibility for applying an offset is also essential in case the biological process under study displays a phase shift due to the design of the experiment. For instance, cell cycle progression is usually studied via genome-wide expression profiling of synchronized cell suspension cultures and usually different methods will generate a synchronized resumption of different phases of the cell cycle. (3) Additionally, the offset parameter can be useful for performing causality searches as they were named by Aach and Church (2001). By specifying a non-zero offset one may slide the sets of expression profiles against each other along the time axis, in this way discovering genes with similar trajectories but shifted in time. Thus putative targets of a known transcription factor can be identified using its profile as a template and evaluating the list of the best matching genes for different offset values (see Zhu et al., 2002).
The anchor point option provides the possibility to explicitly align a time point from one time series with a time point from another time series and can be used in similar cases as the ones listed above for the offset. For instance, setting an anchor point might be very useful in case there is detailed information about the exact times when the compared biological processes go through some fixed state or when one of the time series to be compared is sampled during a relatively shorter time interval than the other.
GenT
Warper is supplied with a powerful graphical interface. Visualization panels provide a comparative view of (1) the original expression profiles and (2) their aligned with the DTW algorithm counterparts. In addition, the alignment mode interface reports the DTW table, with the optimal warping path through it indicated in red, and the final fitscore. The interface of the template matching mode (Fig. 2) enables the user to select the expression profile of a gene of interest as a template by simply scrolling up and down a list of gene names.
Useful case studies with GenT
Warper can be found at http://www.psb.ugent.be/cbd/papers/gentxwarper/casestudy/
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Steen Knudsen
Received on July 27, 2005; revised on September 20, 2005; accepted on November 15, 2005
| REFERENCES |
|---|
|
|
|---|
Aach, J. and Church, G.M. (2001) Aligning gene expression time series with time warping algorithms. Bioinformatics, 17, 495508
Sakoe, H. and Chiba, S. (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process, . 26, 4349[CrossRef].
Zhu, Z., et al. (2002) Computational identification of transcription factor binding sites via a transcription-factor-centric clustering algorithm. J. Mol. Biol, . 318, 7181[CrossRef][ISI][Medline].
This article has been cited by other articles:
![]() |
F. Hermans and E. Tsiporkova Merging microarray cell synchronization experiments through curve alignment Bioinformatics, January 15, 2007; 23(2): e64 - e70. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Ferre and P. Clote BTW: a web server for Boltzmann time warping of gene expression time series. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W482 - W485. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



