Bioinformatics 20(Suppl. 1) © Oxford University Press 2004; all rights reserved.
Robust inference of groups in gene expression time-courses using mixtures of HMMs
1 Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestrasse 73, 14195 Berlin, Germany and 2 Center for Applied Computer Science, University of Cologne, Weyertal 80, 50937 Cologne, Germany
Received on January 15, 2004; accepted on March 1, 2004
Motivation: Genetic regulation of cellular processes is frequently investigated using large-scale gene expression experiments to observe changes in expression over time. This temporal data poses a challenge to classical distance-based clustering methods due to its horizontal dependencies along the time-axis. We propose to use hidden Markov models (HMMs) to explicitly model these time-dependencies. The HMMs are used in a mixture approach that we show to be superior over clustering. Furthermore, mixtures are a more realistic model of the biological reality, as an unambiguous partitioning of genes into clusters of unique functional assignment is impossible. Use of the mixture increases robustness with respect to noise and allows an inference of groups at varying level of assignment ambiguity. A simple approach, partially supervised learning, allows to benefit from prior biological knowledge during the training. Our method allows simultaneous analysis of cyclic and non-cyclic genes and copes well with noise and missing values.
Results: We demonstrate biological relevance by detection of phase-specific groupings in HeLa time-course data. A benchmark using simulated data, derived using assumptions independent of those in our method, shows very favorable results compared to the baseline supplied by k-means and two prior approaches implementing model-based clustering. The results stress the benefits of incorporating prior knowledge, whenever available.
Availability: A software package implementing our method is freely available under the GNU general public license (GPL) at http://ghmm.org/gql
Supplementary information: Supplemental material can be found at http://algorithmics.molgen.mpg.de/ExpMix
Contact: schliep{at}molgen.mpg.de
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
T. Yoneya and H. Mamitsuka A hidden Markov model-based approach for identifying timing differences in gene expression under different experimental factors Bioinformatics, April 1, 2007; 23(7): 842 - 849. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Rattray, X. Liu, G. Sanguinetti, M. Milo, and N. D. Lawrence Propagating uncertainty in microarray data analysis Brief Bioinform, March 1, 2006; 7(1): 37 - 47. |
||||
![]() |
I. G. Costa, A. Schonhuth, and A. Schliep The Graphical Query Language: a tool for analysis of gene expression time-courses Bioinformatics, May 15, 2005; 21(10): 2544 - 2545. [Abstract] [Full Text] [PDF] |
||||

