Bioinformatics Advance Access published online on June 26, 2009
Bioinformatics, doi:10.1093/bioinformatics/btp395
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Transcriptional landscape estimation from tiling array data using a model of signal shift and drift
1INRA, Mathématique Informatique et Génome UR1077, 78350 Jouy-en-Josas, France
2AgroParisTech/INRA, Mathématiques et Informatique Appliquées UMR518, 16 rue Claude Bernard, 75005 Paris, France
3Technical University of Denmark, Center for Biological Sequence analysis, Building 208, 2800Lyngby, Denmark
*To whom correspondence should be addressed. Dr. Pierre Nicolas, E-mail: pierre.nicolas{at}jouy.inra.fr
| Abstract |
|---|
Motivation: High-density oligonucleotide tiling array technology holds the promise of a better description of the complexity and the dynamics of transcriptional landscapes. In organisms such as bacteria and yeasts, transcription can be measured on a genome-wide scale with a resolution higher than 25 bp. The statistical models currently used to handle these data remain however very simple, the most popular being the piecewise constant Gaussian model with a fixed number of breakpoints.
Results: This paper describes a new methodology based on a hidden Markov model that embeds the segmentation of a continuous-valued signal in a probabilistic setting. For a computationally affordable cost, this framework (i) alleviates the difficulty of choosing a fixed number of breakpoints, and (ii) permits retrieving more information than a unique segmentation by giving access to the whole probability distribution of the transcription profile. Importantly, the model is also enriched and accounts for subtle effects such as signal "drift" and covariates. Relevance of this framework is demonstrated on a Bacillus subtilis data-set.
Availability: A software is distributed under the GPL.
Contact: pierre.nicolas{at}jouy.inra.fr
Associate Editor: Dr. Trey Ideker
Received on January 27, 2009; revised on May 11, 2009; accepted on June 19, 2009