Bioinformatics Vol. 19 Suppl. 2 2003
pages ii16-ii25
© 2003 Oxford University Press
Searching for statistically significant regulatory modules

1 Institute for Molecular Bioscience,
University of Queensland, Brisbane, Australia
2 Department of Genome Sciences, University
of Washington, 1705 NE Pacific Street, Seattle, WA, 98195, USA
Received on March 17, 2003
; accepted on June 9, 2003
Motivation: The regulatory machinery controlling gene expression is complex, frequently requiring multiple, simultaneous DNA-protein interactions. The rate at which a gene is transcribed may depend upon the presence or absence of a collection of transcription factors bound to the DNA near the gene. Locating transcription factor binding sites in genomic DNA is difficult because the individual sites are small and tend to occur frequently by chance. True binding sites may be identified by their tendency to occur in clusters, sometimes known as regulatory modules.
Results: We describe an algorithm for detecting occurrences of regulatory modules in genomic DNA. The algorithm, called mcast, takes as input a DNA database and a collection of binding site motifs that are known to operate in concert. mcast uses a motif-based hidden Markov model with several novel features. The model incorporates motif-specific p-values, thereby allowing scores from motifs of different widths and specificities to be compared directly. The p-value scoring also allows mcast to only accept motif occurrences with significance below a user-specified threshold, while still assigning better scores to motif occurrences with lower p-values. mcast can search long DNA sequences, modeling length distributions between motifs within a regulatory module, but ignoring length distributions between modules. The algorithm produces a list of predicted regulatory modules, ranked by E-value. We validate the algorithm using simulated data as well as real data sets from fruitfly and human.
Availability: http://meme.sdsc.edu/MCAST/paper
Contact: tlb{at}maths.uq.edu.au
* To whom correspondence should be addressed.
Formerly
William Noble Grundy, see www.gs.washington.edu/noble/name-change.html
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. Vandenbon, Y. Miyamoto, N. Takimoto, T. Kusakabe, and K. Nakai Markov Chain-based Promoter Structure Modeling for Tissue-specific Expression Pattern Prediction DNA Res, February 7, 2008; (2008) dsm034v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Li, A. S. L. Cheng, V. X. Jin, H. H. Paik, M. Fan, X. Li, W. Zhang, J. Robarge, C. Balch, R. V. Davuluri, et al. A mixture model-based discriminate analysis for identifying ordered transcription factor binding site pairs in gene promoters directly regulated by estrogen receptor-{alpha} Bioinformatics, September 15, 2006; 22(18): 2210 - 2216. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Kreiman Identification of sparsely distributed clusters of cis-regulatory elements in sets of co-expressed genes Nucleic Acids Res., May 20, 2004; 32(9): 2889 - 2900. [Abstract] [Full Text] [PDF] |
||||


