Bioinformatics Vol. 19 no. 18 2003
pages 2369-2380
© 2003 Oxford University Press
Combining phylogenetic data with co-regulated genes to identify regulatory motifs
Department of Genetics, Washington University Medical School, St. Louis, MO 63110, USA
Received on April 3, 2003
; revised on June 2, 2003
; accepted on June 12, 2003
Motivation: Discovery of regulatory motifs in unaligned DNA sequences remains a fundamental problem in computational biology. Two categories of algorithms have been developed to identify common motifs from a set of DNA sequences. The first can be called a multiple genes, single species approach. It proposes that a degenerate motif is embedded in some or all of the otherwise unrelated input sequences and tries to describe a consensus motif and identify its occurrences. It is often used for co-regulated genes identified through experimental approaches. The second approach can be called single gene, multiple species. It requires orthologous input sequences and tries to identify unusually well conserved regions by phylogenetic footprinting. Both approaches perform well, but each has some limitations. It is tempting to combine the knowledge of co-regulation among different genes and conservation among orthologous genes to improve our ability to identify motifs.
Results: Based on the Consensus algorithm previously established by our group, we introduce a new algorithm called PhyloCon (Phylogenetic Consensus) that takes into account both conservation among orthologous genes and co-regulation of genes within a species. This algorithm first aligns conserved regions of orthologous sequences into multiple sequence alignments, or profiles, then compares profiles representing non-orthologous sequences. Motifs emerge as common regions in these profiles. Here we present a novel statistic to compare profiles of DNA sequences and a greedy approach to search for common subprofiles. We demonstrate that PhyloCon performs well on both synthetic and biological data.
Availability: Software available upon request from the authors. http://ural.wustl.edu/softwares.html
Contact: stormo{at}ural.wustl.edu
* To whom correspondence should be addressed at 4566 Scott Avenue, Campus Box 8232, St. Louis, MO 63110, USA.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
J. Liu, X. Xu, and G. D. Stormo The cis-regulatory map of Shewanella genomes Nucleic Acids Res., September 1, 2008; 36(16): 5376 - 5390. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Xie, J. Cai, N.-Y. Chia, H. H. Ng, and S. Zhong Cross-species de novo identification of cis-regulatory modules with GibbsModule: Application to gene regulation in embryonic stem cells Genome Res., August 1, 2008; 18(8): 1325 - 1335. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Keles, C. L. Warren, C. D. Carlson, and A. Z. Ansari CSI-Tree: a regression tree approach for modeling binding properties of DNA-binding molecules based on cognate site identification (CSI) data Nucleic Acids Res., June 1, 2008; 36(10): 3171 - 3184. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. J. Pape, S. Rahmann, and M. Vingron Natural similarity measures between position frequency matrices with an application to clustering Bioinformatics, February 1, 2008; 24(3): 350 - 357. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Brilli, R. Fani, and P. Lio Current trends in the bioinformatic sequence analysis of metabolic pathways in prokaryotes Brief Bioinform, January 1, 2008; 9(1): 34 - 45. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. R. Davies, L.-W. Chang, D. Patra, X. Xing, K. Posey, J. Hecht, G. D. Stormo, and L. J. Sandell Computational identification and functional validation of regulatory motifs in cartilage-expressed genes Genome Res., October 1, 2007; 17(10): 1438 - 1447. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Cai, H. Hu, and X. S. Li Tree Gibbs Sampler: identifying conserved motifs without aligning orthologous sequences Bioinformatics, August 1, 2007; 23(15): 2013 - 2014. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Dai, J. He, and X. Zhao A new systematic computational approach to predicting target genes of transcription factors Nucleic Acids Res., July 26, 2007; 35(13): 4433 - 4440. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. A. Newberg, W. A. Thompson, S. Conlan, T. M. Smith, L. A. McCue, and C. E. Lawrence A phylogenetic Gibbs sampler that yields centroid solutions for cis-regulatory site prediction Bioinformatics, July 15, 2007; 23(14): 1718 - 1727. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Mahony and P. V. Benos STAMP: a web tool for exploring DNA-binding motif similarities Nucleic Acids Res., July 13, 2007; 35(suppl_2): W253 - W258. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Zhao, L. A. Schriefer, and G. D. Stormo Identification of muscle-specific regulatory modules in Caenorhabditis elegans Genome Res., March 1, 2007; 17(3): 348 - 357. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Tan, T. Shlomi, H. Feizi, T. Ideker, and R. Sharan Transcriptional regulation of protein complexes within and across species PNAS, January 23, 2007; 104(4): 1283 - 1288. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. J. Donaldson and B. Gottgens CoMoDis: composite motif discovery in mammalian genomes Nucleic Acids Res., January 12, 2007; 35(1): e1 - e1. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Pachkov, I. Erb, N. Molina, and E. van Nimwegen SwissRegulon: a database of genome-wide annotations of regulatory sites Nucleic Acids Res., January 12, 2007; 35(suppl_1): D127 - D131. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Elnitski, V. X. Jin, P. J. Farnham, and S. J.M. Jones Locating mammalian transcription factor binding sites: A survey of computational and experimental techniques Genome Res., December 1, 2006; 16(12): 1455 - 1464. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Haberer, M. T. Mader, P. Kosarev, M. Spannagl, L. Yang, and K. F.X. Mayer Large-Scale cis-Element Detection by Analysis of Correlated Expression and Sequence Conservation between Arabidopsis and Brassica oleracea Plant Physiology, December 1, 2006; 142(4): 1589 - 1602. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. GuhaThakurta Computational identification of transcriptional regulatory elements in DNA sequence Nucleic Acids Res., July 19, 2006; 34(12): 3585 - 3598. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Neph and M. Tompa MicroFootPrinter: a tool for phylogenetic footprinting in prokaryotic genomes. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W366 - W368. [Abstract] [Full Text] [PDF] |
||||
![]() |
S.-W. Ho, G. Jona, C. T. L. Chen, M. Johnston, and M. Snyder Linking DNA-binding proteins to their recognition sequences by using protein microarrays PNAS, June 27, 2006; 103(26): 9940 - 9945. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. S Hon and A. N Jain A deterministic motif finding algorithm with application to the human genome Bioinformatics, May 1, 2006; 22(9): 1047 - 1054. [Abstract] [Full Text] [PDF] |
||||
![]() |
L.-W. Chang, R. Nagarajan, J. A. Magee, J. Milbrandt, and G. D. Stormo A systematic model to predict transcriptional regulatory mechanisms based on overrepresentation of transcription factor binding profiles Genome Res., March 1, 2006; 16(3): 405 - 413. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. D. MacIsaac, D. B. Gordon, L. Nekludova, D. T. Odom, J. Schreiber, D. K. Gifford, R. A. Young, and E. Fraenkel A hypothesis-based approach for identifying the binding specificity of regulatory proteins from chromatin immunoprecipitation data Bioinformatics, February 15, 2006; 22(4): 423 - 429. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Wang and G. D. Stormo Identifying the conserved network of cis-regulatory sites of a eukaryotic genome PNAS, November 29, 2005; 102(48): 17400 - 17405. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Li, S. Zhong, and W. H. Wong Reliable prediction of transcription factor binding sites by phylogenetic verification PNAS, November 22, 2005; 102(47): 16945 - 16950. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. T. Jensen, L. Shen, and J. S. Liu Combining phylogenetic motif discovery and motif clustering to predict co-regulated genes Bioinformatics, October 15, 2005; 21(20): 3832 - 3839. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Hu, B. Li, and D. Kihara Limitations and potentials of current motif discovery algorithms Nucleic Acids Res., September 2, 2005; 33(15): 4899 - 4913. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Gertz, L. Riles, P. Turnbaugh, S.-W. Ho, and B. A. Cohen Discovery, validation, and genetic dissection of transcription factor binding sites by comparative and functional genomics Genome Res., August 1, 2005; 15(8): 1145 - 1152. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Li and W. H. Wong Sampling motifs on phylogenetic trees PNAS, July 5, 2005; 102(27): 9481 - 9486. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Tan, L. A. McCue, and G. D. Stormo Making connections between novel transcription factors and their DNA motifs Genome Res., February 1, 2005; 15(2): 312 - 320. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. E. Schones, P. Sumazin, and M. Q. Zhang Similarity of position frequency matrices for transcription factor binding sites Bioinformatics, February 1, 2005; 21(3): 307 - 313. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. GuhaThakurta, L. A. Schriefer, R. H. Waterston, and G. D. Stormo Novel transcription regulatory elements in Caenorhabditis elegans muscle genes Genome Res., December 1, 2004; 14(12): 2457 - 2468. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Backhed, H. Ding, T. Wang, L. V. Hooper, G. Y. Koh, A. Nagy, C. F. Semenkovich, and J. I. Gordon The gut microbiota as an environmental factor that regulates fat storage PNAS, November 2, 2004; 101(44): 15718 - 15723. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. Zhou and W. H. Wong CisModule: De novo discovery of cis-regulatory modules by hierarchical mixture modeling PNAS, August 17, 2004; 101(33): 12114 - 12119. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Hu, T. Wang, G. D. Stormo, and J. I. Gordon RNA interference of achaete-scute homolog 1 in mouse prostate neuroendocrine cells reveals its gene targets and DNA binding sites PNAS, April 13, 2004; 101(15): 5559 - 5564. [Abstract] [Full Text] [PDF] |
||||





