Bioinformatics Vol. 17 no. 10 2001
Pages 871-877
© 2001 Oxford University Press
Enrichment of regulatory signals in conserved non-coding genomic sequence
1 Informatics Research, Celera Genomics
Corporation, 45 West Gude Drive, Rockville, MD 20850, USA
2 Center for Biological Sequence Analysis,
The Technical University of Denmark, Building 208, DK-2800 Lyngby,
Denmark
Received on April 20, 2001
; revised on July 6, 2001
; accepted on July 6, 2001
Motivation: Whole genome shotgun sequencing strategies generate sequence data prior to the application of assembly methodologies that result in contiguous sequence. Sequence reads can be employed to indicate regions of conservation between closely related species for which only one genome has been assembled. Consequently, by using pairwise sequence alignments methods it is possible to identify novel, non-repetitive, conserved segments in non-coding sequence that exist between the assembled human genome and mouse whole genome shotgun sequencing fragments. Conserved non-coding regions identify potentially functional DNA that could be involved in transcriptional regulation.
Results: Local sequence alignment methods were applied employing mouse fragments and the assembled human genome. In addition, transcription factor binding sites were detected by aligning their corresponding positional weight matrices to the sequence regions. These methods were applied to a set of transcripts corresponding to 502 genes associated with a variety of different human diseases taken from the Online Mendelian Inheritance in Man database. Using statistical arguments we have shown that conserved non-coding segments contain an enrichment of transcription factor binding sites when compared to the sequence background in which the conserved segments are located. This enrichment of binding sites was not observed in coding sequence. Conserved non-coding segments are not extensively repeated in the genome and therefore their identification provides a rapid means of finding genes with related conserved regions, and consequently potentially related regulatory mechanism. Conserved segments in upstream regions are found to contain binding sites that are co-localized in a manner consistent with experimentally known transcription factor pairwise co-occurrences and afford the identification of novel co-occurring Transcription Factor (TF) pairs. This study provides a methodology and more evidence to suggest that conserved non-coding regions are biologically significant since they contain a statistical enrichment of regulatory signals and pairs of signals that enable the construction of regulatory models for human genes.
Contact: samuel.levy{at}celera.com
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. Freeling, L. Rapaka, E. Lyons, B. Pedersen, and B. C. Thomas G-Boxes, Bigfoot Genes, and Environmental Response: Characterization of Intragenomic Conserved Noncoding Sequences in Arabidopsis PLANT CELL, May 1, 2007; 19(5): 1441 - 1457. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. C. Thomas, L. Rapaka, E. Lyons, B. Pedersen, and M. Freeling Arabidopsis intragenomic conserved noncoding sequence PNAS, February 27, 2007; 104(9): 3348 - 3353. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. GuhaThakurta Computational identification of transcriptional regulatory elements in DNA sequence Nucleic Acids Res., July 19, 2006; 34(12): 3585 - 3598. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Zhu, J. Shendure, and G. M. Church Discovering functional transcription-factor combinations in the human cell cycle Genome Res., June 1, 2005; 15(6): 848 - 855. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Bomblies and J. F. Doebley Molecular Evolution of FLORICAULA/LEAFY Orthologs in the Andropogoneae (Poaceae) Mol. Biol. Evol., April 1, 2005; 22(4): 1082 - 1094. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. H. Zheng, F. Lu, Z.-Y. Wang, F. Zhong, J. Hoover, and R. Mural Using shared genomic synteny and shared protein functions to enhance the identification of orthologous gene pairs Bioinformatics, March 15, 2005; 21(6): 703 - 710. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Döhr, A. Klingenhoff, H. Maier, M. H. de Angelis, T. Werner, and R. Schneider Linking disease-associated genes to regulatory networks via promoter organization Nucleic Acids Res., February 8, 2005; 33(3): 864 - 872. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-V. Chamary and L. D. Hurst Similar Rates but Different Modes of Sequence Evolution in Introns and at Exonic Silent Sites in Rodents: Evidence for Selectively Driven Codon Usage Mol. Biol. Evol., June 1, 2004; 21(6): 1014 - 1023. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-D. Huang, J.-T. Horng, Y.-M. Sun, A.-P. Tsou, and S.-L. Huang Identifying transcriptional regulatory sites in the human genome using an integrated system Nucleic Acids Res., March 29, 2004; 32(6): 1948 - 1956. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Barthel and A. E. Goldfeld T Cell-Specific Expression of the Human TNF-{alpha} Gene Involves a Functional and Highly Conserved Chromatin Signature in Intron 3 J. Immunol., October 1, 2003; 171(7): 3612 - 3619. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. C. Inada, A. Bashir, C. Lee, B. C. Thomas, C. Ko, S. A. Goff, and M. Freeling Conserved Noncoding Sequences in the Grasses Genome Res., September 1, 2003; 13(9): 2030 - 2041. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Zhang, V. Pavlovic, C. R Cantor, and S. Kasif Human-Mouse Gene Identification by Comparative Evidence Integration and Evolutionary Analysis Genome Res., June 1, 2003; 13(6): 1190 - 1202. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. L. Hong, L. Hamaguchi, M. A. Busch, and D. Weigel Regulatory Elements of the Floral Homeotic Gene AGAMOUS Identified by Phylogenetic Footprinting and Shadowing PLANT CELL, June 1, 2003; 15(6): 1296 - 1309. [Abstract] [Full Text] |
||||
![]() |
M. P. Hare and S. R. Palumbi High Intron Sequence Conservation Across Three Mammalian Orders Suggests Functional Constraints Mol. Biol. Evol., June 1, 2003; 20(6): 969 - 978. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Guo and S. P. Moose Conserved Noncoding Sequences among Cultivated Cereal Genomes Identify Candidate Regulatory Sequence Elements and Patterns of Promoter Evolution PLANT CELL, May 1, 2003; 15(5): 1143 - 1158. [Abstract] [Full Text] |
||||
![]() |
B. Giardine, L. Elnitski, C. Riemer, I. Makalowska, S. Schwartz, W. Miller, and R. C. Hardison GALA, a Database for Genomic Sequence Alignments and Annotations Genome Res., April 1, 2003; 13(4): 732 - 741. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Mrowka, K. Steinhage, A. Patzak, and P. B. Persson An evolutionary approach for identifying potential transcription factor binding sites: the renin gene as an example Am J Physiol Regulatory Integrative Comp Physiol, April 1, 2003; 284(4): R1147 - R1150. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. D. URNOV Chromatin as a Tool for the Study of Genome Function in Cancer Ann. N.Y. Acad. Sci., March 1, 2003; 983(1): 5 - 21. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. T. Morishige, K. L. Childs, L. D. Moore, and J. E. Mullet Targeted Analysis of Orthologous Phytochrome A Regions of the Sorghum, Maize, and Rice Genomes using Comparative Gene-Island Sequencing Plant Physiology, December 1, 2002; 130(4): 1614 - 1625. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Hannenhalli and S. Levy Predicting transcription factor synergism Nucleic Acids Res., October 1, 2002; 30(19): 4278 - 4284. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Colinas, K. Birnbaum, and P. N. Benfey Using Cauliflower to Find Conserved Non-Coding Regions in Arabidopsis Plant Physiology, June 1, 2002; 129(2): 451 - 454. [Full Text] [PDF] |
||||
![]() |
N. J. Kaplinsky, D. M. Braun, J. Penterman, S. A. Goff, and M. Freeling Utility and distribution of conserved noncoding sequences in the grasses PNAS, April 18, 2002; (2002) 52139599. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. S. Kondrashov and S. A. Shabalina Classification of common conserved sequences in mammalian intergenic regions Hum. Mol. Genet., March 1, 2002; 11(6): 669 - 674. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. T. Webb, S. A. Shabalina, A. Yu. Ogurtsov, and A. S. Kondrashov Analysis of similarity within 142 pairs of orthologous intergenic regions of Caenorhabditis elegans and Caenorhabditis briggsae Nucleic Acids Res., March 1, 2002; 30(5): 1233 - 1239. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. J. Kaplinsky, D. M. Braun, J. Penterman, S. A. Goff, and M. Freeling Utility and distribution of conserved noncoding sequences in the grasses PNAS, April 30, 2002; 99(9): 6147 - 6151. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. G. Loots, I. Ovcharenko, L. Pachter, I. Dubchak, and E. M. Rubin rVista for Comparative Sequence-Based Discovery of Functional Transcription Factor Binding Sites Genome Res., May 1, 2002; 12(5): 832 - 839. [Abstract] [Full Text] [PDF] |
||||










