Alignments anchored on genomic landmarks can aid in the identification of regulatory elements


Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health Building 38A, 8600 Rockville Pike, Bethesda, MD 20894-6075, USA
*To whom correspondence should be addressed.
Motivation: The transcription start site (TSS) has been located for an increasing number of genes across several organisms. Statistical tests have shown that some cis-acting regulatory elements have positional preferences with respect to the TSS, but few strategies have emerged for locating elements by their positional preferences. This paper elaborates such a strategy. First, we align promoter regions without gaps, anchoring the alignment on each promoter's TSS. Second, we apply a novel word-specific mask. Third, we apply a clustering test related to gapless BLAST statistics. The test examines whether any specific word is placed unusually consistently with respect to the TSS. Finally, our program A-GLAM, an extension of the GLAM program, uses significant word positions as new anchors to realign the sequences. A Gibbs sampling algorithm then locates putative cis-acting regulatory elements. Usually, Gibbs sampling requires a preliminary masking step, to avoid convergence onto a dominant but uninteresting signal from a DNA repeat. However, since the positional anchors focus A-GLAM on the motif of interest, masking DNA repeats during Gibbs sampling becomes unnecessary.
Results: In a set of human DNA sequences with experimentally characterized TSSs, the placement of 791 octonucleotide words was unusually consistent (multiple test corrected P < 0.05). Alignments anchored on these words sometimes located statistically significant motifs inaccessible to GLAM or AlignACE.
Availability: The A-GLAM program and a list of statistically significant words are available at ftp://ftp.ncbi.nih.gov/pub/spouge/papers/archive/AGLAM/.
Contact: spouge{at}ncbi.nlm.nih.gov
Received on January 15, 2005; accepted on March 27, 2005
This article has been cited by other articles:
![]() |
Y. Y. Yamamoto, H. Ichida, T. Abe, Y. Suzuki, S. Sugano, and J. Obokata Differentiation of core promoter architecture between plants and mammals revealed by LDSS analysis Nucleic Acids Res., September 25, 2007; 35(18): 6219 - 6226. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S. Papadopoulos and R. Agarwala COBALT: constraint-based alignment tool for multiple protein sequences Bioinformatics, May 1, 2007; 23(9): 1073 - 1079. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Frickey and G. Weiller Mclip: motif detection based on cliques of gapped local profile-to-profile alignments Bioinformatics, February 15, 2007; 23(4): 502 - 503. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Zhang, Z. Xuan, S. Otto, J. R. Hover, S. R. McCorkle, G. Mandel, and M. Q. Zhang A clustering property of highly-degenerate transcription factor binding sites in the mammalian genome. Nucleic Acids Res., January 1, 2006; 34(8): 2238 - 2246. [Abstract] [Full Text] [PDF] |
||||

