Skip Navigation


Bioinformatics Advance Access originally published online on November 10, 2006
Bioinformatics 2007 23(2):243-244; doi:10.1093/bioinformatics/btl568
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
23/2/243    most recent
btl568v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Bellora, N.
Right arrow Articles by Mar Albà, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bellora, N.
Right arrow Articles by Mar Albà, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

PEAKS: identification of regulatory motifs by their position in DNA sequences

Nicolás Bellora 1, Domènec Farré 2 and M. Mar Albà 1,3,*

1 Research Unit on Biomedical Informatics, Universitat Pompeu Fabra Barcelona 08003, Spain
2 Centre for Genomic Regulation Barcelona 08003, Spain
3 Catalan Institution for Research and Advanced Studies—Municipal Institute of Medical Research Barcelona 08003, Spain

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 REFERENCES
 

Summary: Many DNA functional motifs tend to accumulate or cluster at specific gene locations. These locations can be detected, in a group of gene sequences, as high frequency ‘peaks’ with respect to a reference position, such as the transcription start site (TSS). We have developed a web tool for the identification of regions containing significant motif peaks. We show, by using different yeast gene datasets, that peak regions are strongly enriched in experimentally-validated motifs and contain potentially important novel motifs.

Availability: http://genomics.imim.es/peaks

Contact: malba{at}imim.es

Supplementary information: Supplementary Data are available at Bioinformatics online.

The identification of regulatory motifs in DNA sequences is a challenging problem in bioinformatics. Computational predictions of known motifs, such as transcription factor binding sites (TFBS) often contain an unacceptable number of false positives, due to the short size and variability of the motifs. Focusing on motifs that are shared by several sequences can increase the specificity of motif predictions. For example, one can select sequences that have been conserved during evolution, a strategy known as phylogenetic footprinting (Lenhard et al., 2003). A different type of evolutionary constraint is related to the position of motifs along the gene sequence. There is ample evidence that many gene expression regulatory motifs show a biased location within promoter sequences (FitzGerald et al., 2004; Xie et al., 2005). That is, they are not randomly distributed but tend to accumulate or cluster in particular regions, forming high abundance ‘peaks’. This presumably reflects specific requirements of motif-binding proteins that need to interact with each other to regulate transcription. The identification of significant motif peaks can be used to increase the specificity of motif predictions, provide information on the promoter structure, and help discover regulatory motifs that are specifically involved in the regulation of genes with similar expression or function. Motivated by the lack of available computational methods to detect motif clustering we have developed a novel algorithm for this purpose, which we have termed ‘positional footprinting’ and which is implemented in the web server PEAKS.

PEAKS can be used to analyze any group of sequences that share a known reference element, such as the transcription start site (TSS), the initiation codon, a known TFBS or any other predefined site. The scope is to detect any other motifs that show a significant clustering at a particular distance from the reference element. In the first step of the procedure the sequence positions that show matches to motifs from a user-selected library are recorded. Available motif libraries are: (1) compilations of TFBS position-specific weight matrices (PSWMs), (2) all possible DNA words of a given length or (3) pre-built consensus motif collections (Zhu and Zhang, 1999; Harbison et al., 2004). Several PSWM libraries can be used: TRANSFAC (Matys et al., 2003), Jaspar (Sandelin et al., 2004) and PROMO (Messeguer et al., 2002). Using DNA words can aid in the discovery of putative new motifs in different types of DNA sequences. In the second step, the positions of predicted motifs are used to build motif frequency profiles along the sequences. A position is considered positive for a motif is the motif occurs within a sequence window surrounding that position. Increasing the window size above the default value (31) allows the detection of motifs that do no have a very precise location at the cost of decreasing the significance of motifs located at very well defined positions (see Supplementary Table S1 for a full list of program parameters). The third step is the calculation of the positional footprinting score, Spf, which measures the relative over-representation of a motif at a particular position (see PEAKS web server for a full mathematical description). The fourth step is the statistical evaluation of the maximum Spf score obtained for each motif. To this end, we apply the same procedure described above to simulated random sequence datasets, which can be generated using an order 1 Markov model, to obtain an empirical p-value associated with the maximum Spf score. If significant, we extract any other positions with a Spf score above the p-value cut-off, which define the motif significant regions. The output includes a graphical representation of all the significant motifs and regions, a list of sequences containing significant motifs, motif profile pictures and a summary table.

Figure 1 shows the output produced by PEAKS in a dataset of 180 yeast genes involved in ribosome biogenesis (Mewes et al., 2002). Sequences spanned from –500 to +100 with respect to the most used TSS (Zhang and Dietrich, 2005). Motifs were detected using exact matches to a consensus motif collection containing 102 different TFBS (Harbison et al., 2004), and a sliding window of 31 nucleotides. An integrated picture (Fig. 1A) was derived from the significant regions in the profiles at p-value < 1e–3 (Fig. 1B). Five of the seven significant motifs, Fhl1, Rap1, Sfp1, Abf1 and Reb1, are known to be involved in the regulation of ribosomal-related genes (Fig. 1C). Yox1 and Skn7, have, so far, not been associated with this function, but their distribution indicates that they are strong candidates. We calculated the ratio between the observed fraction of experimentally-validated motifs falling into a significant region and the fraction of motifs expected in this region under a random motif distribution (size of the significant region divided by the total length of the sequence). The enrichment in real motifs ranged from 2.06 for Fhl1 to 10.67 for Skn7 (Fig. 1C). New putative binding sites for these transcription factors were discovered. For example, among the 24 different Skn7 motifs in the significant region (–239 to –215) only four were previously known. A second example, using a dataset of 86 yeast genes involved in amino acid metabolism, is provided in Supplementary Figure 1S.


Figure 1
View larger version (54K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 Results from PEAKS. Dataset comprising 180 Saccharomyces cerevisiae promoter sequences (–500 to +100 with respect to the most used TSS) from genes involved in ribosome biogenesis. Window size 31 nt. (A) Integrated representation. Significant regions were detected for Fhl1, Rap1, Sfp1, Abf1, Reb1, Yox1 and Skn7 motifs (P < 1e–3). Oval width indicates significant region boundaries. Oval height is the relative motif signal (RMS), the ratio between the number of sequences that correspond to the maximum peak and the number of sequences that contain the motif at the P-value cut-off. The table shows the score, P-value (PVAL), number of sequences and position with the maximum score (maximum peak), and the significant regions (ranges). (B) Significant motif profiles. The x-axis represents the sequence positions and the y-axis the number of sequences with a match to the motif. The green line represents the P-value cut-off, regions above the line are significant. Left of the profile picture is the motif name (e.g. Yox1), the position of maximum peak (–24 for Yox1) and associated Spf score (48.4 for Yox1), and below a link to a list of genes containing significant motifs (motif matches). (C) Description and enrichment in experimentally-validated sites for significant motifs (see main text).

 

    Acknowledgments
 
The authors thank Loris Mularoni, Eduardo Eyras, Robert Castelo and Oscar González for useful discussions during this work. The authors acknowledge support from Fundación Banco Bilbao Vizcaya Argentaria (FBBVA), Plan Nacional de I + D MCyT (BIO2002-04426-C02-01), EC Infobiomed NoE and Fundació ICREA.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: David Rocke

Received on July 4, 2006; revised on October 18, 2006; accepted on November 5, 2006

    REFERENCES
 TOP
 ABSTRACT
 REFERENCES
 

    FitzGerald, P.C., et al. (2004) Clustering of DNA sequences in human promoters. Genome Res, . 14, 1562–1574[Abstract/Free Full Text].

    Harbison, C., et al. (2004) Transcriptional regulatory code of a eukaryotic genome. Nature, 431, 99–104[CrossRef][Medline].

    Lenhard, B., et al. (2003) Identification of conserved regulatory elements by comparative genome analysis. J. Biol, . 2, 13[CrossRef][Medline].

    Matys, V., et al. (2003) TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res, . 31, 374–378[Abstract/Free Full Text].

    Messeguer, X., et al. (2002) PROMO: detection of known transcription regulatory elements using species-tailored searches. Bioinformatics, 18, 333–334[Abstract/Free Full Text].

    Mewes, H.W., et al. (2002) MIPS: a database for genomes and protein sequences. Nucleic Acids Res, . 30, 31–34[Abstract/Free Full Text].

    Sandelin, A., et al. (2004) JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res, . 32, D91–D94[Abstract/Free Full Text].

    Xie, X., et al. (2005) Systematic discovery of regulatory motifs in human promoters and 3'-UTRs by comparison of several mammals. Nature, 434, 338–345[CrossRef][Medline].

    Zhang, Z. and Dietrich, F. (2005) Mapping of transcription start sites in Saccharomyces cerevisiae using 5' SAGE. Nucleic Acids Res, . 33, 2838–2851[Abstract/Free Full Text].

    Zhu, J. and Zhang, M.Q. (1999) SCPD: a promoter database of the yeast Saccharomyces cerevisiae. Bioinformatics, 15, 607–611[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Mol Biol EvolHome page
M. Toll-Riera, N. Bosch, N. Bellora, R. Castelo, L. Armengol, X. Estivill, and M. Mar Alba
Origin of Primate Orphan Genes: A Comparative Genomics Approach
Mol. Biol. Evol., March 1, 2009; 26(3): 603 - 612.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
23/2/243    most recent
btl568v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Bellora, N.
Right arrow Articles by Mar Albà, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bellora, N.
Right arrow Articles by Mar Albà, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?