Bioinformatics Advance Access originally published online on November 30, 2004
Bioinformatics 2005 21(8):1747-1749; doi:10.1093/bioinformatics/bti173
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Drosophila DNase I footprint database: a systematic genome annotation of transcription factor binding sites in the fruitfly, Drosophila melanogaster
1Department of Genetics, University of Cambridge Cambridge CB2 3EH, UK
2Department of Genome Sciences, Lawrence Berkeley National Laboratory Berkeley, CA 94720, USA
3Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory Berkeley, CA 94720, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Summary: Despite increasing numbers of computational tools developed to predict cis-regulatory sequences, the availability of high-quality datasets of transcription factor binding sites limits advances in the bioinformatics of gene regulation. Here we present such a dataset based on a systematic literature curation and genome annotation of DNase I footprints for the fruitfly, Drosophila melanogaster. Using the experimental results of 201 primary references, we annotated 1367 binding sites from 87 transcription factors and 101 target genes in the D.melanogaster genome sequence. These data will provide a rich resource for future bioinformatics analyses of transcriptional regulation in Drosophila such as constructing motif models, training cis-regulatory module detectors, benchmarking alignment tools and continued text mining of the extensive literature on transcriptional regulation in this important model organism.
Availability: http://www.flyreg.org/
Contact: cbergman{at}gen.cam.ac.uk
The fruitfly Drosophila melanogaster has one of the most highly annotated metazoan genome sequences with respect to gene and transposable element content (Misra et al. 2002; Kaminker et al. 2002). In contrast, the cis-regulatory sequences that control transcription are only just beginning to be incorporated explicitly into the genome annotation, despite the vast literature of functionally characterized cis-regulatory elements that exists for this species (http://www.flybase.org/). This lack of a systematic, publicly available compilation of cis-regulatory sequences for D.melanogaster, such as the SCPD in yeast (Zhu and Zhang, 1999), limits progress in the computational analysis of gene regulation for this important model species. The need for such a resource is clear from the fact that cis-regulatory curation efforts of limited scope for genes involved in early development (Ludwig et al., 2000; Spirov et al., 2000; Berman et al., 2002; Papatsenko et al., 2002; Rajewsky et al., 2002; Emberly et al., 2003; Lifanov et al., 2003) have rapidly proven useful for subsequent bioinformatic and comparative studies of gene regulation (Costas et al., 2003; Grad et al., 2004).
To contribute to a comprehensive annotation of cis-regulatory sequences in D.melanogaster, we report here a database of DNase I footprint sequences derived from a systematic literature curation and annotation effort. We have chosen to focus on DNase I footprints data since they are an abundant and high-quality source of data on transcription factor specificity (Galas and Schmitz, 1978). In contrast to previous binding site compilations in Drosophila, these data are derived from the same experimental data type, cover all available aspects of development and are explicitly linked to the finished Release 3 genome sequence coordinates (Celniker et al., 2002). The purpose of this note is to present a basic characterization of these data and to make them available in a single database as a resource for computational analyses of transcriptional regulation in one of the most important model organisms.
Our literature curation yielded a total of 201 references with non-redundant experimental data from DNase I footprinting experiments (see Supplemental Files 1 and 2). Our set of references is a superset of all those meeting the same criteria in previous compilations of binding site data for the Drosophila early embryo (Ludwig et al., 2000; Spirov et al., 2000; Berman et al., 2002; Papatsenko et al., 2002; Rajewsky et al., 2002; Emberly et al., 2003; Lifanov et al., 2003) and Transfac 5.0 (Wingender et al., 2001). The overlap between the present and previous compilations is detailed in Supplemental File 1. Our current work includes information from 113 primary references not present in any previous compilation, doubling the number of references with curated Drosophila DNase I footprint data consolidated in a single public database.
Of the 1367 footprints annotated, 1341 footprints (98%) can be attributed to 101 target genes, with 26 footprints (2%) obtained from chromatin immunoprecipitation experiments having unknown targets (Supplemental File 3). The mean (median) number of footprints annotated per target gene is 13.3 (5), with a skewed distribution (Fig. 1a): the top ten genes (Ubx, Antp, h, ftz, eve, dpp, kni, en, Ddc, Sgs4) contribute nearly half (49%) of the footprints mapping to known targets. Likewise, 1164 (85%) of the 1367 footprints annotated can be attributed to 87 purified or recombinant transcription factors, plus an additional 203 footprints (15%) from unspecified factors with unknown identity derived from crude or purified nuclear extract (Supplemental File 3). The distribution of number of footprints per factor is also skewed with a mean (median) number of footprints annotated per factor of 13.4 (6) (Fig. 1). As with the distribution by target, the top ten genes (hb, Trl, ftz, Ubx, en, bcd, Kr, abd-A, z, dl) also contribute nearly half (49%) of the footprints derived from known factors. Although these data represent the most comprehensive collection of binding site data in Drosophila to date, it is clear that binding site information is lacking for the majority of factors and genes, a limitation that can hopefully be overcome in the future by high-throughput experimental techniques (e.g. Bulyk et al., 2001).
|
Individually the 1363 footprints that map to euchromatic arms (four footprints map to heterochromatic scaffolds) comprise a total of 26,983 bp of DNA sequence, but since nearly half (45%, n = 613) of the footprints annotated overlap at least one other footprint, these data span only 21 372 bp of genomic DNA, or approximately 0.0183% of the Release 3 euchromatic genome sequence. The footprinted sequences annotated range in length from 5 to 140 bp, and surprisingly have a mean (median) length of 19.8 bp (17 bp) (Fig. 1). In fact, the vast majority (81%, n = 1101) of the footprinted sequences annotated are longer than both the 10.5 bp length needed for one turn of the ß-form DNA helix (Wolffe, 1998) as well as the core recognition motif length (510 bp) typically reported for most transcription factors. The prevalence of long footprinted sequences may simply result from steric hindrance of the transcription factor preventing access to DNase cleavage, but may also suggest an under-appreciated role for non-core motif nucleotides in transcription-factor DNA interactions and/or a high frequency of homo-cooperative binding interactions. Certainly, the magnitude of overlap among footprinted sequences suggests the possibility extensive hetero-cooperative interactions in these data. With the resource presented here, these and other hypotheses can now be tested using the wide array of experimental and computational methods available for the functional analysis of transcription factor binding sites.
| SUPPLEMENTARY DATA |
|---|
|
|
|---|
Supplementary data for this paper are available on Bioinformatics online.
| Acknowledgments |
|---|
We thank Nicholas Blanchard for assistance with literature curation; FlyBase Cambridge for access to the Drosophila offprint collection; Michael Ashburner, Douda Bensasson, Thomas Down and Rachel Drysdale for suggestions on data format and representation; and three anonymous reviewers and Nikolaus Rajewsky for helpful comments on the manuscript. This work was supported in part by NIH grants HG00750 and GH002673 to G.Rubin and SEC, respectively. CMB is supported by NIH training grant T32 HL07279 to E.Rubin and by a Royal Society USA Research Fellowship.
Received on August 5, 2004; revised on November 5, 2004; accepted on November 20, 2004
| REFERENCES |
|---|
|
|
|---|
Berman, B.P., Nibu, Y., Pfeiffer, B.D., Tomancak, P., Celniker, S.E., Levine, M., Rubin, G.M., Eisen, M.B. (2002) Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc. Natl Acad. Sci. USA, 99, 757762
Bulyk, M.L., Huang, X., Choo, Y., Church, G.M. (2001) Exploring the DNA-binding specificities of zinc fingers with DNA microarrays. Proc. Natl Acad. Sci. USA, 98, 71587163
Celniker, S.E., Wheeler, D.A., Kronmiller, B., Carlson, J.W., Halpern, A., Patel, S., Adams, M., Champe, M., Dugan, S.P., Frise, E., et al. (2002) Finishing a whole genome shotgun sequence assembly: release 3 of the Drosophila euchromatic genome sequence. Genome Biol., 3, .
Costas, J., Casares, F., Vieira, J. (2003) Turnover of binding sites for transcription factors involved in early Drosophila development. Gene, 310, 215220[CrossRef][Web of Science][Medline].
Emberly, E., Rajewsky, N., Siggia, E.D. (2003) Conservation of regulatory elements between two species of Drosophila. BMC Bioinformatics, 4, 57[CrossRef][Medline].
Galas, D.J. and Schmitz, A. (1978) DNAse footprinting: a simple method for the detection of protein-DNA binding specificity. Nucleic Acids Res., 5, 31573170
Grad, Y.H., Roth, F.P., Halfon, M.S., Church, G.M. (2004) Prediction of similarly-acting cis-regulatory modules by subsequence profiling and comparative genomics in D.melanogaster and D.pseudoobscura. Bioinformatics, 20, 27382750
Kaminker, J.S., Bergman, C.M., Kronmiller, B., Carlson, J., Svirskas, R., Patel, S., Frise, E., Wheeler, D.A., Lewis, S.E., Rubin, G.M., Ashburner, M., Celniker, S.E. (2002) The transposable elements of the Drosophila melanogaster euchromatin: a genomics perspective. Genome Biol., 3, .
Lifanov, A.P., Makeev, V.J., Nazina, A.G., Papatsenko, D.A. (2003) Homotypic regulatory clusters in. Drosophila. Genome Res., 13, 579588.
Ludwig, M.Z., Bergman, C., Patel, N., Kreitman, M. (2000) Evidence for stabilizing selection in a eukaryotic cis-regulatory element. Nature, 403, 564567[CrossRef][Medline].
Misra, S., Crosby, M.A., Mungall, C.J., Matthews, B.B., Campbell, K.S., Hradecky, P., Huang, Y., Kaminker, J.S., Millburn, G.H., Prochnik, S.E., et al. (2002) Annotation of the Drosophila melanogaster euchromatic genome: a systematic review. Genome Biol., 3, .
Papatsenko, D.A., Makeev, V.J., Lifanov, A.P., Regnier, M., Nazina, A.G., Desplan, C. (2002) Extraction of functional binding sites from unique regulatory regions: the Drosophila early developmental enhancers. Genome Res., 12, 470481
Rajewsky, N., Vergassola, M., Gaul, U., Siggia, E.D. (2002) Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo. BMC Bioinformatics, 3, 30[CrossRef][Medline].
Spirov, A.V., Bowler, T., Reinitz, J. (2000) HOX Pro: a specialized database for clusters and networks of homeobox genes. Nucleic Acids Res., 28, 337340
Wingender, E., Chen, X., Fricke, E., Geffers, R., Hehl, R., Liebich, I., Krull, M., Matys, V., Michael, H., Ohnhauser, R., et al. (2001) The TRANSFAC system on gene expression regulation. Nucleic Acids Res., 29, 281283
Wolffe, A. Chromatin, (1998) , San Diego Academic Press, pp. 8.
Zhu, J. and Zhang, M.Q. (1999) SCPD: a promoter database of the yeast Saccharomyces cerevisiae. Bioinformatics, 15, 607611
This article has been cited by other articles:
![]() |
R. Jurgelenaite, T. M. H. Dijkstra, C. H. M. Kocken, and T. Heskes Gene regulation in the intraerythrocytic cycle of Plasmodium falciparum Bioinformatics, June 15, 2009; 25(12): 1484 - 1491. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. K. Holloway, D. J. Begun, A. Siepel, and K. S. Pollard Accelerated sequence divergence of conserved genomic elements in Drosophila melanogaster Genome Res., October 1, 2008; 18(10): 1592 - 1601. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. R. Haddrill, D. Bachtrog, and P. Andolfatto Positive and Negative Selection on Noncoding DNA in Drosophila simulans Mol. Biol. Evol., September 1, 2008; 25(9): 1825 - 1834. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Satija, L. Pachter, and J. Hein Combining statistical alignment and phylogenetic footprinting to detect regulatory elements Bioinformatics, May 15, 2008; 24(10): 1236 - 1242. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. B. Noyes, X. Meng, A. Wakabayashi, S. Sinha, M. H. Brodsky, and S. A. Wolfe A systematic characterization of factors that regulate Drosophila segmentation via a bacterial one-hybrid system Nucleic Acids Res., May 1, 2008; 36(8): 2547 - 2560. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. L. Griffith, S. B. Montgomery, B. Bernier, B. Chu, K. Kasaian, S. Aerts, S. Mahony, M. C. Sleumer, M. Bilenky, M. Haeussler, et al. ORegAnno: an open-access community-driven resource for regulatory annotation Nucleic Acids Res., January 11, 2008; 36(suppl_1): D107 - D113. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. S. Halfon, S. M. Gallo, and C. M. Bergman REDfly 2.0: an integrated database of cis-regulatory modules and transcription factor binding sites in Drosophila Nucleic Acids Res., January 11, 2008; 36(suppl_1): D594 - D598. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Dubois, J. Enriquez, V. Daburon, F. Crozet, G. Lebreton, M. Crozatier, and A. Vincent collier transcription in a single Drosophila muscle lineage: the combinatorial control of muscle identity Development, December 15, 2007; 134(24): 4347 - 4355. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Kheradpour, A. Stark, S. Roy, and M. Kellis Reliable prediction of regulator targets using 12 Drosophila genomes Genome Res., December 1, 2007; 17(12): 1919 - 1931. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. R. Copley, M. Totrov, J. Linnell, S. Field, J. Ragoussis, and I. A. Udalova Functional conservation of Rel binding sites in drosophilid genomes Genome Res., September 1, 2007; 17(9): 1327 - 1335. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Mahony and P. V. Benos STAMP: a web tool for exploring DNA-binding motif similarities Nucleic Acids Res., July 13, 2007; 35(suppl_2): W253 - W258. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Sandmann, C. Girardot, M. Brehme, W. Tongprasit, V. Stolc, and E. E.M. Furlong A core transcriptional network for early mesoderm development in Drosophila melanogaster Genes & Dev., February 15, 2007; 21(4): 436 - 449. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. D. Hueber, D. Bezdan, S. R. Henz, M. Blank, H. Wu, and I. Lohmann Comparative analysis of Hox downstream genes in Drosophila Development, January 15, 2007; 134(2): 381 - 392. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Pierstorff, C. M. Bergman, and T. Wiehe Identifying cis-regulatory modules by combining comparative and compositional analysis of DNA Bioinformatics, December 1, 2006; 22(23): 2858 - 2864. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Sinha, Y. Liang, and E. Siggia Stubb: a program for discovery and analysis of cis-regulatory modules. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W555 - W559. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Adryan and S. A. Teichmann FlyTF: a systematic review of site-specific transcription factors in the fruit fly Drosophila melanogaster Bioinformatics, June 15, 2006; 22(12): 1532 - 1533. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. B. Montgomery, O. L. Griffith, M. C. Sleumer, C. M. Bergman, M. Bilenky, E. D. Pleasance, Y. Prychyna, X. Zhang, and S. J. M. Jones ORegAnno: an open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation Bioinformatics, March 1, 2006; 22(5): 637 - 640. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. Gallo, L. Li, Z. Hu, and M. S. Halfon REDfly: a Regulatory Element Database for Drosophila Bioinformatics, February 1, 2006; 22(3): 381 - 383. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Matys, O. V. Kel-Margoulis, E. Fricke, I. Liebich, S. Land, A. Barre-Dirrie, I. Reuter, D. Chekmenev, M. Krull, K. Hornischer, et al. TRANSFAC(R) and its module TRANSCompel(R): transcriptional gene regulation in eukaryotes Nucleic Acids Res., January 1, 2006; 34(suppl_1): D108 - D110. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Junion, T. Jagla, S. Duplant, R. Tapin, J.-P. Da Ponte, and K. Jagla Mapping Dmef2-binding regulatory modules by using a ChIP-enriched in silico targets approach PNAS, December 20, 2005; 102(51): 18479 - 18484. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Ashburner and C. M. Bergman Drosophila melanogaster: A case study of a model genomic sequence and its consequences Genome Res., December 1, 2005; 15(12): 1661 - 1667. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. A. Glazov, M. Pheasant, E. A. McGraw, G. Bejerano, and J. S. Mattick Ultraconserved elements in insect genomes: A highly conserved intronic sequence implicated in the control of homothorax mRNA splicing Genome Res., June 1, 2005; 15(6): 800 - 808. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||







