Skip Navigation


Bioinformatics Advance Access originally published online on November 30, 2004
Bioinformatics 2005 21(8):1747-1749; doi:10.1093/bioinformatics/bti173
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary data
Right arrow All Versions of this Article:
21/8/1747    most recent
bti173v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (42)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Bergman, C. M.
Right arrow Articles by Celniker, S. E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bergman, C. M.
Right arrow Articles by Celniker, S. E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2004. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

Drosophila DNase I footprint database: a systematic genome annotation of transcription factor binding sites in the fruitfly, Drosophila melanogaster

Casey M. Bergman 1,*, Joseph W. Carlson 2,3 and Susan E. Celniker 2,3

1Department of Genetics, University of Cambridge Cambridge CB2 3EH, UK
2Department of Genome Sciences, Lawrence Berkeley National Laboratory Berkeley, CA 94720, USA
3Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory Berkeley, CA 94720, USA

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 SUPPLEMENTARY DATA
 REFERENCES
 

Summary: Despite increasing numbers of computational tools developed to predict cis-regulatory sequences, the availability of high-quality datasets of transcription factor binding sites limits advances in the bioinformatics of gene regulation. Here we present such a dataset based on a systematic literature curation and genome annotation of DNase I footprints for the fruitfly, Drosophila melanogaster. Using the experimental results of 201 primary references, we annotated 1367 binding sites from 87 transcription factors and 101 target genes in the D.melanogaster genome sequence. These data will provide a rich resource for future bioinformatics analyses of transcriptional regulation in Drosophila such as constructing motif models, training cis-regulatory module detectors, benchmarking alignment tools and continued text mining of the extensive literature on transcriptional regulation in this important model organism.

Availability: http://www.flyreg.org/

Contact: cbergman{at}gen.cam.ac.uk

The fruitfly Drosophila melanogaster has one of the most highly annotated metazoan genome sequences with respect to gene and transposable element content (Misra et al. 2002; Kaminker et al. 2002). In contrast, the cis-regulatory sequences that control transcription are only just beginning to be incorporated explicitly into the genome annotation, despite the vast literature of functionally characterized cis-regulatory elements that exists for this species (http://www.flybase.org/). This lack of a systematic, publicly available compilation of cis-regulatory sequences for D.melanogaster, such as the SCPD in yeast (Zhu and Zhang, 1999), limits progress in the computational analysis of gene regulation for this important model species. The need for such a resource is clear from the fact that cis-regulatory curation efforts of limited scope for genes involved in early development (Ludwig et al., 2000; Spirov et al., 2000; Berman et al., 2002; Papatsenko et al., 2002; Rajewsky et al., 2002; Emberly et al., 2003; Lifanov et al., 2003) have rapidly proven useful for subsequent bioinformatic and comparative studies of gene regulation (Costas et al., 2003; Grad et al., 2004).

To contribute to a comprehensive annotation of cis-regulatory sequences in D.melanogaster, we report here a database of DNase I footprint sequences derived from a systematic literature curation and annotation effort. We have chosen to focus on DNase I footprints data since they are an abundant and high-quality source of data on transcription factor specificity (Galas and Schmitz, 1978). In contrast to previous binding site compilations in Drosophila, these data are derived from the same experimental data type, cover all available aspects of development and are explicitly linked to the finished Release 3 genome sequence coordinates (Celniker et al., 2002). The purpose of this note is to present a basic characterization of these data and to make them available in a single database as a resource for computational analyses of transcriptional regulation in one of the most important model organisms.

Our literature curation yielded a total of 201 references with non-redundant experimental data from DNase I footprinting experiments (see Supplemental Files 1 and 2). Our set of references is a superset of all those meeting the same criteria in previous compilations of binding site data for the Drosophila early embryo (Ludwig et al., 2000; Spirov et al., 2000; Berman et al., 2002; Papatsenko et al., 2002; Rajewsky et al., 2002; Emberly et al., 2003; Lifanov et al., 2003) and Transfac 5.0 (Wingender et al., 2001). The overlap between the present and previous compilations is detailed in Supplemental File 1. Our current work includes information from 113 primary references not present in any previous compilation, doubling the number of references with curated Drosophila DNase I footprint data consolidated in a single public database.

Of the 1367 footprints annotated, 1341 footprints (98%) can be attributed to 101 target genes, with 26 footprints (2%) obtained from chromatin immunoprecipitation experiments having ‘unknown’ targets (Supplemental File 3). The mean (median) number of footprints annotated per target gene is 13.3 (5), with a skewed distribution (Fig. 1a): the top ten genes (Ubx, Antp, h, ftz, eve, dpp, kni, en, Ddc, Sgs4) contribute nearly half (49%) of the footprints mapping to known targets. Likewise, 1164 (85%) of the 1367 footprints annotated can be attributed to 87 purified or recombinant transcription factors, plus an additional 203 footprints (15%) from ‘unspecified’ factors with unknown identity derived from crude or purified nuclear extract (Supplemental File 3). The distribution of number of footprints per factor is also skewed with a mean (median) number of footprints annotated per factor of 13.4 (6) (Fig. 1). As with the distribution by target, the top ten genes (hb, Trl, ftz, Ubx, en, bcd, Kr, abd-A, z, dl) also contribute nearly half (49%) of the footprints derived from known factors. Although these data represent the most comprehensive collection of binding site data in Drosophila to date, it is clear that binding site information is lacking for the majority of factors and genes, a limitation that can hopefully be overcome in the future by high-throughput experimental techniques (e.g. Bulyk et al., 2001).



View larger version (19K):
[in this window]
[in a new window]
 
Fig. 1 Distribution of the number of annotated footprints per target gene and transcription factor (a), and distribution of annotated footprint lengths (b). In (a), the target gene with the highest number of footprints annotated is Ubx, and the transcription factor with the highest number of footprints annotated is hb.

 
Individually the 1363 footprints that map to euchromatic arms (four footprints map to heterochromatic scaffolds) comprise a total of 26,983 bp of DNA sequence, but since nearly half (45%, n = 613) of the footprints annotated overlap at least one other footprint, these data span only 21 372 bp of genomic DNA, or approximately 0.0183% of the Release 3 euchromatic genome sequence. The footprinted sequences annotated range in length from 5 to 140 bp, and surprisingly have a mean (median) length of 19.8 bp (17 bp) (Fig. 1). In fact, the vast majority (81%, n = 1101) of the footprinted sequences annotated are longer than both the 10.5 bp length needed for one turn of the ß-form DNA helix (Wolffe, 1998) as well as the core recognition motif length (5–10 bp) typically reported for most transcription factors. The prevalence of long footprinted sequences may simply result from steric hindrance of the transcription factor preventing access to DNase cleavage, but may also suggest an under-appreciated role for non-core motif nucleotides in transcription-factor DNA interactions and/or a high frequency of homo-cooperative binding interactions. Certainly, the magnitude of overlap among footprinted sequences suggests the possibility extensive hetero-cooperative interactions in these data. With the resource presented here, these and other hypotheses can now be tested using the wide array of experimental and computational methods available for the functional analysis of transcription factor binding sites.


    SUPPLEMENTARY DATA
 TOP
 Abstract
 SUPPLEMENTARY DATA
 REFERENCES
 
Supplementary data for this paper are available on Bioinformatics online.


    Acknowledgments
 
We thank Nicholas Blanchard for assistance with literature curation; FlyBase Cambridge for access to the Drosophila offprint collection; Michael Ashburner, Douda Bensasson, Thomas Down and Rachel Drysdale for suggestions on data format and representation; and three anonymous reviewers and Nikolaus Rajewsky for helpful comments on the manuscript. This work was supported in part by NIH grants HG00750 and GH002673 to G.Rubin and SEC, respectively. CMB is supported by NIH training grant T32 HL07279 to E.Rubin and by a Royal Society USA Research Fellowship.

Received on August 5, 2004; revised on November 5, 2004; accepted on November 20, 2004

    REFERENCES
 TOP
 Abstract
 SUPPLEMENTARY DATA
 REFERENCES
 

    Berman, B.P., Nibu, Y., Pfeiffer, B.D., Tomancak, P., Celniker, S.E., Levine, M., Rubin, G.M., Eisen, M.B. (2002) Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc. Natl Acad. Sci. USA, 99, 757–762[Abstract/Free Full Text].

    Bulyk, M.L., Huang, X., Choo, Y., Church, G.M. (2001) Exploring the DNA-binding specificities of zinc fingers with DNA microarrays. Proc. Natl Acad. Sci. USA, 98, 7158–7163[Abstract/Free Full Text].

    Celniker, S.E., Wheeler, D.A., Kronmiller, B., Carlson, J.W., Halpern, A., Patel, S., Adams, M., Champe, M., Dugan, S.P., Frise, E., et al. (2002) Finishing a whole genome shotgun sequence assembly: release 3 of the Drosophila euchromatic genome sequence. Genome Biol., 3, .

    Costas, J., Casares, F., Vieira, J. (2003) Turnover of binding sites for transcription factors involved in early Drosophila development. Gene, 310, 215–220[CrossRef][ISI][Medline].

    Emberly, E., Rajewsky, N., Siggia, E.D. (2003) Conservation of regulatory elements between two species of Drosophila. BMC Bioinformatics, 4, 57[CrossRef][Medline].

    Galas, D.J. and Schmitz, A. (1978) DNAse footprinting: a simple method for the detection of protein-DNA binding specificity. Nucleic Acids Res., 5, 3157–3170[Abstract/Free Full Text].

    Grad, Y.H., Roth, F.P., Halfon, M.S., Church, G.M. (2004) Prediction of similarly-acting cis-regulatory modules by subsequence profiling and comparative genomics in D.melanogaster and D.pseudoobscura. Bioinformatics, 20, 2738–2750[Abstract/Free Full Text].

    Kaminker, J.S., Bergman, C.M., Kronmiller, B., Carlson, J., Svirskas, R., Patel, S., Frise, E., Wheeler, D.A., Lewis, S.E., Rubin, G.M., Ashburner, M., Celniker, S.E. (2002) The transposable elements of the Drosophila melanogaster euchromatin: a genomics perspective. Genome Biol., 3, .

    Lifanov, A.P., Makeev, V.J., Nazina, A.G., Papatsenko, D.A. (2003) Homotypic regulatory clusters in. Drosophila. Genome Res., 13, 579–588.

    Ludwig, M.Z., Bergman, C., Patel, N., Kreitman, M. (2000) Evidence for stabilizing selection in a eukaryotic cis-regulatory element. Nature, 403, 564–567[CrossRef][Medline].

    Misra, S., Crosby, M.A., Mungall, C.J., Matthews, B.B., Campbell, K.S., Hradecky, P., Huang, Y., Kaminker, J.S., Millburn, G.H., Prochnik, S.E., et al. (2002) Annotation of the Drosophila melanogaster euchromatic genome: a systematic review. Genome Biol., 3, .

    Papatsenko, D.A., Makeev, V.J., Lifanov, A.P., Regnier, M., Nazina, A.G., Desplan, C. (2002) Extraction of functional binding sites from unique regulatory regions: the Drosophila early developmental enhancers. Genome Res., 12, 470–481[Abstract/Free Full Text].

    Rajewsky, N., Vergassola, M., Gaul, U., Siggia, E.D. (2002) Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo. BMC Bioinformatics, 3, 30[CrossRef][Medline].

    Spirov, A.V., Bowler, T., Reinitz, J. (2000) HOX Pro: a specialized database for clusters and networks of homeobox genes. Nucleic Acids Res., 28, 337–340[Abstract/Free Full Text].

    Wingender, E., Chen, X., Fricke, E., Geffers, R., Hehl, R., Liebich, I., Krull, M., Matys, V., Michael, H., Ohnhauser, R., et al. (2001) The TRANSFAC system on gene expression regulation. Nucleic Acids Res., 29, 281–283[Abstract/Free Full Text].

    Wolffe, A. Chromatin, (1998) , San Diego Academic Press, pp. 8.

    Zhu, J. and Zhang, M.Q. (1999) SCPD: a promoter database of the yeast Saccharomyces cerevisiae. Bioinformatics, 15, 607–611[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Mol Biol EvolHome page
P. R. Haddrill, D. Bachtrog, and P. Andolfatto
Positive and Negative Selection on Noncoding DNA in Drosophila simulans
Mol. Biol. Evol., September 1, 2008; 25(9): 1825 - 1834.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. Satija, L. Pachter, and J. Hein
Combining statistical alignment and phylogenetic footprinting to detect regulatory elements
Bioinformatics, May 15, 2008; 24(10): 1236 - 1242.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. B. Noyes, X. Meng, A. Wakabayashi, S. Sinha, M. H. Brodsky, and S. A. Wolfe
A systematic characterization of factors that regulate Drosophila segmentation via a bacterial one-hybrid system
Nucleic Acids Res., May 1, 2008; 36(8): 2547 - 2560.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
O. L. Griffith, S. B. Montgomery, B. Bernier, B. Chu, K. Kasaian, S. Aerts, S. Mahony, M. C. Sleumer, M. Bilenky, M. Haeussler, et al.
ORegAnno: an open-access community-driven resource for regulatory annotation
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D107 - D113.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. S. Halfon, S. M. Gallo, and C. M. Bergman
REDfly 2.0: an integrated database of cis-regulatory modules and transcription factor binding sites in Drosophila
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D594 - D598.
[Abstract] [Full Text] [PDF]


Home page
DevelopmentHome page
L. Dubois, J. Enriquez, V. Daburon, F. Crozet, G. Lebreton, M. Crozatier, and A. Vincent
collier transcription in a single Drosophila muscle lineage: the combinatorial control of muscle identity
Development, December 15, 2007; 134(24): 4347 - 4355.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
P. Kheradpour, A. Stark, S. Roy, and M. Kellis
Reliable prediction of regulator targets using 12 Drosophila genomes
Genome Res., December 1, 2007; 17(12): 1919 - 1931.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
R. R. Copley, M. Totrov, J. Linnell, S. Field, J. Ragoussis, and I. A. Udalova
Functional conservation of Rel binding sites in drosophilid genomes
Genome Res., September 1, 2007; 17(9): 1327 - 1335.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Mahony and P. V. Benos
STAMP: a web tool for exploring DNA-binding motif similarities
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W253 - W258.
[Abstract] [Full Text] [PDF]


Home page
Genes Dev.Home page
T. Sandmann, C. Girardot, M. Brehme, W. Tongprasit, V. Stolc, and E. E.M. Furlong
A core transcriptional network for early mesoderm development in Drosophila melanogaster
Genes & Dev., February 15, 2007; 21(4): 436 - 449.
[Abstract] [Full Text] [PDF]


Home page
DevelopmentHome page
S. D. Hueber, D. Bezdan, S. R. Henz, M. Blank, H. Wu, and I. Lohmann
Comparative analysis of Hox downstream genes in Drosophila
Development, January 15, 2007; 134(2): 381 - 392.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
N. Pierstorff, C. M. Bergman, and T. Wiehe
Identifying cis-regulatory modules by combining comparative and compositional analysis of DNA
Bioinformatics, December 1, 2006; 22(23): 2858 - 2864.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Sinha, Y. Liang, and E. Siggia
Stubb: a program for discovery and analysis of cis-regulatory modules.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W555 - W559.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
B. Adryan and S. A. Teichmann
FlyTF: a systematic review of site-specific transcription factors in the fruit fly Drosophila melanogaster
Bioinformatics, June 15, 2006; 22(12): 1532 - 1533.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. B. Montgomery, O. L. Griffith, M. C. Sleumer, C. M. Bergman, M. Bilenky, E. D. Pleasance, Y. Prychyna, X. Zhang, and S. J. M. Jones
ORegAnno: an open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation
Bioinformatics, March 1, 2006; 22(5): 637 - 640.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. M. Gallo, L. Li, Z. Hu, and M. S. Halfon
REDfly: a Regulatory Element Database for Drosophila
Bioinformatics, February 1, 2006; 22(3): 381 - 383.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
V. Matys, O. V. Kel-Margoulis, E. Fricke, I. Liebich, S. Land, A. Barre-Dirrie, I. Reuter, D. Chekmenev, M. Krull, K. Hornischer, et al.
TRANSFAC(R) and its module TRANSCompel(R): transcriptional gene regulation in eukaryotes
Nucleic Acids Res., January 1, 2006; 34(suppl_1): D108 - D110.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
G. Junion, T. Jagla, S. Duplant, R. Tapin, J.-P. Da Ponte, and K. Jagla
Mapping Dmef2-binding regulatory modules by using a ChIP-enriched in silico targets approach
PNAS, December 20, 2005; 102(51): 18479 - 18484.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
M. Ashburner and C. M. Bergman
Drosophila melanogaster: A case study of a model genomic sequence and its consequences
Genome Res., December 1, 2005; 15(12): 1661 - 1667.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
E. A. Glazov, M. Pheasant, E. A. McGraw, G. Bejerano, and J. S. Mattick
Ultraconserved elements in insect genomes: A highly conserved intronic sequence implicated in the control of homothorax mRNA splicing
Genome Res., June 1, 2005; 15(6): 800 - 808.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary data
Right arrow All Versions of this Article:
21/8/1747    most recent
bti173v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (42)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Bergman, C. M.
Right arrow Articles by Celniker, S. E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bergman, C. M.
Right arrow Articles by Celniker, S. E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?