Skip Navigation


Bioinformatics Advance Access originally published online on November 22, 2005
Bioinformatics 2006 22(3):381-383; doi:10.1093/bioinformatics/bti794
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/3/381    most recent
bti794v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (21)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Gallo, S. M.
Right arrow Articles by Halfon, M. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gallo, S. M.
Right arrow Articles by Halfon, M. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

* The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

REDfly: a Regulatory Element Database for Drosophila

Steven M. Gallo 1, Long Li 2, Zihua Hu 1 and Marc S. Halfon 2,3,*

1Center for Computational Research 140 Farber Hall, State University of New York at Buffalo, 3435 Main Street, Buffalo, NY 14214, USA
2Department of Biochemistry 140 Farber Hall, State University of New York at Buffalo, 3435 Main Street, Buffalo, NY 14214, USA
3Center of Excellence in Bioinformatics and the Life Sciences 140 Farber Hall, State University of New York at Buffalo, 3435 Main Street, Buffalo, NY 14214, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 OVERVIEW OF THE DATA
 EXPRESSION PATTERN ANNOTATION...
 GRAPHICAL DISPLAY, DOWNLOAD AND...
 KNOWN LACUNAE AND FUTURE...
 REFERENCES
 

Summary: Bioinformatics studies of transcriptional regulation in the metazoa are significantly hindered by the absence of readily available data on large numbers of transcriptional cis-regulatory modules (CRMs). Even the richly annotated Drosophila melanogaster genome lacks extensive CRM information. We therefore present here a database of Drosophila CRMs curated from the literature complete with both DNA sequence and a searchable description of the gene expression pattern regulated by each CRM. This resource should greatly facilitate the development of computational approaches to CRM discovery as well as bioinformatics analyses of regulatory sequence properties and evolution.

Availability: http://redfly.ccr.buffalo.edu

Contact: mshalfon{at}buffalo.edu


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 OVERVIEW OF THE DATA
 EXPRESSION PATTERN ANNOTATION...
 GRAPHICAL DISPLAY, DOWNLOAD AND...
 KNOWN LACUNAE AND FUTURE...
 REFERENCES
 
Despite their importance, the transcriptional cis-regulatory modules (CRMs) associated with the majority of genes in the higher eukaryotes are unknown, and few are yet included in genome annotations. This lack of a comprehensive collection of known CRMs presents a considerable roadblock to large-scale computational analyses of transcriptional regulatory sequences. Easy access to a compilation of CRM sequences would have considerable value for subsequent CRM discovery—for instance, by providing training data for supervised learning approaches—as well as for investigations into the nature and evolution of cis-regulatory elements.

Although Drosophila melanogaster has one of the most fully annotated metazoan genomes, fewer than 45 genes are annotated with documented CRMs (http://flybase.bio.indiana.edu/annot/). An additional collection of about 60 CRMs from about 24 different genes involved in early embryonic gene expression has also been developed (Lifanov et al., 2003; Schroeder et al., 2004). This collection has been used for a number of studies (Abnizova et al., 2005; Berman et al., 2004; Costas et al., 2003; Grad et al., 2004; Gupta and Liu, 2005; Lifanov et al., 2003; Papatsenko et al., 2002; Philippakis et al., 2005; Rajewsky et al., 2002; Schroeder et al., 2004; Zhou and Wong, 2004), but is limited by the fact that all of the CRMs are involved in regulating a similar pattern of gene expression and in binding a similar repertoire of transcription factors (TFs). Recently, Bergman et al. (2005) have compiled a comprehensive database of DNAse I footprints in Drosophila. However, the footprint data only detail TF binding sites, and not functional CRM sequences.

We introduce here a database of published Drosophila CRMs, REDfly (Regulatory Element Database for fly). REDfly currently contains over 600 CRMs along with their sequences and a description of the expression patterns for which they are responsible. The goal of REDfly is to provide a comprehensive source of sequence and expression pattern data for Drosophila CRMs.


    OVERVIEW OF THE DATA
 TOP
 ABSTRACT
 INTRODUCTION
 OVERVIEW OF THE DATA
 EXPRESSION PATTERN ANNOTATION...
 GRAPHICAL DISPLAY, DOWNLOAD AND...
 KNOWN LACUNAE AND FUTURE...
 REFERENCES
 
For the initial REDfly release we have focused on sequences that have been demonstrated to be sufficient to regulate gene expression, primarily through reporter gene assays in transgenic animals. Sequences necessary for expression, but not clearly sufficient—e.g. TF binding sites or sequences uncovered by small deletions—are not presently incorporated. Each record contains the DNA sequence of the CRM as well as coordinates mapped to the release 4 genomic sequence (http://www.fruitfly.org/annot/release4.html). We have also noted if the given CRM includes the associated gene's promoter. A more detailed explanation of how sequences were chosen and mapped onto the genomic sequence are provided in the online User's Guide.

REDfly currently contains an excess of 600 CRMs associated with more than 200 genes drawn from over 200 references. Curation of the database will continue both to add newly reported CRMs and to fill in previously reported CRMs; we estimate that better than two-thirds of the reported CRMs are presently included. Approximately 75% of the CRM sequences are <2500 bp in length and 50% <1125 bp. Approximately 25% of the included CRMs overlap other included CRMs, and 18% of the CRMs include their gene's promoter. Greater than 75% of the CRMs regulate gene expression outside of the blastoderm embryo and are thus not included in the previous compilations of Drosophila regulatory elements (Lifanov et al., 2003; Schroeder et al., 2004). The total amount of (non-overlapping) CRM sequence in the database is slightly >1 Mb, or ~0.86% of the total Drosophila euchromatic genome, with sequences from each chromosome represented. The median distance between CRMs ranges from 23.4 kb on chromosome 2L to 275.4 kb on chromosome 4; the maximum distance in most cases is ~10% of the chromosome arm length. A more detailed bioinformatics analysis of the CRMs will be presented elsewhere.


    EXPRESSION PATTERN ANNOTATION AND SEARCHING
 TOP
 ABSTRACT
 INTRODUCTION
 OVERVIEW OF THE DATA
 EXPRESSION PATTERN ANNOTATION...
 GRAPHICAL DISPLAY, DOWNLOAD AND...
 KNOWN LACUNAE AND FUTURE...
 REFERENCES
 
Each CRM has been annotated with a description of the expression pattern that it directs. REDfly uses the Drosophila anatomy ontology (http://obo.sourceforge.net/cgibin/detail.cgi?drosanat; Drysdale, 2001) for assigning expression patterns, which will enable high interoperability with other biological databases.

REDfly has two modes of searching for expression patterns. The ‘Expression Term’ search will search for records whose expression annotation includes the specified term. Alternatively, users can use the ‘Ontology’ search function, which will return records whose expression annotation matches the specified term or any of the descendent terms in the ontology hierarchy. For example, a search for ‘mesoderm’ using the Expression Term search will return only those CRMs whose annotation explicitly includes the word ‘mesoderm’. However, a similar Ontology search will also return mesodermal derivatives such as ‘embryonic somatic muscle’ and ‘cardioblast’. The Ontology search can be initiated either by entering an ontology term in the search box or by browsing the ontology tree in a pop-up window. This enables easy access to the terms and term hierarchy. A link is provided to the FlyBase gene expression report page (Drysdale et al., 2005), which provides a list of genes annotated in FlyBase with the current ontology term. Link-out is also provided to genes with similar expression patterns in the BDGP in situ hybridization database (Tomancak et al., 2002). As mappings between the anatomy ontologies of different organisms are developed, it should be possible to create links to similarly expressed genes in these organisms as well.

The REDfly expression pattern annotation is drawn from the textual descriptions given by authors. As these are provided in the literature in varying levels of detail and are typically not reported using the ontology terms, providing an exact annotation is not always straightforward. We have attempted to err on the side of more general rather than more restrictive assignments (e.g. ‘embryonic muscle system’ versus ‘abdominal dorsal acute muscle’, unless explicitly so annotated by the author). The Ontology search function therefore provides a way to identify CRMs that potentially drive similar spatial patterns of expression despite that expression having been described at different levels of detail in the literature.

The ability to search by expression pattern is a key feature of REDfly that promises to be highly useful for developing models for computational discovery of tissue-specific CRMs (e.g. Grad et al., 2004) and for investigating structural and organizational properties of CRMs (Erives and Levine, 2004; Senger et al., 2004). However, we note that the anatomy ontology does not at this time always provide a means to distinguish subtissue- or organ-level cell populations. Thus, e.g. two entries annotated as ‘wing disc’ may in fact refer to non-overlapping cell types within the disc. Users are therefore encouraged to consult original references for detailed descriptions of expression patterns.


    GRAPHICAL DISPLAY, DOWNLOAD AND LINK-OUT
 TOP
 ABSTRACT
 INTRODUCTION
 OVERVIEW OF THE DATA
 EXPRESSION PATTERN ANNOTATION...
 GRAPHICAL DISPLAY, DOWNLOAD AND...
 KNOWN LACUNAE AND FUTURE...
 REFERENCES
 
A number of options for graphical display, download and link-out to other databases have been provided. From the report page for any record, links are available to display the CRM in the UCSC genome browser (Kent et al., 2002) or in the Generic Genome Browser (Gbrowse; Stein et al., 2002). CRM sequences can be downloaded in multi-FASTA format, in the format for custom Gbrowse annotations, and in CSV or GFFv3 format that includes additional field data. Links are available to the FlyBase report of the associated gene and to the PubMed citation of the primary reference. As noted above, for each expression term associated with a given CRM, it is also possible to link to a list of genes annotated as having the same expression pattern in both FlyBase and in the BDGP in situ hybridization database.


    KNOWN LACUNAE AND FUTURE INCLUSIONS
 TOP
 ABSTRACT
 INTRODUCTION
 OVERVIEW OF THE DATA
 EXPRESSION PATTERN ANNOTATION...
 GRAPHICAL DISPLAY, DOWNLOAD AND...
 KNOWN LACUNAE AND FUTURE...
 REFERENCES
 
A number of potentially important regulatory sequences have not yet been included in REDfly. These include CRMs inferred but not demonstrated to have specific activities based on deletion analysis, either from reporter gene assays or from genomic deletions, as well as silencer and boundary elements. REDfly is also currently limited to CRMs from D.melanogaster, despite the growing number of functionally tested sequences from other fly species. Future updates of REDfly will include such sequences along with a description of the evidence used to support their assignment as CRMs. We also hope to continue to upgrade the expression pattern search functions and the graphical display capabilities, and to improve cross-referencing with other databases.


    Acknowledgments
 
We thank J. Leatherbarrow for assistance with literature curation, Q. Nguyen for programming assistance, H. Apitz, G. Mardon, J. Posakony and J. Wildonger for providing CRM sequences, and E. Wang and S. Sinha for comments on the manuscript and database. M.S.H. is supported by NIH grant HG002489.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Chris Stoeckert

Received on September 29, 2005; revised on November 17, 2005; accepted on November 17, 2005

    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 OVERVIEW OF THE DATA
 EXPRESSION PATTERN ANNOTATION...
 GRAPHICAL DISPLAY, DOWNLOAD AND...
 KNOWN LACUNAE AND FUTURE...
 REFERENCES
 

    Abnizova, I., et al. (2005) Some statistical properties of regulatory DNA sequences, and their use in predicting regulatory regions in the Drosophila genome: the fluffy-tail test. BMC Bioinformatics, 6, 109[CrossRef][Medline].

    Bergman, C.M., et al. (2005) Drosophila DNase I footprint database: a systematic genome annotation of transcription factor binding sites in the fruitfly, Drosophila melanogaster. Bioinformatics, 21, 1747–1749[Abstract/Free Full Text].

    Berman, B.P., et al. (2004) Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura. Genome Biol, . 5, R61[CrossRef][Medline].

    Costas, J., et al. (2003) Turnover of binding sites for transcription factors involved in early Drosophila development. Gene, 310, 215–220[CrossRef][Web of Science][Medline].

    Drysdale, R. (2001) Phenotypic data in FlyBase. Brief Bioinform, . 2, 68–80[Abstract/Free Full Text].

    Drysdale, R.A., et al. (2005) FlyBase: genes and gene models. Nucleic Acids Res, . 33, (Database issue) D390–D395[Abstract/Free Full Text].

    Erives, A. and Levine, M. (2004) Coordinate enhancers share common organizational features in the Drosophila genome. Proc. Natl Acad. Sci. USA, 101, 3851–3856[Abstract/Free Full Text].

    Grad, Y.H., et al. (2004) Prediction of similarly acting cis-regulatory modules by subsequence profiling and comparative genomics in Drosophila melanogaster and D.pseudoobscura. Bioinformatics, 20, 2738–2750[Abstract/Free Full Text].

    Gupta, M. and Liu, J.S. (2005) De novo cis-regulatory module elicitation for eukaryotic genomes. Proc. Natl Acad. Sci. USA, 102, 7079–7084[Abstract/Free Full Text].

    Kent, W.J., et al. (2002) The human genome browser at UCSC. Genome Res, . 12, 996–1006[Abstract/Free Full Text].

    Lifanov, A.P., et al. (2003) Homotypic regulatory clusters in Drosophila. Genome Res, . 13, 579–588[Abstract/Free Full Text].

    Papatsenko, D.A., et al. (2002) Extraction of functional binding sites from unique regulatory regions: the Drosophila early developmental enhancers. Genome Res, . 12, 470–481[Abstract/Free Full Text].

    Philippakis, A.A., et al. (2005) Modulefinder: a tool for computational discovery of cis regulatory modules. Pac. Symp. Biocomput, . 519–530.

    Rajewsky, N., et al. (2002) Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo. BMC Bioinformatics, 3, 30[CrossRef][Medline].

    Schroeder, M.D., et al. (2004) Transcriptional control in the segmentation gene network of Drosophila. PLoS Biol, . 2, e271.

    Senger, K., et al. (2004) Immunity regulatory DNAs share common organizational features in Drosophila. Mol. Cell, 13, 19–32[CrossRef][Web of Science][Medline].

    Stein, L.D., et al. (2002) The generic genome browser: a building block for a model organism system database. Genome Res, . 12, 1599–1610[Abstract/Free Full Text].

    Tomancak, P., et al. (2002) Systematic determination of patterns of gene expression during Drosophila embryogenesis. Genome Biol, . 3, RESEARCH0088.

    Zhou, Q. and Wong, W.H. (2004) CisModule: de novo discovery of cis-regulatory modules by hierarchical mixture modeling. Proc. Natl Acad. Sci. USA, 101, 12114–12119[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
W. Fu, P. Ray, and E. P. Xing
DISCOVER: a feature-based discriminative method for motif search in complex genomes
Bioinformatics, June 15, 2009; 25(12): i321 - i329.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Q. Dai, Y. Yang, and T. Wang
Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison
Bioinformatics, October 15, 2008; 24(20): 2296 - 2302.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
A. K. Holloway, D. J. Begun, A. Siepel, and K. S. Pollard
Accelerated sequence divergence of conserved genomic elements in Drosophila melanogaster
Genome Res., October 1, 2008; 18(10): 1592 - 1601.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
P. R. Haddrill, D. Bachtrog, and P. Andolfatto
Positive and Negative Selection on Noncoding DNA in Drosophila simulans
Mol. Biol. Evol., September 1, 2008; 25(9): 1825 - 1834.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. Satija, L. Pachter, and J. Hein
Combining statistical alignment and phylogenetic footprinting to detect regulatory elements
Bioinformatics, May 15, 2008; 24(10): 1236 - 1242.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. B. Noyes, X. Meng, A. Wakabayashi, S. Sinha, M. H. Brodsky, and S. A. Wolfe
A systematic characterization of factors that regulate Drosophila segmentation via a bacterial one-hybrid system
Nucleic Acids Res., May 1, 2008; 36(8): 2547 - 2560.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
O. L. Griffith, S. B. Montgomery, B. Bernier, B. Chu, K. Kasaian, S. Aerts, S. Mahony, M. C. Sleumer, M. Bilenky, M. Haeussler, et al.
ORegAnno: an open-access community-driven resource for regulatory annotation
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D107 - D113.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. S. Halfon, S. M. Gallo, and C. M. Bergman
REDfly 2.0: an integrated database of cis-regulatory modules and transcription factor binding sites in Drosophila
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D594 - D598.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. R. Kantorovitz, G. E. Robinson, and S. Sinha
A statistical method for alignment-free comparison of regulatory sequences
Bioinformatics, July 1, 2007; 23(13): i249 - i255.
[Abstract] [Full Text] [PDF]


Home page
Genes Dev.Home page
T. Sandmann, C. Girardot, M. Brehme, W. Tongprasit, V. Stolc, and E. E.M. Furlong
A core transcriptional network for early mesoderm development in Drosophila melanogaster
Genes & Dev., February 15, 2007; 21(4): 436 - 449.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/3/381    most recent
bti794v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (21)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Gallo, S. M.
Right arrow Articles by Halfon, M. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gallo, S. M.
Right arrow Articles by Halfon, M. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?