Skip Navigation


Bioinformatics Advance Access originally published online on October 13, 2005
Bioinformatics 2005 21(24):4411-4413; doi:10.1093/bioinformatics/bti714
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/24/4411    most recent
bti714v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (14)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by O'Connor, T. R.
Right arrow Articles by Wyrick, J. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by O'Connor, T. R.
Right arrow Articles by Wyrick, J. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oxfordjournals.org

Athena: a resource for rapid visualization and systematic analysis of Arabidopsis promoter sequences

Timothy R. O'Connor 1, Curtis Dyreson 2 and John J. Wyrick 1,*

1School of Molecular Biosciences, Washington State University Pullman, WA, USA
2School of Electrical Engineering and Computer Science, Washington State University Pullman, WA, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 SYSTEM DESCRIPTION
 REFERENCES
 

Summary: To better understand the regulatory networks that control plant gene expression, tools are needed to systematically analyze and visualize promoter regulatory sequences in Arabidopsis thaliana. We have developed the Athena database, which contains 30 067 predicted Arabidopsis promoter sequences and consensus sequences for 105 previously characterized transcription factor (TF) binding sites. Athena provides four novel tools to facilitate the analysis of promoter sequences: a promoter visualization tool to enable the rapid inspection of key regulatory sequences in multiple promoters; a TF binding site enrichment tool to identify statistically over-represented TF sites occurring in a user-selected subset of promoters; a data-mining tool to rapidly select promoter sequences containing the specified combination of TF binding sites; and a tool to display the distribution of TF binding site positions in a selected set of promoter sequences.

Availability: http://www.bioinformatics2.wsu.edu/Athena

Contact: jwyrick{at}wsu.edu


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 SYSTEM DESCRIPTION
 REFERENCES
 
The plant Arabidopsis thaliana presents an excellent model system for the computational study of promoter sequences and transcription regulation. To aid in this study, a number of databases containing information about Arabidopsis promoter sequences and transcription factor (TF) binding sequences have been developed. Several databases, including PlantCARE (Lescot et al., 2002; Rombauts et al., 1999), PLACE (Higo et al., 1999) and ATTFDB (Davuluri et al., 2003), store consensus binding sequences for plant or Arabidopsis-specific TFs. ATCISDB, which is included in the Agris website along with ATTFDB, catalogs several putative binding sites on a genome-wide scale. This resource also provides visualization tools for individual promoter sequences (Davuluri et al., 2003). AthaMap integrates both Arabidopsis DNA sequence and predicted TF binding sites (Steffens et al., 2004). Through this resource, single segments of DNA are viewable in a textual format with TF binding sites indicated. PlantProm DB curates a collection of plant promoters with experimentally verified transcription start sites (Shahmuradov et al., 2003).

To supplement these transcription-related resources, there is a need for integrated tools that perform on-the-fly visualization of promoter sequences and streamlined data-mining applications of regulatory promoter sequences. Tools are also needed to assess the statistical significance of a putative enrichment of regulatory sequences in a selected subset of promoters. A further need is to integrate all of these tools and resources within one interface so that the output from one tool can be automatically piped into a related tool.

To meet these computational challenges, the Athena database employs a unified and automated interface to assist in the analysis of Arabidopsis promoter sequences. Athena integrates DNA sequence and Gene Ontology (GO) data to facilitate the visualization, statistical and positional analysis of promoter regulatory elements. These tools should advance the study of promoter sequences of individual genes and also provide a framework for the systematic analysis of transcriptional regulatory networks in A.thaliana.


    SYSTEM DESCRIPTION
 TOP
 ABSTRACT
 INTRODUCTION
 SYSTEM DESCRIPTION
 REFERENCES
 
The Athena database and accompanying web interface were constructed using the MySQL DBMS, custom Perl scripts and dynamically generated HTML/JavaScript web pages. Athena contains up to 3 kb of promoter sequence for 30 067 predicted Arabidopsis genes and consensus sequences for 105 previously characterized TF binding sites. Athena also contains GO information for each predicted gene. Promoter sequence and GO data were obtained from TAIR (Rhee et al., 2003); TF binding site consensus sequences were imported from PLACE (Higo et al., 1999) and Agris (Davuluri et al., 2003). Using standard defaults for maximum promoter size (3 kb) and truncating promoter sequences that overlap with upstream genes, we find that the average Arabidopsis promoter has 29.9 predicted TF binding sites.

Promoter visualization
The promoter visualization tool graphically represents key regulatory elements in promoter sequences. In the visualization tool's promoter selection page, the user can enter up to 100 gene accessions (e.g. At5g52310) and select various display options.

Athena was used to visualize 2000 bp of the promoter sequence of the gene erd10 (At1g20450), which is involved in dehydration response in plants. A portion of the resulting web page containing the compact promoter visualization output is shown in Figure 1A. The web page also shows a TF table beneath the promoter image, which provides a key to the color-coded TF binding sites, lists the number of instances of each TF binding site in the promoter and gives the significance of enrichment (P-value) for each TF site in the promoter sequence (calculated using a hypergeometric probability distribution). An example of a TF table is shown in Figure 1B. The alternative form of promoter visualization is the cartoon display (data not shown), which is better at illustrating overlapping TF binding sites than the compact display. The visualization display is interactive; by checking the boxes adjacent to each TF site in the table, the user can select which TF binding sites to display in the promoter image.



View larger version (68K):
[in this window]
[in a new window]
 
Figure 1 (A) Compact visualization of the erd10 (At1g20450) promoter. Each TF binding site is indicated by a color-coded hash mark matching those of the TF table. The predicted gene sequence is represented as a gray rectangle; an arrow indicates the start of transcription. Aqua rectangles indicate predicted CpG islands. (B) Enriched TF binding sites present in the promoters of dehydration/cold response genes. The number of promoters containing at least one instance of the TF binding site and the total number of TF binding sites in the selected set of sequences are given in the ‘P’ and ‘S’ columns of the TF table, respectively. (C) Histogram of ABRE-like binding site distribution in dehydration/cold response promoter sequences. The gray distribution backdrop is the random (expected) distribution of these TF binding sites. (D) Histogram of DRE-core motif position distribution. The position of the DRE-core motif was plotted across all dehydration/cold acclimation promoters with the same parameters as in C.

 
Data mining
The data-mining tool enables the user to identify promoters based on the presence of selected TF binding sites in the promoter sequence, or the GO classification of the corresponding gene. Multiple TF sites can be selected using AND Boolean logic; promoters containing all of the selected TF sites will be identified. Multiple GO terms can be selected using OR Boolean logic.

The data-mining tool was used to select promoters whose corresponding genes were classified under the following three GO terms: response to water deprivation, salinity response and cold acclimation. The mining tool identified 55 genes classified under at least one of these terms (data not shown). The data-mining results include a table of TF binding sites that are over-represented (enriched) in the promoter sequences of dehydration/cold response genes (Fig. 1B). Five of the enriched TF binding sites are classified under the ABRE/ABF or DRE binding site families, which are known to have a significant role in regulating gene expression in response to the three selected stresses (Yamaguchi-Shinozaki and Shinozaki, 1994). The mining results page also includes the list of 55 genes selected, a table of TF sites that are present but not enriched and a table of enriched GO terms (data not shown).

More advanced data-mining queries can be performed using the analysis tool selection page. The analysis selection tool includes optional promoter position constraints, which allow the user to search for TF sites that occur in specific positions in the promoter sequences (e.g. –200 to –100 bp upstream). Athena also includes a custom motif search tool, which allows the user to search for promoters containing a user-specified consensus sequence.

Analysis of TF site position bias
The Athena analysis tools can be used to examine the distribution of TF binding sites in promoter sequences. The analysis page contains a histogram tool that graphically displays the positions of a selected TF binding site among the specified promoter sequences. If multiple TF sites are selected, then the histogram tool will display the aggregate positions of all selected TF sites in a single plot.

The histogram tool was used to display the distribution of ABRE-like TF binding sites in the promoter sequences of the previously selected dehydration/cold response genes. Inspection of Figure 1C indicates that the ABRE-like TF sites are enriched between positions –40 and –160 in the promoter sequences of dehydration/cold response genes. The observed position bias is not simply due to the base composition of the ABRE-like consensus sequence; for random permutations of this consensus sequence do not show a similar promoter position bias (data not shown). In contrast, the DRE core site does not show a strong positional bias in the promoters of dehydration/cold response genes (Fig. 1D). These results suggest the hypothesis that the positioning of the ABRE-like TF site in promoter sequences of dehydration/cold response genes may be important for its transcriptional regulatory function.


    Acknowledgments
 
We are grateful to John Browse, Mark Lange, Nicholas Provart, Heidi Strom and Bryan Thines for helpful comments, suggestions and discussions on the Athena database and this manuscript. We thank Monique Kohagura for assistance in developing some of the Athena tools and web pages. We thank Justin Fischer and Jason Sikes for web and server support. This work was supported, in part, by American Cancer Society grant RSG-03-181-01-GMC.

Conflict of Interest: none declared.

Received on July 28, 2005; revised on September 15, 2005; accepted on October 10, 2005

    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 SYSTEM DESCRIPTION
 REFERENCES
 

    Davuluri, R.V., et al. (2003) AGRIS: Arabidopsis gene regulatory information server, an information resource of Arabidopsis cis-regulatory elements and transcription factors. BMC Bioinformatics, 4, 25[CrossRef][Medline].

    Higo, K., et al. (1999) Plant cis-acting regulatory DNA elements (PLACE) database: 1999. Nucleic Acids Res, . 27, 297–300[Abstract/Free Full Text].

    Lescot, M., et al. (2002) PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences. Nucleic Acids Res, . 30, 325–327[Abstract/Free Full Text].

    Rhee, S.Y., et al. (2003) The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucleic Acids Res, . 31, 224–228[Abstract/Free Full Text].

    Rombauts, S., et al. (1999) PlantCARE, a plant cis-acting regulatory element database. Nucleic Acids Res, . 27, 295–296[Abstract/Free Full Text].

    Shahmuradov, I.A., et al. (2003) PlantProm: a database of plant promoter sequences. Nucleic Acids Res, . 31, 114–117[Abstract/Free Full Text].

    Steffens, N.O., et al. (2004) AthaMap: an online resource for in silico transcription factor binding sites in the Arabidopsis thaliana genome. Nucleic Acids Res, . 32, D368–D372[Abstract/Free Full Text].

    Yamaguchi-Shinozaki, K. and Shinozaki, K. (1994) A novel cis-acting element in an Arabidopsis gene is involved in responsiveness to drought, low-temperature, or high-salt stress. Plant Cell, 6, 251–264[Abstract].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Plant CellHome page
S. M. Brady and N. J. Provart
Web-Queryable Large-Scale Data Sets for Hypothesis Generation in Plant Biology
PLANT CELL, April 1, 2009; 21(4): 1034 - 1051.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
J. A. Christianson, I. W. Wilson, D. J. Llewellyn, and E. S. Dennis
The Low-Oxygen-Induced NAC Domain Transcription Factor ANAC102 Affects Viability of Arabidopsis Seeds following Low-Oxygen Treatment
Plant Physiology, April 1, 2009; 149(4): 1724 - 1738.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
L. Bulow, S. Engelmann, M. Schindler, and R. Hehl
AthaMap, integrating transcriptional and post-transcriptional data
Nucleic Acids Res., January 1, 2009; 37(suppl_1): D983 - D986.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. T. Morris, T. R. O'Connor, and J. J. Wyrick
Osiris: an integrated promoter database for Oryza sativa L.
Bioinformatics, December 15, 2008; 24(24): 2915 - 2917.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
R. C. Day, R. P. Herridge, B. A. Ambrose, and R. C. Macknight
Transcriptome Analysis of Proliferating Arabidopsis Endosperm Reveals Biological Implications for the Control of Syncytial Division, Cytokinin Signaling, and Gene Expression Regulation
Plant Physiology, December 1, 2008; 148(4): 1964 - 1984.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Choudhury and A. Lahiri
Arabidopsis thaliana regulatory element analyzer
Bioinformatics, October 1, 2008; 24(19): 2263 - 2264.
[Abstract] [Full Text] [PDF]


Home page
J Exp BotHome page
D. Huang, W. Wu, S. R. Abrams, and A. J. Cutler
The relationship of drought-related gene expression in Arabidopsis thaliana to hormonal and environmental factors
J. Exp. Bot., August 1, 2008; 59(11): 2991 - 3007.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
L. H.M. Ho, E. Giraud, V. Uggalla, R. Lister, R. Clifton, A. Glen, D. Thirkettle-Watts, O. Van Aken, and J. Whelan
Identification of Regulatory Pathways Controlling Gene Expression of Stress-Responsive Mitochondrial Proteins in Arabidopsis
Plant Physiology, August 1, 2008; 147(4): 1858 - 1873.
[Abstract] [Full Text] [PDF]


Home page
Brief Funct Genomic ProteomicHome page
A. Krishnan and A. Pereira
Integrative approaches for mining transcriptional regulatory programs in Arabidopsis
Brief Funct Genomic Proteomic, July 16, 2008; (2008) eln035v1.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
G. Zeller, R. M. Clark, K. Schneeberger, A. Bohlen, D. Weigel, and G. Ratsch
Detecting polymorphic regions in Arabidopsis thaliana with resequencing microarrays
Genome Res., June 1, 2008; 18(6): 918 - 929.
[Abstract] [Full Text] [PDF]


Home page
Plant CellHome page
H. A. van den Burg, D. I. Tsitsigiannis, O. Rowland, J. Lo, G. Rallapalli, D. MacLean, F. L.W. Takken, and J. D.G. Jones
The F-Box Protein ACRE189/ACIF1 Regulates Cell Death and Defense Responses Activated during Pathogen Recognition in Tobacco and Tomato
PLANT CELL, March 1, 2008; 20(3): 697 - 719.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
D. Kerk, G. Templeton, and G. B.G. Moorhead
Evolutionary Radiation Pattern of Novel Protein Phosphatases Revealed by Analysis of Protein Data from the Completely Sequenced Genomes of Humans, Green Algae, and Higher Plants
Plant Physiology, February 1, 2008; 146(2): 351 - 367.
[Abstract] [Full Text] [PDF]


Home page
ScienceHome page
S. M. Brady, D. A. Orlando, J.-Y. Lee, J. Y. Wang, J. Koch, J. R. Dinneny, D. Mace, U. Ohler, and P. N. Benfey
A High-Resolution Root Spatiotemporal Map Reveals Dominant Expression Patterns
Science, November 2, 2007; 318(5851): 801 - 806.
[Abstract] [Full Text] [PDF]


Home page
Plant CellHome page
R. Zentella, Z.-L. Zhang, M. Park, S. G. Thomas, A. Endo, K. Murase, C. M. Fleet, Y. Jikumaru, E. Nambara, Y. Kamiya, et al.
Global Analysis of DELLA Direct Targets in Early Gibberellin Signaling in Arabidopsis
PLANT CELL, October 1, 2007; 19(10): 3037 - 3057.
[Abstract] [Full Text] [PDF]


Home page
J Exp BotHome page
L. Li, H. Ilarslan, M. G. James, A. M. Myers, and E. S. Wurtele
Genome wide co-expression among the starch debranching enzyme genes AtISA1, AtISA2, and AtISA3 in Arabidopsis thaliana
J. Exp. Bot., September 20, 2007; (2007) erm180v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. Galuschka, M. Schindler, L. Bulow, and R. Hehl
AthaMap web tools for the analysis and identification of co-regulated genes
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D857 - D862.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
R. Muller, M. Morant, H. Jarmer, L. Nilsson, and T. H. Nielsen
Genome-Wide Analysis of the Arabidopsis Leaf Transcriptome Reveals Interaction of Phosphate and Sugar Metabolism
Plant Physiology, January 1, 2007; 143(1): 156 - 171.
[Abstract] [Full Text] [PDF]


Home page
Plant CellHome page
N. Journot-Catalino, I. E. Somssich, D. Roby, and T. Kroj
The Transcription Factors WRKY11 and WRKY17 Act as Negative Regulators of Basal Resistance in Arabidopsis thaliana
PLANT CELL, November 1, 2006; 18(11): 3289 - 3302.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/24/4411    most recent
bti714v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (14)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by O'Connor, T. R.
Right arrow Articles by Wyrick, J. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by O'Connor, T. R.
Right arrow Articles by Wyrick, J. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?