Bioinformatics Advance Access originally published online on August 18, 2005
Bioinformatics 2005 21(22):4187-4189; doi:10.1093/bioinformatics/bti635
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Virtual Footprint and PRODORIC: an integrative framework for regulon prediction in prokaryotes
1Institute of Microbiology, Technical University Braunschweig Spielmannstrasse 7, D-38106 Braunschweig, Germany
2Institute of Biochemical Engineering, Technical University Braunschweig Gaußstrasse 17, 38106 Braunschweig, Germany
3Department of Informatics, University of Applied Sciences Wolfenbüttel Am Exer 2, 38302 Wolfenbüttel, Germany
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: A new online framework for the accurate and integrative prediction of transcription factor binding sites (TFBSs) in prokaryotes was developed. The system consists of three interconnected modules: (1) The PRODORIC database as a comprehensive data source and extensive collection of TFBSs with corresponding position weight matrices. (2) The pattern matching tool Virtual Footprint for the prediction of genome based regulons and for the analysis of individual promoter regions. (3) The interactive genome browser GBPro for the visualization of TFBS search results in their genomic context and links to gene and regulator-specific information in PRODORIC. The aim of this service is to provide researchers a free and easy to use collection of interconnected tools in the field of molecular microbiology, infection and systems biology.
Availability: http://www.prodoric.de/vfp
Contact: d.jahn{at}tu-bs.de
| INTRODUCTION |
|---|
|
|
|---|
The accurate prediction of transcription factor binding sites (TFBSs) and whole regulons is still a crucial step towards the understanding of complex regulatory networks in systems biology. Current bioinformatic methods of pattern recognition usually suffer from their low specificity. This often results in an accumulation of false-positive matches (Frech et al., 1997; Benitez-Bellon et al., 2002). Therefore, we developed a new framework for the straightforward evaluation and visualization of in silico results with focus on bacterial gene regulation. The user can interactively identify putative TFBSs using genome wide searches and immediately obtain detailed information on the promoters, corresponding genes, operons and encoded proteins found. This includes the genomic localization and detailed information about potentially regulated genes via links to the prokaryotic database of gene regulation (PRODORIC) database as well as relevant links to external sources. Moreover, it is possible to evaluate the matches obtained according to their phylogenetic conservation. The software is organized in three major interconnected components which are the PRODORIC database, the pattern search tool Virtual Footprint and the genome browser GBpro.
The PRODORIC database
The PRODORIC database is a comprehensive source of prokaryotic genomes and their underlying gene regulatory networks (Münch et al 2003). Among many other features it contains a compilation of over 2500 TFBSs from several bacterial species including their interacting transcriptional regulators. The data of PRODORIC are all based on experimental evidence which were manually extracted from the original literature. Using this huge collection of TFBSs over 170 species-specific position weight matrices (PWMs) were generated which serve as a library for the pattern matching tool Virtual Footprint. Besides the exclusive TFBS information, PRODORIC contains data of genes and proteins, promoter details, operon structures and links to relevant databases. Table 1 summarizes the data content of PRODORIC in the field of bacterial gene regulation. More recently the database was supplemented with additional information such as expression data from trans criptomics and proteomics experiments and metabolic networks. This integrated approach makes PRODORIC well suited as a platform for systems biology in prokaryotes.
|
Virtual Footprintpattern matcher
The new pattern search tool Virtual Footprint offers fast searches of complex DNA patterns in whole bacterial genomes. Usually the search pattern is defined as PWM provided by PRODORIC (Fig. 1B). We also added PWMs from other resources (Robison et al., 1998; Salgado et al., 2004). However, in some cases, e.g. when sufficient sequence data are not available, it is necessary to use other pattern definitions (Stormo, 2000). Therefore, we implemented search algorithms using IUPAC consensi and regular expressions (Betel et al., 2002). Some TFBSs are not only variable in their sequence conservation but also in their sequence lengths. Common examples are the occurrence of two half-sites separated by a variable spacer, the conserved 10 and 35 hexamers found in
70 regulated promoters and other so, called composite elements (Kel-Margoulis et al., 2000). Therefore, Virtual Footprint allows the definition of bipartite patterns via the combination of up to two subpatterns separated by a variable spacer. Different pattern types can be freely combined e.g. a PWM with a IUPAC string. This enables a flexible definition of search patterns. The list of obtained matches can be evaluated by several different in silico approaches in accordance with their genomic context. Usually matches are directly linked to the downstream genes. Identified genes or operons are linked to the PRODORIC database and the genome browser GBpro (Fig. 1). If it is not possible to assign a gene to a match, the corresponding genomic location is specified. For a search the size of the upstream region (distance to the start codon) can be defined, the pattern orientation can be selected and matches in coding regions can be excluded. Matches can be further evaluated by analyzing upstream regions of orthologous genes for the same pattern (Fig. 1D). In this case orthologous sequence stretches from different genomes are extracted by the use of BLAST (Altschul et al., 1990) and then analyzed via Virtual Footprint. This kind of investigation is also called regulog analysis (Alkema et al., 2004). Furthermore, the GC-content of a promoter region and the resulting stacking energy plot shown by the genome browser can help to evaluate matches since functional targets should be localized in chromosomal regions where the GC-content and stacking energy are expected to be below the average. In addition to the whole genome search, Virtual Footprint allows the analysis of single promoter regions. In this case the upstream sequence of a gene or a pasted user defined sequence is compared with all PWMs provided by PRODORIC. Virtual Footprint has been successfully applied to define the ResD regulon of Bacillus subtilis (Härtig et al., 2004) and to detect split tRNA genes in Nanoarchaeum equitans (Randau et al., 2005). The program offers many options and settings not described here. A detailed description is available in the online help of the program.
|
GBproGenome Browser
GBpro is a genome browser for an interactive navigation through all bacterial genomes available in PRODORIC. Genes, promoters and binding sites are displayed in parallel as graphical maps and highlighted sequences. Optionally, the GC-content and stacking energy of a DNA sequence of interest can be visualized. All results of Virtual Footprint are directly linked to GBpro and can thus be visualized in their genomic context (Fig. 1C). Similarly, genes and TFBSs present in PRODORIC are directly linked to this GBpro.
| Acknowledgments |
|---|
We would like to thank Denise Wätzlich and Lorenz Reimer for technical support and Dr Barbara Schulz for critical proofreading of the manuscript. We also thank Karin and Werner Müunch for assistance in web design. This work was funded by the German Federal Ministry of Education and Research (BMBF) for the Bioinformatics Competence Center Intergenomics (Grant No. 031U110A/031U210A).
Conflict of Interest: none declared.
Received on May 11, 2005; revised on August 15, 2005; accepted on August 15, 2005
| REFERENCES |
|---|
|
|
|---|
Alkema, W.B.L., et al. (2004) Regulog analysis: detection of conserved regulatory networks across bacteria: application to Staphylococcus aureus. Genome Res., 14, 13621373
Altschul, S.F., et al. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403410[CrossRef][ISI][Medline].
Baldi, P. and Baisnee, P.F. (2000) Sequence analysis by additive scales: DNA structure for sequences and repeats of all lengths. Bioinformatics, 16, 865889
Benitez-Bellon, E., et al. (2002) Evaluation of thresholds for the detection of binding sites for regulatory proteins in Escherichia coli K12 DNA. Genome Biol., 3, 13.
Betel, D. and Hogue, C. (2002) KangarooA pattern-matching program for biological sequences. BMC Bioinformatics, 3, 20[CrossRef][Medline].
Frech, K., et al. (1997) Finding protein-binding sites in DNA sequences: the next generation. Trends Biochem. Sci., 22, 103104[CrossRef][ISI][Medline].
Härtig, E., et al. (2004) Bacillus subtilis ResD induces expression of the potential regulatory genes yclJK upon oxygen limitation. J. Bacteriol., 186, 64776484
Kel-Margoulis, O.V., et al. (2000) COMPEL: a database on composite regulatory elements providing combinatorial transcriptional regulation. Nucleic Acids Res., 28, 311315
Münch, R., et al. (2003) PRODORIC: prokaryotic database of gene regulation. Nucleic Acids Res., 31, 266269
Randau, L., et al. (2005) Nanoarchaeum equitans creates functional tRNA from separate genes for their 5'- and 3'-halves. Nature, 433, 537541[CrossRef][Medline].
Robison, K., et al. (1998) A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K12 genome. J. Mol. Biol., 284, 241254[CrossRef][ISI][Medline].
Salgado, H., et al. (2004) RegulonDB (version 4.0): transcriptional regulation, operon organization and growth conditions in Escherichia coli K12. Nucleic Acids Res., 32, 303306.
Stormo, G.D. (2000) DNA binding sites: representation and discovery. Bioinformatics, 16, 1623
This article has been cited by other articles:
![]() |
B. Benkert, N. Quack, K. Schreiber, L. Jaensch, D. Jahn, and M. Schobert Nitrate-responsive NarX-NarL represses arginine-mediated induction of the Pseudomonas aeruginosa arginine fermentation arcDABC operon Microbiology, October 1, 2008; 154(10): 3053 - 3060. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. D. Sriramulu, M. Liang, D. Hernandez-Romero, E. Raux-Deery, H. Lunsdorf, J. B. Parsons, M. J. Warren, and M. B. Prentice Lactobacillus reuteri DSM 20016 Produces Cobalamin-Dependent Diol Dehydratase in Metabolosomes and Metabolizes 1,2-Propanediol by Disproportionation J. Bacteriol., July 1, 2008; 190(13): 4559 - 4567. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Klein, S. Leupold, R. Munch, C. Pommerenke, T. Johl, U. Karst, L. Jansch, D. Jahn, and I. Retter ProdoNet: identification and visualization of prokaryotic gene regulatory and metabolic networks Nucleic Acids Res., July 1, 2008; 36(suppl_2): W460 - W464. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Sugahara, N. Yachie, K. Arakawa, and M. Tomita In silico screening of archaeal tRNA-encoding genes having multiple introns with bulge-helix-bulge splicing motifs RNA, May 1, 2007; 13(5): 671 - 681. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Choi, R. Munch, S. Leupold, J. Klein, I. Siegel, B. Thielen, B. Benkert, M. Kucklick, M. Schobert, J. Barthelmes, et al. SYSTOMONAS -- an integrated database for systems biology analysis of Pseudomonas Nucleic Acids Res., January 12, 2007; 35(suppl_1): D533 - D537. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. E. Kazakov, M. J. Cipriano, P. S. Novichkov, S. Minovitsky, D. V. Vinogradov, A. Arkin, A. A. Mironov, M. S. Gelfand, and I. Dubchak RegTransBase--a database of regulatory sequences and interactions in a wide range of prokaryotic genomes Nucleic Acids Res., January 12, 2007; 35(suppl_1): D407 - D412. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Jensen, D. Lons, C. Zaoui, F. Bredenbruch, A. Meissner, G. Dieterich, R. Munch, and S. Haussler RhlR Expression in Pseudomonas aeruginosa Is Modulated by the Pseudomonas Quinolone Signal via PhoB-Dependent and -Independent Pathways J. Bacteriol., December 15, 2006; 188(24): 8601 - 8606. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Wecke, B. Veith, A. Ehrenreich, and T. Mascher Cell Envelope Stress Response in Bacillus licheniformis: Integrating Comparative Genomics, Transcriptional Profiling, and Regulon Mining To Decipher a Complex Regulatory Network J. Bacteriol., November 1, 2006; 188(21): 7500 - 7511. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Boes, K. Schreiber, E. Hartig, L. Jaensch, and M. Schobert The Pseudomonas aeruginosa Universal Stress Protein PA4352 Is Essential for Surviving Anaerobic Energy Stress. J. Bacteriol., September 1, 2006; 188(18): 6529 - 6538. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Jordan, A. Junker, J. D. Helmann, and T. Mascher Regulation of LiaRS-Dependent Gene Expression in Bacillus subtilis: Identification of Inhibitor Proteins, Regulator Binding Sites, and Target Genes of a Conserved Cell Envelope Stress-Sensing Two-Component System. J. Bacteriol., July 1, 2006; 188(14): 5153 - 5166. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. F. Alice, C. S. Lopez, C. A. Lowe, M. A. Ledesma, and J. H. Crosa Genetic and Transcriptional Analysis of the Siderophore Malleobactin Biosynthesis and Transport Genes in the Human Pathogen Burkholderia pseudomallei K96243 J. Bacteriol., February 15, 2006; 188(4): 1551 - 1566. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Reents, R. Munch, T. Dammeyer, D. Jahn, and E. Hartig The Fnr Regulon of Bacillus subtilis J. Bacteriol., February 1, 2006; 188(3): 1103 - 1112. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




