Bioinformatics Advance Access originally published online on December 1, 2007
Bioinformatics 2008 24(3):424-425; doi:10.1093/bioinformatics/btm600
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
BAGET: a web server for the effortless retrieval of prokaryotic gene context and sequence
Biologie Moléculaire du Gène chez les Extrêmophiles, Université Paris-Sud, Institut de Génétique et Microbiologie, CNRS UMR8621, 91405 Orsay Cedex, France
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: BAGET (Bacterial and Archaeal Gene Exploration Tool) is a web service designed to facilitate extraction, by molecular geneticists and phylogeneticists, of specific gene and protein sequences from completely determined prokaryotic genomes. Upon selection of a particular prokaryotic organism and gene, two levels of visual gene context information are provided on a single dynamic page: (i) a graphical representation of a user defined portion of the chromosome centered on the gene of interest and (ii) the DNA sequence of the query gene, of the immediate neighboring genes and the intergenic regions each identified by a consistent color code. The aminoacid sequence is provided for protein-coding query genes. Query results can be exported as a rich text format (RTF) word processor file for printing, archival or further analysis.
Availability: http://archaea.u-psud.fr/bin/baget.dll
Contact: jacques.oberto{at}igmors.u-psud.fr
| 1 INTRODUCTION |
|---|
|
|
|---|
In the recent years, prokaryotic molecular genetics has taken a quantum leap with the availability of a large and growing number of completely sequenced bacterial and archaeal genomes. This mass of genetic sequences and related annotations is freely available through the National Center for Biotechnology Information (NCBI); however, access to the primary sequence and sequence context of a given gene is not trivial for wet bench experimentalists or phylogenseticists. Genomic information consists of large text files in GenBank format containing gene features and annotations and followed by one strand of the complete DNA sequence. GenBank files are impractical for routine use: their name is not explicit and their format is designed to be parsed mainly by computer software or interpreted by programmers. Specialized platform-specific software packages such as Vector NTI (Invitrogen) can interpret these files but require program installation and the download and storage of database files locally. In addition to the NCBI repository, several web servers have been developed such as TIGR-CMR (Peterson et al., 2001) and the Integrated Microbial Genomes system (Markowitz et al., 2006) in order to provide comprehensive access to prokaryotic genomes. Unfortunately, the complexity of these sites imposes navigation through a number of interlinked static pages before reaching the requested information. Furthermore, access to intergenic DNA sequences and their mapping is often overlooked. For these reasons I have developed BAGET (Bacterial and Archaeal Gene Exploration Tool), a web server providing a fast and effortless access to virtually any prokaryotic gene present in the sequenced archaeal and bacterial genomes of the NCBI repository. The features of this web service are described below.
| 2 FEATURES |
|---|
|
|
|---|
The BAGET web service graphical interface consists of a single dynamic page compatible with all recent internet browsers/operating systems. BAGET datafiles and genomes are stored on the server itself and are updated/converted on a daily basis from the NCBI repository (see below). Individual chromosomes can be selected from a list of bacterial and archaeal species names. When a given microorganism harbors multiple chromosomes, their respective names are appended with the C1, C2, etc, suffixes by decreasing size. The gene query can be executed by entering a case insensitive name or name substring. If the gene search is successful, the user is provided with one or several entries which can be selected to generate a report constituted of four parts (Fig. 1):
- The genomic context image depicts a chromosomal interval of user selected length centered on the query gene. The query gene ORF (or non-coding sequence) is represented as a rightward red arrow in the 5'–3' coding polarity. The other genes are rotated accordingly and drawn in blue if in the same polarity or in green for the opposite one. Rapid switching form one query gene to the other is achieved by clicking on the genomic context map. Pointing device context-specific information such as gene name and product is provided for each coding sequence.
- The DNA sequence is represented in the same color scheme as the genomic context; it encompasses the query gene and the immediate upstream and downstream neighboring genes. The color coding permits an easy and accurate identification of the open reading frames and non-translated intergenic regions surrounding the query gene. The particular case of overlapping open reading frames is implemented in BAGET: the background color of the query gene sequence is that of the overlapping gene.
- The aminoacid sequence is produced if the query gene encodes a mRNA.
- An external hyperlink to the NCBI databases is provided.
|
At this stage, BAGET cannot resolve the rare genes located within other genes and the trans-splicing proteins of Nanoarchaeum equitans. These limitations will be addressed in further versions of the program.
| 3 DATABASE MANAGEMENT |
|---|
|
|
|---|
BAGET database files are managed on the server by the accessory program BagetUpdater, designed for the automated daily incremental retrieval and conversion of the prokaryotic databases in GenBank format from the NCBI repository. BagetUpdater generates for each chromosome a flat DNA text file and a compact index file listing the name, coordinates, orientation and coding ability and product of each gene.
| 4 CONCLUSIONS |
|---|
|
|
|---|
Even with the plethora of bioinformatics web servers available today offering a variety of tools reviewed in the Bioinformatic Links Directory (Fox et al., 2007), the access to a given prokaryotic DNA-coding sequence, together with its 5' and 3' control regions remains difficult. BAGET was therefore developed as a web tool to extract efficiently gene context, DNA and protein sequences from completely sequenced prokaryotic genomes. The query results, presented intuitively on a single dynamic page, provide assistance for the design of primers, PCR amplification templates and DNA cloning strategies. Rapid identification of coding regions and potential regulatory features in the intergenic regions facilitates the design of promoter and gene fusions to reporter genes. Phylogenetic projects might benefit from this tool as well, for the extraction of gene and protein orthologs from a variety of related or distant genomes.
Conflict of Interest: none declared.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
The author wishes to thank the "Centre National pour la Recherche Scientifique" for financial support.
| FOOTNOTES |
|---|
Associate Editor: John Quackenbush
Received on September 13, 2007; revised on November 15, 2007; accepted on November 28, 2007
| REFERENCES |
|---|
|
|
|---|
Fox JA, et al. Conducting research on the web: 2007 update for the bioinformatics links directory. Nucleic Acids Res (2007) 35:W3–W5.
Markowitz VM, et al. The integrated microbial genomes (IMG) system. Nucleic Acids Res (2006) 34:D344–D348.
Peterson JD, et al. The Comprehensive Microbial Resource. Nucleic Acids Res (2001) 29:123–125.
This article has been cited by other articles:
![]() |
J. Oberto, N. Breuil, A. Hecker, F. Farina, C. Brochier-Armanet, E. Culetto, and P. Forterre Qri7/OSGEPL, the mitochondrial version of the universal Kae1/YgjD protein, is essential for mitochondrial genome maintenance Nucleic Acids Res., September 1, 2009; 37(16): 5343 - 5352. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

