Skip Navigation


Bioinformatics Advance Access originally published online on December 1, 2007
Bioinformatics 2008 24(3):424-425; doi:10.1093/bioinformatics/btm600
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/3/424    most recent
btm600v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Oberto, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Oberto, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

BAGET: a web server for the effortless retrieval of prokaryotic gene context and sequence

Jacques Oberto *

Biologie Moléculaire du Gène chez les Extrêmophiles, Université Paris-Sud, Institut de Génétique et Microbiologie, CNRS UMR8621, 91405 Orsay Cedex, France

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 FEATURES
 3 DATABASE MANAGEMENT
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 

Summary: BAGET (Bacterial and Archaeal Gene Exploration Tool) is a web service designed to facilitate extraction, by molecular geneticists and phylogeneticists, of specific gene and protein sequences from completely determined prokaryotic genomes. Upon selection of a particular prokaryotic organism and gene, two levels of visual gene context information are provided on a single dynamic page: (i) a graphical representation of a user defined portion of the chromosome centered on the gene of interest and (ii) the DNA sequence of the query gene, of the immediate neighboring genes and the intergenic regions each identified by a consistent color code. The aminoacid sequence is provided for protein-coding query genes. Query results can be exported as a rich text format (RTF) word processor file for printing, archival or further analysis.

Availability: http://archaea.u-psud.fr/bin/baget.dll

Contact: jacques.oberto{at}igmors.u-psud.fr


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 FEATURES
 3 DATABASE MANAGEMENT
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
In the recent years, prokaryotic molecular genetics has taken a quantum leap with the availability of a large and growing number of completely sequenced bacterial and archaeal genomes. This mass of genetic sequences and related annotations is freely available through the National Center for Biotechnology Information (NCBI); however, access to the primary sequence and sequence context of a given gene is not trivial for wet bench experimentalists or phylogenseticists. Genomic information consists of large text files in GenBank format containing gene features and annotations and followed by one strand of the complete DNA sequence. GenBank files are impractical for routine use: their name is not explicit and their format is designed to be parsed mainly by computer software or interpreted by programmers. Specialized platform-specific software packages such as Vector NTI (Invitrogen) can interpret these files but require program installation and the download and storage of database files locally. In addition to the NCBI repository, several web servers have been developed such as TIGR-CMR (Peterson et al., 2001) and the Integrated Microbial Genomes system (Markowitz et al., 2006) in order to provide comprehensive access to prokaryotic genomes. Unfortunately, the complexity of these sites imposes navigation through a number of interlinked static pages before reaching the requested information. Furthermore, access to intergenic DNA sequences and their mapping is often overlooked. For these reasons I have developed BAGET (Bacterial and Archaeal Gene Exploration Tool), a web server providing a fast and effortless access to virtually any prokaryotic gene present in the sequenced archaeal and bacterial genomes of the NCBI repository. The features of this web service are described below.


    2 FEATURES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 FEATURES
 3 DATABASE MANAGEMENT
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
The BAGET web service graphical interface consists of a single dynamic page compatible with all recent internet browsers/operating systems. BAGET datafiles and genomes are stored on the server itself and are updated/converted on a daily basis from the NCBI repository (see below). Individual chromosomes can be selected from a list of bacterial and archaeal species names. When a given microorganism harbors multiple chromosomes, their respective names are appended with the C1, C2, etc, suffixes by decreasing size. The gene query can be executed by entering a case insensitive name or name substring. If the gene search is successful, the user is provided with one or several entries which can be selected to generate a report constituted of four parts (Fig. 1):

  1. The genomic context image depicts a chromosomal interval of user selected length centered on the query gene. The query gene ORF (or non-coding sequence) is represented as a rightward red arrow in the 5'–3' coding polarity. The other genes are rotated accordingly and drawn in blue if in the same polarity or in green for the opposite one. Rapid switching form one query gene to the other is achieved by clicking on the genomic context map. Pointing device context-specific information such as gene name and product is provided for each coding sequence.
  2. The DNA sequence is represented in the same color scheme as the genomic context; it encompasses the query gene and the immediate upstream and downstream neighboring genes. The color coding permits an easy and accurate identification of the open reading frames and non-translated intergenic regions surrounding the query gene. The particular case of overlapping open reading frames is implemented in BAGET: the background color of the query gene sequence is that of the overlapping gene.
  3. The aminoacid sequence is produced if the query gene encodes a mRNA.
  4. An external hyperlink to the NCBI databases is provided.
An alternative gene selection list is provided; it can be navigated by groups of hundred genes, listed by chromosomal position. This list is especially useful for fast chromosome browsing or for genomes provided with numeric gene name annotations only. BAGET reports can be exported as rich text format (RTF) files compatible with all major word processors for printing, storage or further elaboration with other software programs.


Figure 1
View larger version (60K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. BAGET report for the Escherichia coli yhhL gene. (A) Genomic context panel. The query gene (in red) is surrounded by its neighboring genes of same or opposite polarity (blue and green, respectively). (B) DNA sequence panel of the query gene and its immediate neighbors. Consistent red color coding identifies the query gene ORF in its genomic and sequence contexts. Gene-specific information is provided by mouse pointing. The blue-background sequence indicates the extent of overlap between rsmD and yhhL ORFs. (C) Protein sequence panel.

 
At this stage, BAGET cannot resolve the rare genes located within other genes and the trans-splicing proteins of Nanoarchaeum equitans. These limitations will be addressed in further versions of the program.


    3 DATABASE MANAGEMENT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 FEATURES
 3 DATABASE MANAGEMENT
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
BAGET database files are managed on the server by the accessory program BagetUpdater, designed for the automated daily incremental retrieval and conversion of the prokaryotic databases in GenBank format from the NCBI repository. BagetUpdater generates for each chromosome a flat DNA text file and a compact index file listing the name, coordinates, orientation and coding ability and product of each gene.


    4 CONCLUSIONS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 FEATURES
 3 DATABASE MANAGEMENT
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
Even with the plethora of bioinformatics web servers available today offering a variety of tools reviewed in the Bioinformatic Links Directory (Fox et al., 2007), the access to a given prokaryotic DNA-coding sequence, together with its 5' and 3' control regions remains difficult. BAGET was therefore developed as a web tool to extract efficiently gene context, DNA and protein sequences from completely sequenced prokaryotic genomes. The query results, presented intuitively on a single dynamic page, provide assistance for the design of primers, PCR amplification templates and DNA cloning strategies. Rapid identification of coding regions and potential regulatory features in the intergenic regions facilitates the design of promoter and gene fusions to reporter genes. Phylogenetic projects might benefit from this tool as well, for the extraction of gene and protein orthologs from a variety of related or distant genomes.

Conflict of Interest: none declared.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 FEATURES
 3 DATABASE MANAGEMENT
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
The author wishes to thank the "Centre National pour la Recherche Scientifique" for financial support.


    FOOTNOTES
 
Associate Editor: John Quackenbush

Received on September 13, 2007; revised on November 15, 2007; accepted on November 28, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 FEATURES
 3 DATABASE MANAGEMENT
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Fox JA, et al. Conducting research on the web: 2007 update for the bioinformatics links directory. Nucleic Acids Res (2007) 35:W3–W5.[Abstract/Free Full Text]

    Markowitz VM, et al. The integrated microbial genomes (IMG) system. Nucleic Acids Res (2006) 34:D344–D348.[Abstract/Free Full Text]

    Peterson JD, et al. The Comprehensive Microbial Resource. Nucleic Acids Res (2001) 29:123–125.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
J. Oberto, N. Breuil, A. Hecker, F. Farina, C. Brochier-Armanet, E. Culetto, and P. Forterre
Qri7/OSGEPL, the mitochondrial version of the universal Kae1/YgjD protein, is essential for mitochondrial genome maintenance
Nucleic Acids Res., September 1, 2009; 37(16): 5343 - 5352.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/3/424    most recent
btm600v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Oberto, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Oberto, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?