Skip Navigation


Bioinformatics Advance Access originally published online on October 31, 2006
Bioinformatics 2007 23(1):125-126; doi:10.1093/bioinformatics/btl556
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/1/125    most recent
btl556v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Kuenne, C. T.
Right arrow Articles by Hain, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kuenne, C. T.
Right arrow Articles by Hain, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

GECO–linear visualization for comparative genomics

C. T. Kuenne , R. Ghai , T. Chakraborty and T. Hain *

Institute of Medical Microbiology, Justus-Liebig-University Frankfurter Strasse 107, D-35392 Giessen, Germany

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 METHODS
 4 FUNCTIONS
 REFERENCES
 

Summary: In order to understand and interpret phylogenetic and functional relationships between multiple prokaryotic species, qualitative and quantitative data must be correlated and displayed. GECO allows linear visualization of multiple genomes using a client/server based approach by dynamically creating .png- or .pdf-formatted images. It is able to display ortholog relations calculated using BLASTCLUST by color coding ortholog representations. Irregularities on the genomic level can be identified by anomalous G/C composition. Thus, this software will enable researchers to detect horizontally transferred genes, pseudogenes and insertions/deletions in related microbial genomes.

Availability: http://bioinfo.mikrobio.med.uni-giessen.de/geco2/GecoMainServlet

Contact: Carsten.Kuenne{at}mikrobio.med.uni-giessen.de


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 METHODS
 4 FUNCTIONS
 REFERENCES
 
Genome based studies of microbial evolution have a strong focus on universally conserved genes across genera and comparisons of shared and distinct gene sets between different groups. This is especially true when comparing closely related species (frequently pathogenic and non-pathogenic). The increasing number of sequenced genomes has raised considerable interest in devising means for comparative analysis to gain further knowledge of function and phylogeny. In such large-scale genome comparisons, it is important to distinguish between conserved and potentially horizontally transferred genes, detect frameshifts and insertions/deletions. As horizontally transferred genes often insert preferentially in well defined ‘hot spots’, these loci differ considerably between virulent and non-virulent bacteria. A dynamic visualization method, capable of carrying out multiple comparisons is required to observe this process of genome reduction and expansion and further understanding of disease mechanisms.

Most existing tools in this category are either commercial: GenomeSCOUT (Suter-Crazzolara et al., 2000), ERGO (Overbeek et al., 2003), not locally deployable: MBGD (Uchiyama et al., 2003), GeConT (Ciria et al., 2004) or do not offer all necessary features: Artemis/ACT (Carver et al., 2005), GenomeComp (Yang et al., 2003), GenomePixelizer (Kozik et al., 2002). Generic Genome Browser (Stein et al., 2002) could technically deliver on this task, but lacks visual quality and customizability in the interface, which made us implement our own solution. Currently, GECO includes all publically available fully sequenced prokaryotes (389).


    2 IMPLEMENTATION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 METHODS
 4 FUNCTIONS
 REFERENCES
 
GECO (abbreviation of comparative genomics) is implemented in Java 1.5 on a Debian 3.1 Linux PC. It facilitates a client/server architecture using Java Servlets and Apache Tomcat for webinterface and MySQL as database backend.


    3 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 METHODS
 4 FUNCTIONS
 REFERENCES
 
Orthologs are calculated using BLASTCLUST, which is part of the NCBI BLAST package and uses BLASTP to determine homology of two proteins. It identifies orthologs above a certain cutoff and groups them by single linkage clustering (AB + BC = ABC).

Two genes are considered orthologs if they have: (1) BLASTP E-value <1 x 10–6, (2) protein identity >60%, (3) both coverages (alignment-protein1 and alignment-protein2) >80% (to filter single domain hits).

All proteins which are elements of the same cluster are colored identically. Introduction of new genomes to GECO is a straightforward process: (1) data in .embl-format is put into genome directory, and (2) the update function is run.


    4 FUNCTIONS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 METHODS
 4 FUNCTIONS
 REFERENCES
 
GECO's query form was designed for simplicity and adopts a natural workflow. At first, the base gene for the current view has to be defined by choosing a specific identifier (e.g. dnaA) and organism. The base track (topmost track of image) is centered on that gene, which is highlighted by a red frame (optionally). All subsequent tracks relate to it. The range of nucleotides to be displayed before and after the base gene can be varied. All genes within this range make up the base track. The number and order of genomes to be compared to the basetrack can be picked, as well as the degree of similarity deemed suitable for an ortholog (currently 60% identity, 80% coverage). Tracks can also be sorted descending by the number of orthologs shared with the base track. GECO incorporates a parameter to mask core genes (present in all selected organisms), so that differences between the tracks are easier to identify. Each gene is associated with a popup-box, displaying gene annotation and a customizable link. The width of the resulting image can be varied to compensate for different screen resolutions.

GECO has two distinct modes of operation: ‘pinned’ and ‘area’ (Fig. 1A and B).


Figure 1
View larger version (30K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 GECO ortholog visualization of the gene rplI of Listeria monocytogenes EGD-e, Bacillus subtilis 168 and Staphylococcus aureus N315 using different display modes. (A) ‘Pinned region mode’ will only find the basegene in all organisms, while (B) ‘whole area mode’ looks for all genes of the basetrack.

 
In ‘pinned mode’, GECO will first look for the basegene-ortholog in all chosen organisms and subsequently display each track within the baserange defined. If an ortholog is not found in an organism, its track is omitted. This mode is the default one and useful for quick scanning of a region around a certain gene in related genomes. The display of GC compositional deviation from average is only available using ‘pinned mode’.

In ‘area mode’, GECO will retrieve all genes of the basetrack first. Orthologs of all these genes are looked up in the selected organisms. If an ortholog could be determined, GECO retrieves the ‘neighbourhood’ of the hit (elongation default: 3000 bp up- and downstream of the ortholog). These regions are then connected by gaps to improve readability. In this manner, all orthologs of the basetrack genes can be shown in a single image, even if they are scattered all over the selected genomes.

To improve usability of the visualization servlet, GECO can show a list of all identifiers/annotations and general genomic data (GC content, percentage of coding genes, etc.).


    Acknowledgments
 
The authors wish to thank M. Maier for technical support. T.C. acknowledges support from the Bundesministerium für Bildung und Forschung, Germany (NGFN 01GS0401), T.H. from Pathogenomics (PTJ-Bio//03U213B), and R.G. is a member of the Graduate School ‘Biochemistry of Nucleoprotein Complexes’ supported by the Deutsche Forschungsgemeinschaft.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Alex Bateman

Received on August 1, 2006; revised on October 17, 2006; accepted on October 23, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 METHODS
 4 FUNCTIONS
 REFERENCES
 

    Carver, T.J., et al. (2005) ACT: the Artemis comparison tool. Bioinformatics, 21, 3422–3423[Abstract/Free Full Text].

    Ciria, R., et al. (2004) GeConT: gene content analysis. Bioinformatics, 20, 2307–2308[Abstract/Free Full Text].

    Kozik, A., et al. (2002) GenomePixelizer—a visualization program for comparative genomics within and between species. Bioinformatics, 18, 335–336[Abstract/Free Full Text].

    Overbeek, R., et al. (2003) The ERGO (TM) genome analysis and discovery system. Nucleic Acids Res, . 31, 164–171[Abstract/Free Full Text].

    Stein, L.D., et al. (2002) The generic genome browser: a building block for a model organism system database. Genome Res, . 12, 1599–1610[Abstract/Free Full Text].

    Suter-Crazzolara, C. and Kurapkat, G. (2000) An infrastructure for comparative genomics to functionally characterize genes and proteins. Genome Inform. Ser. Workshop Genome Inform, . 11, 24–32[Medline].

    Uchiyama, I. (2003) MBGD: microbial genome database for comparative analysis. Nucleic Acids Res, . 31, 58–62[Abstract/Free Full Text].

    Yang, J., et al. (2003) GenomeComp: a visualization tool for microbial genome comparison. J. Microbiol. Meth, . 54, 423–426[CrossRef][Web of Science][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Genome ResHome page
M. Krzywinski, J. Schein, I. Birol, J. Connors, R. Gascoyne, D. Horsman, S. J. Jones, and M. A. Marra
Circos: An information aesthetic for comparative genomics
Genome Res., September 1, 2009; 19(9): 1639 - 1645.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. E. Martinez-Guerrero, R. Ciria, C. Abreu-Goodger, G. Moreno-Hagelsieb, and E. Merino
GeConT 2: gene context analysis for orthologous proteins, conserved domains and metabolic pathways
Nucleic Acids Res., July 1, 2008; 36(suppl_2): W176 - W180.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/1/125    most recent
btl556v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Kuenne, C. T.
Right arrow Articles by Hain, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kuenne, C. T.
Right arrow Articles by Hain, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?