Bioinformatics Advance Access originally published online on March 1, 2007
Bioinformatics 2007 23(9):1172-1174; doi:10.1093/bioinformatics/btm070
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Phylogenetic exploration of bacterial genomic rearrangements
1Université Paul Sabatier, CNRS-LMGM, UMR 5100, 118, route de Narbonne, 31062 Toulouse, Cedex, 2Laboratoire de Génétique Cellulaire INRA UMR444 and 3Laboratoire des Interactions Plantes Microorganismes INRA/CNRS UMR441/2594, Chemin de Borde Rouge BP52627 31326 Castanet Tolosan Cedex, France
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: We present a graphical tool dedicated to the exploration of bacterial genome rearrangements. The principle of this exploration relies on the reconstruction of ancestral genomes at each internal node of a gene-order-based phylogenetic tree. This tool allows the selection of internal nodes to visualize the rearrangements between the inferred chromosome of this node and its direct descendant on the tree.
Availability: PEGR is available at the Genopole Toulouse Bioinformatics platform.
Supplementary information: Online supplementary data are available at PEGR web site: http://bioinfo.genopole-toulouse.prd.fr/pegr.
| 1 INTRODUCTION |
|---|
|
|
|---|
The construction and the comparison of genetic maps of bacterial species have revealed that the overall gene order on chromosomes has not been conserved over a long evolutionary timescale. Now, with the availability of sequenced bacterial genomes, more detailed analysis can be performed to elucidate the relationships between organizational features of chromosomes and physiology of the cells. Powerful algorithms for reconstructing rearrangement scenario for multiple species have been developed (Bourque and Pevzner, 2002; Moret et al., 2001; Tang and Moret, 2003). They allow reconstruction of genome-scale phylogenetic trees and of ancestral gene order at each node of the tree with high accuracy, even for large datasets. However, such algorithms suffer from the lack of a good interactive visualization component. Indeed, the graphical representation of genomic rearrangements is a challenging issue when the number of species compared is greater than three. One can distinguish broadly three types of solution among the publicly available tools: (i) the species are compared to a reference and the rearrangements are highlighted between the reference and each of the other species (EnteriX, Florea et al., 2003; VISTA, Frazer et al., 2004), (ii) the genome rearrangements are visualized as stacking of pairwise comparisons: the species become successively query and subject (ACT, Carver et al., 2005) or blocks of synteny are delineated from a multiple alignment of all species compared (Mauve, Darling et al., 2004 or M-GCAT, Treangen and Messeguer, 2006) and (iii) a comprehensive analysis of backbone and loops can be performed using tools such as MOSAIC (Chiapello et al., 2005). However, none of these tools can be used to draw genomic rearrangements along evolutionary trajectories. Here, we present such a tool that is composed of a computational module and a graphical interface.
| 2 METHODS |
|---|
|
|
|---|
In the first step, pairs of orthologous genes are used as landmarks of chromosome conservation. They allow comparisons between distant species without heavy computational load. Orthologous genes in pairs of genomes are pre-computed, at the amino acid level, and stored in a local database. When more than two genomes are compared, the pairs of orthologs allow construction of a large graph of genes linked by orthology relationships. Groups of orthologous genes that contained genes present once in each genome were selected to establish orthologous landmarks between chromosomes; they make up the conserved backbone. Finally, the initial chromosomes are re-written as pseudo-chromosomes that retain only the selected genes, in the initial order. This procedure ensures that all pseudo-chromosomes have the same length and identical gene content but they still contain a large number of genes that are conserved in the same relative order in the different genomes. Therefore, we used the GRIL algorithm (Darling et al., 2004) to assemble the conserved list of genes into large chromosomal blocks. The pseudo-chromosomes are reconstructed as a signed list of ordered conserved blocks where the sign refers to the block orientation.
In the second step, the pseudo-chromosomes are used to infer both a phylogenetic tree based on conserved block rearrangements and the ancestral states of each node of this tree. According to user specifications, this is achieved either with GRAPPA (Tang and Moret, 2003) or MGR (Bourque and Pevzner, 2002) programs. Both programs allow the use of breakpoint or inversion distances, but the inversion distances give better results with bacterial genomes (Bourque and Pevzner, 2002).
| 3 GRAPHICAL USER INTERFACE |
|---|
|
|
|---|
3.1 Genome selection
In this version, the graphical component accommodates only single chromosome genomes. The PEGR web site includes samples of pre-computed analyses built on bacterial species for which at least three complete genomes from different strains have been published. In addition, the web interface allows the selection of user-defined genomes from a list of closely related species assembled in taxonomic clads. Starting from the genome selection, the conserved blocks are computed and the distance matrix is calculated. Providing that the data can be handled by GRAPPA or MGR (a reasonable number of blocks and inversions), the results can be browsed with the client interface.
3.2 Phylogenetic explorer
Through a unified interface, the PEGR explorer (Fig. 1) provides the user with complementary views of genome conservation and rearrangements (see Supplementary Material: http://bioinfo.genopole-toulouse.prd.fr/pegr).
|
The tree navigator, based on ATV (Zmasek and Eddy, 2001), is used to select a given chromosomal display through the selection of internal nodes. The chromosomal display shows the relationships between the selected node and its direct descendants (children) on the tree. The descendants may be either real genomes or inferred ancestral genomes. Unitary blocks that are in the same order in the ancestral and descendant genomes are merged. These blocks are graphically linked and different colors are used to represent blocks in the same orientation and in reverse orientation between the pair of genomes. One can center the chromosome display on a chosen gene and then, through successive selections, turn the chromosome around. The conserved blocks are built over the core gene, but one can display the localization of non-conserved genes as triangles centered at the insertion sites. One can also superimpose GC skew plots. The context panel shows the conservation of the chromosomal neighborhood of the selected gene. The tool is written in Java 1.4. It runs as an application throughout the JavaTM Web Start Technology. The source code is available upon request from the authors.
Currently, the genomic rearrangement scenario and ancestral states include only blocks that cover all compared genomes, and other regions are displayed only in their genomic context. The future developments will improve both the display of the genome-specific or partially shared regions and the management of genomes composed of several replicons.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
We thank David Lane for critical reading and helpful comments. We are grateful to Guillaume Bourque for providing us with the pre-release version of MGR. We also thank the reviewers for valuable suggestions to improve the manuscript and the PEGR tool. This work is supported by Genopole Toulouse Midi Pyrenees.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Chris Stoeckert
Received on November 21, 2006; revised on February 16, 2007; accepted on February 22, 2007
| REFERENCES |
|---|
|
|
|---|
Bourque G, Pevzner PA. Genome-scale evolution: reconstructing gene orders in the ancestral species. Genome Res. (2002) 12:26–36.
Carver TJ, et al. ACT: the Artemis comparison tool. Bioinformatics (2005) 21:3422–3423.
Chiapello H, et al. Systematic determination of the mosaic structure of bacterial genomes: species backbone versus strain-specific loops. BMC Bioinformatics (2005) 6:171.[CrossRef][Medline]
Darling AC, et al. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. (2004a) 14:1394–1403.
Darling AE, et al. GRIL: genome rearrangement and inversion locator. Bioinformatics (2004b) 20:122–124.
Florea L, et al. EnteriX 2003: visualization tools for genome alignments of Enterobacteriaceae. Nucleic Acids Res. (2003) 31:3527–3532.
Frazer KA, et al. VISTA: computational tools for comparative genomics. Nucleic Acids Res. (2004) 32:W273–W279.
Moret BM, et al. A new implementation and detailed study of breakpoint analysis. Pac. Symp. Biocomput. (2001) pp. 583–594.
Tang J, Moret BM. Scaling up accurate phylogenetic reconstruction from gene-order data. Bioinformatics (2003) 19(Suppl. 1):i305–i312.[Abstract]
Treangen TJ, Messeguer X. M-GCAT: interactively and efficiently constructing large-scale multiple genome comparison frameworks in closely related species. BMC Bioinformatics (2006) 7:433.[CrossRef][Medline]
Zmasek CM, Eddy SR. ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics (2001) 17:383–384.
This article has been cited by other articles:
![]() |
A. Esteban-Marcos, A. E. Darling, and M. A. Ragan Seevolution: visualizing chromosome evolution Bioinformatics, April 1, 2009; 25(7): 960 - 961. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

