Skip Navigation


Bioinformatics Advance Access originally published online on April 6, 2006
Bioinformatics 2006 22(13):1551-1561; doi:10.1093/bioinformatics/btl139
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/13/1551    most recent
btl139v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Toft, C.
Right arrow Articles by Fares, M. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Toft, C.
Right arrow Articles by Fares, M. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

GRAST: a new way of genome reduction analysis using comparative genomics

Christina Toft and Mario A. Fares *

Molecular Evolution and Bioinformatics Laboratory, Department of Biology, National University of Ireland Maynooth

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 SYSTEMS AND METHODS
 SAMPLE OUTPUT AND DISCUSSION
 CONCLUSION
 REFERENCES
 

Motivation: Establishment of intra-cellular life involved a profound re-configuration of the genetic characteristics of bacteria, including genome reduction and rearrangements. Understanding the mechanisms underlying these phenomena will shed light on the genome rearrangements essential for the development of an intra-cellular lifestyle. Comparison of genomes with differences in their sizes poses statistical as well as computational problems. Little efforts have been made to develop flexible computational tools with which to analyse genome reduction and rearrangements.

Results: Investigation of genome reduction and rearrangements in endosymbionts using a novel computational tool (GRAST) identified gathering of genes with similar functions. Conserved clusters of functionally related genes (CGSCs) were detected. Heterogeneous gene and gene cluster non-functionalization/loss are identified between genome regions, functional gene categories and during evolution. Results show that gene non-functionalisation has accelerated during the last 50 MY of Buchnera's evolution while CGSCs have been static.

Availability: Software is available at http://biology.nuim.ie/staff/mfmolecevolandbioinf.shtml/

Contact: mario.fares{at}nuim.ie


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 SYSTEMS AND METHODS
 SAMPLE OUTPUT AND DISCUSSION
 CONCLUSION
 REFERENCES
 
Intra-cellular bacteria are characterized by their intimate biochemical and genetic relationships with the host that resulted in a pathogenic or symbiotic relationship. Symbiosis has been largely associated with the emergence of metabolic, ecological and genetic novelties in the host and the bacteria (Gil et al., 2002). The epidemiological behaviour of intracellular bacteria relies on specific population genetics factors that have an enormous influence on the mutational dynamics at the genome and proteome levels. The most important of these factors is the strong bottlenecks to which the bacterial effective population sizes are subjected between generations and the absence of lateral gene transfer and recombination (Tamas et al., 2002). This results in a high rate of fixation of slightly deleterious mutations by genetic drift (Rispe and Moran, 2000). This scenario has been confirmed through comparative genomic analyses (Moran and Mira, 2001; Silva et al., 2001; Tamas et al., 2002) and has been associated to the non-functionalization of genes (Andersson and Kurland, 1998; McClelland et al., 2004) followed by disintegration and genome reduction (Andersson and Andersson, 1999; Gil et al., 2002; Silva et al., 2001). As a result, intra-cellular bacteria are expected to form unstable biological systems (Kondrashov, 1988; Lynch et al., 1993). Despite this, the symbiotic relationship between the bacteria, such as the endosymbiotic bacteria of aphids Buchnera aphidicola, and their hosts has been successfully maintained for 100–150 MY. Mechanisms that compensate the effects of slightly deleterious mutations have been thus proposed (Moran, 1996) and demonstrated (Fares et al., 2002a, b).

Effects attributable to the intracellular life are the reduced genome sizes and the high level of genome rearrangements (Belda et al., 2005; Mira et al., 2001). Understanding the underlying mechanisms responsible for such genome dynamics is instrumental in uncovering the genome rearrangement patterns and genes responsible for the establishment of the intra-cellular lifestyle. These mechanisms may also be crucial in defining the final outcome of the interaction between the biological system of the host and of the bacteria.

An increasing number of computational tools have been developed to visualize genomes, locate genes, determine their function and identify their replication direction (Ciria et al., 2004; Ghai et al., 2004; Gibson and Smith, 2003; Stothard and Wishart, 2005; Vernikos et al., 2003). Other software identify conserved regions and rearrangements throughout the organism's evolution by linear genomes comparison (Carver et al., 2005; Chen et al., 2005; Leader, 2004; Xie and Hood, 2003; Yang et al., 2003). Alternatives to these approaches have permitted the comparison of genome lengths, the identification of common genes and the presence and absence of genes or regions when comparing circular genomes (‘Genome versus Genome Protein Hits’ from TIGR CMR, http://www.tigr.org) (Romualdi et al., 2005).

Tools for gene order comparison between two genomes have recently become available. The two genomes are BLASTed against each other and the most significant hits are plotted onto a graph where each of the axes represents one of the genomes (e.g. see Silva et al., 2001). These computational tools differ, in that while some software plot all BLAST hits that satisfies a specific threshold, normally set by the user (Celamkoti et al., 2004; Choi et al., 2005), others only plot the hit that are found by mutual top BLAST hits (GenePlot from NCBI, http://www.ncbi.nlm.nih.gov/sutils/geneplot.cgi).

Websites such as NCBI (http://www.ncbi.nlm.nih.gov), Microbes Online (Alm et al., 2005), STRING (von Mering et al., 2005), KEGG (Arakawa et al., 2005), BuchneraBASE (Prickett et al., 2006) and PLARCOM (Choi et al., 2005) perform a variety of genome comparisons and/or obtain information about fully or partly sequenced genomes. None of the available software has been however designed to investigate genome reduction.

Here we present Genome Reduction Analysis Software Tool (GRAST) that allows the user to investigate genome reduction by comparing an intra-cellular organism (reduced genome) to its closest free-living relatives (reference genomes). The application of this tool for the comparative genomic analysis of free-living bacteria with endosymbiotic bacteria yields information on the main genome dynamics that occurred following the establishment of intra-cellular life, including, among others, genome rearrangements, propensity of functional categories to lose functional genes, gathering of functionally related genes and the genome distribution of junk DNA.


    SYSTEMS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 SYSTEMS AND METHODS
 SAMPLE OUTPUT AND DISCUSSION
 CONCLUSION
 REFERENCES
 
Orthologous pairs of genes between the reduced genome and the reference genomes are identified by mutual BLASTP (Altschul et al., 1997) searches of the genes of both genomes. Orthologous gene pairs are those finding each other as top BLAST hits with E-value being lower than a certain cut-off value. In this analysis only orthologous functional genes are compared between the two genomes.

Genome sequences
In this study we have compared the genomes of the endosymbiotic bacterium B.aphidicola from the aphid strains Acyrthosiphon pisum (BAp; Accession number: NC_002528 [GenBank] ), Schizaphis graminum (BSg; Accession number: NC_004061 [GenBank] ) and Baizongia pistaciae (BBp; Accession number: NC_4545) to that of their closest free-living relatives Escherichia coli K12 (E.coli; Accession number: NC_000913 [GenBank] ) and Salmonella typhimurium LT2 (S.typhimurium; Accession number: NC_003197 [GenBank] ). Similar to the establishment of endosymbiosis in aphids, both free-living bacteria were separated 100–160 MY.

Genome rearrangements
GRAST examines three ways in which genes can undergo rearrangements (Fig. 1). First, two adjacent genes in the reference genome can be separated in the rearranged genome by translocation (Fig. 1A). Second, genes can be gathered due to the disintegration of non-functional genes between them (Fig. 1B). Third, genes can be gathered by the translocation of one gene to a nearby region of the other gene or by the movement of both genes to adjacent regions in the reference genomes (Fig. 1C). In the latter case, genes included between gathered genes may have been moved to another region of the genome by other mechanisms such as translocations of complete genome segments (Fig. 1C) or chromosomal segment inversion (Fig. 1D). All of these possible genome rearrangements were studied in B.aphidicola.


Figure 1
View larger version (24K):
[in this window]
[in a new window]
 
Fig. 1 Gene rearrangements in the endosymbiont (reduced) genome identified by GRAST. (A) Gene movements can occur through the translocation of neighbour genes in the ancestral genome to different positions in the reduced genome; (B) genes can be gathered as a result of loss of non-functional genes located between them or (C) by translocation in the reduced genome. (D) Gene movements can also occur by gene translocation and genome segment inversion in the reduced genome. CGSCs are defined as segments in the reduced genome in genetic synteny with the reference genome (E).

 
Conserved gene succession clusters (CGSCs)
Genes that remain clustered after genome reduction and do not suffer internal rearrangements are often under strong selective constraints to remain so. For example, genes with similar functions may be maintained proximal to coordinate their expression (Siefert et al., 1997). In GRAST, CGSC are identified as groups of two or more genes that have retained their gene order following genome reduction (Fig. 1E). For two adjacent genes to be in a CGSC they are required to be in synteny with their orthologues in the reference genome and any gene between them in the reference genome should have been lost in the reduced genome. We examined CGSCs in each one of the B.aphidicola genomes and identified the main rearrangements that occurred in these clusters.

Gathering of functionally related genes
There are three overall functional categories defined by the clusters of orthologous groups of proteins (COG; Tatusov et al., 2003); JKL refers to information processing, D–O to cellular processes and signalling category and the C–Q to metabolism. GRAST calculates the probability of observing a pair of genes belonging to the same functional category gathered. The assumption here is that each rearrangement is an independent event and follows no specific order. We can hence estimate the probability of gene gathering under a multinomial density function as follows:

Let z1 and z2 be two genes that have been gathered (Fig. 1C) owing to a specific genome rearrangement mechanism, and let us assume that

Y1 = {{z1, z2}: where both genes belong to the functional category ‘Information storage and processing’}

Y2 = {{z1, z2}: where both genes belong to the functional category ‘Cellular processes’}

Y3 = {{z1, z2}: where both genes belong to the functional category ‘Metabolism’}

Y4 = {{z1, z2}: where both genes belong to two different functional categories}

In this particular case, the probability of the observed number of translocations causing genes gathering is

Formula 1(1)
Where n is the number of translocations, ni is the number of Yi observations, and pi is the probability of observing Yi and is calculated as follows:

Formula 2(2)
Conversely, the probability of having two genes belonging to two different functional categories gathered is:

Formula 3(3)
In general, if we have K functional categories, then the probability of the observed number of translocation causing gene gathering will be:

Formula 4(4)

We evaluated the importance of genes gathering in B.aphidicola genomes and tested the functional relatedness of gathered genes.

Intergenic DNA
The mutational dynamic of non-functional intergenic DNA might shed light on the mechanism of gene non-functionalization and disintegration. Genomes undergoing high fixation rates of slightly deleterious mutations and gene non-functionalization followed by disintegration are expected to show shorter intergenic regions after a certain evolutionary time span (Gomez-Valero et al., 2004). GRAST investigates the dynamics of the intergenic regions length and tests whether these have changed in any of the gene categories described in this work (CGSCs, translocated genes or gathered genes categories) between related genomes with different genome sizes.

Implementation
GRAST is written in PERL and consists of a main program called GRAST.pl that uses a number of other PERL modules. An interface to visualize graphs was built using the PERL module GD.pm. There are two versions of GRAST one that outputs gif-type files and that uses GD and GD:Graph modules to create the files, the other version outputs svg-type files and uses GD::SVG modules.

The implementation of the subroutine that calculates the probabilities of gene gathering is complemented by the PERL module Math::BigFloat to deal with the factorial calculations of the number of translocations. Finally, GRAST can be executed through a user interface or using command line arguments. The flow of information and functions in GRAST together with the input and output files generated are depicted in Figure 2. Briefly, GRAST takes as input files the GenBank genome files and extracts the information regarding genome location, direction and amino acid sequences of genes. Then, GRAST performs mutual BLASTP searches to find orthologous genes in the compared genomes and extracts gene function information. GRAST also screens for gene duplications by intra-genomic BLASTP searches and one of the gene copies is removed from later analyses. Finally, GRAST performs the analyses and generates graphs and output files (Fig. 2).


Figure 2
View larger version (40K):
[in this window]
[in a new window]
 
Fig. 2 Flow-chart of GRAST with all the options requested by the user. Genome files are read, information for individual genes extracted and BLASTP searches performed by GRAST to find orthologous genes between the compared genomes. Analyses are run to find CGSCs, genes gathering and genes lost and output graphs generated.

 

    SAMPLE OUTPUT AND DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 SYSTEMS AND METHODS
 SAMPLE OUTPUT AND DISCUSSION
 CONCLUSION
 REFERENCES
 
Genome plot
The genome plot output by GRAST is a combination of different approaches used in existing software (Fig. 3A and B). GRAST plots the genes that have been identified as orthologous pairs and allows to determine the cut-off value to plot genes in the genome. While it is possible to set the cut-off value in programs such as GenomePlot (Choi et al., 2005) and GeneOrder (Celamkotis et al., 2004), these programs plot all genes that satisfy the cut-off value as opposed to gene pairs that have been determined to be orthologues, increasing the risk of finding paralogues. Our approach, however, is susceptible to miss orthologous genes when the sequences compared are too divergent (Tatusov et al., 1997), being hence more conservative.


Figure 3
View larger version (23K):
[in this window]
[in a new window]
 
Fig. 3 Plot of the orthologous gene pairs generated by GRAST when comparing the reduced genome (BAp) with the reference genomes E.coli (A) and S.typhimurium (B). Axes represent positions in each genome in kilo base pairs (Kbp).

 
Identifying lost, retained and non-common genes after genome reduction
To qualitatively determine the extent of genome modification between the reduced genome and the two free-living bacteria genomes GRAST shows the number of common genes conserved after genome reduction (Fig. 4A) and non-common genes from both genomes (Fig. 4B). Further, to define the extent of gene loss in the reduced genome, GRAST generates a figure showing simultaneously genome-specific and shared genes between the genomes compared (Fig. 4C). Note that gene non-functionalization would be followed by extreme sequence divergence and therefore might not be identifiable through BLAST searches. Thus, gene loss will hereon refer to either gene disintegration or non-functionalization.


Figure 4
View larger version (51K):
[in this window]
[in a new window]
 
Fig. 4 Schematic representation by GRAST of the common (A), non-common and (B) both common and non-common genes (C) when comparing BAp (inner circle) with E.coli or S.typhimurium (outer circle).

 
Placing genome size modifications, genome rearrangements and gene acquisition in specific time points of the endosymbiotic bacteria evolution would uncover bacteria group-specific patterns of genome dynamics. One way to approach this is through multiple genome comparison. GRAST is a useful tool to compare phylogenetically related genomes and to identify branch specific patterns of gene loss/retention and rearrangements within a phylogeny.

Comparison of the three fully sequenced genomes of B.aphidicola with the two free-living bacteria relatives E.coli and S.typhimurium identifies events specific to each branch of the tree by genome pairwise phylogenetic comparison (Fig. 5). For example, genes retained in BBp but not in BSg and BAp are considered to have been lost in the common ancestor of BSg and BAp. Genes lost in all three Buchnera genomes are considered to have been lost in the most recent common symbiotic ancestor. GRAST analysis clearly shows that, in accordance with previous reports (Gomez-Valero et al., 2004; Silva et al., 2003), Buchnera genomes have been highly static following the establishment of endosymbiosis and genome reduction, since most of the events may have pre-dated the split between the three Buchnera endosymbionts (Fig. 5). However, gene loss has not been homogenously distributed along time as most of the gene non-functionalization events occurred during the last 50 MY in the lineages of BAp and BSg (Fig. 5). This has been probably due to the loss of important genes involved in recombination such as recA and recF in the ancestor of BAp and BSg (Tamas et al., 2002) that has halted the process of removal of slightly deleterious mutations and hence accelerated the non-functionalization of genes. Calculation of the rate of gene loss in this study gives estimates of 1 gene lost every 6.4 MY during the first 90 MY of Buchnera's evolution and 1 gene loss every 2 MY following the split giving rise to BAp and BSg. These results give faster rates for gene loss than previous works that reported 1 complete gene elimination per 5–10 MY during the divergence of BAp and BSg (Tamas et al., 2002). The phylogenetic distribution of lost genes is very similar to that reported previously (Silva et al., 2003). Conversely, conserved gene succession clusters (CGSCs) have been conserved during the last 50 MY after the split giving BAp and BSg with very few lineage specific CGSCs lost (Fig. 5) which demonstrates that CGSCs have been under selective constraints. From this we conclude that the rate of gene function loss in Buchnera has accelerated during the last 50 MY despite genome stasis regarding CGSCs and genome rearrangements.


Figure 5
View larger version (14K):
[in this window]
[in a new window]
 
Fig. 5 Branch specific events of gene loss/non-functionalization and CGSCs rearrangements during the evolution of B.aphidicola symbionts. The three Buchnera genomes were compared with their free-living bacteria relatives E.coli and S.typhimurium. Branch lengths in the tree are not time-scaled. Circles represent complete genomes and solid lines, dotted lines; black and grey boxes refer to lost genes, non-common genes between genomes, CGSCs and reverted CGSCs, respectively. CGSCs in each lineage indicate clusters retained in each lineage and lost in the others.

 
Conserved gene succession cluster
The frequency and length of CGSCs indicate how conserved the reduced genome is and how many rearrangements the genome has undergone. Density in CGSCs of the reduced genome was determined by identifying CGSC in the genomes of E.coli, S.typhimurium and BAp (white blocks in Fig. 6A). We have also identified CGSCs that have undergone gene order reversion (black blocks in Fig. 6A). The results show that specific regions of the reduced genome have a greater density of CGSC than others. These genome regions may have an important functional role for the organism, given the selective pressure against gene death and to maintain gene order in these clusters.


Figure 6
View larger version (48K):
[in this window]
[in a new window]
 
Fig. 6 Schematic representation of the CGSCs rearrangements generated by GRAST. The figure represents the density of CGSCs and CGSCs that underwent inversions in the reduced genome (A) and the percentage of the reduced genome and genes lost that belong to CGSCs (B). The number of genes within CGSCs and genes lost that belong to CGSCs are also shown (C).

 
Comparison of BAp genome to that of E.coli and of S.typhimurium shows that the percentage of genes lost in the CGSCs is significantly lower than the overall percentage of lost genes (grey bar in Fig. 6B). Random loss of genes in the reduced genome would yield similar values for both the mean percentage of genes lost in the genome and the mean percentage of genes lost in the CGSCs. Our results, however, demonstrate that the events of genes lost are significantly low in CGSCs indicating a strong selective pressure to maintain the composition of genes in these clusters. Genes' functions have been hence asymmetrically lost in the genome of B.aphidicola, with CGSCs being highly static and with inter-cluster genome regions being hyper-dynamic. On the other hand, comparison of the means, maximum and medians numbers of genes in individual CGSCs (Fig. 6C) highlights the heterogeneity in the size and the amount of rearrangements in the CGSCs in comparison with the rest of the reduced genome. Furthermore, most of CGSCs have been retained during the last 50 MY since CGSCs present in the ancestor of BAp and BSg were also detected in these lineages individually (Fig. 5).

Functional categorisation of genes lost in the reduced genome
A number of databases provide information as to the function of the genes present in individual genomes (e.g. COGs; Tatusov et al., 2003). However, no computational tools have been designed to compare the distribution of genes and genes lost in the different functional categories between two genomes. GRAST allows the identification of significantly conserved gene functional categories and the propensity of each category to lose genes. The gene loss between the different functional categories in BAp, when compared with E.coli and with S.typhimurium, is highly heterogeneous (grey bars in Fig. 7A and B). This heterogeneity is also very significant in some functional categories when compared with the expected value of lost genes (Fig. 7A and B). For example, only 28% of genes involved in translation have been lost compared with the expected 86%. Functional categories that contain a large percentage of the genes of E.coli and of S.typhimurium and where the percentage of genes lost is significantly different from the expectation will be those that are either highly conserved regarding gene non-functionalization or have a high propensity to lose its genes.


Figure 7
View larger version (52K):
[in this window]
[in a new window]
 
Fig. 7 (A) and (B) Percentage of genes lost in each of the functional categories described by Tatusov et al. (2003). Black bars indicate the percentage of the genes in a specific functional category that have been lost, grey bars indicate the percentage of the genes lost belonging to a specific functional category and the solid line indicates the expected percentage of genes lost in the functional categories.

 
Gathering of genes
GRAST allows the investigation of the movement of genes during or after genome reduction by calculating the probability of the gathering of functionally related genes. To test this probability, a simulation of the genome rearrangement is performed in the reference genome and in a model genome that contains the genes found in the two genomes but in synteny with their orthologues in the reference genome. Performing this analysis with BAp shows that the probability of the observed number of gene gathering, computed by Equation (1), is P(GG) = 1.2924 x 10–12; P(GGR) = 1.6581 x 10–3; P(GGM) = 1.6411 x 10–7 when compared with E.coli and is P(GG) = 3.0494 x 10–10; P(GGR) = 1.6617 x 10–3; P(GGM) = 5.0721 x 10–7 when compared with S.typhimurium. Here GG, GGR and GGM refer to the observed genes gathered and expected genes gathered in the reference genomes and in the model genome, respectively. The observed probability of genes gathered is significantly lower than the expectation irrespective of the time point in which rearrangements occurred (before or after genome reduction). No single events of gene gathering was observed in the last 50 MY of Buchnera's evolution, which supports previous reports (Tamas et al., 2002; Silva et al., 2003).

The accuracy of the simulations depends on the number of simulations performed. By default 100 simulations are performed in GRAST and the average value of those simulations is taken. The simulations of the model genome however do not always give an accurate prediction of the expected value after genome reduction because simulations are conducted over the genes present in both genomes. In the case of B.aphidicola symbionts inaccuracy is meaningless since 98.76% of its genes have orthologues in the reference genomes.

Non-functional intergenic (junk) DNA
Another parameter that could aid in determining whether genome reduction is an ongoing process is the distribution of the junk DNA (intergenic DNA) in the reduced genome. The question we asked was whether a correlation exists between the fact gene pairs have retained succession, are gathered, translocated or lost in the reduced genome and the length of junk DNA? Comparison of BAp with E.coli and S.typhimurium supports very similar lengths in their intergenic DNA (Fig. 8A and B). Interestingly, genes belonging to CGS present very short junk DNA compared with any of the other gene categories, indicating that these genes may belong to the same transcription unit. Genes that have been translocated, gathered or non-functionalized/lost in the reduced genome present significantly large junk DNA lengths in the reference genomes when compared with the mean junk DNA length (Fig. 8A and B). This observation suggests a relationship between gene movements and junk DNA lengths. However, further studies should be performed to confirm this. Also, in contrast to previous reports (Gomez-Valero et al., 2004), the mean and median length of intergenic DNA is slightly longer for BAp than for E.coli and S.typhimurium, although this difference is not significant.


Figure 8
View larger version (46K):
[in this window]
[in a new window]
 
Fig. 8 Distribution of the length of junk DNA in base pairs in the different categories of gene dynamics. The junk DNA lengths of the genes belonging to conserved gene succession, translocated genes in Buchnera, genes lost, genes gathered by translocation or because of the loss of genes between them in the free-living relatives are compared. The junk DNA length in Buchnera compared with (A) E.coli and (B) S.typhimurium is also shown.

 

    CONCLUSION
 TOP
 ABSTRACT
 INTRODUCTION
 SYSTEMS AND METHODS
 SAMPLE OUTPUT AND DISCUSSION
 CONCLUSION
 REFERENCES
 
Full genome comparisons are a powerful tool to investigate the most dramatic genome rearrangements between close relatives with either similar or different genome sizes. At present a number of software tools are available to perform different kinds of comparative genomic analyses although no computational tools provide ways to investigate genome dynamical change under a particular biological phenomenon. GRAST offers a user-friendly tool to investigate genome rearrangements following genome reduction. The comparison of the endosymbiotic bacteria of aphids B.aphidicola with its closest free-living relatives E.coli and S.typhimurium using GRAST suggests that genome reduction has been followed by complex dynamics of genome rearrangements. We demonstrate that gene movements have been under a selective pressure to keep functionally related genes gathered and to maintain specific genes physically and functionally clustered and in synteny with the ancestral genome. Also, we uncover heterogeneous selective pressures on genome rearrangements amongst Buchnera lineages. We observe that, in contrast to individual genes, CGSCs have been maintained unaltered during the last 50 MY of the Buchnera's evolution. Moreover, junk DNA seems to present more complex dynamics and more detailed studies are needed. Further studies including other intra-cellular bacteria will demonstrate that this analysis has only uncovered the tip of the iceberg.


    Acknowledgments
 
The authors are thankful to Simon Travers for helpful comments on the manuscript. This work was supported by Science Foundation Ireland, under the program of the President of Ireland Young Researcher Award to M.A.F, and the Irish Council for Science, Engineering and Technology and the John & Pat Hume Scholarship to C.T.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Dmitrij Frishman

Received on February 3, 2006; revised on April 4, 2006; accepted on April 4, 2006

    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 SYSTEMS AND METHODS
 SAMPLE OUTPUT AND DISCUSSION
 CONCLUSION
 REFERENCES
 

    Alm, E.J., et al. (2005) The MicrobesOnline Web site for comparative genomics. Genome Res, . 15, 1015–1022[Abstract/Free Full Text].

    Altschul, S.F., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, . 25, 3389–3402[Abstract/Free Full Text].

    Andersson, J.O. and Andersson, S.G. (1999) Genome degradation is an ongoing process in Rickettsia. Mol. Biol. Evol, . 16, 1178–1191[Abstract].

    Andersson, S.G. and Kurland, C.G. (1998) Reductive evolution of resident genomes. Trends Microbiol, . 6, 263–268[CrossRef][ISI][Medline].

    Arakawa, K., et al. (2005) KEGG-based pathway visualization tool for complex omics data. In Silico Biol, . 5, 0039.

    Belda, E., et al. (2005) Genome rearrangement distances and gene order phylogeny in gamma-Proteobacteria. Mol. Biol. Evol, . 22, 1456–1467[Abstract/Free Full Text].

    Carver, T.J., et al. (2005) ACT: the artemis comparison tool. Bioinformatics, 21, 3422–3423[Abstract/Free Full Text].

    Celamkoti, S., et al. (2004) GeneOrder3.0: software for comparing the order of genes in pairs of small bacterial genomes. BMC Bioinformatics, 5, 52[CrossRef][Medline].

    Chen, T., et al. (2005) The bioinformatics resource for oral pathogens. Nucleic Acids Res, . 33, W734–W740[Abstract/Free Full Text].

    Choi, K., et al. (2005) PLATCOM: a platform for computational comparative genomics. Bioinformatics, 21, 2514–2516[Abstract/Free Full Text].

    Ciria, R., et al. (2004) GeConT: gene context analysis. Bioinformatics, 20, 2307–2308[Abstract/Free Full Text].

    Fares, M.A., et al. (2002a) The evolution of the heat-shock protein GroEL from Buchnera, the primary endosymbiont of aphids, is governed by positive selection. Mol. Biol. Evol, . 19, 1162–1170[Abstract/Free Full Text].

    Fares, M.A., et al. (2002b) Endosymbiotic bacteria: groEL buffers against deleterious mutations. Nature, 417, 398[CrossRef][Medline].

    Ghai, R., et al. (2004) GenomeViz: visualizing microbial genomes. BMC Bioinformatics, 5, 198[CrossRef][Medline].

    Gibson, R. and Smith, D.R. (2003) Genome visualization made fast and simple. Bioinformatics, 19, 1449–1450[Abstract/Free Full Text].

    Gil, R., et al. (2002) Extreme genome reduction in Buchnera spp.: toward the minimal genome needed for symbiotic life. Proc. Natl Acad. Sci. USA, 99, 4454–4458[Abstract/Free Full Text].

    Gomez-Valero, L., et al. (2004) The evolutionary fate of non-functional DNA in the bacterial endosymbiont Buchnera aphidicola. Mol. Biol. Evol, . 21, 2172–2181[Abstract/Free Full Text].

    Kondrashov, A.S. (1988) Deleterious mutations and the evolution of sexual reproduction. Nature, 336, 435–440[CrossRef][Medline].

    Leader, D.P. (2004) BugView: a browser for comparing genomes. Bioinformatics, 20, 129–130[Abstract/Free Full Text].

    Lynch, M., et al. (1993) The mutational meltdown in asexual populations. J. Hered, 84, 339–344[Abstract/Free Full Text].

    McClelland, M., et al. (2004) Comparison of genome degradation in Paratyphi A and Typhi, human-restricted serovars of Salmonella enterica that cause typhoid. Nat. Genet, . 36, 1268–1274[CrossRef][ISI][Medline].

    Mira, A., et al. (2001) Deletional bias and the evolution of bacterial genomes. Trends Genet, . 17, 589–596[CrossRef][ISI][Medline].

    Moran, N.A. (1996) Accelerated evolution and Muller's rachet in endosymbiotic bacteria. Proc. Natl Acad. Sci. USA, 93, 2873–2878[Abstract/Free Full Text].

    Moran, N.A. and Mira, A. (2001) The process of genome shrinkage in the obligate symbiont Buchnera aphidicola. Genome Biol, . 2, RESEARCH0054.

    Prickett, M.D., Page, M., Douglas, A.E., Thomas, G.H. (2006) BuchneraBASE: a post-genomic resource for Buchnera sp. APS. Bioinformatics, 22, 641–642[Abstract/Free Full Text].

    Rispe, C. and Moran, N.A. (2000) Accumulation of deleterious mutations in endosymbionts: Muller's ratchet with two levels of selection. Am. Nat, . 156, 425–441[CrossRef].

    Romualdi, A., et al. (2005) GenColors: accelerated comparative analysis and annotation of prokaryotic genomes at various stages of completeness. Bioinformatics, 21, 3669–3671[Abstract/Free Full Text].

    Siefert, J.L., et al. (1997) Conserved gene clusters in bacterial genomes provide further support for the primacy of RNA. J. Mol. Evol, . 45, 467–472[CrossRef][ISI][Medline].

    Silva, F.J., et al. (2001) Genome size reduction through multiple events of gene disintegration in Buchnera APS. Trends Genet, . 17, 615–618[CrossRef][ISI][Medline].

    Silva, F.J., et al. (2003) Why are the genomes of endosymbiotic bacteria so stable? Trends Genet, . 19, 176–180[CrossRef][ISI][Medline].

    Stothard, P. and Wishart, D.S. (2005) Circular genome visualization and exploration using CGView. Bioinformatics, 21, 537–539[Abstract/Free Full Text].

    Tamas, I., et al. (2002) 50 million years of genomic stasis in endosymbiotic bacteria. Science, 296, 2376–2379[Abstract/Free Full Text].

    Tatusov, R.L., et al. (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics, 4, 41[CrossRef][Medline].

    Tatusov, R.L., et al. (1997) A genomic perspective on protein families. Science, 278, 631–637[Abstract/Free Full Text].

    Vernikos, G.S., et al. (2003) GeneViTo: visualizing gene-product functional and structural features in genomic datasets. BMC Bioinformatics, 4, 53[CrossRef][Medline].

    von Mering, C., et al. (2005) STRING: known and predicted protein–protein associations, integrated and transferred across organisms. Nucleic Acids Res, . 33, D433–D437[Abstract/Free Full Text].

    Xie, T. and Hood, L. (2003) ACGT-a comparative genomics tool. Bioinformatics, 19, 1039–1040[Abstract/Free Full Text].

    Yang, J., et al. (2003) GenomeComp: a visualization tool for microbial genome comparison. J. Microbiol. Methods, 54, 423–426[CrossRef][ISI][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Mol Biol EvolHome page
A. N. Khachane, K. N. Timmis, and V. A. P. Martins dos Santos
Dynamics of Reductive Genome Evolution in Mitochondria and Obligate Intracellular Microbes
Mol. Biol. Evol., February 1, 2007; 24(2): 449 - 456.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/13/1551    most recent
btl139v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Toft, C.
Right arrow Articles by Fares, M. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Toft, C.
Right arrow Articles by Fares, M. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?