Skip Navigation


Bioinformatics Advance Access originally published online on April 26, 2005
Bioinformatics 2005 21(13):3053-3055; doi:10.1093/bioinformatics/bti460
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/13/3053    most recent
bti460v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (19)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by van Passel, M. W. J.
Right arrow Articles by van der Ende, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by van Passel, M. W. J.
Right arrow Articles by van der Ende, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

{delta}{rho}-Web, an online tool to assess composition similarity of individual nucleic acid sequences

M. W. J. van Passel 1, A. C. M. Luyf 2, A. H. C. van Kampen 2, A. Bart 1 and A. van der Ende 1,*

1Department of Medical Microbiology, Academic Medical Center 1100 DE Amsterdam, The Netherlands
2Bioinformatics Laboratory, Academic Medical Center 1100 DE Amsterdam, The Netherlands

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 INTRODUCTION
 METHODS
 APPLICATION
 REFERENCES
 

Summary: Although whole-genome sequences have been analysed for the presence of anomalous DNA, no dedicated application is currently available to analyse the composition of individual sequence entries, for instance those derived by experimental techniques, such as subtractive hybridization. Since genomic dinucleotide frequency values are conserved between related species, a representative genome sequence can often be found to score for anomalous sequence composition for many of these putative horizontally transferred sequences. We developed the application {delta}{rho}-web, which enables the determination of the differences between the dinucleotide composition of an input sequence and that of a selected genome in a size-dependent manner. A feature allowing batch comparisons is included as well. In addition, {delta}{rho}-web allows the analysis of the dinucleotide composition of complete genomes. This provides complementary information for the identification of large anomalous gene clusters.

Availability: The application is available through http://deltarho.amc.uva.nl and the software is available from the authors.

Contact: a.vanderende{at}amc.uva.nl

Supplementary information An online help file with more extensive user guidelines is supplied at http://deltarho.amc.uva.nl


    INTRODUCTION
 TOP
 Abstract
 INTRODUCTION
 METHODS
 APPLICATION
 REFERENCES
 
From the data obtained by sequencing many different prokaryotic genomes it has been inferred that horizontal gene transfer (HGT) contributed considerably to the shape and evolution of microbial genomes. Currently, the estimates of percentages of putatively horizontally acquired DNA range from 0.5% in the endocellular symbiont Buchnera sp. APS genome to 25% in the Methanosarcina acetivorans genome, with an average of 14% in 116 prokaryotic genomes (Nakamura et al., 2004).

This genomic patchwork is clearly visible in the amount of genomic islands (GIs) detected in microbial genomes (Mantri and Williams, 2004). Initially, acquisition of GIs were linked with gain in virulence in pathogenic bacteria. While more genome sequences of environmental strains are becoming available, an increasing variety of acquired gene clusters providing diverse metabolic capacities are being discovered in these non-pathogenic strains, emphasizing that lateral genetic transfer is not limited to virulence traits (Dobrindt et al., 2004).

GIs can be recognized by their composition with regard to codon usage and GC-content, they being different from that of their host's genome. It is of note however, that not all aDNA in a genome is necessarily horizontally transferred. Ribosomal gene clusters are known to be compositionally dissimilar from the rest of the genome. On the other hand, a sequence horizontally acquired from a donor with a genome compositionally similar to that of the recipient's genome will most probably not be anomalous in composition in the host's genome. In addition, sequences which have been horizontally acquired might become less anomalous in composition over time owing to a process called amelioration (Lawrence and Ochman, 1997). Hence, horizontally acquired DNA, which has been obtained relatively recently, will be more readily identified by their anomalous composition in the recipient's genome. However, the genomic context of aDNA may also aid in the identification of HGT. It has been previously indicated, that the location of GIs between mobile elements, such as phage sequences, and insertion sequences imply a heterologous origin (Blum et al., 1994; Hacker and Kaper, 2000; Karlin, 2001).

Nakamura et al. (2004) provided evidence that transferred genes are biased towards functional categories associated with the cell surface, pathogenicity and DNA-binding genes, although putative horizontally acquired sequences still contain many putative genes with unknown functions. Dobrindt et al. (2004) explain acquisition efficiency mainly in terms of fitness increase. Together, these findings imply that remarkable and diverse capacities are being transferred among micro-organisms, and this has generated great interest for HGT. However, the available databases and applications describing or identifying putative horizontally acquired sequences have focused exclusively on published genome sequences (Garcia-Vallve et al., 2003; Hsiao et al., 2003; Nakamura et al., 2004), and although informative, they do not consider individual sequences which are still abundant in the public databases.

Besides whole-genome sequencing, alternative techniques are used in vitro to selectively isolate putative horizontally acquired sequences. These include subtractive hybridization, representational difference analysis and adaptor-linked PCR with endonucleases clustered specifically in compositionally atypical regions of the genome (Lisitsyn and Wigler, 1993; Straus and Ausubel, 1990; van Passel et al., 2004). However, no dedicated tool is available to score individual sequences, isolated with these techniques, for their dinucleotide composition dissimilarities compared with a genome sequence, although for many of these putative horizontally transferred sequences a representative genome sequence (i.e. a genomic context) is available.

Our aim was to develop an application to score dinucleotide composition differences of individual sequence entries with a chosen representative host genome sequence.


    METHODS
 TOP
 Abstract
 INTRODUCTION
 METHODS
 APPLICATION
 REFERENCES
 
The approach is based on the dinucleotide relative abundance values or genome signature . As published previously by Karlin and Burge (1995) each genome has its own typical dinucleotide frequency values, which are conserved between related species. Although the genome signature was found to be relatively constant in 50 kb windows (Karlin, 2001), smaller windows can be used to identify anomalous sequences (van Passel et al., 2004). This is carried out by calculating in a size-dependent manner the dinucleotide relative abundance difference between the input sequence and the selected representative genome sequence. In brief, the dinucleotide relative abundance values, {rho}XY*, are defined as the frequency of the dinucleotide XY divided by the product of the background frequencies of the individual nucleotides in the combined sense and reverse complement sequence [{rho}*XY=fXY/(fX* fY)]. {delta}* is the dinucleotide relative abundance difference given by , where denotes the abundance values calculated for input sequence fragment f and denotes the abundance values calculated for the closely related genome sequence g.


    APPLICATION
 TOP
 Abstract
 INTRODUCTION
 METHODS
 APPLICATION
 REFERENCES
 
We aimed to develop a tool that compares the dinucleotide composition of an input sequence with the composition of a selectable complete genome sequence, and can also handle batch comparisons. This application, {delta}{rho}-web, first constructs a collection of genomic fragments identical to the input sequence length. For all these genomic fragments the {delta}* values are calculated and depicted in an empirical distribution. To avoid statistically irrelevant computations, we recommend the minimum length of an input sequence to be 1000 bp, allowing adequate dinucleotide counts per sequence. Even so, the maximum length of an input sequence should not exceed 20 000 bp, as longer sequences may not allow a genomic frequency distribution with ample genomic fragments; however, these cut-off sizes should be considered carefully in relation to the size of the genome in question. Next, the {delta}* value of the input sequence is compared with the distribution of {delta}* values of the genomic fragments, which puts the composition of the input sequence in a genomic context. As mentioned previously, because different species contain different amounts of horizontally acquired sequences, the threshold to consider DNA to be anomalous varies accordingly, and a conservative cut-off value is therefore advised. As an example, although Neisseria meningitidis MC58 is thought to contain over 20% of horizontally acquired DNA (Nakamura et al., 2004), we used the conservative threshold of 10% in a previous study (van Passel et al., 2004). However, as compositional dissimilarity is merely indicative for horizontal transfer, more evidence, such as phylogenetic validation or the presence of species specific motifs (such as, DNA uptake sequences) (Sandberg et al., 2001) is desirable to be able to make claims concerning a heterologous origin. To determine the probability that the genomic dissimilarity of the input sequence differs from the average genomic dissimilarity of the collection of genomic fragments, {delta}{rho}-web compares {delta}* of the input sequence with the empirical distribution of {delta}* values of the genomic fragments. This empirical distribution is also graphically represented by {delta}{rho}-web (Figure 1A). The position of {delta}* or the GC percentage of the input sequence in the plot of the distribution of genomic fragments can be expressed as the percentage of the genomic fragments which have a lower {delta}* value or a lower GC percentage. Fragments are scored anomalous if {delta}* has a high dissimilarity value compared with the genomic values (i.e. many genomic fragments have a lower {delta}*), whereas the GC percentage may be either high or low compared with the genomic values. An extensive list of genomic fractions of horizontally acquired DNA is supplied by Nakamura et al. (2004) which may be used to determine cut-off values for genomic composition dissimilarity values. However, the fractions of horizontally acquired DNA described by Nakamura et al. (2004) are based only on computational approaches, hence as long as further evidence is lacking the cut-off values based on these fractions should therefore be considered as arbitrary.



View larger version (31K):
[in this window]
[in a new window]
 
Fig. 1 Adapted screenshots from {delta}{rho} -web in black and white. (A) Graphical output of the genomic dissimilarity ({delta}*) and GC-percentage scores of the locus encoding the NmeSI restriction modification system (Bart et al., 2001), calculated against a representative genome sequence (N.meningitidis MC58). The selected genome sequence is divided into fragments of equal length as the input sequence. Next, the {delta}* (upper graph) or the GC-percentage (lower graph) values of all genomic fragments are plotted in a frequency distribution. The {delta}* value of the input sequence is then compared with the distribution of {delta}* values of the genomic fragments (vertical line). Both the value of {delta}* and the GC percentage are given in the respective graph, as well as the percentage of fragments with a lower {delta}* or GC percentage. (B) Visualization of the genome composition of N.meningitidis MC58 using a window size of 10 000 bp. Both the {delta}* values (top) and the GC percentage (bottom) distributions are shown, with the respective average in a horizontal line in each graph. The large islands of horizontal transfer (IHT-A, IHT-B and IHT-C) described by Tettelin et al. (2000) are indicated, as well as islands identified by Karlin (2001) and Garcia-Vallve et al. (2003).

 
In addition, {delta}{rho}-web allows whole-genome composition analysis with a selectable window size, to supply an alternative analysis based on both the GC composition and the genomic signature to visualize large anomalous gene clusters in a prokaryotic genome. This is performed by dividing the genome sequence in non-overlapping windows, after which the composition of the windows is compared with the composition of the complete genome. In Figure 1B not only the different large islands of horizontal transfer (IHTs) as annotated by Tettelin et al. (2000) are visible as both high {delta}* values and aberrant GC-percentage scored islands, but also smaller anomalous gene clusters are visible. The island designated B in Figure 1B was previously recognized by (Karlin, 2001), whereas the island designated X was previously identified by Garcia-Vallve et al. (2003).

In conclusion, {delta}{rho}-web allows composition similarity scoring for individual prokaryotic sequence entries compared with a selected representative prokaryotic genome sequence, including a many-to-many interface as well as genome composition visualizations.

Received on December 8, 2004; revised on April 15, 2005; accepted on April 20, 2005

    REFERENCES
 TOP
 Abstract
 INTRODUCTION
 METHODS
 APPLICATION
 REFERENCES
 

    Bart, A., et al. (2001) NmeSI restriction-modification system identified by representational difference analysis of a hypervirulent Neisseria meningitidis strain. Infect Immun., 69, 1816–1820[Abstract/Free Full Text].

    Blum, G., et al. (1994) Excision of large DNA regions termed pathogenicity islands from tRNA-specific loci in the chromosome of an Escherichia coli wild-type pathogen. Infect. Immun., 62, 606–614[Abstract/Free Full Text].

    Dobrindt, U., et al. (2004) Genomic islands in pathogenic and environmental microorganisms. Nat. Rev. Microbiol., 2, 414–424[CrossRef][Web of Science][Medline].

    Garcia-Vallve, S., et al. (2003) HGT-DB: a database of putative horizontally transferred genes in prokaryotic complete genomes. Nucleic Acids Res., 31, 187–189[Abstract/Free Full Text].

    Hacker, J. and Kaper, J.B. (2000) Pathogenicity islands and the evolution of microbes. Annu. Rev. Microbiol., 54, 641–679[CrossRef][Web of Science][Medline].

    Hsiao, W., et al. (2003) IslandPath: aiding detection of genomic islands in prokaryotes. Bioinformatics, 19, 418–420[Abstract/Free Full Text].

    Karlin, S. (2001) Detecting anomalous gene clusters and pathogenicity islands in diverse bacterial genomes. Trends Microbiol., 9, 335–343[CrossRef][Web of Science][Medline].

    Karlin, S. and Burge, C. (1995) Dinucleotide relative abundance extremes: a genomic signature. Trends Genet., 11, 283–290[CrossRef][Web of Science][Medline].

    Lawrence, J.G. and Ochman, H. (1997) Amelioration of bacterial genomes: rates of change and exchange. J. Mol. Evol., 44, 383–397[CrossRef][Web of Science][Medline].

    Lisitsyn, N. and Wigler, M. (1993) Cloning the differences between two complex genomes. Science, 259, 946–951[Abstract].

    Mantri, Y. and Williams, K.P. (2004) Islander: a database of integrative islands in prokaryotic genomes, the associated integrases and their DNA site specificities. Nucleic Acids Res., 32, D55–D58[Abstract/Free Full Text].

    Nakamura, Y., et al. (2004) Biased biological functions of horizontally transferred genes in prokaryotic genomes. Nat. Genet., 36, 760–766[CrossRef][Web of Science][Medline].

    Sandberg, R., et al. (2001) Capturing whole-genome characteristics in short sequences using a naive Bayesian classifier. Genome Res., 11, 1404–1409[Abstract/Free Full Text].

    Straus, D. and Ausubel, F.M. (1990) Genomic subtraction for cloning DNA corresponding to deletion mutations. Proc. Natl Acad. Sci. USA, 87, 1889–1893[Abstract/Free Full Text].

    Tettelin, H., et al. (2000) Complete genome sequence of Neisseria meningitidis serogroup B strain MC58. Science, 287, 1809–1815[Abstract/Free Full Text].

    van Passel, M.W., et al. (2004) An in vitro strategy for the selective isolation of anomalous DNA from prokaryotic genomes. Nucleic Acids Res., 32, e114[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Appl. Environ. Microbiol.Home page
M. Liu, R. J. Siezen, and A. Nauta
In Silico Prediction of Horizontal Gene Transfer Events in Lactobacillus bulgaricus and Streptococcus thermophilus Reveals Protocooperation in Yogurt Manufacturing
Appl. Envir. Microbiol., June 15, 2009; 75(12): 4120 - 4129.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
F. Wang, X. Xiao, H.-Y. Ou, Y. Gai, and F. Wang
Role and Regulation of Fatty Acid Biosynthesis in the Response of Shewanella piezotolerans WP3 to Different Temperatures and Pressures
J. Bacteriol., April 15, 2009; 191(8): 2574 - 2584.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
E. Heikens, W. van Schaik, H. L. Leavis, M. J. M. Bonten, and R. J. L. Willems
Identification of a Novel Genomic Island Specific to Hospital-Acquired Clonal Complex 17 Enterococcus faecium Isolates
Appl. Envir. Microbiol., November 15, 2008; 74(22): 7094 - 7097.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
A. L. V. Cohen, J. D. Oliver, A. DePaola, E. J. Feil, and E. Fidelma Boyd
Emergence of a Virulent Clade of Vibrio vulnificus and Correlation with the Presence of a 33-Kilobase Genomic Island
Appl. Envir. Microbiol., September 1, 2007; 73(17): 5553 - 5565.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
M. J. Yebra, M. Zuniga, S. Beaufils, G. Perez-Martinez, J. Deutscher, and V. Monedero
Identification of a Gene Cluster Enabling Lactobacillus casei BL23 To Utilize myo-Inositol
Appl. Envir. Microbiol., June 15, 2007; 73(12): 3850 - 3858.
[Abstract] [Full Text] [PDF]


Home page
Infect. Immun.Home page
M. W. J. van Passel, A. van der Ende, and A. Bart
Plasmid Diversity in Neisseriae
Infect. Immun., August 1, 2006; 74(8): 4892 - 4899.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. M. Quirke, F. J. Reen, M. J. Claesson, and E. F. Boyd
Genomic island identification in Vibrio vulnificus reveals significant genome plasticity in this human pathogen
Bioinformatics, April 15, 2006; 22(8): 905 - 910.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/13/3053    most recent
bti460v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (19)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by van Passel, M. W. J.
Right arrow Articles by van der Ende, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by van Passel, M. W. J.
Right arrow Articles by van der Ende, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?