Skip Navigation


Bioinformatics Advance Access originally published online on June 9, 2006
Bioinformatics 2006 22(17):2162-2163; doi:10.1093/bioinformatics/btl283
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/17/2162    most recent
btl283v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Ho, J. W. K.
Right arrow Articles by Jermiin, L. S
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ho, J. W. K.
Right arrow Articles by Jermiin, L. S
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

SeqVis: Visualization of compositional heterogeneity in large alignments of nucleotides

Joshua W. K. Ho 1,2, Cameron E. Adams 1, Jie Bin Lew 1, Timothy J. Matthews 1, Chiu Chin Ng 1, Arash Shahabi-Sirjani 1, Leng Hong Tan 1, Yu Zhao 1, Simon Easteal 1, Susan R Wilson 1 and Lars S Jermiin 1,2,*

1 School of Biological Sciences Sydney, Australia
2 Sydney University Biological Informatics and Technology Centre Sydney, Australia
3 John Curtin School of Medical Research, Australian National University Canberra, Australia
4 Mathematical Sciences Institute, Australian National University Canberra, Australia

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE PROGRAM AND...
 3 EXAMPLES
 REFERENCES
 

Summary: Most phylogenetic methods assume that the sequences evolved under homogeneous, stationary and reversible conditions. Compositional heterogeneity in data intended for studies of phylogeny suggests that the data did not evolve under these conditions. SeqVis, a Java application for analysis of nucleotide content, reads sequence alignments in several formats and plots the nucleotide content in a tetrahedron. Once plotted, outliers can be identified, thus allowing for decisions on the applicability of the data for phylogenetic analysis.

Availability: http://www.bio.usyd.edu.au/jermiin/programs.htm

Contact: lars.jermiin{at}usyd.edu.au


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE PROGRAM AND...
 3 EXAMPLES
 REFERENCES
 
Model-based phylogenetic methods usually assume that the aligned nucleotides evolved under stationary, reversible and homogeneous conditions [for definitions, see e.g. Jayaswal et al. (2005)]. If these conditions are violated by data, then the risk of phylogenetic errors is increased (Ho and Jermiin, 2004; Jermiin et al., 2004).

Alignments of nucleotides may vary compositionally in the sense that the composition may vary across sequences and/or across sites. In the first case, the sites would not have evolved under conditions that are stationary, reversible and homogeneous, and in the second case, the sites would have evolved under different stationary, reversible and homogeneous conditions. In both cases, it would be inappropriate to infer a phylogeny assuming that a single time-reversible Markov process underpins variation in the alignment.

Methods to detect compositional heterogeneity in alignments of nucleotides fall into four categories (Jermiin et al., 2004), with those of the first category using graphs or tables to visualize compositional heterogeneity, and those of the other categories producing test statistics that may be evaluated against expected distributions. Methods of the first category, however, are of limited use for surveys of alignments with many species [as in e.g. Hashimoto et al. (1995)] while methods of the other categories are either statistically invalid or not yet accommodated by the wider scientific community.

Inspired by the second problem, Ababneh et al. (2006) described several matched-pairs tests of homogeneity for analysis of aligned nucleotides. The tests are useful because they provide details on the Markov processes that may have operated during the divergence of sequences. However, surveying the results may be impractical if the data include many sequences or impossible for the matched-pairs test of marginal symmetry (Stuart, 1955) and internal symmetry (Ababneh et al., 2006) (because the estimation of the test statistics involves inverting a matrix that sometimes is singular), in which case a visual assessment of the data, preferentially combined with a matched-pairs test of symmetry (Bowker, 1948), may suffice. Here we present a solution to this visual assessment.


    2 THE PROGRAM AND ITS FEATURES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE PROGRAM AND...
 3 EXAMPLES
 REFERENCES
 
We extended the de Finetti plot (Cannings and Edwards, 1968) to a tetrahedral plot with similar properties (i.e. each observation comprises four variables, a, b, c and d, where a + b + c + d = 1 and 0 ≤ a, b, c, d ≤ 1). Each axis in the plot starts at the center of a surface at value 0, and finishes at the opposite corner at value 1 (Fig. 1A). The nucleotide content of a given sequence is simply the list of shortest distances between its point, P, in the tetrahedron and the each surface. Visual assessment of the spread of points in the tetrahedron shows the extent of compositional heterogeneity.


Figure 1
View larger version (10K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 The visual output of SeqVis: (A) a tetrahedron with points corresponding to the nucleotide content of each sequence (all sites); (B) three tetrahedrons displaying the nucleotide content at first, second and third codon sites; (C) the effect of recoding nucleotides—the nucleotide content, in the tetrahedron, is shown after R- and Y-coding of the nucleotides (the two de Finetti plots), and after RY-, KM- and SW-coding of the nucleotides (the three linear plots).

 
In order to study the nucleotide content of aligned sequences, we developed SeqVis, a Java application that displays the nucleotide composition of a set of sequences within a tetrahedron. SeqVis requires Java 3D package and Java Runtime Environment (version 5.0 or later). The program was tested on Windows XP and Mandrake Linux, and supports the following features:
  • SeqVis reads and writes alignments in the sequential PHYLIP format the NEXUS format and the FASTA format.
  • The tetrahedron can be rotated in all directions, animated and manipulated interactively; all items on display can be changed.
  • By viewing the points orthogonally through one of the surfaces, the distribution of three nucleotides (e.g. C, G, T) may be assessed while ignoring the fourth nucleotide (i.e. A).
  • The nucleotide composition at the codon sites can be surveyed independently and visualized on a single canvas.
  • Sequence information can be obtained by mouse-clicking on points of interest or using inbuilt tools that query the data based on the sequences' names or attributes.
  • A number of analytical tools are provided: e.g. matched-pairs test of symmetry, hierarchical clustering, k-mean clustering.
  • On-screen images may be saved in the PNG and JPEG formats.


    3 EXAMPLES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE PROGRAM AND...
 3 EXAMPLES
 REFERENCES
 
Rokas et al. (2005) inferred a phylogeny of 32 eukaryotes using an alignment of 12 060 amino acids encoded by nuclear genes and discovered compositional heterogeneity among the sequences. We examined the corresponding alignment of nucleotides using SeqVis to get a better understanding of the data's complexity (Fig. 1A and B). The spread of points was greatest at the third codon site but also visible at the first codon site. The matched-pairs test of symmetry showed that none of the codon sites could have evolved under stationary, reversible and homogeneous conditions. Given the structure of the genetic code, a similar conclusion must be drawn about the alignment of amino acids. The visual and statistical assessments of these data thus corroborate Rokas et al.'s (2005) reason for using the LogDet method (Lockhart et al., 1994).

Nucleotides may be recoded to reduce compositional heterogeneity (Woese et al., 1991). The effect of this may be visualized by superimposing the axes of a tetrahedron or a de Finetti plot. We surveyed the alignment of 23S ribosomal RNA molecules from Galtier et al. (1999), and we found that the RY- and KM-coding of nucleotides gave a tighter spread of the points than the SW-coding (Fig. 1C), thus indicating that the RY- and KM-coded alignments are more likely than the SW-coded alignment to be consistent with evolution under stationary, reversible and homogeneous conditions.

The two examples show that SeqVis is capable of surveying large sets of data. Compared with other visualization methods, like the one used by Hashimoto et al. (1995), SeqVis permits a more informative exploration of the nucleotide content. However, the spread of points should not be used alone in the assessment because it does not take into account the length of the sequences.


    Acknowledgments
 
We thank S.-H. Hong and M. A. Charleston for constructive advice, and N. Galtier and A. Rokas for the data. The first eight authors contributed equally to this paper as part of two third-year Bioinformatics Projects at The University of Sydney.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Keith A Crandall

Received on March 20, 2006; accepted on May 26, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE PROGRAM AND...
 3 EXAMPLES
 REFERENCES
 

    Ababneh, F., et al. (2006) Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences. Bioinformatics, 22, 1225–1231[Abstract/Free Full Text].

    Bowker, A.H. (1948) A test for symmetry in contingency tables. J. Am. Stat. Assoc, . 43, 572–574[CrossRef][Web of Science][Medline].

    Cannings, C. and Edwards, A.W.F. (1968) Natural selection and the de Finetti diagram. Ann. Hum. Genet, . 31, 421–428[Web of Science][Medline].

    Galtier, N., et al. (1999) A nonhyperthermophilic common ancestor to extant life forms. Science, 283, 220–221[Abstract/Free Full Text].

    Hashimoto, T., et al. (1995) Phylogenetic place of mitochondrial-lacking protozoan, Giardia lamblia, inferred from amino acid sequences of elongation factor 2. Mol. Biol. Evol, . 12, 782–793[Abstract].

    Ho, S.Y.W. and Jermiin, L.S. (2004) Tracing the decay of the historical signal in biological sequence data. Syst. Biol, . 53, 623–637[Abstract/Free Full Text].

    Jayaswal, V., et al. (2005) Estimation of phylogeny using a general Markov model. Evol. Bioinf. Online, 1, 62–80.

    Jermiin, L.S., et al. (2004) The biasing effect of compositional heterogeneity on phylogenetic estimates may be underestimated. Syst. Biol, . 53, 638–643[Free Full Text].

    Lockhart, P.J., et al. (1994) Recovering evolutionary trees under a more realistic model of sequence evolution. Mol. Biol. Evol, . 11, 605–612[Web of Science][Medline].

    Rokas, A., et al. (2005) Animal evolution and the molecular signature of radiations compressed in time. Science, 310, 1933–1938[Abstract/Free Full Text].

    Stuart, A. (1955) A test for homogeneity of the marginal distributions in a two-way classification. Biometrika, 42, 412–416[Free Full Text].

    Woese, C.R., et al. (1991) Archaeal phylogeny: reexamination of the phylogenetic position of Archaeoglobus fulgidus in light of certain composition-induced artifacts. Syst. Appl. Microbiol, . 14, 364–371[Web of Science][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/17/2162    most recent
btl283v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Ho, J. W. K.
Right arrow Articles by Jermiin, L. S
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ho, J. W. K.
Right arrow Articles by Jermiin, L. S
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?