Skip Navigation


Bioinformatics Advance Access originally published online on January 19, 2008
Bioinformatics 2008 24(5):717-718; doi:10.1093/bioinformatics/btn027
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/5/717    most recent
btn027v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Moore, J. D.
Right arrow Articles by Allaby, R. G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Moore, J. D.
Right arrow Articles by Allaby, R. G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

TreeMos: a high-throughput phylogenomic approach to find and visualize phylogenetic mosaicism

J. D. Moore * and R. G. Allaby

Warwick HRI, University of Warwick, Warwick, CV35 9EF, UK

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 FEATURES
 3 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Summary: TreeMos is a novel high-throughput graphical analysis application that allows the user to search for phylogenetic mosaicism among one or more DNA or protein sequence multiple alignments and additional unaligned sequences. TreeMos uses a sliding window and local alignment algorithm to identify the nearest neighbour of each sequence segment, and visualizes instances of sequence segments whose nearest neighbour is anomalous to that identified using the global alignment. Data sets can include whole genome sequences allowing phylogenomic analyses in which mosaicism may be attributed to recombination between any two points in the genome. TreeMos can be run from the command line, or within a web browser allowing the relationships between taxa to be explored by drill-through.

Availability: http://www2.warwick.ac.uk/fac/sci/whri/research/archaeobotany

Contact: jonathan.moore{at}warwick.ac.uk

Supplementary information: Supplementary data are available at Bioinformatics online.


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 FEATURES
 3 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Recombination events between DNA and RNA molecules occur during meiosis in eukaryotes by gene conversion, illegitimate pairing between paralogons, and by lateral gene transfer. Methods of phylogenetic tree reconstruction, used to infer evolutionary relationships between sequences, are predicated on a model of sequence evolution without recombination. Recombination between sequences violates the assumptions of this model, potentially resulting in a group of sequences with several underlying phylogenies (Posada et al., 2002).

Recombination detection methods have included sequence similarity, distance, phylogeny, compatibility, and substitution distribution methods (Posada et al., 2002). A range of tools are available to identify recombination (e.g. Etherington et al., 2005; Milne et al., 2004). The limitation of most tools is that sequences to be searched must first be aligned in a single multiple alignment. This is appropriate when searching for evidence of recombination in a conserved gene family within or between genomes. However, a single multiple alignment cannot be achieved in several circumstances. First, in a large gene superfamily, where sequence similarity is low, robust alignment may only be possible for each subfamily separately. Second, in genome-scale comparisons chromosomal rearrangements can mean that alignments of whole chromosomes or large chromosomal segments cannot be made. Third, when considering a conserved gene family a sequence may, through recombination, contain fragments of a non-homologous sequence, from elsewhere in the same or another genome, that will not align with the remaining sequences in the alignment. TreeMos addresses these three types of case.

The TreeMos approach is a phylogenomic one because it allows the genetic information of entire genomes to be incorporated within a single analysis. TreeMos was developed to search for phylogenetic mosaicism within sequences which could not be analysed within a single multiple alignment, using the proteins of the rhodopsin G-Protein Coupled Receptor (GPCR) gene superfamily, an example of the first type above, as a test case (Allaby and Woodwark, 2007). TreeMos considers sequences within a data set and looks for high local similarities between non-aligned regions within an alignment, regions in separate alignments, and sequences which may be distant homologues, or contain homologous segments (through recombination) in otherwise non-homologous sequences. The high local similarities are subjected to phylogenetic analyses, in order to identify instances of a change in the nearest neighbour—phylogenetic mosaicism—which are displayed as a reticulate relationship (Fig. 1).


Figure 1
View larger version (42K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Sample graphical output from TreeMos for the Human AA2A gene protein sequence, with respect to other proteins in the Human GPCR gene superfamily. AA2A is represented on the left with scale in residues. Other members of the superfamily which have phylogenetically anomalous relationships with AA2A are represented on the right at reduced scales.

 
Figure 1 illustrates how visualization can lead to the discernment of higher order patterns of phylogenetic mosaicism, such as correlated mosaicism in which many members of a sequence family resemble a distantly related family more than each other within a localized sequence region (Allaby and Woodwark, 2007).


    2 FEATURES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 FEATURES
 3 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 
TreeMos can be run on Mac OS X, Windows or Linux, through a user-friendly web browser interface, from the command line, or as part of a high-throughput pipeline. The program searches multiple alignments and individual DNA or protein sequences, in FASTA format, for phylogenetic anomalies. Default parameters, which can be adjusted by the user, identify the window size over which anomalies are detected, the increment to slide the window, and the maximum genetic distance between a pair of sequences for them to be considered related (Allaby and Woodwark, 2004).

For each sequence, its Global Nearest Neighbour (GNN) is identified by comparison with all sequences in the data set, as are the Local Nearest Neighbours of each window within the sequence (LNNs). Throughout, an automated data screening procedure is used to filter out sequences, which are too dissimilar to be reliably aligned (see Supplementary Fig. 1) (Allaby and Woodwark, 2007). Tree-building is used to identify nearest neighbours, in cases where enough data are available, otherwise distance methods are used. Where the LNN of a particular window differs from the GNN, the window is identified as having a phylogenetically anomalous affiliation. Typically, this entails hundreds or thousands of phylogenetic analyses. The resulting set of anomalies are reported in tab-separated text format, and visualized as an image for each sequence and for each alignment. Log files are generated, recording processing steps and any errors. Sets of results can be archived for browsing at a future date.

The web browser interface allows all functions to be accessed through local web pages, and the resulting set of anomalous affiliations to be visualized interactively with drill-through between affiliated sequences (no connection to the internet is required).

External packages are used to carry out the analyses underlying the algorithm. In release 1.0 BLAST (Altschul et al., 1990) is used to search for local alignments, CLUSTALW (Thompson et al., 1994) is used to search for multiple alignments, and the neighbor, dnadist and protdist programs from the PHYLIP package (Felsenstein, 2005) are used to carry out neighbor-joining tree and distance calculations using gamma-distributed among-site rate heterogeneity with a fixed shape parameter value of 4 based on a recent review of real data (Bofkin and Goldman, 2007). The algorithm is designed to be package-neutral, and the software could be readily modified to use alternative packages with potential for improved performance. Future plans include incorporating S-Search (Smith and Waterman, 1981) for local and MUSCLE (Edgar, 2004) or MAFFT (Katoh et al., 2005) for multiple alignments. For phylogenetic analyses, we intend to assess the use of PHYML (Guindon and Gascuel, 2003) with parameter optimization, and also to carry out substitution model selection by carrying out likelihood ratio tests.


    3 IMPLEMENTATION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 FEATURES
 3 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 
TreeMos is coded in Perl and has been tested on Mac OS X 10.4.9, Windows XP with ActivePerl 5.8.6 installed, and SuSE Linux 9.3. Output files are platform-independent, so results generated on Windows will successfully load on Mac OS X, for example. For graphical navigation, TreeMos uses a personal webserver to execute Perl CGI scripts, which have been tested using Mac OS X 10.4.9 personal web sharing, and using the XAMPP installation of Apache on Windows XP and SuSE Linux 9.3. Executables of the NCBI BLAST (Altschul et al., 1990), PHYLIP (Felsenstein, 2005), and CLUSTAL W (Thompson et al., 1994) packages are distributed with the TreeMos installer. Graphical reporting is accomplished through the GD graphics library (Joye and Boutell, 2007), and a binary version for Mac OS X Intel platforms is included in the installer. On Mac OS X PowerPC, Windows, and Linux platforms, the installer attempts to use the fink, rpm, and cpan modules respectively to install the GD library.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 FEATURES
 3 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 
The development of TreeMos was part-funded by the Biotechnology and Biological Sciences Research Council (BBSRC), UK. We thank the anonymous reviewers for helpful suggestions.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Martin Bishop

Received on September 6, 2007; revised on January 16, 2008; accepted on January 16, 2008

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 FEATURES
 3 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Allaby RG, Woodwark M. Phylogenetic analysis reveals extensive phylogenetic mosaicism in the Human GPCR superfamily. Evol. Bioinformatics (2007) 3:155–168.

    Allaby RG, Woodwark M. Phylogenetics in the bioinformatics culture of understanding. Compar. Funct. Genomics (2004) 5:128–146.[CrossRef]

    Altschul SF, et al. Basic local alignment search tool. J. Mol. Biol (1990) 215:403–410.[CrossRef][Web of Science][Medline]

    Bofkin L, Goldman N. Variation in evolutionary processes at different codon positions. Mol. Biol. Evol (2007) 24:513.[Abstract/Free Full Text]

    Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucl. Acids Res (2004) 32:1792–97.[Abstract/Free Full Text]

    Etherington JG, et al. Recombination Analysis Tool (RAT): a program for the high-throughput detection of recombination. Bioinformatics (2005) 21:278–281.[Abstract/Free Full Text]

    Felsenstein J. PHYLIP (Phylogeny Inference Package) version 3.6. (2005) University of Washington, Seattle: Department of Genome Sciences. Distributed by the author.

    Guindon S, Gascuel O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximul likelihood. Systematic Biology (2003) 52:5:696–704.

    Joye PA, Boutell T. gdLibrary 2.0.34 software application. (2007) Available at http://www.libgd.org.

    Katoh K, et al. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucl. Acids Res (2005) 33:511.[Abstract/Free Full Text]

    Milne I, et al. TOPALi: software for automatic identification of recombinant sequences within DNA multiple alignments. Bioinformatics (2004) 20:1806–1807.[Abstract/Free Full Text]

    Posada D, et al. Recombination in evolutionary genomics. Annu. Rev. Genet (2002) 36:75–97.[CrossRef][Web of Science][Medline]

    Smith TF, Waterman MS. Identification of common molecular subsequences. J. Mol. Biol (1981) 147:195–197.[CrossRef][Web of Science][Medline]

    Thompson JD, et al. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl. Acids Res (1994) 22:4673–4680.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/5/717    most recent
btn027v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Moore, J. D.
Right arrow Articles by Allaby, R. G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Moore, J. D.
Right arrow Articles by Allaby, R. G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?