Skip Navigation


Bioinformatics Advance Access originally published online on August 30, 2007
Bioinformatics 2008 24(4):581-583; doi:10.1093/bioinformatics/btm388
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
24/4/581    most recent
btm388v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (30)
Google Scholar
Right arrow Articles by Nylander, J. A.A.
Right arrow Articles by Swofford, D. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Nylander, J. A.A.
Right arrow Articles by Swofford, D. L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

AWTY (are we there yet?): a system for graphical exploration of MCMC convergence in Bayesian phylogenetics

Johan A.A. Nylander 1, James C. Wilgenbusch 1,*, Dan L. Warren 2 and David L. Swofford 1

1School of Computational Sciences, Florida State University, Tallahassee, Florida 32306 and 2Section of Evolution and Ecology, University of California, Davis, California 95616, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE AWTY PROGRAM
 ACKNOWLEDGEMENTS
 REFERENCES
 

Summary: A key element to a successful Markov chain Monte Carlo (MCMC) inference is the programming and run performance of the Markov chain. However, the explicit use of quality assessments of the MCMC simulations—convergence diagnostics—in phylogenetics is still uncommon. Here, we present a simple tool that uses the output from MCMC simulations and visualizes a number of properties of primary interest in a Bayesian phylogenetic analysis, such as convergence rates of posterior split probabilities and branch lengths. Graphical exploration of the output from phylogenetic MCMC simulations gives intuitive and often crucial information on the success and reliability of the analysis. The tool presented here complements convergence diagnostics already available in other software packages primarily designed for other applications of MCMC. Importantly, the common practice of using trace-plots of a single parameter or summary statistic, such as the likelihood score of sampled trees, can be misleading for assessing the success of a phylogenetic MCMC simulation.

Availability: The program is available as source under the GNU General Public License and as a web application at http://ceb.scs.fsu.edu/awty

Contact: jwilgenb{at}scs.fsu.edu


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE AWTY PROGRAM
 ACKNOWLEDGEMENTS
 REFERENCES
 
Despite the growing popularity of MCMC methods in phylogenetics, the use of MCMC convergence diagnostics is still relatively uncommon. Tools for assessing convergence are already available for many statistical models (e.g. Plummer et al., 2005) but they are rarely used in phylogenetic studies [a notable exception is the Tracer software (Ramber and Drummond, 2004) designed for analyzing time-series plots of substitution model parameters]. This is probably due to the fact that convergence diagnostics for parameters specific to phylogenetic trees, such as splits and branch lengths, are few and their performance relatively unexplored.

The difficulties involved with diagnosing convergence in MCMC inference are well documented in the statistical literature (e.g. Brooks and Gelman, 1998; Geweke, 1992) and application of MCMC to Bayesian phylogenetics is no exception. As an example, the most frequently used method for assessing convergence in the phylogenetic literature involves examining trace plots of the likelihood scores for trees sampled by the Markov chain. This approach can, however, be misleading for diagnosing convergence (or lack thereof) as illustrated in the upper row of Figure 1. The plot in the first column shows the output from a Bayesian MCMC simulation where the likelihood trace for two independent runs reaches the same level of apparent stationarity. Posterior probabilities of splits continue, however, to change over the length of the simulation (Fig. 1, third column).


Figure 1
View larger version (21K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Examples of output from AWTY. The figure shows the graphical exploration of the output from two different DNA data sets (upper row: 86 sequences, lower row: 62 sequences) analyzed in MrBayes v.3.1.2 (Ronquist and Huelsenbeck, 2002). Two separate simulations (indicated by red and blue colors) were run for each data set using the GTR+I+{Gamma} model. The first graph shows the trace plot of the log likelihood and the sampled values reach apparent stationarity for both simulations in both data sets. The second graph is a bivariate plot of the split frequencies for the first and second run of the simulations. A low correlation (upper row) diagnoses lack of convergence. The third graph shows the cumulative split frequency for a number of selected splits for one of the individual simulations. A trend in frequencies (upper row) diagnoses lack of convergence. The fourth graph shows the cumulative split frequency (upper part) and the corresponding presence (+) and absence (–) for a single split (lower part) over the two simulations. A slow mixing—where the chain moves slowly in parameter space—resulting in a trend in the cumulative split frequency is apparent in the upper row. The last graph compares the symmetric tree-difference score (Penny and Hendy, 1985) within and between (dashed line) simulations. A between-run distance well separated from the within-run distance (upper row) diagnoses lack of convergence.

 
This example emphasizes the fact that trees with similar likelihoods are not necessarily close in parameter (tree) space and judging the success of a MCMC from the likelihood trace alone might lead to inaccurate and misleading results (Huelsenbeck et al., 2002; Nylander et al., 2004). It also emphasizes that using a range of MCMC diagnostics is important and that graphical exploration of tree-specific parameters is a crucial complement to existing diagnostics tools and should routinely be applied in phylogenetic analyses using MCMC.


    2 THE AWTY PROGRAM
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE AWTY PROGRAM
 ACKNOWLEDGEMENTS
 REFERENCES
 
2.1 Program features
The AWTY program takes as input the phylogenetic trees generated as output by other phylogenetic MCMC programs; MrBayes (Ronquist and Huelenbeck, 2002), BEAST (Drummond and Rambaut, 2006) and BAMBE (Simon and Larget, 2000) formats are currently supported. A number of diagnostic analyses can then be performed on the trees and visualized graphically. The main focus is on splits or clades and Figure 1 shows some examples where properties related to splits are compared within and among separate MCMC runs of the same data. Other features available are, e.g. Geweke's; diagnostic (Geweke, 1992) and Brooks and Gelman's; Formula -interval diagnostic (Brooks and Gelman, 1998) for branch lengths.

Many of the diagnostics implemented in AWTY are based on a post hoc approach where the output from a MCMC analysis is examined and compared over replicated runs. The underlying assumption is that simulations started from independent starting values should have similar properties at convergence (Brooks and Gelman, 1998). It must be emphasized, however, that this approach cannot guarantee convergence per se but is primarily a method for diagnosing lack of convergence in one (or several) runs. Furthermore, the success of the post hoc approach is dependent on the number of individual runs and the performance and behavior of each individual run. The information on the latter, such as proposal/acceptance ratios, should be included in the overall assessment of the success of an MCMC simulation and should be the focus in further research on MCMC applications in Bayesian phylogenetics.

2.2 Implementation details
The main routine in AWTY is written in Perl and uses the program PAUP* (Swofford, 2003) for handling phylogenetic trees. The graphical output is generated by GNUPLOT (Williams and Kelly, 2006). For some of the convergence diagnostics, AWTY uses the CODA package (Plummer et al., 2005) written in R (R Development Core Team, 2006) through the R-from-Perl interface RSPerl (Temple-Lang, 2006). The program can be run using either a command-line UNIX-type interface, or via a Gtk2/Tk interface provided by the Perl modules Getopt:GUI:Long and QWizard (Hardaker, 2006). In addition, the program comes with a web interface written in PHP4 and runs on any web server such as Apache.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE AWTY PROGRAM
 ACKNOWLEDGEMENTS
 REFERENCES
 
Clemens Lakner, Mark Holder, Fredrik Ronquist and Wes Hardaker are thanked for advice on MCMC diagnostics and programming.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Keith Crandall

Received on February 8, 2007; revised on May 25, 2007; accepted on July 20, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE AWTY PROGRAM
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Brooks SP, Gelman A. General methods for monitoring convergence of iterative simulations. J. Comput. Graphi. Stat (1998) 7:434–455.[CrossRef]

    Drummond AJ, Rambaut A. BEAST v1.4. (2006) http://beast.bio.ed.ac.uk/.

    Geweke J. Evaluating the accuracy of sampling-based approaches to calculating posterior moments. In: Bayesian Statistics 4.—Bernado JM, et al, eds. (1992) Clarendon Press, Oxford UK.

    Hardaker W. Getopt:GUI:Long Version 0.62, and QWizard Version 3.03. (2006) http://www.cpan.org.

    Huelsenbeck JP, et al. Potential applications and pitfalls of Bayesian inference of phylogeny. Syst. Biol (2002) 51:673–688.[CrossRef][Web of Science][Medline]

    Nylander JAA, et al. Bayesian phylogenetic analysis of combined data. Syst. Biol (2004) 53:47–67.[Abstract/Free Full Text]

    Penny D, Hendy MD. The use of tree comparison metrics. Systematic Zool (1985) 34:75–82.[CrossRef]

    Plummer M, et al. CODA: output analysis and diagnostics for Markov Chain Monte Carlo simulations. (2006) http://cran.Rproject.org.

    R Development Core Team. R: a language and environment for statistical computing, R Foundation for Statistical Computing. (2006) Vienna, Austria. http://www.R-project.org.

    Rambaut A, Drummond AJ. Tracer v1.3. (2004) http://evolve.zoo.ox.ac.uk/software.html.

    Ronquist F, Huelsenbeck JP. MRBAYES 3: Bayesian phylogeneticinference under mixed models. Bioinformatics (2003) 19:1572–1574.[Abstract/Free Full Text]

    Simon D, Larget B. Bayesian analysis in molecular biology and evolution (BAMBE), version 2.03 beta. (2000) Department of Mathematics and Computer Science, Duquesne University.

    Swofford DL. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). (2003) Version 4. Sinauer Associates, Sunderland, Massachusetts, USA.

    Temple-Lang D. RSPerl Version 0.83. (2006) http://www.omegahat.org/RSPerl.

    Williams T, Kelley C. GNUPLOT, an interactive plotting program, Version 4.0. (2006) http://www.gnuplot.info.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Syst BiolHome page
A. D. Leache
Species Tree Discordance Traces to Phylogeographic Clade Boundaries in North American Fence Lizards (Sceloporus)
Syst Biol, December 1, 2009; 58(6): 547 - 559.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
M. E. Steeman, M. B. Hebsgaard, R. E. Fordyce, S. Y. W. Ho, D. L. Rabosky, R. Nielsen, C. Rahbek, H. Glenner, M. V. Sorensen, and E. Willerslev
Radiation of Extant Cetaceans Driven by Restructuring of the Oceans
Syst Biol, December 1, 2009; 58(6): 573 - 585.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
D. San Mauro, D. J. Gower, T. Massingham, M. Wilkinson, R. Zardoya, and J. A. Cotton
Experimental Design in Caecilian Systematics: Phylogenetic Information of Mitochondrial Genomes and Nuclear rag1
Syst Biol, August 18, 2009; (2009) syp043v1.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
N. C. Sheffield, H. Song, S. L. Cameron, and M. F. Whiting
Nonstationary Evolution and Compositional Heterogeneity in Beetle Mitochondrial Phylogenomics
Syst Biol, August 1, 2009; 58(4): 381 - 394.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
A. D. Leache, M. S. Koo, C. L. Spencer, T. J. Papenfuss, R. N. Fisher, and J. A. McGuire
From the Cover: Quantifying ecological, morphological, and genetic variation to delimit species in the coast horned lizard species complex (Phrynosoma)
PNAS, July 28, 2009; 106(30): 12418 - 12423.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
B. Frajman, F. Eggens, and B. Oxelman
Hybrid Origins and Homoploid Reticulate Evolution within Heliosperma (Sileneae, Caryophyllaceae)--A Multigene Phylogenetic Approach with Relative Dating
Syst Biol, July 3, 2009; (2009) syp030v1.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
M. C. Brandley, D. L. Warren, A. D. Leache, and J. A. McGuire
Homoplasy and Clade Support
Syst Biol, June 29, 2009; (2009) syp019v1.
[Abstract] [Full Text] [PDF]


Home page
Proc R Soc BHome page
S. Boessenkool, J. J Austin, T. H Worthy, P. Scofield, A. Cooper, P. J Seddon, and J. M Waters
Relict or colonizer? Extinction and range expansion of penguins in southern New Zealand
Proc R Soc B, March 7, 2009; 276(1658): 815 - 821.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
B. H. Bird, J. W. K. Githinji, J. M. Macharia, J. L. Kasiiti, R. M. Muriithi, S. G. Gacheru, J. O. Musaa, J. S. Towner, S. A. Reeder, J. B. Oliver, et al.
Multiple Virus Lineages Sharing Recent Common Ancestry Were Associated with a Large Rift Valley Fever Outbreak among Livestock in Kenya during 2006-2007
J. Virol., November 15, 2008; 82(22): 11152 - 11166.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
G. Baele, Y. Van de Peer, and S. Vansteelandt
A Model-Based Approach to Study Nearest-Neighbor Influences Reveals Complex Substitution Patterns in Non-coding Sequences
Syst Biol, October 1, 2008; 57(5): 675 - 692.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
24/4/581    most recent
btm388v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (30)
Google Scholar
Right arrow Articles by Nylander, J. A.A.
Right arrow Articles by Swofford, D. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Nylander, J. A.A.
Right arrow Articles by Swofford, D. L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?