Skip Navigation


Bioinformatics Advance Access originally published online on May 5, 2006
Bioinformatics 2006 22(16):2047-2048; doi:10.1093/bioinformatics/btl175
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/16/2047    most recent
btl175v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (14)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Suchard, M. A.
Right arrow Articles by Redelings, B. D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Suchard, M. A.
Right arrow Articles by Redelings, B. D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

BAli-Phy: simultaneous Bayesian inference of alignment and phylogeny

Marc A. Suchard 1,2,* and Benjamin D. Redelings 3

1 Department of Biomathematics, David Geffen School of Medicine at UCLA Los Angeles, CA 90095, USA
2 Department of Human Genetics, David Geffen School of Medicine at UCLA Los Angeles, CA 90095, USA
3 Bioinformatics Research Center, North Carolina State University Raleigh, NC 27606, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 SOFTWARE OVERVIEW
 EXAMPLE
 REFERENCES
 

Summary: BAli-Phy is a Bayesian posterior sampler that employs Markov chain Monte Carlo to explore the joint space of alignment and phylogeny given molecular sequence data. Simultaneous estimation eliminates bias toward inaccurate alignment guide-trees, employs more sophisticated substitution models during alignment and automatically utilizes information in shared insertion/deletions to help infer phylogenies.

Availability: Software is available for download at http://www.biomath.ucla.edu/msuchard/bali-phy.

Contact: msuchard{at}ucla.edu


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 SOFTWARE OVERVIEW
 EXAMPLE
 REFERENCES
 
Phylogenetic methods to reconstruct the evolutionary tree relating molecular sequence data traditionally condition on a single, sometimes poorly estimated multiple sequence alignment (Holder and Lewis, 2003). This alignment specifies which residues across the sequences are derived from a common origin. Conditioning on a poor alignment derived from an inaccurate guide-tree can cause bias and inappropriate inference in evolutionary studies (Lake, 1991). This concern is particularly poignant for highly diverse sequences, where the complete alignment is not obvious. We provide a novel Bayesian program BAli-Phy that simultaneously estimates the alignment and phylogenetic tree that relate molecular sequences. This sidesteps the bias issue inherent in sequential estimation.


    SOFTWARE OVERVIEW
 TOP
 ABSTRACT
 INTRODUCTION
 SOFTWARE OVERVIEW
 EXAMPLE
 REFERENCES
 
Redelings and Suchard (2005) introduce a joint estimation model for alignment and phylogeny and describe a Markov chain Monte Carlo (MCMC) approach to generate random samples from the joint model posterior. We briefly review the salient features of the model here. Conditional on a given alignment, the model employs standard continuous-time Markov chain (CTMC) processes to describe residue substitution along the branches of an unknown tree relating the sequences. To remove this conditioning and treat alignments as unknown parameters, the model further assumes a prior distribution over all possible alignments. We construct this distribution from a set of hidden Markov models with affine gap penalties that describe the pairwise alignments along each branch of the tree.

To generate posterior samples, BAli-Phy employs a Metropolis-within-Gibbs (Tierney, 1994) approach. We construct the random-scan Gibbs cycle from straightforward Metropolis-Hastings proposals for updating branch lengths and substitution and indel parameters and several unique steps for updating the alignment and topology. These latter steps rely simultaneously on subtree transfer operators and dynamic programming through the Forward–Backward algorithm (Scott, 2002) and extend the work of Holmes and Bruno (2001) to provide good convergence properties to the sampler.

BAli-Phy also contains a number of tools to summarize the joint posterior samples. The most important among these is Alignment-gild that produces alignment uncertainty (AU, pronounced ‘gold’) plots. AU plots depict an estimate of the maximum a posteriori alignment annotated to identify features (residues or gaps) with positional variability in the posterior samples. Alignment-gild renders these plots in HTML for cross-platform viewing.

BAli-Phy can also sample from more traditional phylogenetic models in which the alignment of the leaf sequences is fixed. When this alignment is fixed, users have the option of including the indel process prior and sampling the alignment states of the internal nodes or excluding this prior completely. Indels shared by common descent will influence the posterior when the indel prior is included, while excluding the prior results in a residue-substitution-only model such as that sampled by MrBayes (Huelsenbeck and Ronquist, 2001).

BAli-Phy currently implements several CTMC processes for residue-substitution, including the JC69, HKY85 and TN93 nucleotide models, several codon-based models and empirically estimated amino acid models. Gamma-distributed rate variation and invariant sites extensions are available. Sequential estimation generally assumes a naive model for residue-substitution during the alignment phase. In contrast, joint estimation may employ any of these more sophisticated processes to inform the alignment. We distribute BAli-Phy as C++ source code and precompiled binaries. BAli-Phy should run on all hardware with a modern operating system such as Linux, Windows or Mac OS X.


    EXAMPLE
 TOP
 ABSTRACT
 INTRODUCTION
 SOFTWARE OVERVIEW
 EXAMPLE
 REFERENCES
 
Considerable debate surrounds the early history of life on Earth. Molecular sequences across the Tree of Life are highly diverse and troublesome to align. Phylogenies based on some arbitrary alignments suggest three major domains of all organisms: Eubacteria, Archaea and Eukaryotes; other alignments support four, in which the Archaea are subdivided into Euryarchaeota and Eocytes (Brown and Doolittle, 1997). We examine elongation factor 1{alpha}/Tu sequences from 24 species using the WAG+{Gamma}+INV substitution model. Figure 1 presents the most probable evolutionary tree relating these sequences and a portion of the AU plot. This highly supported tree divides life into at least four domains, furthering the Eocyte hypothesis unconditional on alignment. The shared indel event labeled ‘A’ in the figure has been hypothesized previously (Rivera and Lake, 1992) and adds strength to the conclusion.


Figure 1
View larger version (42K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 Maximum a posteriori topology for 24 EF-1{alpha}/Tu sequences across the Tree of Life (left) and two separate portions of the alignment uncertainty (AU) plot (right) for a subsample of these sequences. Branch lengths equal posterior mean estimates and line-style depicts partition credibility. In the AU plot, well-resolved entries have a red background, whereas less certain entries have backgrounds tending towards violet based on an approximate probability that each entry is homologous with a residue at the root in each column. Four different types of topologically informative insertion/deletion events (A, B, C and D) are highlighted.

 

    Acknowledgments
 
B.D.R. was supported by NIH training grant GM008185 and NSF training grant DGE9987641. M.A.S. is partially supported by NIH grant GM068955 and USPHS grant CA16042.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Keith A Crandall

Received on February 17, 2006; revised on April 28, 2006; accepted on May 1, 2006

    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 SOFTWARE OVERVIEW
 EXAMPLE
 REFERENCES
 

    Brown, J. and Doolittle, W. (1997) Archaea and the prokaryote-to-eukaryote transition. Microbiol. Mol. Biol. Rev, . 61, 456–502[Abstract/Free Full Text].

    Holder, M. and Lewis, P. (2003) Phylogeny estimation: traditional and Bayesian approaches. Nat. Rev. Genet, . 4, 275–284[CrossRef][Web of Science][Medline].

    Holmes, I. and Bruno, W. (2001) Evolutionary HMMs: a Bayesian approach to multiple alignment. Bioinformatics, 17, 803–820[Abstract/Free Full Text].

    Huelsenbeck, J. and Ronquist, F. (2001) MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics, 17, 754–755[Abstract/Free Full Text].

    Lake, J. (1991) The order of sequence alignment can bias the selection of tree topology. Mol. Biol. Evol, . 8, 378–385[Web of Science][Medline].

    Redelings, B. and Suchard, M. (2005) Joint Bayesian estimation of alignment and phylogeny. Syst. Biol, . 54, 401–418[Abstract/Free Full Text].

    Rivera, M. and Lake, J. (1992) Evidence that eukaryotes and eocyte prokaryotes are immediate relatives. Science, 257, 74–76[Abstract/Free Full Text].

    Scott, S. (2002) Bayesian methods for hidden Markov models, recursive computing in the 21st century. J. Am. Stat. Asso, . 97, 337–551[CrossRef][Web of Science].

    Tierney, L. (1994) Markov chains for exploring posterior distributions (with discussion). Ann. Stat, . 22, 1701–1762[CrossRef].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Stat Methods Med ResHome page
I. Miklos, A. Novak, R. Satija, R. Lyngso, and J. Hein
Stochastic models of sequence evolution including insertion--deletion events
Statistical Methods in Medical Research, October 1, 2009; 18(5): 453 - 485.
[Abstract] [PDF]


Home page
BioinformaticsHome page
L.-J. Liang, R. E. Weiss, B. Redelings, and M. A. Suchard
Improving phylogenetic analyses by incorporating additional information from genetic sequence databases
Bioinformatics, October 1, 2009; 25(19): 2530 - 2536.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
B. Misof and K. Misof
A Monte Carlo Approach Successfully Identifies Randomness in Multiple Sequence Alignments: A More Objective Means of Data Exclusion
Syst Biol, May 20, 2009; (2009) syp006v1.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. Anisimova and C. Kosiol
Investigating Protein-Coding Sequence Evolution with Probabilistic Codon Substitution Models
Mol. Biol. Evol., February 1, 2009; 26(2): 255 - 271.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
K. Somogyi, B. Sipos, Z. Penzes, E. Kurucz, J. Zsamboki, D. Hultmark, and I. Ando
Evolution of Genes and Repeats in the Nimrod Superfamily
Mol. Biol. Evol., November 1, 2008; 25(11): 2337 - 2347.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Novak, I. Miklos, R. Lyngso, and J. Hein
StatAlign: an extendable software package for joint Bayesian estimation of alignments and evolutionary trees
Bioinformatics, October 15, 2008; 24(20): 2403 - 2404.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. D. Fernandes and W. R. Atchley
Site-specific evolutionary rates in proteins are better modeled as non-independent and strictly relative
Bioinformatics, October 1, 2008; 24(19): 2177 - 2183.
[Abstract] [Full Text] [PDF]


Home page
ScienceHome page
K. M. Wong, M. A. Suchard, and J. P. Huelsenbeck
Alignment Uncertainty and Genomic Analysis
Science, January 25, 2008; 319(5862): 473 - 476.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. K. Bradley and I. Holmes
Transducers: an emerging probabilistic framework for modeling indels on trees
Bioinformatics, December 1, 2007; 23(23): 3258 - 3262.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Tomovic and E. J. Oakeley
Quality estimation of multiple sequence alignments by Bayesian hypothesis testing
Bioinformatics, September 15, 2007; 23(18): 2488 - 2490.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
X. Didelot and D. Falush
Inference of Bacterial Microevolution Using Multilocus Sequence Data
Genetics, March 1, 2007; 175(3): 1251 - 1266.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/16/2047    most recent
btl175v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (14)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Suchard, M. A.
Right arrow Articles by Redelings, B. D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Suchard, M. A.
Right arrow Articles by Redelings, B. D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?