Skip Navigation


Bioinformatics Advance Access originally published online on August 6, 2009
Bioinformatics 2009 25(19):2530-2536; doi:10.1093/bioinformatics/btp473
This Article
Right arrow Full Text
Right arrow Full Text (Print PDF)
Right arrow Supplementary Data
Right arrow All Versions of this Article:
25/19/2530    most recent
btp473v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Liang, L.-J.
Right arrow Articles by Suchard, M. A.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Liang, L.-J.
Right arrow Articles by Suchard, M. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2009. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Improving phylogenetic analyses by incorporating additional information from genetic sequence databases

Li-Jung Liang 1,*, Robert E. Weiss 2, Benjamin Redelings 3 and Marc A. Suchard 2,4

1Department of Medicine Statistics Core, 2Department of Biostatistics, UCLA School of Public Health, Los Angeles, CA 90095, 3Bioinformatics Research Center, North Carolina State University, Raleigh, NC 27606 and 4Departments of Biomathematics and Human Genetics, UCLA School of Medicine, Los Angeles, CA 90095, USA

*To whom correspondence should be addressed.


   Abstract

Motivation: Statistical analyses of phylogenetic data culminate in uncertain estimates of underlying model parameters. Lack of additional data hinders the ability to reduce this uncertainty, as the original phylogenetic dataset is often complete, containing the entire gene or genome information available for the given set of taxa. Informative priors in a Bayesian analysis can reduce posterior uncertainty; however, publicly available phylogenetic software specifies vague priors for model parameters by default. We build objective and informative priors using hierarchical random effect models that combine additional datasets whose parameters are not of direct interest but are similar to the analysis of interest.

Results: We propose principled statistical methods that permit more precise parameter estimates in phylogenetic analyses by creating informative priors for parameters of interest. Using additional sequence datasets from our lab or public databases, we construct a fully Bayesian semiparametric hierarchical model to combine datasets. A dynamic iteratively reweighted Markov chain Monte Carlo algorithm conveniently recycles posterior samples from the individual analyses. We demonstrate the value of our approach by examining the insertion–deletion (indel) process in the enolase gene across the Tree of Life using the phylogenetic software BALI-PHY; we incorporate prior information about indels from 82 curated alignments downloaded from the BAliBASE database.

Contact: liangl{at}ucla.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Associate Editor: Martin Bishop


Received on February 7, 2009; revised on July 28, 2009; accepted on July 28, 2009

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.