Bioinformatics Advance Access originally published online on January 19, 2007
Bioinformatics 2007 23(7):815-824; doi:10.1093/bioinformatics/btm015
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Assessment of phylogenomic and orthology approaches for phylogenetic inference
1Center for Molecular and Biomolecular Informatics/Nijmegen Center for Molecular Life Sciences, Radboud University Nijmegen Medical Center, P.O. Box 9101, 6500 HB, Nijmegen, The Netherlands, 2Centraalbureau voor Schimmelcultures, Uppsalalaan 8, 3584 CT, Utrecht, The Netherlands and 3Bioinformatics Group, Department of Biology, Utrecht University, Padualaan 8, 3584 CH, Utrecht, The Netherlands
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: Phylogenomics integrates the vast amount of phylogenetic information contained in complete genome sequences, and is rapidly becoming the standard for reliably inferring species phylogenies. There are, however, fundamental differences between the ways in which phylogenomic approaches like gene content, superalignment, superdistance and supertree integrate the phylogenetic information from separate orthologous groups. Furthermore, they all depend on the method by which the orthologous groups are initially determined. Here, we systematically compare these four phylogenomic approaches, in parallel with three approaches for large-scale orthology determination: pairwise orthology, cluster orthology and tree-based orthology.
Results: Including various phylogenetic methods, we apply a total of 54 fully automated phylogenomic procedures to the fungi, the eukaryotic clade with the largest number of sequenced genomes, for which we retrieved a golden standard phylogeny from the literature. Phylogenomic trees based on gene content show, relative to the other methods, a bias in the tree topology that parallels convergence in lifestyle among the species compared, indicating convergence in gene content.
Conclusions: Complete genomes are no guarantee for good or even consistent phylogenies. However, the large amounts of data in genomes enable us to carefully select the data most suitable for phylogenomic inference. In terms of performance, the superalignment approach, combined with restrictive orthology, is the most successful in recovering a fungal phylogeny that agrees with current taxonomic views, and allows us to obtain a high-resolution phylogeny. We provide solid support for what has grown to be a common practice in phylogenomics during its advance in recent years.
Contact: dutilh{at}cmbi.ru.nl
Supplementary information: Supplementary data are available at Bioinformatics online.
Associate Editor: Martin Bishop
Received on October 30, 2006; revised on January 15, 2007; accepted on January 15, 2007
This article has been cited by other articles:
![]() |
B. E. Dutilh, B. Snel, T. J. G. Ettema, and M. A. Huynen Signature Genes as a Phylogenomic Tool Mol. Biol. Evol., August 1, 2008; 25(8): 1659 - 1667. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. E. Dutilh, Y. He, M. L. Hekkelman, and M. A. Huynen Signature, a web server for taxonomic characterization of sequence samples using signature genes Nucleic Acids Res., July 1, 2008; 36(suppl_2): W470 - W474. [Abstract] [Full Text] [PDF] |
||||

