Skip Navigation


Bioinformatics Advance Access originally published online on August 23, 2006
Bioinformatics 2006 22(21):2688-2690; doi:10.1093/bioinformatics/btl446
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary data
Right arrowOA All Versions of this Article:
22/21/2688    most recent
btl446v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (63)
Google Scholar
Right arrow Articles by Stamatakis, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Stamatakis, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2006 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models

Alexandros Stamatakis

Swiss Federal Institute of Technology Lausanne, School of Computer and Communication Sciences Lab Prof. Moret, STATION 14, CH-1015 Lausanne, Switzerland


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 OPTIMIZATIONS OF RAxML
 3 RESULTS AND DISCUSSION
 4 CONCLUSION AND FUTURE...
 REFERENCES
 

Summary: RAxML-VI-HPC (randomized axelerated maximum likelihood for high performance computing) is a sequential and parallel program for inference of large phylogenies with maximum likelihood (ML). Low-level technical optimizations, a modification of the search algorithm, and the use of the GTR+CAT approximation as replacement for GTR+{Gamma} yield a program that is between 2.7 and 52 times faster than the previous version of RAxML. A large-scale performance comparison with GARLI, PHYML, IQPNNI and MrBayes on real data containing 1000 up to 6722 taxa shows that RAxML requires at least 5.6 times less main memory and yields better trees in similar times than the best competing program (GARLI) on datasets up to 2500 taxa. On datasets ≥4000 taxa it also runs 2–3 times faster than GARLI. RAxML has been parallelized with MPI to conduct parallel multiple bootstraps and inferences on distinct starting trees. The program has been used to compute ML trees on two of the largest alignments to date containing 25 057 (1463 bp) and 2182 (51 089 bp) taxa, respectively.

Availability: icwww.epfl.ch/~stamatak

Contact: Alexandros.Stamatakis{at}epfl.ch

Supplementary information: Supplementary data are available at Bioinformatics online.


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 OPTIMIZATIONS OF RAxML
 3 RESULTS AND DISCUSSION
 4 CONCLUSION AND FUTURE...
 REFERENCES
 
Phylogenetic inference with the maximum likelihood (ML) method is NP-hard (Chor and Tuller, 2005). Despite the algorithmic complexity and the high-computational cost of ML, significant progress has been achieved with the release of fast and accurate programs such as PHYML (Guindon and Gascuel, 2003), IQPNNI (Minh et al., 2005), MrBayes (Ronquist and Huelsenbeck, 2003), GARLI (Zwickl, 2006) and RAxML (Stamatakis et al., 2005). Most of these programs allow for inference of 1000 taxon trees on a single CPU in <24 h.

This paper describes the new version of RAxML [Randomized axelerated maximum likelihood for high performance computing (RAxML-VI-HPC, v2.0.1)], which is significantly faster than the previous versions of RAxML due to simple, yet very efficient technical optimizations and a slight alteration of the search algorithm. In addition, RAxML has been parallelized with MPI to enable parallel bootstrapping and multiple inferences on distinct starting trees on PC clusters. Moreover, it implements bifurcating and multifurcating constraint trees and the capability to assign and estimate separate model parameters1 for individual genes of multi-gene alignments (mixed/partitioned models).

The main focus is on the computation of huge trees (≥1000 taxa) for real-world data and the comparative performance study with GARLI, IQPNNI, MrBayes and PHYML. Since the efficiency of the novel optimizations in RAxML-VI-HPC increases with the number of taxa, less significant performance improvements will be observed on smaller datasets. Performance comparisons of RAxML with other popular ML programs on smaller datasets, including simulated alignments, can be found in Hordijk and Gascuel (2005), Stamatakis et al. (2005) and Zwickl (2006). Finally, the experimental study also shows that the GTR+CAT approximation [see Stamatakis (2006) for a detailed description] can be efficiently deployed as a replacement for the significantly more compute- and memory intensive GTR+{Gamma} model.

Some of the largest published ML-based analyses to date have been conducted using RAxML (Robertson et al., 2005; Ley et al., 2005, 2006). On-going work includes the computation of a backbone tree for Bacteria with ~9000 taxa, a phylogeny for Acer with 582 taxa, and the analysis of a mammalian multi-gene alignment comprising 2182 sequences.


    2 OPTIMIZATIONS OF RAxML
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 OPTIMIZATIONS OF RAxML
 3 RESULTS AND DISCUSSION
 4 CONCLUSION AND FUTURE...
 REFERENCES
 
A detailed description of the optimizations listed below is provided in the on-line supplement. The main improvements cover:

  • An efficient mechanism to store and re-store topologies and branch lengths via rearrangement descriptors.
  • A consequent re-use of partial likelihood vectors.
  • A dynamic adaptation of the rearrangement distance.
  • Low-level optimization of the GTR+CAT and GTR+{Gamma} likelihood functions.
  • An efficient re-implementation of Maximum Parsimony starting tree computations.

An important and generally applicable insight from those optimizations is that storing and re-storing an unrooted tree topology with 2n–3 branch lengths and 2n–2 nodes can become a major performance bottleneck for trees with >1000 taxa. It is thus important to store alternative topologies as a sequence of topological changes applied to the current topology rather than as complete data object. Only the consequent avoidance of storage operations reveals the actual power of the Lazy Subtree Rearrangement (LSR) mechanism introduced in Stamatakis et al. (2005).

Another issue which becomes important for huge trees is to determine a ‘good’ rearrangement distance, i.e. re-insertion radius for the LSR moves. In RAxML-VI the algorithm initially determines the best rearrangement distance by applying distances of 5, 10, ... , 25 for one iteration of LSRs, to the starting tree. The minimum rearrangement distance which yields the best likelihood improvement on the starting tree is then selected for the inference. Despite the extra computations which are performed, a ‘good’ rearrangement distance pays off in terms of likelihood units for huge alignments with large evolutionary diameters (e.g. the 6722 and 7769 taxa alignments, see Supplementary Table 2).


    3 RESULTS AND DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 OPTIMIZATIONS OF RAxML
 3 RESULTS AND DISCUSSION
 4 CONCLUSION AND FUTURE...
 REFERENCES
 
The exact experimental set-up as well as the results are described in detail in the on-line supplement. Table and Figure numbers also refer to the on-line supplement.

Results in Supplementary Table 2 show that RAxML-VI-HPC clearly outperforms RAxML-V in terms of inference times. In addition, due to the usage of a ‘good’ rearrangement setting it also yields significantly better log-likelihood values on the larger and more diverse datasets ≥4000 taxa. Supplementary Figure 3 shows the significant computational advantages of the GTR+CAT over the GTR+{Gamma} implementation in RAxML-VI.

Supplementary Tables 3–6 indicate that RAxML-VI-HPC outperforms other current sequential phylogeny programs, on huge datasets with respect to inference times, memory consumption as well as final log-likelihood values. In addition, the performance advantage with respect to run-times increases with growing alignment size (Supplementary Table 5). Another important result is that the GTR+CAT approximation (Supplementary Table 3) can be used to significantly reduce memory consumption and still yield significantly better GTR+{Gamma} likelihood values (Supplementary Table 4) than competing programs.

GARLI terminated within approximately the same time as RAxML-VI-HPC on the six smaller datasets and yielded the second-best likelihood score in all cases. This is an astonishing achievement for several reasons: GARLI implements a genetic search algorithm and was executed under GTR+{Gamma}. Moreover, it maintains a whole population of trees in memory, including some intelligently selected (Zwickl, 2006) partial likelihood vectors as well as all tree topologies. Thus, it is expected to be slower than the RAxML hill-climbing algorithm. This extraordinary performance is due to the sophisticated implementation of the likelihood function and promising algorithmic ideas (Zwickl, 2006) such that the forthcoming publication about GARLI is surely something to look forward to. Note that, the parallel genetic search algorithm of GARLI performs a distinct and more thorough search, that yields, e.g. better final trees on the 1000 taxon alignment (Zwickl, 2006). However, the focus of the current study is on the strictly sequential versions of all programs.

The performance of the new version of MrBayes is also remarkable. Given that it has to maintain four distinct Markov chains, the relatively low memory consumption in combination with acceptable likelihood values after 60 h under GTR+{Gamma}, the performance is quite impressive. As Bayesian inference conceptually differs from pure ML-based inference, a comparison based on likelihood scores is certainly not fair since it uses MrBayes as an ML heuristic. MrBayes has mainly been included owing to its popularity.

IQPNNI and PHYML both suffer from a relatively inefficient technical implementation. The high memory consumption of IQPNNI and PHYML is due to a different memory organization which uses two likelihood vectors per branch (3n – 6 vectors) instead of one per inner node (n – 2 vectors).

Moreover, PHYML uses NNI moves which only exploit a very small fraction of the search space. A solution to this problem has been proposed by Hordijk and Gascuel (2005). However, the respective program is currently only available as proof-of-concept implementation (W. Hordijk and O. Gascuel, personal communication) and cannot be used for large trees owing to numerical problems.

In the final analysis, it can be stated that technical implementation aspects are becoming increasingly important and can yield significant performance improvements. In addition, in all programs there exist excellent algorithmic ideas which in the optimal case could significantly advance the field, when merged into one program.


    4 CONCLUSION AND FUTURE WORK
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 OPTIMIZATIONS OF RAxML
 3 RESULTS AND DISCUSSION
 4 CONCLUSION AND FUTURE...
 REFERENCES
 
The new version VI of RAxML has been presented, which incorporates efficient technical optimizations, parallel OpenMP- and MPI-based implementations, and a mixed model implementation. A thorough experimental study on large real-world datasets shows that RAxML can find better trees with a significantly lower memory consumption within similar or less time than the best competing program.

Future work will mainly cover the development of new methods for rapid bootstrapping. Despite the fact, that RAxML and GARLI allow for inference of huge trees with ML in reasonable times, conducting a full biological analysis still requires at least 100 or 1000 bootstraps which places the computational burden much higher than for the inference of a single ML tree.


    Acknowledgments
 
The author would like to thank Derrick Zwickl, Wim Hordijk, Olivier Gascuel, B.Q. Minh, L.S. Vinh and Bret Larget for useful discussions on experimental set-up and their programs. He would also like to thank Usman Roshan, Charles Robertson, Josh Wilcox, Robin Gutell and Daniel Dalevi for providing the alignment data. Funding to pay the Open Access publication charges for this article was provided by Swiss Confederation Funding.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Keith A Crandall

1CAT and {Gamma} cannot be used simultaneously in the same analysis. Back

Received on May 15, 2006; revised on July 21, 2006; accepted on August 16, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 OPTIMIZATIONS OF RAxML
 3 RESULTS AND DISCUSSION
 4 CONCLUSION AND FUTURE...
 REFERENCES
 

    Chor, B. and Tuller, T. (2005) Maximum likelihood of evolutionary trees: hardness and approximation. Bioinformatics, 21, 97–106.

    Guindon, S. and Gascuel, O. (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol, . 52, 696–704[CrossRef][ISI][Medline].

    Hordijk, W. and Gascuel, O. (2005) Improving the efficiency of SPR moves in phylogenetic tree search methods based on maximum likelihood. Bioinformatics, 21, 4338–4347[Abstract/Free Full Text].

    Ley, R., et al. (2005) Obesity alters gut microbial ecology. Proc. Natl Acad. Sci. USA, 102, 11070–11075[Abstract/Free Full Text].

    Ley, R.E., et al. (2006) Unexpected diversity and complexity of the guerrero negro hypersaline microbial mat. Appl. Envir. Microbiol, . 72, 3685–3695[Abstract/Free Full Text].

    Minh, B.Q., et al. (2005) pIQPNNI: parallel reconstruction of large maximum likelihood phylogenies. Bioinformatics, 21, 3794–3796[Abstract/Free Full Text].

    Robertson, C., et al. (2005) Phylogenetic diversity and ecology of environmental Archaea. Curr. Opin. Microbiol, . 8, 638–642[ISI][Medline].

    Ronquist, F. and Huelsenbeck, J. (2003) Mrbayes 3: bayesian phylogenetic inference under mixed models. Bioinformatics, 19, 1572–1574[Abstract/Free Full Text].

    Stamatakis, A. (2006) Phylogenetic models of rate heterogeneity: a high performance computing perspective. Proceedings of the IPDPS2006Rhodos, Greece.

    Stamatakis, A., et al. (2005) Raxml-iii: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics, 21, 456–463[Abstract/Free Full Text].

    Zwickl, D. (2006) Genetic algorithm approaches for the phylogenetic analysis of large biologiical sequence datasets under the maximum likelihood criterion. PhD thesis,, TX University of Texas at Austin.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Mol Biol EvolHome page
R. Burri, H. N. Hirzel, N. Salamin, A. Roulin, and L. Fumagalli
Evolutionary Patterns of MHC Class II B in Owls and Their Implications for the Understanding of Avian MHC Evolution
Mol. Biol. Evol., June 1, 2008; 25(6): 1180 - 1191.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
I. Ruiz-Trillo, A. J. Roger, G. Burger, M. W. Gray, and B. F. Lang
A Phylogenomic Investigation into the Origin of Metazoa
Mol. Biol. Evol., April 1, 2008; 25(4): 664 - 672.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
A. Zuccaro, C. L. Schoch, J. W. Spatafora, J. Kohlmeyer, S. Draeger, and J. I. Mitchell
Detection and Identification of Fungi Intimately Associated with the Brown Seaweed Fucus serratus
Appl. Envir. Microbiol., February 15, 2008; 74(4): 931 - 941.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
T. A. Isenbarger, M. Finney, C. Rios-Velazquez, J. Handelsman, and G. Ruvkun
Miniprimer PCR, a New Lens for Viewing the Microbial World
Appl. Envir. Microbiol., February 1, 2008; 74(3): 840 - 849.
[Abstract] [Full Text] [PDF]


Home page
ANN BOT (LOND)Home page
G. W. Grimm and T. Denk
ITS Evolution in Platanus (Platanaceae): Homoeologues, Pseudogenes and Ancient Hybridization
Ann. Bot., February 1, 2008; 101(3): 403 - 419.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
N. Cusimano, L.-B. Zhang, and S. S. Renner
Reevaluation of the cox1 Group I Intron in Araceae and Angiosperms Indicates a History Dominated by Loss rather than Horizontal Transfer
Mol. Biol. Evol., February 1, 2008; 25(2): 265 - 276.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
J. B. Dacks, P. P. Poon, and M. C. Field
Phylogeny of endocytic components yields insight into the process of nonendosymbiotic organelle evolution
PNAS, January 15, 2008; 105(2): 588 - 593.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
J. W. Sahl, R. Schmidt, E. D. Swanner, K. W. Mandernack, A. S. Templeton, T. L. Kieft, R. L. Smith, W. E. Sanford, R. L. Callaghan, J. B. Mitton, et al.
Subsurface Microbial Diversity in Deep-Granitic-Fracture Water in Colorado
Appl. Envir. Microbiol., January 1, 2008; 74(1): 143 - 152.
[Abstract] [Full Text] [PDF]


Home page
Integr. Comp. Biol.Home page
D. Q. Matus, K. M. Halanych, and M. Q. Martindale
The Hox gene complement of a pelagic chaetognath, Flaccisagitta enflata
Integr. Comp. Biol., December 1, 2007; 47(6): 854 - 864.
[Abstract] [Full Text] [PDF]


Home page
Integr. Comp. Biol.Home page
W. E. Browne, S. H. D. Haddock, and M. Q. Martindale
Phylogenetic analysis of lineage relationships among hyperiid amphipods as revealed by examination of the mitochondrial gene, cytochrome oxidase I (COI)
Integr. Comp. Biol., December 1, 2007; 47(6): 815 - 830.
[Abstract] [Full Text] [PDF]


Home page
Int ImmunolHome page
K. Rannikko, C. Ortutay, and M. Vihinen
Immunity genes and their orthologs: a multi-species database
Int. Immunol., December 1, 2007; 19(12): 1361 - 1370.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
V. Soria-Carrasco, G. Talavera, J. Igea, and J. Castresana
The K tree score: quantification of differences in the relative branch length and topology of phylogenetic trees
Bioinformatics, November 1, 2007; 23(21): 2954 - 2956.
[Abstract] [Full Text] [PDF]


Home page
Plant CellHome page
J. K. Hane, R. G.T. Lowe, P. S. Solomon, K.-C. Tan, C. L. Schoch, J. W. Spatafora, P. W. Crous, C. Kodira, B. W. Birren, J. E. Galagan, et al.
Dothideomycete Plant Interactions Illuminated by Genome Sequencing and EST Analysis of the Wheat Pathogen Stagonospora nodorum
PLANT CELL, November 1, 2007; 19(11): 3347 - 3368.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
J. R. Spear, H. A. Barton, C. E. Robertson, C. A. Francis, and N. R. Pace
Microbial Community Biofabrics in a Geothermal Mine Adit
Appl. Envir. Microbiol., October 1, 2007; 73(19): 6172 - 6180.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
K. M. Haen, B. F. Lang, S. A. Pomponi, and D. V. Lavrov
Glass Sponges and Bilaterian Animals Share Derived Mitochondrial Genomic Features: A Common Ancestry or Parallel Evolution?
Mol. Biol. Evol., July 1, 2007; 24(7): 1518 - 1527.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
M. P. Heinicke, W. E. Duellman, and S. B. Hedges
From the Cover: Major Caribbean and Central American frog faunas originated by ancient oceanic dispersal
PNAS, June 12, 2007; 104(24): 10092 - 10097.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. Gottschling, A. Stamatakis, I. Nindl, E. Stockfleth, A. Alonso, and I. G. Bravo
Multiple Evolutionary Mechanisms Drive Papillomavirus Diversification
Mol. Biol. Evol., May 1, 2007; 24(5): 1242 - 1258.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary data
Right arrowOA All Versions of this Article:
22/21/2688    most recent
btl446v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (63)
Google Scholar
Right arrow Articles by Stamatakis, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Stamatakis, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?