Skip Navigation


Bioinformatics Advance Access originally published online on September 5, 2007
Bioinformatics 2007 23(23):3263-3264; doi:10.1093/bioinformatics/btm432
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/23/3263    most recent
btm432v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Holmes, I.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Holmes, I.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Phylocomposer and phylodirector: analysis and visualization of transducer indel models

Ian Holmes

Department of Bioengineering, University of California, Berkeley CA, USA

To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 ACKNOWLEDGEMENTS
 REFERENCES
 

Summary: Finite-state string transducers are probabilistic tools similar to Hidden Markov Models that can be systematically extended to large number of sequences related by indel and substitution processes on phylogenetic trees. The number of states in such models grows exponentially with the number of nodes in the tree, with the consequence that even quite small trees can be difficult to analyze or visualize. Here, we present two tools, phylocomposer and phylodirector, for working with string transducers. The former tool implements previously described composition algorithms for extending transducers to arbitrary tree topologies, while the latter generates short animations for arbitrary input alignments and phylogenetic trees, illustrating the state path through the composed transducer.

Availability: Phylocomposer and phylodirector are freely available at http://biowiki.org/PhyloComposer and http://biowiki.org/PhyloDirector

Contact: ihh{at}berkeley.edu

String transducers are finite-state automata with input and output tapes. They are closely related to Hidden Markov Models (HMMs) and indeed may be regarded as conditionally-normalized Pair HMMs. Given a string transducer and a phylogenetic guide tree, one can systematically derive a multi-sequence HMM that correctly handles nested indels and neighbor-dependent substitutions (Holmes, 2003).

The algorithm to do this, known as transducer composition, has considerable application throughout bioinformatics. These applications include ‘statistical alignment’ (Holmes and Bruno, 2001), comparative genome annotation (Lunter et al., 2006), phylogeny (Lunter et al., 2005) and measurement of evolutionary rates (Holmes, 2005).

Figure 1 and Table 1 illustrate why the composition problem is nontrivial, prompting automation: the number of states in the composed machine increases exponentially with number of nodes in the tree. Even though the single-branch transducer (A) corresponds to the simplest possible indel model (Thorne et al., 1991), the largest composed transducer (E) has 620 transitions.


Figure 1
View larger version (47K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Composed phylo-transducers for the TKF91 model (Thorne et al., 1991), generated using phylocomposer and GraphViz. The basic transducer (A), which is self-similar when composed in series (B), can be used to infer common ancestors (C) or sample over alignments (D) or phylogenies (E).

 

View this table:
[in this window]
[in a new window]

 
Table 1. State spaces of composed TKF91 transducers

 
These data were generated with phylocomposer. The program takes as input a file describing a phylogenetic tree, one or more transducers (represented as Moore machines) and a mapping from branches to transducers. The file format is based on Lisp S-expressions (Rivest, 1997). The output is a description of the composed phylo-transducer, including the algebraic structure of transition and emission probabilities. In addition, the program generates dotfiles suitable for input to GraphViz, which was used to produce Figure 1. It can also be used for dynamic programming alignment and inference. The program is written in C++ and requires the GNU gcc and make utilities.

The phylocomposer distribution includes the examples of Figure 1 along with more realistic transducers, incorporating geometric and mixture-of-geometric length distributions (i.e. affine and general convex gap penalties).

Applications of the HMMs generated by the program include multiple alignment [including progressive (Higgins and Sharp, 1989), multidimensional (Sankoff and Cedergren, 1983) and MCMC (Holmes and Bruno, 2001)]; simultaneous phylogeny and alignment (Lunter et al., 2005); inference of ancestral ‘protosequences’; measurement of evolutionary rates and analysis of indel models.

Figure 2 illustrates the output of phylodirector. The program takes as input a Stockholm-format alignment, including a Newick-format phylogenetic tree. The output is an MPEG-format animation illustrating the state path through the composed phylo-transducer required to generate the tree, along with a still image for each column of the alignment. The program requires Perl, the GD graphics library (and GD.pm Perl module) and the Berkeley MPEG Encoder.


Figure 2
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Three frames from a phylodirector movie, showing alignments of 12 Drosophila species to Flybase gene CG31973.

 
Animations produced by phylodirector are available at biowiki.org/PhyloFilm.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 ACKNOWLEDGEMENTS
 REFERENCES
 
IH was funded in part by NIH/NHGRI grant 1R01GM076705-01.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Martin Bishop

Received on April 9, 2007; revised on August 1, 2007; accepted on August 17, 2007

    REFERENCES
 TOP
 ABSTRACT
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Higgins DG, Sharp PM. Fast and sensitive multiple sequence alignments on a microcomputer. Comput. Appl. Biosci (1989) 5:151–153.[Abstract/Free Full Text]

    Holmes I. Using guide trees to construct multiple-sequence evolutionary HMMs. Bioinformatics (2003) 19(Suppl. 1):i147–i157.[Abstract]

    Holmes I. Using evolutionary expectation maximization to estimate indel rates. Bioinformatics (2005) 21:2294–2300.[Abstract/Free Full Text]

    Holmes I, Bruno WJ. Evolutionary HMMs: a Bayesian approach to multiple alignment. Bioinformatics (2001) 17:803–820.[Abstract/Free Full Text]

    Lunter G, et al. Bayesian coestimation of phylogeny and sequence alignment. BMC Bioinformatics (2005) 6:83.[CrossRef][Medline]

    Lunter G. Genome-wide identification of human functional DNA using a neutral indel model. PLoS Comput. Biol (2006) 2.

    Rivest R. S-expressions. Internet Working Draft. (1997) http://theory.lcs.mit.edu/rivest/sexp.txt.

    Sankoff D, Cedergren RJ. Simultaneous comparison of three or more sequences related by a tree. In: Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison—Sankoff D, Kruskal JB, eds. (1983) chapter 9, Addison-Wesley, Reading, MA, pp. 253–264.

    Thorne JL, et al. An evolutionary model for maximum likelihood alignment of DNA sequences. J. Mol. Evol (1991) 33:114–124.[CrossRef][Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
R. Satija, L. Pachter, and J. Hein
Combining statistical alignment and phylogenetic footprinting to detect regulatory elements
Bioinformatics, May 15, 2008; 24(10): 1236 - 1242.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/23/3263    most recent
btm432v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Holmes, I.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Holmes, I.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?