Bioinformatics Advance Access originally published online on August 21, 2008
Bioinformatics 2008 24(20):2401-2402; doi:10.1093/bioinformatics/btn453
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ProfDistS: (profile-) distance based phylogeny on sequence—structure alignments
1Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, 97074 Würzburg, Germany
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: The Profile Neighbor Joining (PNJ) algorithm as implemented in the software ProfDist is computationally efficient in reconstructing very large trees. Besides the huge amount of sequence data the structure is important in RNA alignment analysis and phylogenetic reconstruction.
Results: For this ProfDistS provides a phylogenetic workflow that uses individual RNA secondary structures in reconstructing phylogenies based on sequence-structure alignments—using PNJ with manual or iterative and automatic profile definition. Moreover, ProfDistS can deal also with protein sequences.
Availability: ProfDistS is freely available for non-commercial use for Windows, Linux and MAC operating systems at http://profdist.bioapps.biozentrum.uni-wuerzburg.de.
Contact: tobias.mueller{at}biozentrum.uni-wuerzburg.de; matthias.wolf{at}biozentrum.uni-wuerzburg.de
| 1 THE WORKFLOW OF PROFDISTS |
|---|
|
|
|---|
An important task in sequence analysis is the reconstruction of phylogenetic trees from molecular sequences. Distance based approaches like NJ (Saitou and Nei, 1987) and BIONJ (Gascuel, 1997) or profile distance based approaches like Profile Neighbor Joining (PNJ) (Müller et al., 2004) are known to be computationally efficient allowing to reconstruct very large trees. Nowadays, beside the huge amount of considered sequence data the structural level becomes a challenging issue in RNA sequence-structure alignment analysis and phylogenetic reconstruction (Bauer et al., 2007; Hudelot et al., 2003; Seibel et al., 2006; Siebert and Backofen, 2005; Smith et al., 2004). In contrast to the previous ProfDist version (Friedrich et al., 2005) ProfDistS is now able to deal with RNA sequence-structure alignments. We think that for the first time we here introduce a phylogenetic tool that uses individual RNA secondary structures in reconstructing phylogenies. Based on those individual secondary structures, available e.g. from the ITS2 database (Selig et al., 2008), alignments are routinely reconstructed by tools such as 4SALE (Seibel et al., 2006), MARNA (Siebert and Backofen, 2005), LARA (Bauer et al., 2007) or RNAforester (Hochsmann et al., 2004).
To get a joint substitution model of sequence and structure evolution we convert the considered character alphabet as described in Seibel et al. (2006) by mapping the sequence and secondary structure information of every single RNA sequence to artificial character sequences. The tree reconstructing algorithm works on a 12 letter alphabet comprised of the four nucleotides in three structural states (unpaired, paired left, paired right, e.g. A., A(, A), U., etc.). Such a substitution model naturally combines a general time reversible (GTR) model on the sequence level with a substitution model on morphological features of the structures.
Simple correction formulas like the famous Jukes and Cantor formula (Jukes and Cantor, 1969) are extended to work on sequence-structure alignments. Based on the GTR RNA sequence-structure specific substitution model (Seibel et al., 2006) evolutionary distances between sequence-structure pairs are estimated by maximum likelihood and are also extended on the profile level. Other substitution models could be included in this phylogenetic framework (Schöniger and von Haeseler, 1994; Smith et al., 2004). Phylogenies can now smoothly be achieved on the RNA sequence-structure level with the help of the pipeline consisting of the ITS2 database (Selig et al., 2008), the sequence structure alignment editor 4SALE (Seibel et al., 2006) and the phylogentic reconstruction tool ProfDistS.
The power of ProfDist (on the sequence profile level) has already been demonstrated for reconstructed phylogenies of 1269 metazoan 18S rRNA gene sequences (Gerlach et al., 2007) and of about 100 algae 18S rRNA gene sequences (Müller et al., 2004; Vanormelingen et al., 2007). Additionally, the semi automatic profile definition was demonstrated by Gerlach et al. (2007). The fully iterative and automatic profile definition is still provided in ProfDistS. Furthermore, semi-automatic taxonomic lineage information (Wheeler et al., 2000) for an a priori profile definition is provided for different formats (FASTA & EMBL). If the user is not aware of the taxonomic classification in the data, lineage information for all sequences deposited at NCBI is available via the ProfDistS web page. In Figure 1, all features of ProfDistS are sketched.
|
In the new version ProfDistS also protein sequences can now be processed. Evolutionary distances as LogDet or maximum-likelihood distances, based on a suitable GTR substitution model like the VTML model (Müller and Vingron, 2000), are estimated from the sequence data. For all sequence data (DNA, RNA, RNA sequence-structure and protein) alternative models can easily be loaded via the graphical user interface, which has been fully redesigned based on the QT framework.
A number of tools allow phylogenetic tree reconstruction. We have shown earlier the improved performance of PNJ compared to Neighbor-Joining (Müller et al., 2004). We think that ProfDistS is to date the first software that includes both sequence and individual structure information for phylogenetic tree reconstruction. In contrast, tools like Phase (Jow et al., 2002) or rRNA phylogeny (Smith et al., 2004) are based only on the consensus structure information. Grajales et al. (2007) calculated morphometric distances based on individual structures, however, in their approach they model sequence and structure independently. In ProfDistS both information types—sequences and their individual structures—are used to construct profile based trees allowing a time efficient reconstruction of trees for hundreds or thousands of sequences.
| ACKNOWLEDGEMENT |
|---|
|
|
|---|
We would like to thank Alexander Keller and Philipp Seibel for the help with the MAC version of ProfDistS.
Funding: The German Research foundation (DFG) (Mu 2831/1-1).
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Martin Bishop
Received on June 27, 2008; revised on August 19, 2008; accepted on August 19, 2008
| REFERENCES |
|---|
|
|
|---|
Bauer M, et al. Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization. BMC Bioinformatics (2007) 8:271.[CrossRef][Medline]
Friedrich J, et al. ProfDist: a tool for the construction of large phylogenetic trees based on profile distances. Bionformatics (2005) 21:2108–2109.[CrossRef]
Gascuel O. BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol. Biol. Evol. (1997) 14:685–695.[Abstract]
Gerlach D, et al. Deep metazoan phylogeny. In Silico Biol. (2007) 7:151–154.[Medline]
Grajales A, et al. Phylogenetic reconstruction using secondary structures of internal transcribed spacer 2 (its2, rDNA): finding the molecular and morphological gap in caribbean gorgonian corals. BMC Evol. Biol. (2007) 7:90.[CrossRef][Medline]
Hochsmann M, et al. Pure multiple RNA secondary structure alignments: a progressive profile approach. IEEE/ACM Trans. Comput. Biol. Bioinform. (2004) 1:53–62.[CrossRef]
Hudelot C, et al. RNA-based phylogenetic methods: application to mammalian mitochondrial RNA sequences. Mol. Phylogenet. Evol. (2003) 28:241–252.[CrossRef][Web of Science][Medline]
Jow H, et al. Bayesian phylogenetics using an RNA substitution model applied to early mammalian evolution. Mol. Biol. Evol. (2002) 19:1591–1601.
Jukes T, Cantor C. Evolution of protein molecules. In: Mammalian Protein Metabolism—Munro H, ed. (1969) New York, USA: Academic Press. 21–132.
Müller T, et al. Accurate and robust phylogeny estimation based on profile distances: a study of the Chlorophyceae (Chlorophyta). BMC Evol. Biol. (2004) 4:20.[CrossRef][Medline]
Müller T, Vingron M. Modeling amino acid replacement. J. Comput. Biol. (2000) 7:761–776.[CrossRef][Web of Science][Medline]
Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. (1987) 4:406–425.[Abstract]
Schöniger M, von Haeseler A. A stochastic model for the evolution of autocorrelated DNA sequences. Mol. Phylogenet. Evol. (1994) 3:240–247.[CrossRef][Medline]
Seibel PN, et al. 4SALE–a tool for synchronous RNA sequence and secondary structure alignment and editing. BMC Bioinformatics (2006) 7:498.[CrossRef][Medline]
Selig C, et al. The ITS2 Database II: homology modelling RNA structure for molecular systematics. Nucleic Acids Res. (2008) 36:D377–D380.
Siebert S, Backofen R. MARNA: multiple alignment and consensus structure prediction of RNAs based on sequence structure comparisons. Bioinformatics (2005) 21:3352–3359.
Smith AD, et al. Empirical models for substitution in ribosomal RNA. Mol. Biol. Evol. (2004) 21:419–427.
Trolltech (2008) http://trolltech.com/products/qt/.
Vanormelingen P, et al. The systematics of a small spineless Desmodesmus species, D-costato-granulatus (Sphaeropleales, Chlorophyceae), based on ITS2 rDNA sequence analyses and cell wall morphology. Journal of Phyclology (2007) 43:378–396.[CrossRef]
Wheeler DL, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. (2000) 28:10–14.
This article has been cited by other articles:
![]() |
C. Koetschan, F. Forster, A. Keller, T. Schleicher, B. Ruderisch, R. Schwarz, T. Muller, M. Wolf, and J. Schultz The ITS2 Database III--sequences and structures for phylogeny Nucleic Acids Res., November 17, 2009; (2009) gkp966v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Schwarz, P. N. Seibel, S. Rahmann, C. Schoen, M. Huenerberg, C. Muller-Reible, T. Dandekar, R. Karchin, J. Schultz, and T. Muller Detecting species-site dependencies in large multiple sequence alignments Nucleic Acids Res., October 1, 2009; 37(18): 5959 - 5968. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

