Bioinformatics Advance Access originally published online on October 12, 2004
Bioinformatics 2005 21(6):817-820; doi:10.1093/bioinformatics/bti060
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ParIS Genome Rearrangement server
1Theoretical Biology and Ecology Modelling Group, Hungarian Academy of Sciences and Eötvös Loránd University Pázmány Péter Sétány 1/c. H-1117 Budapest, Hungary
2Collegium Budapest, Institute for Advanced Study Szentháromság u. 2. H-1014 Budapest, Hungary
3Genome Analysis and Bioinformatics Group, Oxford Centre for Gene Function, Department of Statistics, Oxford University 1 South Parks Road, Oxford OX1 3TG, UK
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Summary: ParIS Genome Rearrangement is a web server for a Bayesian analysis of unichromosomal genome pairs. The underlying model allows inversions, transpositions and inverted transpositions. The server generates a Markov chain using a Partial Importance Sampler technique, and samples trajectories of mutations from this chain. The user can specify several marginalizations to the posterior: the posterior distribution of number of mutations needed to transform one genome into another, length distribution of mutations, number of mutations that have occurred at a given site. Both text and graphical outputs are available. We provide a limited server, a downloadable unlimited server that can be installed locally on any linux/Unix operating system, and a database of mitochondrial gene orders.
Availability: http://www.colbud.hu:8765/paris.html and http://doob.stats.ox.ac.uk/~miklos/paris.html
Contact: miklosi{at}ramet.elte.hu
Genome rearrangement events consist of not only inversions and (for multichromosomal genomes) translocations but also transpositions and inverted transpositions. Unfortunately, parsimony algorithms which find the minimum number of mutations needed to transform one genome into another are available only for inversions and translocations (Hannenhalli, 1996; Hannenhalli and Pevzner, 1999; Tesler, 2002). Algorithms considering all type of mutations are only approximations (Blanchette et al., 1996; Gu et al., 1999; Eriksen, 2001). We started a Bayesian approach to address the problem of rearranging genomes by inversions, transpositions and inverted transpositions (Miklós, 2003) which combines the methods of Blanchette et al. (1996) and Larget et al. (2002). The former method considers all types of mutations; however, it is a greedy approximation, and therefore, hard to statistically describe the distribution it gives. The latter method is a Bayesian approach yielding a statistically well-defined posterior distribution, however, it considers only inversions. Our Bayesian approach is based on an evolutionary model allowing all type of mutations. We defined a Markov chain converging to the posterior distribution of trajectories of mutations transforming a genome into another, and empirically showed that the convergence was fast and the posterior distribution of a number of mutations reasonably predicted the true number of mutations that recurred.
Several changes have been made since the first version of our MCMC sampler, to make it faster and easier to use. The most remarkable change in the new version is that we change only a part of the actual trajectory in the Markov chain. Although it provides slower mixing per accepted step in the Markov chain, the acceptance ratio is greater in this sampler, and thus, the overall performance becomes better. Minor algorithmic changes were also implemented, and as a result a sampling step is performed about 1015 times faster than in the first version.
The method has been integrated into a web server. We provide an on-line server, for which the size of the genomes and the length of the Markov chain is limited, and the users are not allowed to submit a job until their previous job is completed. We also offer a downloadable version, which can be installed locally to the users computer without the above-mentioned limitations and a database of gene orders found in the mitochondrial genomes.
The user does not have to transform genomes into signed permutations, the server automatically selects the intersection of common genes in the two genomes and generates the corresponding signed permutation. The server provides the following statistics in the text as .pdf and .ps files:
- The joint posterior distribution of number of inversions and transpositions happened (Fig. 1).
- The posterior distribution of length of inversions and transpositions (Fig. 2).
- The posterior distribution of number of events happened at a site. For example, a mutation causing a breakpoint between genes +g1 and +g2 counts as a mutation after g1 and a mutation before g2 (Fig. 3).
- The log-likelihood trace (Fig. 4).
- The trajectories sampled by the Markov chain (a sample trajectory in Table 1).
|
|
|
|
|
The MCMC sampler and postscript generating algorithms are written in C. The web interface is written in php 4.3. The php code generates a Perl script, which runs in background and invokes the compiled C codes. In the downloadable unlimited version, we have provided details on how to use the compiled C codes in command line, which helps the user to write short scripts driving long runs of investigations. The command line version works without having to install a web server.
| Acknowledgments |
|---|
We thank the Collegium Budapest for providing a web server for this project. Special thanks to Susan Hutchinson and Irmtraud M. Meyer for their help and to the anonymous referees for their useful comments. This work was funded by EPSRC grant HAMJW and MRC grant HAMKA. I.M. was also supported by a Békésy György postdoctoral fellowship.
Received on May 25, 2004; revised on June 29, 2004; accepted on July 7, 2004
| REFERENCES |
|---|
|
|
|---|
Blanchette, M., Kunisawa, T., Sankoff, D. (1996) Parametric genome rearrangement. Gene, 172, GC11GC17[CrossRef][Medline].
Eriksen, N. (2001) (1+
)-Approximation of sorting by reversals and transpositions. Proceedings of WABI2001 BRICS, , Denmark LNCS 2149 University of Aarhus, pp. 227237.
Gu, Q.-P., Peng, S., Sudborough, I.H. (1999) A 2-approximation algorithm for genome rearrangements by reversals and transpositions. Theoret. Comput. Sci., 210, 327339[CrossRef].
Hannenhalli, S. (1996) Polynomial algorithm for computing translocation distance between genomes. Proceedings of CPM1996, , CA , pp. 168185.
Hannenhalli, S. and Pevzner, P.A. (1999) Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals. J. ACM, 46, 127[CrossRef].
Larget, B., Simon, D.L., Kadane, J.B. (2002) Bayesian phylogenetic inference from animal mitochondrial genome arrangements. J. R. Stat. Soc. B., 64, 681695[CrossRef].
Miklós, I. (2003) MCMC genome rearrangement. Bioinformatics, 19, ii130ii137[Abstract].
Tessler, G. (2002) GRIMM: genome rearrangements web server. Bioinformatics, 18, 492493
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



