Bioinformatics Advance Access published online on January 9, 2009
Bioinformatics, doi:10.1093/bioinformatics/btp006
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Multiperm: shuffling multiple sequence alignments while approximately preserving dinucleotide frequencies
1Departments of Computer Science and Engineering and 2Genome Sciences, University of Washington, Seattle WA 98195-2350, USA; 3Department of Natural Sciences, Faculty of Life Sciences, University of Copenhagen, 1870 Frederiksberg C, Denmark
*To whom correspondence should be addressed. Dr. Parvez Anandam, E-mail: anandam{at}u.washington.edu,ruzzo{at}cs.washington.edu
| Abstract |
|---|
Summary: Assessing the statistical significance of structured RNA predicted from multiple sequence alignments relies on the existence of a good null model. We present here a random shuffling algorithm, Multiperm, that preserves not only the gap and local conservation structure in alignments of arbitrarily many sequences, but also the approximate dinucleotide frequencies. No shuffling algorithm that simultaneously preserves these three characteristics of a multiple (beyond pairwise) alignment has been available to date. As one benchmark, we show that it produces shuffled exonic sequences having folding free energy closer to native sequences than shuffled alignments that do not preserve dinucleotide frequencies.
Availability: The Multiperm GNU C++ source code is available at http://www.anandam.name/multiperm
Contact: anandam{at}u.washington.edu; ruzzo{at}cs.washington.edu
Supplementary information: One supplemental figure illustrating the algorithm is available online.
Associate Editor: Prof. Dmitrij Frishman
Received on November 3, 2008; revised on December 2, 2008; accepted on December 30, 2008