Bioinformatics Advance Access originally published online on December 4, 2008
Bioinformatics 2009 25(3):295-301; doi:10.1093/bioinformatics/btn630
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sequence progressive alignment, a framework for practical large-scale probabilistic consistency alignment
1Department of Engineering, University of California, Santa Cruz CA, USA and 2EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: Multiple sequence alignment is a cornerstone of comparative genomics. Much work has been done to improve methods for this task, particularly for the alignment of small sequences, and especially for amino acid sequences. However, less work has been done in making promising methods that work on the small-scale practically for the alignment of much larger genomic sequences.
Results: We take the method of probabilistic consistency alignment and make it practical for the alignment of large genomic sequences. In so doing we develop a set of new technical methods, combined in a framework we term sequence progressive alignment, because it allows us to iteratively compute an alignment by passing over the input sequences from left to right. The result is that we massively decrease the memory consumption of the program relative to a naive implementation. The general engineering of the challenges faced in scaling such a computationally intensive process offer valuable lessons for planning related large-scale sequence analysis algorithms. We also further show the strong performance of Pecan using an extended analysis of ancient repeat alignments. Pecan is now one of the default alignment programs that has and is being used by a number of whole-genome comparative genomic projects.
Availability: The Pecan program is freely available at http://www.ebi.ac.uk/
bjp/pecan/ Pecan whole genome alignments can be found in the Ensembl genome browser.
Contact: benedict{at}soe.ucsc.edu
supplementary information: Supplementary data are available at Bioinformatics online.
Associate Editor: Alfonso Valencia
Received on September 11, 2008; revised on November 28, 2008; accepted on December 2, 2008
This article has been cited by other articles:
![]() |
K. S. Pollard, M. J. Hubisz, K. R. Rosenbloom, and A. Siepel Detection of nonneutral substitution rates on mammalian phylogenies Genome Res., January 1, 2010; 20(1): 110 - 121. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Flicek, B. L. Aken, B. Ballester, K. Beal, E. Bragin, S. Brent, Y. Chen, P. Clapham, G. Coates, S. Fairley, et al. Ensembl's 10th year Nucleic Acids Res., January 1, 2010; 38(suppl_1): D557 - D562. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Kemena and C. Notredame Upcoming challenges for multiple sequence alignment methods in the high-throughput era Bioinformatics, October 1, 2009; 25(19): 2455 - 2465. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Hamada, K. Sato, H. Kiryu, T. Mituyama, and K. Asai Predictions of RNA secondary structure by combining homologous sequence information Bioinformatics, June 15, 2009; 25(12): i330 - i338. [Abstract] [Full Text] [PDF] |
||||


