Bioinformatics Advance Access originally published online on February 12, 2004
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics 20(10) © Oxford University Press 2004; all rights reserved.
Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems
Department of Chemistry and Biochemistry, Molecular Biology Institute, Center for Genomics and Proteomics, University of California, Los Angeles, CA 90095-1570, USA
Received on May 25, 2003; revised on January 5, 2004; accepted on January 6, 2004
Advance Access Publication February 12, 2004
Motivation: Partial order alignment (POA) has been proposed as a new approach to multiple sequence alignment (MSA), which can be combined with existing methods such as progressive alignment. This is important for addressing problems both in the original version of POA (such as order sensitivity) and in standard progressive alignment programs (such as information loss in complex alignments, especially surrounding gap regions).
Results: We have developed a new Partial OrderPartial Order alignment algorithm that optimally aligns a pair of MSAs and which therefore can be applied directly to progressive alignment methods such as CLUSTAL. Using this algorithm, we show the combined Progressive POA alignment method yields results comparable with the best available MSA programs (CLUSTALW, DIALIGN2, T-COFFEE) but is far faster. For example, depending on the level of sequence similarity, aligning 1000 sequences, each 500 amino acids long, took 15 min (at 90% average identity) to 44 min (at 30% identity) on a standard PC. For large alignments, Progressive POA was 1030 times faster than the fastest of the three previous methods (CLUSTALW). These data suggest that POA-based methods can scale to much larger alignment problems than possible for previous methods.
Availability: The POA source code is available at http://www.bioinformatics.ucla.edu/poa
Contact: leec{at}mbi.ucla.edu
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
V. Ahola, T. Aittokallio, M. Vihinen, and E. Uusipaikka Model-based prediction of sequence alignment quality Bioinformatics, October 1, 2008; 24(19): 2165 - 2171. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Varsani, D. N. Shepherd, A. L. Monjane, B. E. Owor, J. B. Erdmann, E. P. Rybicki, M. Peterschmitt, R. W. Briddon, P. G. Markham, S. Oluwafemi, et al. Recombination, decreased host specificity and increased mobility may have driven the emergence of maize streak virus as an agricultural pathogen J. Gen. Virol., September 1, 2008; 89(9): 2063 - 2074. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Katoh and H. Toh Recent developments in the MAFFT multiple sequence alignment program Brief Bioinform, July 1, 2008; 9(4): 286 - 298. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. E. Owor, D. P. Martin, D. N. Shepherd, R. Edema, A. L. Monjane, E. P. Rybicki, J. A. Thomson, and A. Varsani Genetic analysis of maize streak virus isolates from Uganda reveals widespread distribution of a recombinant variant J. Gen. Virol., November 1, 2007; 88(11): 3154 - 3165. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Hanekamp, U. Bohnebeck, B. Beszteri, and K. Valentin PhyloGena a user-friendly system for automated phylogenetic annotation of unknown sequences Bioinformatics, April 1, 2007; 23(7): 793 - 801. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. R. Johnston, C. O'Dushlaine, D. A. Fitzpatrick, R. J. Edwards, and D. C. Shields Evaluation of Whether Accelerated Protein Evolution in Chordates Has Occurred before, after, or Simultaneously with Gene Duplication Mol. Biol. Evol., January 1, 2007; 24(1): 315 - 323. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Varsani, E. van der Walt, L. Heath, E. P. Rybicki, A. L. Williamson, and D. P. Martin Evidence of ancient papillomavirus recombination J. Gen. Virol., September 1, 2006; 87(9): 2527 - 2531. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. M. Wallace, O. O'Sullivan, D. G. Higgins, and C. Notredame M-Coffee: combining multiple sequence alignment methods with T-Coffee Nucleic Acids Res., March 23, 2006; 34(6): 1692 - 1699. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Lassmann and E. L. L. Sonnhammer Automatic assessment of alignment quality Nucleic Acids Res., December 16, 2005; 33(22): 7120 - 7128. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. de la Grange, M. Dutertre, N. Martin, and D. Auboeuf FAST DB: a website resource for the study of the expression regulation of human gene products Nucleic Acids Res., July 28, 2005; 33(13): 4276 - 4284. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Ye and A. Godzik Multiple flexible structure alignment using partial order graphs Bioinformatics, May 15, 2005; 21(10): 2362 - 2369. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Katoh, K.-i. Kuma, H. Toh, and T. Miyata MAFFT version 5: improvement in accuracy of multiple sequence alignment Nucleic Acids Res., January 20, 2005; 33(2): 511 - 518. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Roth, M. J. Betts, P. Steffansson, G. Saelensminde, and D. A. Liberles The Adaptive Evolution Database (TAED): a phylogeny based tool for comparative genomics Nucleic Acids Res., January 1, 2005; 33(suppl_1): D495 - D497. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Raphael, D. Zhi, H. Tang, and P. Pevzner A novel method for multiple alignment of sequences with repeated and shuffled elements Genome Res., November 1, 2004; 14(11): 2336 - 2346. [Abstract] [Full Text] [PDF] |
||||





