Bioinformatics Vol. 18 no. 3 2002
Pages 452-464
© 2002 Oxford University Press
Multiple sequence alignment using partial order graphs
1 Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, CA 90095-1570, USA
Received on August 3, 2001
; revised on September 26, 2001
; accepted on October 9, 2001
Motivation: Progressive Multiple Sequence Alignment (MSA) methods depend on reducing an MSA to a linear profile for each alignment step. However, this leads to loss of information needed for accurate alignment, and gap scoring artifacts.
Results: We present a graph representation of an MSA that can itself be aligned directly by pairwise dynamic programming, eliminating the need to reduce the MSA to a profile. This enables our algorithm (Partial Order Alignment (POA)) to guarantee that the optimal alignment of each new sequence versus each sequence in the MSA will be considered. Moreover, this algorithm introduces a new edit operator, homologous recombination, important for multidomain sequences. The algorithm has improved speed (linear time complexity) over existing MSA algorithms, enabling construction of massive and complex alignments (e.g. an alignment of 5000 sequences in 4 h on a Pentium II). We demonstrate the utility of this algorithm on a family of multidomain SH2 proteins, and on EST assemblies containing alternative splicing and polymorphism.
Availability: The partial order alignment program POA is available at http://www.bioinformatics.ucla.edu/poa.
Contact: leec{at}mbi.ucla.edu
* To whom correspondence should be addressed.
2 Current address: center for Applied Mathematics, 657 Rhodes Hall, Cornell University, Ithaca NY 14853, USA.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
D. W. Mount Using Progressive Methods for Global Multiple Sequence Alignment CSH Protocols, July 1, 2009; 2009(7): pdb.top43 - pdb.top43. [Abstract] [Full Text] |
||||
![]() |
D. W. Mount Using Iterative Methods for Global Multiple Sequence Alignment CSH Protocols, July 1, 2009; 2009(7): pdb.top44 - pdb.top44. [Abstract] [Full Text] |
||||
![]() |
P. Lefeuvre, J.-M. Lett, A. Varsani, and D. P. Martin Widely Conserved Recombination Patterns among Single-Stranded DNA Viruses J. Virol., March 15, 2009; 83(6): 2697 - 2707. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Hu and J. L. Blanchard Environmental Sequence Data from the Sargasso Sea Reveal That the Characteristics of Genome Reduction in Prochlorococcus Are Not a Harbinger for an Escalation in Genetic Drift Mol. Biol. Evol., January 1, 2009; 26(1): 5 - 13. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Wilm, D. G. Higgins, and C. Notredame R-Coffee: a method for multiple alignment of non-coding RNA Nucleic Acids Res., May 1, 2008; 36(9): e52 - e52. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Pruesse, C. Quast, K. Knittel, B. M. Fuchs, W. Ludwig, J. Peplies, and F. O. Glockner SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB Nucleic Acids Res., December 18, 2007; 35(21): 7188 - 7196. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Lefeuvre, D. P. Martin, M. Hoareau, F. Naze, H. Delatte, M. Thierry, A. Varsani, N. Becker, B. Reynaud, and J.-M. Lett Begomovirus 'melting pot' in the south-west Indian Ocean islands: molecular diversity and evolution through recombination J. Gen. Virol., December 1, 2007; 88(12): 3458 - 3468. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. A. Rodriguez, T. Bompada, M. Syed, P. K. Shah, and N. Maltsev Evolutionary analysis of enzymes using Chisel Bioinformatics, November 15, 2007; 23(22): 2961 - 2968. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-Y. Hu and H. Saedler Evolution of the Inflated Calyx Syndrome in Solanaceae Mol. Biol. Evol., November 1, 2007; 24(11): 2443 - 2453. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Abbot, A. E. Aviles, L. Eller, and L. A. Durden Mixed Infections, Cryptic Diversity, and Vector-Borne Pathogens: Evidence from Polygenis Fleas and Bartonella Species Appl. Envir. Microbiol., October 1, 2007; 73(19): 6045 - 6052. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Enault, R. Fremez, E. Baranowski, and T. Faraut Alvira: comparative genomics of viral strains Bioinformatics, August 15, 2007; 23(16): 2178 - 2179. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Moretti, F. Armougom, I. M. Wallace, D. G. Higgins, C. V. Jongeneel, and C. Notredame The M-Coffee web server: a meta-method for computing multiple sequence alignments by combining alternative alignment methods Nucleic Acids Res., July 13, 2007; 35(suppl_2): W645 - W648. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. V. Alekseyenko, N. Kim, and C. J. Lee Global analysis of exon creation versus loss and the role of alternative splicing in 17 vertebrate genomes RNA, May 1, 2007; 13(5): 661 - 670. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. G. Giacomelli, A. S. Hancock, and J. Masel The Conversion of 3' UTRs into Coding Regions Mol. Biol. Evol., February 1, 2007; 24(2): 457 - 464. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. S. Schwartz and L. Pachter Multiple alignment by sequence annealing Bioinformatics, January 15, 2007; 23(2): e24 - e29. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Zhang and T. Kahveci QOMA: quasi-optimal multiple alignment of protein sequences Bioinformatics, January 15, 2007; 23(2): 162 - 168. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Dieterich, M. W. Franz, and M. Vingron Developments in CORG: a gene-centric comparative genomics resource Nucleic Acids Res., January 12, 2007; 35(suppl_1): D32 - D35. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. M. Phuong, C. B. Do, R. C. Edgar, and S. Batzoglou Multiple alignment of protein sequences with repeats and rearrangements Nucleic Acids Res., November 6, 2006; 34(20): 5932 - 5942. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. D. Sutherland, P. M. Campbell, S. Weisman, H. E. Trueman, A. Sriskantha, W. J. Wanjura, and V. S. Haritos A highly divergent gene cluster in honey bees encodes a novel silk family Genome Res., November 1, 2006; 16(11): 1414 - 1421. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Song, J.-H. Choi, G. Chen, J. Szymanski, G.-Q. Zhang, A. K. H. Tung, J. Kang, S. Kim, and J. Yang ARCS: an aggregated related column scoring scheme for aligned sequences Bioinformatics, October 1, 2006; 22(19): 2326 - 2332. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Tabei, K. Tsuda, T. Kin, and K. Asai SCARNA: fast and accurate structural alignment of RNA sequences by matching fixed-length stem fragments Bioinformatics, July 15, 2006; 22(14): 1723 - 1729. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Rampp, T. Soddemann, and H. Lederer The MIGenAS integrated bioinformatics toolkit for web-based sequence analysis. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W15 - W19. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. C. Jones, D. Zhi, and B. J. Raphael AliWABA: alignment on the web through an A-Bruijn approach. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W613 - W616. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Xing, T. Yu, Y. N. Wu, M. Roy, J. Kim, and C. Lee An expectation-maximization algorithm for probabilistic reconstructions of full-length isoforms from splice graphs Nucleic Acids Res., June 6, 2006; 34(10): 3150 - 3160. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Li and A. Godzik VISSA: a program to visualize structural features from structure sequence alignment Bioinformatics, April 1, 2006; 22(7): 887 - 888. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. M. Wallace, O. O'Sullivan, D. G. Higgins, and C. Notredame M-Coffee: combining multiple sequence alignment methods with T-Coffee Nucleic Acids Res., March 23, 2006; 34(6): 1692 - 1699. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Duarte, L. Cui, P. K. Wall, Q. Zhang, X. Zhang, J. Leebens-Mack, H. Ma, N. Altman, and C. W. dePamphilis Expression Pattern Shifts Following Duplication Indicative of Subfunctionalization and Neofunctionalization in Regulatory Genes of Arabidopsis Mol. Biol. Evol., February 1, 2006; 23(2): 469 - 478. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Lassmann and E. L. L. Sonnhammer Automatic assessment of alignment quality Nucleic Acids Res., December 16, 2005; 33(22): 7120 - 7128. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. de la Grange, M. Dutertre, N. Martin, and D. Auboeuf FAST DB: a website resource for the study of the expression regulation of human gene products Nucleic Acids Res., July 28, 2005; 33(13): 4276 - 4284. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Bhushan, A. Stahl, S. Nilsson, B. Lefebvre, M. Seki, C. Roth, D. McWilliam, S. J. Wright, D. A. Liberles, K. Shinozaki, et al. Catalysis, Subcellular Localization, Expression and Evolution of the Targeting Peptides Degrading Protease, AtPreP2 Plant Cell Physiol., June 1, 2005; 46(6): 985 - 996. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. G. Beiko, C. X. Chan, and M. A. Ragan A word-oriented approach to alignment validation Bioinformatics, May 15, 2005; 21(10): 2230 - 2239. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Ye and A. Godzik Multiple flexible structure alignment using partial order graphs Bioinformatics, May 15, 2005; 21(10): 2362 - 2369. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. P. Gardner, A. Wilm, and S. Washietl A benchmark of multiple sequence alignment programs upon structural RNAs Nucleic Acids Res., April 28, 2005; 33(8): 2433 - 2439. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Raphael, D. Zhi, H. Tang, and P. Pevzner A novel method for multiple alignment of sequences with repeated and shuffled elements Genome Res., November 1, 2004; 14(11): 2336 - 2346. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. A. Pevzner, H. Tang, and G. Tesler De Novo Repeat Classification and Fragment Assembly Genome Res., September 1, 2004; 14(9): 1786 - 1796. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. C.E. Darling, B. Mau, F. R. Blattner, and N. T. Perna Mauve: Multiple Alignment of Conserved Genomic Sequence With Rearrangements Genome Res., July 1, 2004; 14(7): 1394 - 1403. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Morgenstern DIALIGN: multiple DNA and protein sequence alignment at BiBiServ Nucleic Acids Res., July 1, 2004; 32(suppl_2): W33 - W36. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Kleinjung, J. Romein, K. Lin, and J. Heringa Contact-based sequence alignment Nucleic Acids Res., April 30, 2004; 32(8): 2464 - 2473. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Chen, A. Perlina, and C. J. Lee Positive Selection Detection in 40,000 Human Immunodeficiency Virus (HIV) Type 1 Sequences Automatically Identifies Drug Resistance and Positive Fitness Mutations in HIV Protease and Reverse Transcriptase J. Virol., April 1, 2004; 78(7): 3722 - 3732. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Blanchette, W. J. Kent, C. Riemer, L. Elnitski, A. F.A. Smit, K. M. Roskin, R. Baertsch, K. Rosenbloom, H. Clawson, E. D. Green, et al. Aligning Multiple Genomic Sequences With the Threaded Blockset Aligner Genome Res., April 1, 2004; 14(4): 708 - 715. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Xing, A. Resch, and C. Lee The Multiassembly Problem: Reconstructing Multiple Transcript Isoforms From EST Fragment Mixtures Genome Res., March 1, 2004; 14(3): 426 - 441. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Resch, Y. Xing, A. Alekseyenko, B. Modrek, and C. Lee Evidence for a subpopulation of conserved alternative splicing events under selection pressure for protein reading frame preservation Nucleic Acids Res., February 24, 2004; 32(4): 1261 - 1269. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. C. Frith, U. Hansen, J. L. Spouge, and Z. Weng Finding functional sequence elements by multiple local alignment Nucleic Acids Res., January 2, 2004; 32(1): 189 - 200. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. W. Mewes, C. Amid, R. Arnold, D. Frishman, U. Guldener, G. Mannhaupt, M. Munsterkotter, P. Pagel, N. Strack, V. Stumpflen, et al. MIPS: analysis and annotation of proteins from whole genomes Nucleic Acids Res., January 1, 2004; 32(90001): D41 - 44. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Mudgil, S.-H. Shiu, S. L. Stone, J. N. Salt, and D. R. Goring A Large Complement of the Predicted Arabidopsis ARM Repeat Proteins Are Members of the U-Box E3 Ubiquitin Ligase Family Plant Physiology, January 1, 2004; 134(1): 59 - 66. [Abstract] [Full Text] [PDF] |
||||










