Bioinformatics Advance Access originally published online on October 6, 2007
Bioinformatics 2007 23(24):3289-3296; doi:10.1093/bioinformatics/btm485
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Biclustering as a method for RNA local multiple sequence alignment
1Department of Electrical and Computer Engineering, 2School of Biological Sciences, Section of Integrative Biology and 3Department of Computer Science, University of Texas At Austin, Austin, TX 78712, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivations: Biclustering is a clustering method that simultaneously clusters both the domain and range of a relation. A challenge in multiple sequence alignment (MSA) is that the alignment of sequences is often intended to reveal groups of conserved functional subsequences. Simultaneously, the grouping of the sequences can impact the alignment; precisely the kind of dual situation biclustering is intended to address.
Results: We define a representation of the MSA problem enabling the application of biclustering algorithms. We develop a computer program for local MSA, BlockMSA, that combines biclustering with divide-and-conquer. BlockMSA simultaneously finds groups of similar sequences and locally aligns subsequences within them. Further alignment is accomplished by dividing both the set of sequences and their contents. The net result is both a multiple sequence alignment and a hierarchical clustering of the sequences.
BlockMSA was tested on the subsets of the BRAliBase 2.1 benchmark suite that display high variability and on an extension to that suite to larger problem sizes. Also, alignments were evaluated of two large datasets of current biological interest, T box sequences and Group IC1 Introns. The results were compared with alignments computed by ClustalW, MAFFT, MUCLE and PROBCONS alignment programs using Sum of Pairs (SPS) and Consensus Count.
Results for the benchmark suite are sensitive to problem size. On problems of 15 or greater sequences, BlockMSA is consistently the best. On none of the problems in the test suite are there appreciable differences in scores among BlockMSA, MAFFT and PROBCONS. On the T box sequences, BlockMSA does the most faithful job of reproducing known annotations. MAFFT and PROBCONS do not. On the Intron sequences, BlockMSA, MAFFT and MUSCLE are comparable at identifying conserved regions.
Availability: BlockMSA is implemented in Java. Source code and supplementary datasets are available at http://aug.csres.utexas.edu/msa/
Contact: shuwang2006{at}gmail.com
Supplementary information: Supplementary data are available at Bioinformatics online.
Associate Editor: Keith Crandall
Received on April 21, 2007; revised on August 20, 2007; accepted on September 14, 2007
This article has been cited by other articles:
![]() |
Y. Tabei and K. Asai A local multiple alignment method for detection of non-coding RNA sequences Bioinformatics, June 15, 2009; 25(12): 1498 - 1505. [Abstract] [Full Text] [PDF] |
||||
