Skip Navigation


Bioinformatics Advance Access originally published online on January 28, 2008
Bioinformatics 2008 24(6):791-797; doi:10.1093/bioinformatics/btn032
This Article
Right arrow Full Text
Right arrow Full Text (Print PDF)
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/6/791    most recent
btn032v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Lam, T. W.
Right arrow Articles by Yiu, S. M.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lam, T. W.
Right arrow Articles by Yiu, S. M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Compressed indexing and local alignment of DNA

T. W. Lam 1,*, W. K. Sung 2, S. L. Tam 1, C. K. Wong 1 and S. M. Yiu 1

1Department of Computer Science, University of Hong Kong, Hong Kong, China and 2Department of Computer Science, National University of Singapore, Singapore

*To whom correspondence should be addressed.


   Abstract

Motivation: Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human genome in the main memory. For example, a BWT index for the human genome (with about 3 billion characters) occupies just around 1 G bytes. However, these indexes are designed for exact pattern matching, which is too stringent for biological applications. The demand is often on finding local alignments (pairs of similar substrings with gaps allowed). Without indexing, one can use dynamic programming to find all the local alignments between a text T and a pattern P in O(|T||P|) time, but this would be too slow when the text is of genome scale (e.g. aligning a gene with the human genome would take tens to hundreds of hours). In practice, biologists use heuristic-based software such as BLAST, which is very efficient but does not guarantee to find all local alignments.

Results: In this article, we show how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments. Experiments reveal that BWT-SW is very efficient (e.g. aligning a pattern of length 3 000 with the human genome takes less than a minute). We have also analyzed BWT-SW mathematically for a simpler similarity model (with gaps disallowed), and we show that the expected running time is O(|T|0.628|P|) for random strings. As far as we know, BWT-SW is the first practical tool that can find all local alignments. Yet BWT-SW is not meant to be a replacement of BLAST, as BLAST is still several times faster than BWT-SW for long patterns and BLAST is indeed accurate enough in most cases (we have used BWT-SW to check against the accuracy of BLAST and found that only rarely BLAST would miss some significant alignments).

Availability: www.cs.hku.hk/~ckwong3/bwtsw

Contact: twlam{at}cs.hku.hk

Associate Editor: Thomas Lengauer


Received on August 29, 2007; revised on December 8, 2007; accepted on January 22, 2008

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.