Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (9)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Malde, K.
Right arrow Articles by Jonassen, I.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Malde, K.
Right arrow Articles by Jonassen, I.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics Vol. 19 no. 10 2003
Pages 1221-1226
© 2003 Oxford University Press

Fast sequence clustering using a suffix array algorithm

Ketil Malde *, Eivind Coward and Inge Jonassen

Department of Informatics, University of Bergen, HIB, N5020 Norway

Received on September 19, 2002 ; revised on November 15, 2002 and January 16, 2003 ; accepted on January 21, 2003

Motivation: Efficient clustering is important for handling the large amount of available EST sequences. Most contemporary methods are based on some kind of all-against-all comparison, resulting in a quadratic time complexity. A different approach is needed to keep up with the rapid growth of EST data.

Results: A new, fast EST clustering algorithm is presented. Sub-quadratic time complexity is achieved by using an algorithm based on suffix arrays. A prototype implementation has been developed and run on a benchmark data set. The produced clusterings are validated by comparing them to clusterings produced by other methods, and the results are quite promising.

Availability: The source code for the prototype implementation is available under a GPL license from http://www.ii.uib.no/~ketil/bio/

Contact: ketil{at}ii.uib.no

* To whom correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
S. Hazelhurst, W. Hide, Z. Liptak, R. Nogueira, and R. Starfield
An overview of the wcd EST clustering tool
Bioinformatics, July 1, 2008; 24(13): 1542 - 1546.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. Malde, K. Schneeberger, E. Coward, and I. Jonassen
RBR: library-less repeat detection for ESTs
Bioinformatics, September 15, 2006; 22(18): 2232 - 2236.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. Malde, E. Coward, and I. Jonassen
A graph based algorithm for generating EST consensus sequences
Bioinformatics, April 15, 2005; 21(8): 1371 - 1375.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
K. Schneeberger, K. Malde, E. Coward, and I. Jonassen
Masking repeats while clustering ESTs
Nucleic Acids Res., April 14, 2005; 33(7): 2176 - 2180.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.