Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (11)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Tammi, M. T.
Right arrow Articles by Andersson, B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Tammi, M. T.
Right arrow Articles by Andersson, B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics Vol. 18 no. 3 2002
Pages 379-388
© 2002 Oxford University Press

Separation of nearly identical repeats in shotgun assemblies using defined nucleotide positions, DNPs

Martti T. Tammi 1, Erik Arner 1, Tom Britton 2 and Björn Andersson 1,*

1 Department of Genetics and Pathology, Rudbeck Laboratory
2 Department of Mathematics, Uppsala University, Uppsala, Sweden

Received on June 22, 2001 ; revised on October 11, 2001 ; accepted on October 18, 2001

An increasingly important problem in genome sequencing is the failure of the commonly used shotgun assembly programs to correctly assemble repetitive sequences. The assembly of non-repetitive regions or regions containing repeats considerably shorter than the average read length is in practice easy to solve, while longer repeats have been a difficult problem. We here present a statistical method to separate arbitrarily long, almost identical repeats, which makes it possible to correctly assemble complex repetitive sequence regions. The differencesbetween repeat units may be as low as 1% and the sequencing error may be up to ten times higher. The method is based on the realization that a comparison of only a part of all overlapping sequences at a time in a data set does not generate enough information for a conclusive analysis. Our method uses optimal multi-alignments consisting of all the overlaps of each read. This makes it possible to determine defined nucleotide positions, DNPs, which constitute the differences between the repeat units. Differences between repeats are distinguished from sequencing errors using statistical methods, where the probabilities of obtaining certain combinations of candidate DNPs are calculated using the information from the multi-alignments. The use of DNPs and combinations of DNPs will allow for optimal and rapid assemblies of repeated regions. This method can solve repeats that differ in only two positions in a read length, which is the theoretical limit for repeat separation. We predict that this method will be highly useful in shotgun sequencing in the future.

Contact: bjorn.andersson{at}genpat.uu.se

* To whom correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
S. J. Lindsay, J. K. Bonfield, and M. E. Hurles
Shotgun haplotyping: a novel method for surveying allelic sequence variation
Nucleic Acids Res., October 12, 2005; 33(18): e152 - e152.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
B. Chevreux, T. Pfisterer, B. Drescher, A. J. Driesel, W. E.G. Muller, T. Wetter, and S. Suhai
Using the miraEST Assembler for Reliable and Automated mRNA Transcript Assembly and SNP Detection in Sequenced ESTs
Genome Res., June 1, 2004; 14(6): 1147 - 1159.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. T. Tammi, E. Arner, E. Kindlund, and B. Andersson
Correcting errors in shotgun sequences
Nucleic Acids Res., August 1, 2003; 31(15): 4663 - 4672.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.