Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Guan, X.
Right arrow Articles by Uberbacher, E. C.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Guan, X.
Right arrow Articles by Uberbacher, E. C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© Oxford University Press

Alignments of DNA and protein sequences containing frameshift errors

Xiaojun Guan and Edward C. Uberbacher

Computer Science and Mathematics Division, Oak Ridge National Laboratory Oak Ridge, TN 37831-6364, USA E-mail:x3g{at}ornl.gov

Molecular sequences, like all experimental data, are subject to error. Many current DNA sequencing protocols have very sign error rates and often generate artefactual insertions and deletions of bases (indels) which corrupt the translation of sequences and compromise the detection of protein homologies. The impact of these errors on the utility of molecular sequence data is dependent on the analytic technique used to interpret the data. In the presence of frameshift errors, standard algorithms using six-frame translation can miss important homologies because only subfragments of the correct translation are available in any given frame. We present a new algorithm which can detect and correct frameshift errors in DNA sequences during comparison of translated sequences with protein sequences in the databases. This algorithm can recognize homologous proteins sharing 30% identity even in the presence of a 7% frameshift error rate. Our algorithm uses dynamic programming, producing a guaranteed optimal alignment in the presence of frameshifts, and has a sensitivity equivalent to Smith-Waterman. The computational efficiency of the algorithm is O(nm) where n and m are the sizes of two sequences being compared. The algorithm does not rely on prior knowledge or heuristic rules and performs sign better than any previously reported method.


Received on June 30, 1995; accepted on October 16, 1995

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
E. Perrodou, C. Deshayes, J. Muller, C. Schaeffer, A. Van Dorsselaer, R. Ripp, O. Poch, J.-M. Reyrat, and O. Lecompte
ICDS database: interrupted CoDing sequences in prokaryotic genomes
Nucleic Acids Res., January 1, 2006; 34(suppl_1): D338 - D343.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
C. Médigue, M. Rose, A. Viari, and A. Danchin
Detecting and Analyzing DNA Sequencing Errors: Toward a Higher Quality of the Bacillus subtilis Genome Sequence
Genome Res., November 1, 1999; 9(11): 1116 - 1127.
[Abstract] [Full Text]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.