A tool for aligning very similar DNA sequences
Department of Computer Science and Information Management, Providence University Shalu, Taichung, Taiwan 43309
1National Center for Biotechnology Information, National Library of Medicine NIH, Bethesda, MD 20894
2Department of Computer Science and Engineering, The Pennsylvania State University, University Park PA 16802, USA
3To whom correspondence should be addressed
Results: We have produced a computer program, named sim3, that solves the following computational problem. Two DNA sequences are given, where the shorter sequence is very similar to some contiguous region of the longer sequence. Sim3 determines such a similar region of the longer sequence, and then computes an optimal set of single-nucleotide changes (i.e. insertions, deletions or substitutions) that will convert the shorter sequence to that region. Thus, the alignment scoring scheme is designed to model sequencing errors, rather than evolutionary processes. The program can align a 100 kb sequence to a 1 megabase sequence in a few seconds on a workstation, provided that there are very few differences between the shorter sequence and some region in the longer sequence. The program has been used to assemble sequence data for the Genomes Division at the National Center for Biotechnology Information.
Availability: A version of sim3 for UNIX machines can be obtained by anonymous ftp from ncbi. nlm. nih. gov, in the pub/sim3 directory.
Contact: For portable versions for Macs and PCs, contact zjing@sunset. nlm. nih. gov.
Received on August 14, 1996; accepted on October 8, 1996
This article has been cited by other articles:
![]() |
L. Zhou, M. Pertea, A. L. Delcher, and L. Florea Sim4cc: a cross-species spliced alignment program Nucleic Acids Res., June 1, 2009; 37(11): e80 - e80. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. A. Hoskins, M. Stapleton, R. A. George, C. Yu, K. H. Wan, J. W. Carlson, and S. E. Celniker Rapid and efficient cDNA library screening by self-ligation of inverse PCR products (SLIP) Nucleic Acids Res., December 2, 2005; 33(21): e185 - e185. [Abstract] [Full Text] [PDF] |
||||
![]() |
T.-J. Chuang, W.-C. Lin, H.-C. Lee, C.-W. Wang, K.-L. Hsiao, Z.-H. Wang, D. Shieh, S. C. Lin, and L.-Y. Ch'ang A Complexity Reduction Algorithm for Analysis and Annotation of Large Genomic Sequences Genome Res., February 1, 2003; 13(2): 313 - 322. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Y.Y. Chen, S.-H. Lu, E. S.C. Shih, and M.-J. Hwang Single Nucleotide Polymorphism Mapping Using Genome-Wide Unique Sequences Genome Res., July 1, 2002; 12(7): 1106 - 1111. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Florea, G. Hartzell, Z. Zhang, G. M. Rubin, and W. Miller A Computer Program for Aligning a cDNA Sequence with a Genomic DNA Sequence Genome Res., September 1, 1998; 8(9): 967 - 974. [Abstract] [Full Text] |
||||
![]() |
J. Zhang and T. L. Madden PowerBLAST: A New Network BLAST Application for Interactive or Automated Sequence Analysis and Annotation Genome Res., June 1, 1997; 7(6): 649 - 656. [Abstract] [Full Text] [PDF] |
||||

