Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Melko, O. M.
Right arrow Articles by Mushegian, A. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Melko, O. M.
Right arrow Articles by Mushegian, A. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Vol. 20 no. 1 2004, pages 67-74
Bioinformatics © Oxford University Press 2004; all rights reserved.

Distribution of words with a predefined range of mismatches to a DNA probe in bacterial genomes

O. Michael Melko 1,*,§ and Arcady R. Mushegian 1,2

1 Stowers Institute for Medical Research, Kansas City, MO 64110, USA and 2 Department of Microbiology, Molecular Genetics and Immunology, University of Kansas Medical Center, Kansas City, KS 66160, USA

Received on March 25, 2003 ; revised on June 11, 2003 ; accepted on July 22, 2003

Motivation: Hybridization of oligonucleotides with longer nucleotide sequences is an essential step in nucleic acid biosynthesis in vitro and in vivo, in oligonucleotide-based diagnostics, and in therapeutic applications of oligonucleotides. A major factor determining sensitivity and selectivity of hybridization is the number of base pair mismatches that occur in an ungapped alignment of the oligonucleotide (probe) and a longer sequence (target).

Results: The k-distance match count between the probe and the target is defined as the number of ungapped alignments between the two sequences that have exactly k mismatches, and the k-neighbor match count is defined as the sum of the j-distance match counts for j between 0 and k. We derive a novel formula for the probability of a k-distance match. This formula is based on the assumption that the target is strand-symmetric Bernoulli text (i.e. nucleotides are independently, identically distributed in the target and satisfy Chargaff's second parity rule). Our model predicts that the GC-content in both the probe and the target significantly affects the match count expectation. The ratio of k-neighbor match counts in two distinct genomes for a given probe is a measure of its specificity. We calculated such ratios for pairs of bacterial genomes with different combinations of length, GC-content and phylogenetic distance. Examination of the extreme values of these ratios indicates that probes with a high discriminative power exist for each tested pair.

Supplementary information: Stowers Institute Technical Report No. 0002, C++ source code, Mathematica notebooks and other information is available at http://www.stowers-institute.org/labs/bioinformatics/omm/index.htm

Contact: omm{at}stowers-institute.org

* To whom correspondence should be addressed at

§ Present address: Northern State University, 1200 S. Jay Street, NSU Box 713, Aberdeen, SD 57401, USA


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.