Skip Navigation


Bioinformatics Advance Access originally published online on June 9, 2005
Bioinformatics 2005 21(16):3333-3339; doi:10.1093/bioinformatics/bti530
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
21/16/3333    most recent
bti530v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Koehler, R. T.
Right arrow Articles by Peyret, N.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Koehler, R. T.
Right arrow Articles by Peyret, N.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

Thermodynamic properties of DNA sequences: characteristic values for the human genome

Ryan T. Koehler * and Nicolas Peyret

Applied Biosystems 850 Lincoln Centre Drive, Foster City, CA 94404, USA

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 REFERENCES
 

Motivation: Central to many molecular biology techniques as ubiquitous as PCR and Southern blotting is the design of oligonucleotide (oligo) probes and/or primers possessing specific thermodynamic properties. Here, we use validated theoretical methods to generate distributions of predicted thermodynamic properties for DNA oligos of various lengths. These distributions facilitate immediate appreciation of typical thermodynamic values for oligos of various lengths.

Results: Distributions of melting temperature (Tm), free energy , and fraction hybridized or fraction bound (Fb), are presented for oligos of length 10–50 bases sampled from the human genome. The effects of changing temperature, oligo and salt concentrations, constraining G+C content, and introducing mismatches are exemplified. Our results provide the first survey of typical and limiting thermodynamic values evaluated on a genomic scale. Described numbers comprise useful ‘rules of thumb’ that are applicable to most technologies dependent upon DNA oligo design.

Contact:Koehlert{at}appliedbiosystems.com

Supplementary information: Supplementary Data is available at http://bioinformatics.oxfordjournals.org


    1 INTRODUCTION
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 REFERENCES
 
Molecular biology techniques such as PCR (Saiki et al., 1988), Southern blotting (Southern, 1975), sequencing by hybridization (Fodor et al., 1993), 5' nuclease assays for allelic discrimination (Livak, 1999), DASH (Prince et al., 2001), molecular beacon based assays (Bonnet et al., 1999), antigene targeting (Freier, 1993) and DNA microarrays (Shoemaker and Linsley, 2002) rely on the great specificity and selectivity afforded by DNA probes. This specificity can be traced to the delicate balance between sequence and thermodynamic properties exhibited by nucleic acids.

Thermodynamic parameters for DNA duplex formation are accurately predicted using the nearest-neighbor model (De Voe and Tinoco, 1962; Gray and Tinoco, 1970; Borer et al., 1974). This model has been extended beyond Watson–Crick sequences to those with mismatches and dangling ends (SantaLucia and Hicks, 2004) and is very accurate for predicting thermodynamic properties of oligonucleotides (oligos) as well as longer nucleotide polymers (SantaLucia, 1998), under a variety of experimental conditions.

An accurate working knowledge of oligo thermodynamic quantities is of great practical importance. Knowledge of melting temperature (Tm), free energy of hybridization and fraction of oligos hybridized (Fb) greatly facilitates optimization of oligo sensitivity and specificity for a variety of applications including PCR primer and probe design.

While calculated thermodynamic properties are widely used in oligo design schemes and software (Rychlik and Rhoads, 1989; Peyret and SantaLucia, 1999 http://ozone2.chem.wayne.edu/Hyther/hythermenu.html; Le Novère, 2001), such applications almost invariably focus on achieving specific thermodynamic values for sequences in hand. This narrow scope fails to provide global insight about how important thermodynamic quantities vary in response to changes in sequence, oligo length or experimental conditions. For example, it is essentially impossible to estimate Fb without explicitly calculating it for an oligo sequence. Generally, one cannot even guess upper or lower bounds of thermodynamic values for a generic oligo sequence of some length. Working knowledge of typical and limiting thermodynamic values expected for oligos under a range of experimentally relevant conditions provides a very valuable foundation to anyone undertaking oligo design. For example, one might anticipate that 90% of all possible 16mer DNA oligos completely hybridize under some specific conditions, or that no possible 16mer DNA oligo should hybridize to a target under other conditions. Such rules of thumb can provide a rational starting point for choosing experimental parameters and algorithmic threshold values that are routinely employed in oligo design and screening schemes. Previous studies have focused on the effect of experimental condition variations on random oligo free energy distributions (Rahman and Gräfe, 2004). Here, we present the first comprehensive survey of oligo thermodynamic quantities evaluated on a genomic scale by studying model oligos sampled from the human genome. Our intent is to provide a generalized overview of typical and limiting values for thermodynamic quantities (Tm, and Fb) in relation to sequence length (10–50 bases), physical conditions (salt concentration, oligo concentration and temperature), constraints on base composition (G+C composition), nearest-neighbor parameter set [Breslauer (Breslauer et al., 1986) versus SantaLucia (SantaLucia, 1998)] and introduction of mismatches. We have utilized such distributions to guide investigational researchers with assay design at Applied Biosystems for several years (Koehler and Peyret, 2002).


    2 METHODS
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 REFERENCES
 
2.1 Datasets
One hundred thousand (100K) 50 base sequences were sampled from the UCSC human genome (hg17; May 2004; http://hgdownload.cse.ucsc.edu/downloads.html#human), such that sequences were minimally spaced 10 000 bases apart. For each combination of physical conditions and length considered (10–50 bases), subsequences corresponding to the first n bases of the sampled sequences were evaluated using the nearest-neighbor model. In the case where composition was constrained, n-mers with G+C content outside the limited range 25–75% (in the first n bases) were removed prior to evaluation (though 100K samples were still evaluated).

2.2 Calculation of thermodynamic quantities
Calculation of enthalpy ({Delta}Ho), free energy , entropy ({Delta}So), Tm and Fb for base pairing of each oligonucleotide to its complement are based on the nearest-neighbor model. The model assumes a two-state cooperative transition between random coils and duplex (Petersheim and Turner, 1983; Freier et al., 1986). The most accurate DNA nearest-neighbor parameters were used for our calculations (Allawi and SantaLucia, 1997; Allawi and SantaLucia, 1998a, b,c; SantaLucia, 1998; Peyret et al., 1999; Bommarito et al., 2000). Breslauer nearest-neighbor parameters (Breslauer et al., 1986) were also used for comparison purposes. Free energy values were corrected for monovalent cation (SantaLucia, 1998) and magnesium concentrations (Peyret, 2000). Detailed equations are provided as Supplementary Material. Minimum folding energy of the strands was calculated to evaluate how secondary structure affects distributions. This was done using the Vienna package (Hofacker et al., 1994) parameterized for DNA, using the most up to date parameter set available (SantaLucia and Hicks, 2004).

2.3 Distribution plots
Distributions were generated by evaluating thermodynamic quantities for each of the (100K) randomly chosen n-mers at each length considered. For each length, median and percentile-bounding values were identified and plotted. The final distribution plot is obtained by combining the values obtained for each oligo length. Figure 1 illustrates the construction of a distribution, in this case for Tm. Dedicated scripts were used to process and plot (Grubb, 2002) each distribution. Using 100K oligos, distributions obtained between different random samplings of the genome are indistinguishable, indicating that our sampling is robust.



View larger version (28K):
[in this window]
[in a new window]
 
Fig. 1 Genesis of Tm distribution plots.

 

    3 RESULTS
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 REFERENCES
 
As a convenient and useful point of reference, we start by describing thermodynamic quantity distributions (Tm, and Fb) calculated with physical conditions routinely used for PCR (Sorscher, 1997). Specifically, we use 50°C temperature, 0.05 M [Na+], 0.0013 M [Mg2+], 5 x 10–7 M [DNAA] and 1 x 10–12 M [DNAB]. For these reference conditions, DNAA represents an oligo, such as a PCR primer and DNAB represents the corresponding target, such as genomic DNA. With distributions for reference conditions in hand, we next explore how introducing dangling ends, perturbing salt concentration oligo concentration and temperature alter oligo thermodynamic values. Differences stemming from alternative nearest-neighbor parameter sets and the impact of restricting oligo G+C content are also discussed. Finally, we study the effect of introducing base mismatches in both non-competing and competing reaction contexts.

Distributions are presented for each thermodynamic quantity evaluated as a function of sequence length. This format allows one to easily identify typical and limiting values for oligos of a given length, as well as to compare distributions of these quantities among lengths. For all plots, a thick black line is used to denote median values and different shades of gray demarcate the center-most 50, 80 and 98 percentiles as illustrated in Figure 1. Tables of all percentile values and additional plots are available as Supplementary Material.


    4 DISCUSSION
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 REFERENCES
 
4.1 Hybridization indicators
Melting temperature is probably the most widely used indicator of duplex stability. If we assume a two-state transition (single-stranded state and duplex state), the melting temperature is the temperature at which half of the limiting nucleic acid strand in solution is single-stranded. A major drawback of Tm as an indicator of duplex stability is the inability to differentiate between oligos that have similar Tm values, when assays are performed at temperatures removed from Tm. This is because the fraction hybridized (see below) curve representing the transition from single-strand to duplex may have different slopes as a function of temperature for sequences with similar Tm. Another indicator of duplex stability is the free energy of binding, . This is sensitive to assay temperature but does not reflect the influence of concentrations in duplex stability. A third indicator that is sensitive to both concentration and temperature is the fraction hybridized, Fb. This expresses the percent of the limiting nucleic acid strand that is in duplex state under given temperature and strand concentration conditions. An Fb of 100% indicates complete hybridization, while an Fb close to zero indicates no appreciable hybridization. Tm is defined as the temperature at which Fb equals 50%. The Fb indicator is clearly the most informative as it directly relates to measurable assay characteristics, such as fluorescence, for example.

4.2 Accuracy of the thermodynamic predictions
SantaLucia (1998) predicted the thermodynamics of polymers that had been experimentally measured and showed that the nearest-neighbor model accurately predicted both oligo and polymer thermodynamics. Consequently, oligos considered in this work (10–50 bases) should be predicted accurately, provided the two-state model (Petersheim and Turner, 1983; Freier et al., 1986) applies. Typical errors in thermodynamic predictions of free energy and Tm are 4% and 2–3°C, respectively (SantaLucia et al., 1996; Allawi and SantaLucia, 1997).

Oligos that have propensity to form secondary structures or slipped duplexes sometimes do not exhibit two-state transitions (Allawi and SantaLucia, 1997). To insure that our distributions are accurate we also evaluated thermodynamics using correction for single-strand folding. For this analysis three coupled equilibria were considered and Fb distributions were deduced (see Supplementary Material for equations). The distributions obtained with and without correction for single-strand folding were essentially identical. This result does not mean that secondary structure should be neglected. However, for the oligos and conditions considered here, significant secondary structure able to disrupt hybridization is rare.

In the case of oligos targeting genomic DNA it is harder to predict potential folding of the target region. For this reason some predictions carry inaccuracies that only bench-work (UV spectroscopy, fluorescent melts, etc.) can resolve. However, collecting experimental data on all potential sequences possible for targeting a genomic region is prohibitively expensive. Computational predictions are, therefore, the main alternative and a required first step in any probe/primer design workflow.

4.3 Reference conditions
Reference distributions of Tm, and Fb for oligos evaluated with typical PCR conditions are shown in Figure 2A, B and C, respectively. For these conditions, the median Tm value (Fig. 2A) varies between 32 (10mer) and 74°C (50mer), tending to plateau and increase almost linearly for oligos longer than 30 nt with a slope of ~0.35°C per added base. The width of the Tm distribution spans 35–45°C, though most values (80%) fall within ~10°C of the median for any given length. While typical values are important, consideration of limiting values for any given length is also informative. For example, the plot suggests that virtually no 15mer should have a Tm > 72°C, that no 25mer should have a Tm < 45°C, and that to achieve a Tm of 80°C, an oligo must have at least ~21 bases.



View larger version (24K):
[in this window]
[in a new window]
 
Fig. 2 Melting temperature (A), free energy (B) and fraction hybridized (C) distributions for blunt-ended oligonucleotides in typical PCR conditions (reference conditions): temperature = 50°C, [Na+] = 0.05 M, [Mg2+] = 0.0013 M, [DNA]A = 5 x 10–7 M, [DNA]B = 1 x 10–12 M. Melting temperature.

 
The free energy distribution for our reference conditions (Fig. 2B) shows that median values decrease almost linearly from –5 (10mer) to –37 kcal/mol (50mer) with a slope of ~0.8 kcal/mol per added base. As length increases, the width of the distribution also increases due to the growing sequence diversity.

The Fb distribution (Fig. 2C) shows that median values range from 0 (10mer) to 100% (16mer and longer sequences). This means that >50% of all sampled 10mers are single-strand, while >50% of sampled 16mers are fully hybridized at reference conditions. The rather sharp transition from single-strand to fully hybridized occurs over a length corresponding to the addition of 10 bases. The width of the distribution also grows with sequence length to reflect the increasing diversity of the sequence composition. If we only consider the most typical 98% of oligos, we see that <2% of 10mer targets are >50% hybridized, whereas 98% of 32mer targets are essentially 100% hybridized.

In many applications, when hybridizing to genomic DNA segments, probes and primers are shorter than their targets. As a consequence the duplex formed has overhangs. Thermodynamic studies have shown that the first overhanging base is responsible for most of the energetic contribution of the overhang (Ricceli et al., 2002). The stability of DNA sequences with dangling ends has been systematically investigated (Bommarito et al., 2000), and most dangling ends have stabilizing contributions.

Comparing distributions with (Fig. 2D–F) and without (Figure S1) dangling ends on both sides of the duplex in reference conditions shows how the distributions shift. Median Tm values shift from 2.6°C (10mers) to <0.1°C (46mers and longer oligos), while the distribution width remains constant. Similarly, lengths required for median Fb values of 100% (full hybridization) are 19 and 20 for sequences with and without dangling ends, respectively. Median free energy values also reflect a 0.4 kcal/mol stabilization when dangling ends are present. The widths of the distributions with and without dangling ends are similar.

4.4 Variation of salt concentration, oligo concentration and temperature
Figure 3A, D and B depict Fb distributions in effective cation concentrations of 1.69 x 10–2, 1.69 x 10–1 and 1.69 M, respectively. The value 1.69 x 10–1 M is derived from the salt correction equation (SantaLucia, 1998; Peyret, 2000) (see Supplementary Material, Equation S3) and corresponds to reference conditions with 0.05 M [Na+] and 0.0013 M [Mg2+]. As we increase cation concentrations, the higher salt reduces the oligo length required for full hybridization. Looking at the median Fb curves, these lengths are 30, 20 and 15 bases from low to high salt, respectively. Similarly, the Tm distribution curves shift by ~12°C for each 10-fold increment in salt concentration (Figure S2). The distribution shift seen with changes in salt is slightly greater for longer oligos, because of the length dependence present in the salt correction equation.



View larger version (48K):
[in this window]
[in a new window]
 
Fig. 3 Effect of salt concentration, oligo concentration and temperature variations on fraction hybridized distributions. Panels A, D and B represent fraction hybridized distributions for effective salt concentrations of 0.0169, 0.169 and 1.69 M, respectively. Panels C, D and E represent fraction hybridized distributions for salt concentrations of 5 x 10–9, 5 x 10–7 and 5 x 10–5 M, respectively. Panels F, D and G represent fraction hybridized distributions for assay temperatures of 40, 50 and 60°C, respectively. All non-specified experimental conditions correspond to the reference conditions used in Figure 2.

 
Figure 3C–E depict Fb distributions with oligo concentration of 5 x 10–9, 5 x 10–7, and 5 x 10–5 M, respectively. The Fb curves show that higher concentrations allow shorter oligos to reach complete hybridization. For instance, median lengths for complete hybridization are 24, 20 and 16 bases for the low, reference and high oligo concentrations, respectively. Similarly, Tm distribution curves (Figure S3) shift by >12°C (10mer) to 3°C (50mer) for each 100-fold increment. For long oligos, the concentration change has less impact as can be inferred from the Tm equation (see Supplementary Material, Equation S5).

Figure 3F, D, and H depict Fb distributions at temperatures of 40, 50 and 60°C, respectively. Higher temperatures require longer lengths for oligos to fully hybridize as evidenced by median Fb, lengths of 15, 20 and 28 bases that are required for full hybridization at 40, 50 and 60°C. As expected, this trend is also seen in the free energy distribution which decreases linearly with length. For temperatures of 40, 50 and 60°C, the slope roughly decreases by0.2 kcal/mol for each 10°C variation (Figure S4).

4.5 Thermodynamic parameter sets
The nearest-neighbor thermodynamic parameters of Breslauer et al. (1986) continue to be widely used and implemented in commercial oligo design software, despite the fact that a newer set of unified parameters provide significantly improved predictions (SantaLucia, 1998).

Comparison of predictions performed with each set show that the Breslauer parameters predict shorter oligos to be more hybridized than the unified parameters. For example, the median Fb for full hybridization is 17 bases for Breslauer but 20 bases, using the unified parameters (Figure S5). In accordance with the Fb differences, median Tm values for 10mers are ~30°C with both parameter sets, but Tm values for 50mers differ by ~ 16°C (Figure S5). These observations suggest that the Breslauer parameter set tends to overestimate hybridization stability, and so oligos designed to match a particular Tm or hybridization level using these parameters may actually be somewhat shorter than intended.

4.6 Constraining sequence composition
Constraining sequence base composition to a G + C of 25–75% for our reference conditions does not alter Tm and Fb median values (Figure S6). However, the widths of the distributions are significantly reduced when this simple composition filter is applied to limit oligo diversity.

For Tm, the unconstrained width is ~38°C for all considered lengths, but this reduces to ~25°C when the G+C constraint is applied. While the overall distribution width narrows, the width associated with the more typical Tm values remains largely unchanged. This can be attributed to the fact that most sampled oligos are balanced in terms of G+C content. Sequences that are filtered out have highly skewed G+C, and since Tm roughly correlates with G+C content, removing these sequences shrinks the distribution width by eliminating extreme cases.

The effect of the G+C constraint on Fb is analogous to the Tm results. Oligos giving rise to extreme values are removed from the sample while the range of more typical values is largely unchanged. From a practical point of view, the base composition example illustrates how application of even simple filters can significantly reduce the variation of oligo thermodynamics.

4.7 Sequences with mismatches
The effect of single mismatches on Tm, and Fb is shown in Figure 4. Comparing Tm distributions for perfect match and mismatch cases (Fig. 4A and B, respectively) reveals that median Tm values differ from ~20°C for 10mers to ~2°C for oligos longer than ~35 bases. This trend is expected, because the relative destabilizing impact of a single mismatch diminishes as oligo length is increased. The distribution of differences between perfect match and mismatch cases is shown in Figure 4C, where the diminishing impact of a single mismatch as length is increased is obvious. Considering limiting cases, we see the most destabilizing single mismatch in a 10 base oligo drops Tm by >40°C, and the least destabilizing mismatch drops Tm by ~7°C. For 25mers, the largest drop associated with a single base mismatch is <10°C. It is interesting to compare calculated Tm differences to Tm differences obtained in DASH experiments (Prince et al., 2001), where differential melting profiles are used to identify mismatched sequences. For oligos of 15 and 17 bases, Tm differences associated with a single mismatch were found to be 4–12°C in a magnesium-free buffer. In Figure 4C, the majority (80%) of oligos of this length have calculated Tm differences of ~5 to ~14.5°C with our reference conditions. DASH experiments have also found that increasing probe length from 13 to 25 bases results in a 3-fold decrease in the Tm drop associated with a single mismatch (Howell et al., 1999). The median calculated Tm difference associated with a single mismatch in Figure 4C is ~12°C for 13mers and ~4°C for 25mers, which agrees with the DASH results. Compared to the perfect case, introducing a single mismatch costs the equivalent of ~ 4 bases on average, if complete hybridization is desired (right shift of the distribution in Fig. 4E versus D). Considering limiting lengths, most sampled 10mer with a single mismatch are <10% hybridized, and a small number of oligos longer than 40 bases containing a single mismatch is marginally destabilized (i.e. Fb<100%). At reference conditions, the impact of a single mismatch on Fb is most pronounced for oligos of ~ 15–18 bases, with the largest average Fb difference seen for 17mers. This is evidenced by the hump in Figure 4F. Shorter oligos are, on average, less hybridized, to start with, so destabilizing mismatches cause less disruption than is the case with 17mers. Conversely, longer oligos tend to fully hybridize even with sequence mismatches. Thus under reference conditions, 17mers are most likely to show the greatest discrimination between perfect and mismatch cases. Significantly, consideration of Tm differences alone does not lead one to this result.



View larger version (43K):
[in this window]
[in a new window]
 
Fig. 4 Effect of single mismatch introduction on Tm, and fraction hybridized distributions. (A and B) Tm distributions for perfect match sequences and sequences with one mismatch, respectively are shown. (C) The difference between the two previous Tm distributions is shown. (D and E) Fraction hybridized distributions for perfect match sequences and sequences with one mismatch, respectively are shown. (F) The difference between the two previous fraction hybridized distributions is shown. Experimental conditions correspond to the reference conditions described in the Methods section.

 
We also investigated the effect of multiple base mismatches on hybridization. Fb distributions shift progressively rightward as the number of mismatches increases (Figure S7). This is expected, since longer oligos are required to compensate for the destabilization introduced by the mismatches. The median oligo lengths for full hybridization are ~20, 24, 27 and 35 bases when there are 0, 1, 2 and 4 mismatches, respectively. The mismatch examples considered represent cases where oligos are in excess relative to targets. This situation occurs, for example, when primers or probes are used to amplify and/or interrogate genomic DNA. Any non-target site in the genomic sequence that is similar enough to the intended target (e.g. similar length with few mismatches) may give rise to spurious product and/or readout that compromises an assay. This situation also occurs with microarrays, where oligos are surface-bound, and each may potentially bind to different target sites in the milieu of interrogated DNA. Non-specific signals will be observed, if conditions allow for binding of non-target sequences with mismatches. We note, however, that since the thermodynamic parameters and equilibrium assumption implicit in the currently used model may not hold for surface-bound oligos (Bahnot et al., 2003), results presented in Figure S7 may not be directly applicable to microarray systems. An important design consideration for oligos employed in genomic-context assays is to ensure target specificity. This commonly involves scanning the target genome for similar non-target sites, for example, with BLAST (Altschul et al., 1990). Given a list of similar non-target ‘hits’ to consider, the question then becomes: At what level of similarity to the intended target does a non-target hit pose a risk? Assuming an oligo must appreciably hybridize a non-target site to yield spurious signals, one can identify approximate limits from the mismatch distributions. For example, under reference conditions, most 23mers(90 percentile) fully hybridize to perfect targets while very few hybridize to target-similar sites with four mismatches. Thus (in this system), hits less than ~ 80% similar to target (e.g. 20/23 matching bases) are unlikely to pose a threat. While specific conditions, assay-imposed design constraints, and even different genomes will probably influence results, theoretical limits of this sort are nonetheless useful starting points for setting algorithmic thresholds.

Figure 5 shows Fb distributions of perfect match oligos that are in competition with single mismatch oligos for a common target strand. Competition in this context means that the target site to which the oligos are hybridizing is at a limiting concentration (i.e. excess oligos) hence either perfect match or mismatch oligos may bind to the target, but not both oligos at once. At equal concentrations, the perfect match oligo overwhelmingly out-competes the single mismatches under reference conditions (Fig. 5A). However, an appreciable fraction of single mismatch oligos longer than ~25 bases is still hybridized to a noticeable extent. This is implied by failure of perfect oligos to achieve Fb = 100%, which yields the light gray band across the top of Figure 5A. In this competition case, total Fb is roughly the sum of perfect and mismatch Fb contributions and has a maximum of 100%. Thus, 1% (99th percentile) of single mismatch oligos have an Fb ~10% (100% minus the 99th percentile trace in Figure 5A) and 10% (90th percentile) have an Fb ~4%. These trends are constant for oligos longer than ~25 bases because both perfect and mismatch sequences favor full hybridization at these lengths, and the free energy ‘cost’ of a single mismatch is independent of length. It is interesting to compare mismatch hybridization with and without competition from the perfect match oligo. With competition from a perfect match oligo, single mismatch oligo hybridization is very limited regardless of length. The fact that most targets for oligos >30 nt are >90% hybridized to perfect match oligos means that thesetargets cannot be >10% hybridized to mismatched oligos (color band at top of Fig. 5A). Without competition, however, targets for oligos longer than 30 fully hybridize to mismatch oligos (Fig. 5F).



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 5 Effect of competition between perfect match sequences and sequences with one mismatch on fraction hybridized distributions. (A) The fraction hybridized distribution for the perfect match sequence when competing with mismatched sequences is shown. (B) The same fraction hybridized distribution for an assay carried out at 65°C is shown. (C) The same fraction hybridized distribution for an assay carried out with 100-fold excess of mismatch strand is shown.

 
At increased temperature, the free-energy advantage of a perfect match over a mismatch oligo decreases. Because of this, mismatch oligos now viably compete for target sites; this dramatically reduces the number of perfect match oligos that dominantly hybridize. Figure 5B shows the Fb distribution for the perfect match oligo evaluated with the competitive model at 65°C (with equal perfect and mismatch oligo concentrations). The distribution is significantly depressed and right-shifted relative to the 50°C example (Fig. 5A). Distributions of the mismatch Fb (data not shown) reveal that >90% of targets for oligos longer than ~25 bases are >5% hybridized to mismatch oligos at 65°C, while at 50°C ~10% of such targets are hybridized to a similar extent. Thus, increasing temperature decreases relative specificity between perfect and single mismatch oligos when the two oligos compete for the same target site. As expected, increasing the concentration of a mismatch oligo tends to drive the perfect match oligo off the target. This is shown in Figure 5C, where the mismatch oligo is 100-fold in excess of the perfect match oligo under reference conditions. In this case, about half (i.e. median) of the targets for perfect match oligos longer than 25 bases are ~ 50% bound, 10% (90th percentile) are only ~ 15% hybridized and 99.9% of targets for perfect match oligo are less than ~ 97% hybridized.

The above simulation is applicable to single nucleotide polymorphism (SNP) genotyping assays, where two oligos, designed to complement one or the other allelic base at the SNP site, compete for a single target. Assay specificity depends on the perfect match probe largely out-competing the single mismatch probe. For example, 5' nuclease assays (Livak, 1999), or molecular beacon-based SNP genotyping assays (Marras et al., 1999), require optimized oligo designs so that the mismatched oligo does not generate sufficient signal to make assay readout ambiguous. To further enhance discrimination, modified oligos (i.e. minor groove binder, MGB probes) may also be used (Kutyakin et al., 2000). This simulation is also applicable to microarrays where two surface-bound oligos, differing by a single base, are directed to a given target (Lipshutz et al., 1999). One oligo matches the target perfectly, while the other contains a mismatch and is used to evaluate non-specific background signal. A target binding to both perfect and single mismatch oligos lowers the assay signal-to-noise-ratio. As noted above, however, the currently used model may not hold for surface-bound oligos so these results are not directly applicable to microarray systems (Bahnot et al., 2003).


    5 CONCLUSION
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 REFERENCES
 
Distribution plots provide a useful starting point for applications requiring design of probes or primers to be employed under various experimental conditions. In addition, the trends made evident as conditions are perturbed provide general insights about the relationship between oligo thermodynamics and experimentally controllable parameters. For example, if oligo length is free to vary while full hybridization is required (e.g. PCR primer design), one can identify upper and lower bounds on the range of oligo lengths likely to be encountered. This can be useful for evaluating potential specificity or synthesis issues (i.e. short oligos are more likely to occur multiple times in a given genome and synthesis of some longer oligos may be problematic). Conversely in situations where length is fixed (e.g. combinatorial probe libraries) conditions may be chosen so that, for example, at least 90% oligos are fully hybridized.

Our goal is to provide a broad overview of the theoretical behavior of DNA thermodynamics from which rules of thumb may be gleaned. Our distribution plots facilitate this by concisely conveying typical and limiting thermodynamic values for oligos of practical length. To our knowledge, this is the first comprehensive analysis of thermodynamic distributions on a genomic scale. We anticipate that distributions presented here should provide useful reference information for researchers devising probe design and screening methodologies. Of course, distribution plots based on actual experimental conditions and assay-imposed oligo constraints will be most appropriate for specific applications.


    Acknowledgments
 
We thank Francisco De La Vega, Gene Spier and Hadar Avi-Itzhak for helpful suggestions and critical reading of this manuscript.

Conflict of Interest: none declared.

Received on April 6, 2005; revised on May 13, 2005; accepted on June 6, 2005

    REFERENCES
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 REFERENCES
 

    Allawi, H.T. and SantaLucia, J., Jr. (1997) Thermodynamics and NMR of internal G–T mismatches in DNA. Biochemistry, 36, 10581–10594[CrossRef][Medline].

    Allawi, H.T. and SantaLucia, J., Jr. (1998a) Nearest-neighbor thermodynamic parameters for internal GA mismatches in DNA. Biochemistry, 37, 2170–2179[CrossRef][Medline].

    Allawi, H.T. and SantaLucia, J., Jr. (1998b) Nearest-neighbor thermodynamics of internal AC mismatches in DNA: sequence dependence and pH effect. Biochemistry, 37, 9435–9444[CrossRef][Medline].

    Allawi, H.T. and SantaLucia, J., Jr. (1998c) Thermodynamics of internal CT mismatches in DNA. Nucleic Acids Res., 26, 2694–2701[Abstract/Free Full Text].

    Altschul, S.F., et al. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410[CrossRef][ISI][Medline].

    Bahnot, G., et al. (2003) The importance of thermodynamic equilibrium for high throughput gene expression arrays. Biophys. J., 84, 124–135[ISI][Medline].

    Bommarito, S., et al. (2000) Thermodynamic parameters for DNA sequences with dangling ends. Nucleic Acids Res., 28, 1929–1934[Abstract/Free Full Text].

    Bonnet, G., et al. (1999) Thermodynamic basis of the enhanced specificity of structured DNA probes. Proc. Natl Acad. Sci. USA, 96, 6171–6176[Abstract/Free Full Text].

    Borer, P.N., et al. (1974) Stability of ribonucleic acid double-stranded helices. J. Mol. Biol., 86, 843–853[CrossRef][ISI][Medline].

    Breslauer, K.J., et al. (1986) Predicting DNA duplex stability from the base sequence. Proc. Natl Acad. Sci. USA, 83, 3746–3750[Abstract/Free Full Text].

    De Voe, H. and Tinoco, I., Jr. (1962) The hypochromism of helical polynucleotides. J. Mol. Biol., 4, 518–527[Medline].

    Fodor, S.P.A., et al. (1993) Multiplexed biochemical assays with biological chips. Nature, 364, 555–556[CrossRef][Medline].

    Freier, S.M. (1993) Hybridization: considerations affecting antisense drugs. In Crooke, S.T. and Lebleu, B. (Eds.). Antisense Research and Applications, , Boca Raton, FL CRC Press, pp. 67–82.

    Freier, S.M., et al. (1986) Stability of XGCGCp, GCGCYp, and XGCGCYp helices: an empirical estimate of the energetics of hydrogen bonds in nucleic acids. Biochemistry, 25, 3214–3219[CrossRef][Medline].

    Gray, D.M. and Tinoco, I., Jr. (1970) A new approach to the study of sequence-dependent properties of polynucleotides. Biopolymers, 9, 223–244[CrossRef].

    Grubb, S.C. (2002) Ploticus 2.04 data display engine.

    Hofacker, I.L., et al. (1994) Fast folding and comparison of RNA secondary structures. Monatshefte für Chemie, 125, 167–188[CrossRef].

    Howell, W.M., et al. (1999) Dynamic Allele-specific hybridization. Nat. Biotechnol., 17, 87–88[CrossRef][ISI][Medline].

    Koehler, R.T. and Peyret, N. Florea, L., Walenz, B., Hannenhalli, S. (2002) Distributions of thermodynamic values for DNA oligomers. A theoretical survey of Tm and hybridization extent. Currents in Computational Molecular Biology, , pp. 107.

    Kutyakin, I., et al. (2000) 3'-Minor groove binder-DNA probes increase sequence specificity at PCR extension temperatures. Nucleic Acids Res., 28, 655–661[Abstract/Free Full Text].

    Le Novère, N. (2001) MELTING, computing the melting temperature of nucleic acid duplex. Bioinformatics, 17, 1226–1227[Abstract/Free Full Text].

    Lipshutz, R.J., et al. (1999) High Density Synthetic Oligo Arrays. Nat. Biotechnol., 21, 20–24.

    Livak, K.J. (1999) Allelic discrimination using fluorogenic probes and the 5' nuclease assay. Genet. Anal., 14, 143–149[Medline].

    Marras, S.A.E., et al. (1999) Multipled detection of single-nucleotide variations using molecular beacons. Genet. Anal., 14, 151–156[Medline].

    Petersheim, M. and Turner, D.H. (1983) Base-stacking and base-pairing contributions to helix stability: Thermodynamics of double-helix formation with CCGG, CCGGp, CCGGAp, ACCGGp, CCGGUp, and ACCGGUp. Biochemistry, 22, 256–263[CrossRef][Medline].

    Peyret, N. (2000) Prediction of nucleic acid hybridization: parameters and algorithms. , Detroit, MI PhD Thesis Department of Chemistry, Wayne State University.

    Peyret, N. and SantaLucia, J., Jr. (1999) HYTHER.

    Peyret, N., et al. (1999) Nearest-neighbor thermodynamics and NMR of DNA sequences with internal AA,CC,GG, and TT mismatches. Biochemistry, 38, 3468–3477[CrossRef][Medline].

    Prince, J.A., et al. (2001) Robust and accurate single nucleotide polymorphism genotyping by dynamic allele-specific hybridization (DASH): design criteria and assay validation. Genome Res., 11, 152–162[Abstract/Free Full Text].

    Rahman, S. and Gräfe, C. (2004) Mean and variance of the gibbs free energy of oligonucleotides in the nearest-neighbor model under varying conditions. Bioinformatics, 20, 2928–2933[Abstract/Free Full Text].

    Ricceli, P.V., et al. (2002) Melting studies of dangling-ended DNA hairpins: effect of end length, loop sequence, and biotinylation of loop bases. Nucleic Acids Res., 30, 4088–4093[Abstract/Free Full Text].

    Rychlik, W. and Rhoads, R.E. (1989) A computer program for choosing optimal oligonucleotides for filter hybridization, sequencing and in vitro amplification of DNA. Nucleic Acids Res., 17, 8543–8551[Abstract/Free Full Text].

    Saiki, R.K., et al. (1988) Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science, 239, 487–494[Abstract/Free Full Text].

    SantaLucia, J., Jr. (1998) A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc. Natl Acad. Sci. USA, 95, 1460–1465[Abstract/Free Full Text].

    SantaLucia, J., Jr, et al. (1996) Improved nearest-neighbor parameters for predicting DNA duplex stability. Biochemistry, 35, 3555–3562[CrossRef][Medline].

    SantaLucia, J., Jr and Hicks, D. (2004) The thermodynamics of DNA structural motifs. Annu. Rev. Biophys. Biomol. Struct., 33, 415–440[CrossRef][ISI][Medline].

    Shoemaker, D.D. and Linsley, P.S. (2002) Recent developments in DNA microarrays. Curr. Opin. Microbiol., 5, 334–337[CrossRef][ISI][Medline].

    Sorscher, D.H. (1997) DNA amplification techniques. In Coleman, W.B. and Tsongalis, G.J. (Eds.). Molecular Diagnostics for the Clinical Laboratorian, , NJ Humana Press, pp. 93–94.

    Southern, E.M. (1975) Detection of specific sequences among DNA fragments separated by gel electrophoresis. J. Mol. Biol., 98, 503–517[CrossRef][ISI][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
J. Petersen, L. Poulsen, S. Petronis, H. Birgens, and M. Dufva
Use of a multi-thermal washer for DNA microarrays simplifies probe design and gives robust genotyping assays
Nucleic Acids Res., February 2, 2008; 36(2): e10 - e10.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
R. D. Stedtfeld, L. M. Wick, S. W. Baushke, D. M. Tourlousse, A. B. Herzog, Y. Xia, J. M. Rouillard, J. A. Klappenbach, J. R. Cole, E. Gulari, et al.
Influence of Dangling Ends and Surface-Proximal Tails of Targets on Probe-Target Duplex Formation in 16S rRNA Gene-Based Diagnostic Arrays
Appl. Envir. Microbiol., January 1, 2007; 73(2): 380 - 389.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
21/16/3333    most recent
bti530v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Koehler, R. T.
Right arrow Articles by Peyret, N.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Koehler, R. T.
Right arrow Articles by Peyret, N.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?