Skip Navigation


Bioinformatics Advance Access originally published online on March 15, 2005
Bioinformatics 2005 21(10):2375-2382; doi:10.1093/bioinformatics/bti379
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/10/2375    most recent
bti379v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Leber, M.
Right arrow Articles by Schrader, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Leber, M.
Right arrow Articles by Schrader, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

A fractional programming approach to efficient DNA melting temperature calculation

Markus Leber 1, Lars Kaderali 2,*, Alexander Schönhuth 2 and Rainer Schrader 2

1Institute for Biochemistry, University of Cologne Zülpicher Straße 47, Köln, D-50674, Germany
2Center for Applied Computer Sciences, University of Cologne Weyertal 80, Köln, D-50931, Germany

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 INTRODUCTION
 METHODS
 RESULTS AND EVALUATION
 CONCLUSION
 REFERENCES
 

Motivation: In a wide range of experimental techniques in biology, there is a need for an efficient method to calculate the melting temperature of pairings of two single DNA strands. Avoiding cross-hybridization when choosing primers for the polymerase chain reaction or selecting probes for large-scale DNA assays are examples where the exact determination of melting temperatures is important. Beyond being exact, the method has to be efficient, as these techniques often require the simultaneous calculation of melting temperatures of up to millions of possible pairings. The problem is to simultaneously determine the most stable alignment of two sequences, including potential loops and bulges, and calculate the corresponding melting temperature.

Results: As the melting temperature can be expressed as a fraction in terms of enthalpy and entropy differences of the corresponding annealing reaction, we propose to use a fractional programming algorithm, the Dinkelbach algorithm, to solve the problem. To calculate the required differences of enthalpy and entropy, the Nearest Neighbor model is applied. Using this model, the substeps of the Dinkelbach algorithm in our problem setting turn out to be calculations of alignments which optimize an additive score function. Thus, the usual dynamic programming techniques can be applied. The result is an efficient algorithm to determine melting temperatures of two DNA strands, suitable for large-scale applications such as primer or probe design.

Availability: The software is available for academic purposes from the authors. A web interface is provided at http://www.zaik.uni-koeln.de/bioinformatik/fptm.html.

Contact: kaderali{at}zpr.uni-koeln.de


    INTRODUCTION
 TOP
 Abstract
 INTRODUCTION
 METHODS
 RESULTS AND EVALUATION
 CONCLUSION
 REFERENCES
 
Along with the continously increasing amount of genomic information comes the need for fast and reliable methods and techniques able to handle the available data. In a wide range of applications, the fact that two single strands of DNA can hybridize, thus forming a duplex, is exploited extensively. By making use of hybridization, information can be copied as well as detected.

An example for copying information is the widely used standard procedure of polymerase chain reactions (PCR). It requires short, single-stranded oligonucleotides, being complementary to the amplicon start and end, respectively, as primers for the reaction. A well-chosen primer is critical for the reaction. Each primer should hybridize only to the intended target, and not to any other part of the sequence, or to other sequences. Moreover, PCR only works properly at specific temperature ranges. If several PCR reactions are to be carried out simultaneously (multiplex PCR), primers have to be chosen such that each of them works under the same reaction conditions, and, given these conditions, does not cross-hybridize to other template or primer DNA. For such reactions, choice of proper temperatures for the experiment becomes important.

DNA microarrays can be used to generate expression profiles of cells or to identify biological agents (viruses or bacteria). On a typical glass or silicon substrate, it is possible to deposit over 5000 such samples or spots per cm2, or more than 1 000 000 per microarray. The deposited single stranded DNA sequence (‘probe’) represents one gene or any other characteristic sequence fragment of interest in the experiment. The microarray is brought in contact with the fluorescently marked DNA sample (‘target sequences’), and probe and target DNA will bind according to their complementary nature. Thus, the DNA can be identified (or gene expression changes monitored) based on the resulting fluorescence pattern.

The decisive step in DNA microarray construction is the selection of sensitive and specific probe sequences. All probes on the microarray must hybridize exclusively to their intended targets, and reaction conditions such as the reaction temperature must be chosen carefully. To be able to do so, the melting temperatures TM of the probes chosen and their hybridizations, desired or not, and involving mismatches, bulges and loops, have to be determined accurately.

Denaturation (i.e. the process of unwinding two annealed strands of DNA) and hybridization of two single strands of DNA is a chemical reaction and can thus be described using basic terms of chemistry. They are transitional processes, i.e. these processes undergo several intermediate steps. Taking a duplex and increasing the temperature will lead, for example, to a state where initial parts of the double strand have already been unwound, while the rest is still annealed. However, since a full description of the corresponding reaction model would make this problem computationally untractable, a simplification is necessary. For this purpose we assume a two-state transition. This assumption is reasonable for sufficiently short oligos; it means that the two DNA strands either hybridize completely or do not interact at all, resulting in a reversible equilibrium reaction [compare Owczarzy et al. (1997)]

(1)
where S1 and S2 are two single strands, which together form a duplex D. KD is the equilibrium constant of this chemical reaction.

Clearly, this assumption of a two-state process is a simplification. It is well known that there are oligonucleotides that do not behave in this two-state manner, and in such cases the actual melting temperature may be quite different from the one predicted. However, this assumption of a two-state behaviour is necessary to make the problem computationally tractable. Our previous results indicate that the model is still suitable for large-scale applications of probe and primer design (Kaderali and Schliep, 2002; Kaderali et al., 2003).

Nucleic acid melting temperatures can be calculated by using the formula

(2)
where {Delta}H and {Delta}S are the enthalpy and entropy changes of the annealing reaction, R is the universal gas constant, and CT = [S1] + [S2] + 2[D] is the total molar concentration of the strands (Freier et al., 1986; Ornstein and Fresco, 1983; Owczarzy et al., 1997; Rychlik and Rhoads, 1989). The calculation of melting temperatures by means of formula (2) reduces the problem to determining the corresponding enthalpy and entropy differences. For this, a thermodynamical model of an annealing reaction of two strands of DNA allowing for fast computation is needed. We use the widely established Nearest Neighbor model (see Methods Section).

The Nearest Neighbor model requires that the actual base pairs formed are known beforehand. However, for probes or primers binding at arbitrary positions, this is not necessarily known. The question for ‘undesired’ hybridization asks for potential binding sites of a given primer or probe, where mismatches, bulges, etc., must be considered. Thus, the problem of computing melting temperatures of pairs of sequences is further coupled to finding a thermodynamically optimal alignment that represents the most stable duplex configuration. This translates to optimizing a target function being the quotient of two parameters given by the alignment. In Kaderali and Schliep (2002), a straightforward greedy approach has been developed returning the optimal configuration in slightly more than 50% of pairs of sequences. Building on this work, we propose a fractional programming approach. This assures that the most stable configuration and thus the exact melting temperature of two sequences is found.

Related work
Computation of melting temperatures. For the computation of melting temperatures, a number of programs are available, most of them primarily designed for primer and probe selection. Most of the programs explicitly concerned with the determination of melting temperatures rely on the Nearest Neighbor model as a thermodynamical basis. DAN (Bleasby, 1999; emboss dan. http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Apps/dan.html) is one example. It evaluates the melting temperature and the GC-content of duplexes. MeltTemp (Accelrys, 2002; MELTTEMP. http://www.biology.wustl.edu/gcg/melttemp.html.) and HyTher (Peyret and SantaLucia Jr., 2002; Peyret et al., 1999) mainly differ from DAN in using different Nearest Neighbor parameters. POLAND (Steger, 1994) simulates transition curves of double-stranded nucleic acid sequences and is based on Poland's algorithm (Poland, 1974). MELTING (Le Novere, 2001) allows the user to choose different parameter sets for the Nearest Neighbor model. Note that none of these programs is able to introduce gaps in the alignments. Moreover, none of them is designed as a high-throughput approach.

Zuker (1989) has presented a dynamic programming algorithm for the calculation of RNA secondary structure, yielding a unimolecular folding algorithm that could be applicable to DNA melting as well. However, since speed of the calculations is decisive as, very often, melting temperatures of millions of pairings have to be calculated (Kaderali and Schliep, 2002), this algorithm cannot be used for our purposes.

Programs for primer and probe selection also need methods to determine the stability of duplices. Most primer design programs like OSP (Hillier and Green, 1991), Primer3 (Rozen and Skaletzky, 2000), Primo (Li et al., 1997), GeneFisher (Giegerich et al., 1996), PCRDiag (Dopazo and Sobrino, 1993), CODEHOP (Rose et al., 1998), DoPrimer (Kaempke et al., 2001), HYBSimulator (Hyndman et al., 1996), U-PRIMER (Hsieh et al., 2003), MULTIPCR (Nicodeme and Steyaert, 1997) and VectorNTISuite (Gorelenkov et al., 2001), using different thermodynamic models to explain duplex stability, are intended to handle only limited amounts of data. No efficiency considerations are made here.

High-throughput probe selection programs usually, as a payoff for efficiency, relax exactness of melting temperature calculations. Li and Stormo (2000) select probes based on word frequencies and common sequence comparison before computing exact melting temperatures for chosen probes only. Kurata and Suyama (1999) also avoid thermodynamical models in favor of faster and simpler criteria. Rouillard et al. (2002) focus on the design of longer oligos. In Rahmann's (2003) most recent approach, oligos are chosen looking for longest common substrings. As these approaches do not rely on sound thermodynamical models, error rates for such crucial issues like cross-hybridization cannot be provided intrinsically. Without a method-independent post-processing step, it would remain unclear to what degree the returned oligos falsify experimental outcomes.

To our knowledge our method is the only one computing the stability of duplices exactly (i.e. according to a sound thermodynamical model) and efficiently.

Fractional programming. Fractional programming is the method of choice for any problem relating to the optimization of fractional target functions and thus can be found in quite a lot of applications. An overview of the subject is provided by Craven (1988), Dinkelbach (1967) and Megiddo (1979). Recently, the fractional programming approach has been used to determine normalized local sequence alignments based on the classical Smith–Waterman algorithm (Arslan et al., 2001). Note that in our case the substeps of the fractional programming algorithm correspond to meaningful terms in the language of thermodynamics.


    METHODS
 TOP
 Abstract
 INTRODUCTION
 METHODS
 RESULTS AND EVALUATION
 CONCLUSION
 REFERENCES
 
Thermodynamics: the Nearest Neighbor model
To provide a good and simple enough model for calculating melting temperatures according to Equation (2), we have to look closer at the very nature of a hybridization reaction, to distinguish between those effects whose contributions to energetic considerations are decisive, and those that are not. The Nearest Neighbor model is based on the observation that two kinds of interactions between bases in nucleic acids play a prevalent role (Cupal, 1997; Delcourt and Blake, 1990): Base pairing in the plane of the bases due to hydrogen bonding between base pairs in the two opposing strands, and base stacking along the strands due to London dispersion forces and hydrophobic effects. As base pairing is an effect coming from forces between opposing bases, and base stacking effects can be calculated by considering only neighboring bases within a strand, the total energy of the annealing reaction can be estimated by considering only the contributions of pairs of opposing bases (Breslauer et al., 1986). Enthalpy and entropy parameters of pairs of opposing base pairs are simply added up to obtain the energy changes for the whole strand. The Nearest Neighbor model is well established; only complex extensions of it would yield significant improvements [see SantaLucia and Hicks (2004) for a recent and detailed discussion of the subject].

Nearest Neighbor parameters. The determination of reliable Nearest Neighbor parameters still is an area of active research, multiple labs having measured and published different (and contradictory) results (see, e.g. Delcourt and Blake, 1990; Steger, 1994 for discussions). Most recently a database containing comprehensive sets of DNA–DNA parameters, estimated by combining measured values from different work groups, has been built (SantaLucia and Hicks, 2004). Parameters for DNA–RNA and RNA–RNA binding are also available. As our method allows to exchange parameter sets, new and more reliable datasets, when available, can easily be plugged in and replace the outdated ones.

Algorithms: fractional programming and the Dinkelbach algorithm
Consider the fractional program

(3)
where S is a compact set and f, g : S -> R are continuous functions and additionally g > 0. Then the parametric program

(4)
is related to the fractional program in the following sense:

PROPOSITION 1
is an optimal solution of (3) if and only if is an optimal solution of max {f(x) – g(x) | x S}, where is the only null of F(q) = max {f(x) – qg(x) | x S}, q R. Furthermore, = q().

The proof can be found in Schaible (1978). The following picture might help clarify the situation: Each x S can be associated with a real-valued line

with slope (–gx) coloneg(x) and y-axis intercept fx colone f(x). Note that as S is a compact set and f and g are continuous, for all q R, the set {(–gx)q + fx | x S} is compact as well and thus its maximum can be found. This is what has to be done when solving the parametric program (4). Finding the null of

now, as g > 0 makes sure that all lines have negative slope, translates to finding the line which crosses the x-axis last. It can be easily seen that F is a continuous, convex, strictly monotonically decreasing function, which is the envelope of the set of lines described above (Fig. 1).



View larger version (10K):
[in this window]
[in a new window]
 
Fig. 1 The alignment/melting temperature problem with three different possible alignments (solid lines). The respective (temperature-sensitive) most stable configuration is indicated by the dashed line and the way the Dinkelbach algorithm finds the melting temperature and the corresponding alignment is displayed by the dotted line. Vertical sections thereof correspond to finding an optimal alignment given a specific temperature while diagonal sections correspond to finding the melting temperature of a configuration determined by an alignment.

 
Based on these facts, Dinkelbach proposes the following procedure for finding the maximum of (3):
  1. Choose q1 < [e.g. take any for a feasible ]. Set i = 1.
  2. Determine an optimal solution xi for max {f(x) qig(x) | x S}.
  3. If F(qi) ≤ {varepsilon} ({varepsilon} > 0 for some predetermined tolerance), consider qi to be a sufficiently good approximation of . Else go to step 4.
  4. Set qi+1 = q(xi) and i = i + 1. Go back to step 2.
Dinkelbach has shown that this procedure produces a sequence qi, q1 < q2 < ··· ≤ , converging to . The decisive criteria for an efficient implementation of the Dinkelbach algorithm are the availability of a sufficiently fast method for determining the maximum of the parametric program and the number of iteration steps needed, i.e. the rate of convergence of the sequence {qi}.

REMARK 1
The condition of f, g being continuous functions over a compact set of real numbers is only needed to ensure the existence of maxima of the sets of lines evaluated at a single position and thus can be relaxed to f, g being any real-valued functions over an arbitrary set S such that the corresponding sets of line values can be maximized.

We further will have to deal with these modified conditions.

Adaptation to our problem
The aim of our efforts is to find a solution to the fractional program

(5)
where S is the set of all possible alignments of two sequences and {Delta}H, {Delta}S are the enthalpy and entropy differences of the corresponding chemical reaction. Note first that the necessary conditions for step 2 of the algorithm of Remark 1 apply because S is finite. With S being finite it is moreover assured that the solution of Equation (5) can be found with the Dinkelbach algorithm in a finite number of steps. We further have to make sure that the denominator is >0. This can be done by multiplying both the numerator and the denominator by –1 as the denominator is usually negative.

We now give an outline of the strategies applied for carrying out steps 1 and 2 of the Dinkelbach algorithm.

Step 1. For the initialization step, any fast heuristic producing good approximations of the exact melting temperature may be taken. It has to be assured that the initial temperature to be taken is lower than the exact value in order to make the algorithm work properly. It should be considered that the heuristic is fast enough for not slowing down the algorithm too much. See Running Time for a detailed discussion.

Step 2. Dynamic Programming Step 2 translates to

(6)
(after having multiplied numerator and denominator by –1), which is the negative difference of Gibb's free energy, –{Delta}G. Maximizing this term now has a concrete meaning: finding the alignment for which, at this temperature, it is most likely that the strands build duplices. The target function for a given alignment z of sequences x and y can be calculated additively:

(7)
where k is the length of the alignment, xi, yi {A, C, G, T, –} are the bases (extended by ‘–’ if necessary) of the sequences to be aligned and ({Delta}G)xixi+1/yiyi+1 is the free energy according to the Nearest Neighbor model for a pairing of neighboring bases xi, xi+1 paired with yi, yi+1. This optimization problem has thus an objective function which is additively decomposable and therefore turns out to be efficiently solvable with standard dynamic programming techniques. The dynamic programming recursion becomes (for the sake of simplicity denote –G with )

(8)

(9)

(10)
where are the already calculated values of intermediary alignments of the first k with the first l bases; the third index refers to the three possible types of alignment [0 for aligning xi with yj, 1 for xi with a gap and 2 for yj with a gap (xi, yj {A, C, G, T)], and , and are the respective values which have to be added according to the Nearest Neighbor model. This is a slight modification to usual alignment calculations as parameters stem from two neighboring base pairs and not from single base pairings.

By initializing the border of the dynamic programming table with zeros, we assure that initial gaps do not lower TM; by looking for the result not just in cell (|x|, |y|), but in cells (s, |y|) and (|x|, t) for all s = 1.|x| and t = 1.|y| and choosing the maximum value found, the same is true for terminal gaps. Furthermore, a special symbol is used to denote the beginning and the end of a sequence, allowing the use of dangling end thermodynamic parameters in the computation (Bommarito et al., 2000).

Implementation
The algorithm was implemented in C++ and compiles with the gnu compiler on most systems. The program offers a command line interface and can thus be easily called from other programs as a subroutine for TM determination. In addition, the source code is available on request from the authors, and the algorithm can thus be compiled directly into other third-party packages. A web-interface is available at http://www.zaik.uni-koeln.de/bioinformatik/fptm.html. The software has furthermore been built into the latest version of the SBEprimer package (Kaderali et al., 2003) which compiles under Windows, and in this version offers a convenient graphical user interface.


    RESULTS AND EVALUATION
 TOP
 Abstract
 INTRODUCTION
 METHODS
 RESULTS AND EVALUATION
 CONCLUSION
 REFERENCES
 
An example where the greedy algorithm described in Kaderali and Schliep (2002) fails is given by

{bti379i1}

with a calculated optimum melting temperature of 47.6°C, for which the greedy algorithm returns the temperature 38.4°C and the alignment

{bti379i2}

The example illustrates the typical problem the greedy algorithm has: It tries to maximize the duplex stability early in the process of building the alignment (for prefixes of the sequences), and may have to accept costly mismatches or insertions later in the duplex as a consequence of such prior greed. The iterative procedure of the Dinkelbach method eliminates this problem.

Experimental validation
The Nearest Neighbor model has extensively been validated experimentally. SantaLucia and Hicks (2004) report that it fits TM within 1.6°C on average. In addition, we analyzed data recently published by Pozhitkov (2003) with our method. The data shows the influence of the position of a mismatch within a DNA–RNA duplex on the fluorescence observed after hybridization to a DNA chip. Pozhitkov compared perfectly matching and single-mismatch hybridization for five different probes on a chip. Hybridization and washing were carried out in 5 x SSC solution, which is 1 M sodium ions. The system was incubated for 24 h at 45°C, and then washing was done three times at room temperature. As results for two of these experiments exhibited irregularities due to experimentally induced noise (e.g. one showing higher intensity for mismatches than for the perfectly matching duplex) only three of the probes were used for evaluation. Figures 35 show fluorescense intensities (normalized to fluorescense of the perfectly matching duplex, light grey bars) for the three experiments with different probes, being perfectly complementary and with mismatches at positions 6, 11, and at the end.



View larger version (9K):
[in this window]
[in a new window]
 
Fig. 3 Experimentally determined and calculated fluorescense for algae sample, for the perfectly matching duplex and with mismatches at the end and at positions 6 and 11, respectively. The sample's DNA is GAT TGT GCA ATA CTC CAA CC.

 


View larger version (9K):
[in this window]
[in a new window]
 
Fig. 5 Experimentally determined and calculated fluorescense for ostrac sample, with sequence ATC ACT CGC GCA TAA GTT AG.

 
Calculations and their evaluation—the van't Hoff equation. For single-phase transitions, TM is defined as the point where 50% of the sequences are in the denatured and 50% are in the duplex state. As fluorescence is approximately proportional to the number of duplexes, i.e. the concentration of duplexes in terms of the chemical equations given above, we had to establish a relationship between melting temperatures and fluorescence levels at a given temperature. We model temperature dependency of the equilibrium constant KD of the annealing reaction [Equation (1)] using the well-known van't Hoff equation (e.g. Wedler, 1997)

(11)
Solving the van't Hoff differential equation under the initial condition K(TM) = (CT/4)–1 given by the equilibrium constant at melting temperature, where CT is the total molar concentration of strands, and inserting

(12)
(where [D](T) is the duplex concentration at temperature T using the relationships CT = [S1] + [S2] + 2[D], KD = [D]/[S1][S2] and assuming [S1] = [S2]) into the solution yields a sigmoidal functional dependence of the concentration of duplexes and temperature (Fig. 2).



View larger version (8K):
[in this window]
[in a new window]
 
Fig. 2 Temperature-dependent concentration of duplexes according to the van't Hoff equation.

 
Using this function and the proportional relation of concentration and fluorescence, estimations of fluorescence levels at any temperature could be given. Figures 35 show experimentally observed and predicted fluorescense as calculated by our algorithm. Figure 5 shows good agreement between theory and experiment; in Figure 4, the relative trends between experiment and prediction are correct, even though the absolute fluorescense intensities differ somewhat. For the duplices with mismatches at position 6 and 11 in Figure 3, this is even more pronounced. It is unclear whether the experiment is noisy or the model makes wrong predictions in these cases. Clearly, the indirect measurement of duplex concentration via fluorescence intensities is an additional source of noise, and hybridization to a chip is different from reaction conditions in solution, for which Nearest Neighbor parameters are derived, which may explain some of the variability seen. Newer parameters for chip hybridization, when available, may diminish this problem. It is worthwile noting though, that for alignments involving mismatches, the parameters used appear to be overly conservative, predicting a more stable duplex than is experimentally observed. This may be desirable when trying to avoid cross-hybridizations on a DNA microarray.



View larger version (9K):
[in this window]
[in a new window]
 
Fig. 4 Experimentally determined and calculated fluorescense for cyclop sample, with DNA sequence ACA TAC ATG GAT CAC CCC TC.

 
Running time
A necessity for a melting temperature algorithm to be applicable in large-scale applications such as probe design for DNA chips or primer design for multiplex PCR reactions is efficiency of the individual computations.

Step 1 of the Dinkelbach procedure requires choosing a good starting temperature. The closer the starting point is to the actual melting temperature, the fewer iterations are required in the procedure. However, one has to ensure that the adopted temperature is not above the true melting temperature, as otherwise the method will not converge.

We compared two different heuristics for choosing the starting temperature for the algorithm. An obvious initial temperature is T = 0 K. Simply setting the initial temperature to zero is the fastest method of getting a value below the exact temperature. As this temperature is certainly the lowest one possible, one may, as a result, have to put up with some more iteration steps. We evaluated the number of iterations required for pairs of random sequences of different lengths. Thirty thousand pairs of two random sequences were generated for each length of 7, 15, 30, 45 and 60 bases. Each pair was aligned using the fractional programming approach, and the number of iterations of the algorithm required were counted. A clear correlation can be seen between sequence length and the number of iteration steps required. On average, the algorithm requries four iterations.

An alternative way is to set the start temperatures to the values generated by the greedy melting temperature algorithm described in Kaderali and Schliep (2002). This procedure needs about the same time as a single iteration step of the Dinkelbach algorithm, but already produces the exact results in more than 50% of the cases, so that no more iteration steps have to be carried out. If the result of this algorithm is not the exact value, two iterations on average give the exact result. Here too, sequence length matters; see Figure 6. On average about two iterations fewer than when initializing T = 0 K are required. It thus pays off to use the greedy algorithm for initialization, at least if the sequences to be aligned are not very similar. The more similar the sequences are, the fewer iterations are required on average. For perfectly complementary duplexes, the greedy algorithm already produces the exact melting temperature (data not shown).



View larger version (19K):
[in this window]
[in a new window]
 
Fig. 6 Number of iterations when initializing with greedy algorithm, for different lengths of aligned sequences.

 

    CONCLUSION
 TOP
 Abstract
 INTRODUCTION
 METHODS
 RESULTS AND EVALUATION
 CONCLUSION
 REFERENCES
 
We have presented a novel method to efficiently and exactly calculate melting temperatures of DNA-duplexes for large-scale applications such as probe-design or primer-design problems, where thousands of melting temperatures need to be determined quickly. The method builds on previous work combining melting temperature computations according to the Nearest Neighbor model with a simultaneously optimized sequence alignment (Kaderali and Schliep, 2002). By making use of a fractional programming approach, our method presented here overcomes a flaw of the previously presented method, now being exact at the price of an approximately doubled execution time. As our method is based on a sound thermodynamical model, its outcomes are more reliable than those of other approaches designed to be efficient, which, for example, search for common substrings or optimize scores of evolutionary alignments. Given the increase in processor speed observed since publication of the previous methods, high-throughput applications using fast pre-processing steps in combination with suitable data structures are a plausible extension of this work.


    Acknowledgments
 
The authors would like to acknowledge funding from the BMBF through the Cologne University Bioinformatics Centre (CUBIC). Thanks to Alexander Pozhitkov for helpful discussions.

Received on January 4, 2005; revised on February 21, 2005; accepted on March 6, 2005

    REFERENCES
 TOP
 Abstract
 INTRODUCTION
 METHODS
 RESULTS AND EVALUATION
 CONCLUSION
 REFERENCES
 

    Arslan, A., Egecioglu, O., Pevzner, P. (2001) A new approach to sequence comparison: normalized sequence alignment. Proceedings of RECOMB 2001, , Montreal, Canada , pp. 2–11.

    Bommarito, S., et al. (2000) Thermodynamic parameters for DNA sequences with dangling ends. Nucleic Acids Res., 28, 1929–1934[Abstract/Free Full Text].

    Breslauer, K., et al. (1986) Predicting DNA duplex stability from the base sequence. Proc. Natl Acad. Sci. USA, 83, 3746–3750[Abstract/Free Full Text].

    Craven, B. Fractional Programming, (1988) , Berlin Helderman Verlag.

    Cupal, J. (1997) The density of states of RNA secondary structures. Master's thesis University of Vienna.

    Delcourt, S.G. and Blake, R. (1990) Stacking energies in DNA. J. Biol. Chem., 266, 15160–15169.

    Dinkelbach, W. (1967) On nonlinear fractional programming. Manage. Sci., 13, 492–498.

    Dopazo, J. and Sobrino, F. (1993) A computer program for the design of PCR primers for diagnosis of highly variable genomes. J. Virol. Methods, 41, 157–166[CrossRef][Web of Science][Medline].

    Freier, S., et al. (1986) Improved free-energy parameters for predictions of RNA duplex stability. Proc. Natl Acad. Sci. USA, 83, 9373–9377[Abstract/Free Full Text].

    Giegerich, R., Meyer, F., Schleiermacher, C. (1996) Genefisher—software support for the detection of postulated genes. Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology, 1996 AAAI Press, pp. 68–77.

    Gorelenkov, V., et al. (2001) Set of novel tools for PCR Primer Design. BioTechniques, 31, 1326–1330[Web of Science][Medline].

    Hillier, L. and Green, P. (1991) OSP: a computer program for choosing PCR and DNA sequencing primers. PCR Meth. Appl., 1, 124–128[Medline].

    Hsieh, M., et al. (2003) An efficient algorithm for minimal primer set selection. Bioinformatics, 19, 285–286[Abstract/Free Full Text].

    Hyndman, D., et al. (1996) Software to determine optimal oligonucleotide sequences based on hybridization simulation data. BioTechniques, 20, 1090–1097[Web of Science][Medline].

    Kaderali, L. and Schliep, A. (2002) Selecting signature oligonucleotides to identify organisms using DNA arrays. Bioinformatics, 18, 1340–1349[Abstract/Free Full Text].

    Kaderali, L., et al. (2003) Primer-design for multiplexed genotyping. Nucleic Acids Res., 31, 1796–1802[Abstract/Free Full Text].

    Kaempke, T., et al. (2001) Efficient primer design algorithms. Bioinformatics, 17, 214–225[Abstract/Free Full Text].

    Kurata, K. and Suyama, A. (1999) Probe design for DNA chips. Genome Inform., 10, 225–226.

    Le Novere, N. (2001) Melting, computing the melting temperature of nucleic acid duplex. Bioinformatics, 17, 1226–1227[Abstract/Free Full Text].

    Li, F. and Stormo, G. (2000) Selecting optimum DNA oligos for microarrays. IEEE International Symposium on Bio-Informatics and Biomedical Engineering (BIBE).

    Li, P., et al. (1997) PRIMO: a primer design program that applies base quality statistics for automated large-scale DNA sequencing. Genomics, 40, , pp. 476–485[CrossRef][Web of Science][Medline].

    Megiddo, N. (1979) Combinatorial optimization with rational objective functions. Math. Oper. Res., 4, 414–424[Abstract/Free Full Text].

    Nicodeme, P. and Steyaert, J. (1997) A computer support for genotyping by PCR-multiplex. Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology AAAI Press, pp. 210–213.

    Ornstein, R. and Fresco, J. (1983) Correlation of Tm and sequence of DNA duplexes with {Delta} h computed by and improved empirical potential method. Biopolymers, 22, 1979–2000[CrossRef][Web of Science][Medline].

    Owczarzy, R., et al. (1997) Predicting sequence-dependent melting stability of short duplex DNA oligomers. Biopolymers, 44, 217–239[CrossRef][Web of Science][Medline].

    Peyret, J. and SantaLucia, J., Jr. (2002) Hyther version 1.0. Wayne State University.

    Peyret, N., et al. (1999) Nearest-neighbor thermodynamics and NMR of DNA sequences with internal A-A, C-C, G-G and T-T mismatches. Biochemistry, 38, 3468–3477[CrossRef][Medline].

    Poland, D. (1974) Recursion relation generation of probability profiles for specific-sequence macromolecules with long-range correlations. Biopolymers, 13, 1859–1871[CrossRef][Web of Science][Medline].

    Pozhitkov, A. (2003) Molecular taxonomy. Bioinformatics and practical evaluation. Ph.D. Thesis Institute of Genetics, University of Cologne.

    Rahmann, S. (2003) Fast large scale oligonucleotide selection using the longest common factor approach. J. Bioinform. Comput. Biol., 1, 343–361[CrossRef][Medline].

    Rose, T., et al. (1998) Consensus-degenerate hybrid oligonucleotide primers for amplification of distantly related sequences. Nucleic Acids Res., 26, 1628–1635[Abstract/Free Full Text].

    Rouillard, J.-M., et al. (2002) OligoArray: genome-scale oligonucleotide design for microarrays. Bioinformatics, 18, 486–487[Abstract/Free Full Text].

    Rozen, S. and Skaletzky, H. (2000) Primer3 on the WWW for general users and for biologist programmers. In Krawetz, S. and Misener, S. (Eds.). Bioinformatics Methods and Protocols: Methods in Molecular Biology., , Totowa, NJ Humana Press, pp. 365–386.

    Rychlik, W. and Rhoads, R. (1989) A computer program for choosing optimal oligonucleotides for filter hybridization, sequencing and in vitro amplification of DNA. Nucleic Acids Res., 17, 8543–8551[Abstract/Free Full Text].

    SantaLucia, J.J. and Hicks, D. (2004) The thermodynamics of DNA structural motifs. Annu. Rev. Biophys. Biomol. Struct., 33, 415–440[CrossRef][Web of Science][Medline].

    Schaible, S. (1978) Analyse und Anwendungen von Quotientenprogrammen, Vol. 42, Mathematical Systems in Economics. Verlag Anton Hain.

    Steger, G. (1994) Thermal denaturation of double-stranded nucleic acids: prediction of temperatures critical for gradient gel electrophoresis and polymerase chain reaction. Nucleic Acids Res., 22, 2760–2768[Abstract/Free Full Text].

    Wedler, G. Lehrbuch der physikalischen Chemie, (1997) Wiley-VCH.

    Zuker, M. (1989) On finding all suboptimal foldings of an RNA molecule. Science, 244, 48–52[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
T. Mann, R. Humbert, M. Dorschner, J. Stamatoyannopoulos, and W. S. Noble
A thermodynamic approach to PCR primer design
Nucleic Acids Res., July 1, 2009; 37(13): e95 - e95.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Duitama, D. M. Kumar, E. Hemphill, M. Khan, I. I. Mandoiu, and C. E. Nelson
PrimerHunter: a primer design tool for PCR-based virus subtype identification
Nucleic Acids Res., May 1, 2009; 37(8): 2483 - 2492.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. D. Gans and M. Wolinsky
Improved assay-dependent searching of nucleic acid sequence databases
Nucleic Acids Res., July 1, 2008; 36(12): e74 - e74.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
W. Tembe, N. Zavaljevski, E. Bode, C. Chase, J. Geyer, L. Wasieloski, G. Benson, and J. Reifman
Oligonucleotide fingerprint identification for microarray-based pathogen diagnostic assays
Bioinformatics, January 1, 2007; 23(1): 5 - 13.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
H. H. Nour-Eldin, B. G. Hansen, M. H. H. Norholm, J. K. Jensen, and B. A. Halkier
Advancing uracil-excision based cloning towards an ideal technique for cloning PCR fragments
Nucleic Acids Res., October 6, 2006; 34(18): e122 - e122.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/10/2375    most recent
bti379v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Leber, M.
Right arrow Articles by Schrader, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Leber, M.
Right arrow Articles by Schrader, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?