Physical Principles and Visual-OMP Software for Optimal PCR Design
[chapter]
John SantaLucia
2007
Msphere
The physical principles of DNA hybridization and folding are described within the context of how they are important for designing optimal PCRs. The multi-state equilibrium model for computing the concentrations of competing unimolecular and bimolecular species is described. Seven PCR design "myths" are stated explicitly, and alternative proper physical models for PCR design are described. This chapter provides both a theoretical framework for understanding PCR design and practical guidelines
more »
... users. The Visual-OMP (oligonucleotide modeling platform) package from DNA Software, Inc. is also described. From: Methods in Molecular Biology, vol. 402: PCR Primer Design Edited by: A. Yuryev © Humana Press, Totowa, NJ 3 4 SantaLucia G T = −RT × ln K SantaLucia thermodynamics under a wide variety of solution conditions that occur in biological assays, including PCR. At DNA Software, Inc., further empirical equations have been measured (under NIH SBIR funding) for magnesium, DMSO, glycerol, formamide, urea, many fluorophores, and many modified nucleotides including PNA, LNA, morpholino, phosphorothioate, alkynyl pyrimidines, the universal pairing base inosine (11), and others (S. Morosyuk and J. SantaLucia, unpublished results). For several PCR applications, the parameters for PNA, LNA, and inosine (among others) are important and unique to Visual-OMP. We have also determined complete parameters for DNA-RNA hybridization including mismatches, salt dependence, and dangling ends (M. Tsay, S. Morosyuk, and J. SantaLucia, unpublished results), which is useful for the design of reverse-transcription PCR and hybridization-based assays. Computation of T m from H and S By combining Eqs. 12 and 13, one can derive the following expression: where [Atot] is the total molar strand concentration of the strand that is in excess (typically the primer) and [Btot] is the molar concentration of the strand that is Physical Principles for PCR Design 11 lower in concentration (typically the target strand). In Eq. 17, if Atot = Btot , then it is easy to derive that the A − B /2 term equals Ct/4, where Ct = Atot + Btot . Importantly, all the T m equations above apply only to "two-state transitions" (i.e., the molecules that form only random coil and duplex states), and they do not apply to transitions that involve intermediate partially folded or hybridized structures. For such multi-state transitions, the definition of the T m changes to: The temperature at which half of a particular strand (usually the lower concentration strand, which is the target in PCR) forms a particular structure (e.g., duplex hybrid) and the remainder of the strands of that limiting strand form all other intermediates and random coil. Sometimes, the T m is undefined because there is no temperature at which half the strands form a particular structure. Multi-State Coupled Equilibrium Calculations The principle of calculating the amount bound for a two-state transition was described in Subheading 2.1. The two-state model (see Eq. 3), however, can be deceptive because there are often many equilibria that can compete with the desired equilibrium (see Fig. 3 ). In addition to target secondary structure folding, other structural species can also form folded primer, mismatch hybridization, and primer homodimers (and primer heterodimers when more than one primer is present as is typical in PCR). It is desirable to compute the concentrations of all the species for such a coupled multi-state system. This can be accomplished by generalizing the approach described above for the two-state case (see Eqs. 5-11). Fig. 3 . Seven-state model for hybridization (AB match) with competing equilibria for unimolecular folding (A F and B F ), homodimers (A 2 and B 2 ), and mismatch hybridization (AB mismatch). By Le Chatelier's principle, the presence of the competing equilibria will decrease the concentration of AB (match). To compute the concentrations of all the species, a numerical approach is used (described in the text). 18 SantaLucia Furthermore, the unified NN parameters were extended by my laboratory to allow for accurate calculation of mismatches, dangling ends, salt effects, and other secondary structural elements, all of which are important in PCR (2). Myth 3: Designing Forward and Reverse Primers to Have Matching T m 's Is the Best Strategy to Optimize for PCR Nearly all "experts" in PCR design would claim to believe in myth 3. Most current software packages base their design strategy on this myth. Some careful thought, however, quickly reveals the deficiencies of that approach. The T m is the temperature at which half the primer strands are bound to target. This provides intuitive insight for very simple reactions, but it does not reveal the behavior (i.e., the amount of primer bound to target) at the annealing temperature. The PCR annealing temperature is typically chosen to be 10 C below the T m . However, different primers have different H of binding, which results in different slopes at the T m of the melting transition. Thus, the hybridization behavior at the T m is not the same as the behavior at the annealing temperature. The quantity that is important for PCR design is the amount of primer bound to target at the annealing temperature. Obtaining equal primer binding requires that the solution of the equilibrium equations as discussed in Subheading 2.1. If the primers have an equal concentration of binding, then they will be equally extended by DNA polymerase, resulting in efficient amplification. This principle is illustrated in Fig. 5 . The differences in primer binding are amplified with each cycle of PCR, thereby reducing the amplification efficiency and providing opportunity for artifacts to develop. The myth of matched T m 's is thus flawed. Nonetheless, as single-target PCR is fairly robust, such inaccuracies are somewhat tolerated, particularly if one allows for experimental optimization of the temperature cycling protocol for each PCR. In multiplex and other complex assays, however, the design flaws from matched T m 's become crucial and lead to failure. An additional problem with using two-state T m 's for primer design is that they do not account for the rather typical case where target secondary structure competes with primer binding. Thus, the two-state approximation is typically invalid for PCR, and thus the two-state T m is not directly related to the actual behavior in the PCR. The physical principle that does account for the effects of competing secondary structure, mishybridization, primer dimers, and so on is called "multi-state equilibrium," as described in Subheading 2.4. Below an alternative design strategy is suggested in which primers are carefully designed so that many PCRs can be made to work optimally at a single PCR condition, thereby enabling high-throughput PCR without the need Gene I 24 SantaLucia activity have a much higher incidence of primer dimer formation and mishybridization artifacts. Thus, for PCR, "proofreading" activity can actually be harmful. Myth 5: A BLAST Search Is the Best Method for Determining the Specificity of a Primer To minimize mispriming, several PCR texts suggest performing a BLAST search, and such capability is a part of some primer design packages such as GCG and Vector NTI and Visual-OMP. However, a BLAST search is not the appropriate screen for mispriming because sequence identity is not a good approximation to duplex thermodynamics, which is the proper quantity that controls primer binding. For example, BLAST scores a GC and an AT pair identically (as matches), whereas it is well known that base pairing in fact depends on both the G + C content and the sequence, which is why the NN model is most appropriate. In addition, different mismatches contribute differently to duplex stability. For example, a G − G mismatch contributes as much as −2 2 kcal/mol to duplex stability at 37 C, whereas a C − C mismatch can destabilize a duplex by as much as +2 5 kcal/mol. Thus, mismatches can contribute G over a range of 4.7 kcal/mol, which corresponds to factor of 2000 in equilibrium constant. In addition, the thermodynamics of DNA-DNA duplex formation are quite different than that of DNA-RNA hybridization. Clearly, thermodynamic parameters will provide better prediction of mispriming than sequence similarity. BLAST also uses a minimum 8 nt "word length," which must be a perfect match; this is used to make the BLAST algorithm fast, but it also means that BLAST will miss structures that have fewer than eight consecutive matches. As GT, GG, and GA mismatches are stable and occur commonly when a primer is scanned against an entire genome, such a short word length can result in BLAST missing thermodynamically important hybridization events. BLAST also does not properly score the gaps that result in bulges in the duplexes. DNA Software, Inc. is developing a new algorithm called ThermoBLAST that retains the computational efficiency of BLAST so that searches genomic can be accomplished rapidly but uses thermodynamic scoring for base pairs, dangling end, single mismatches, bulges, tandem mismatches, and other motifs. Figure 10 gives some examples of strong hybridization that would be missed by BLAST but detected by ThermoBLAST. The computational efficiency of ThermoBLAST is accomplished using a variant of the bimolecular dynamic programming algorithm that was invented at DNA Software, Inc. Physical Principles for PCR Design 25
doi:10.1007/978-1-59745-528-2_1
pmid:17951788
fatcat:tghqvlsslzcspmi673q4m4q32m