Skip Navigation


Bioinformatics Advance Access originally published online on September 3, 2004
Bioinformatics 2005 21(3):385-387; doi:10.1093/bioinformatics/bti006
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/3/385    most recent
bti006v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (7)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Weckx, S.
Right arrow Articles by Del-Favero, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Weckx, S.
Right arrow Articles by Del-Favero, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics vol. 21 issue 3 © Oxford University Press 2005; all rights reserved.

SNPbox: a modular software package for large-scale primer design

Stefan Weckx , Peter De Rijk , Christine Van Broeckhoven and Jurgen Del-Favero *

Department of Molecular Genetics (VIB8), Bioinformatics Unit, Flanders Interuniversity Institute for Biotechnology, University of Antwerp Universiteitsplein 1, B-2610 Antwerpen, Belgium

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 INTRODUCTION
 STRATEGY
 IMPLEMENTATION
 REFERENCES
 

Summary: We developed a modular software package SNPbox that automates and standardizes the generation of PCR primers and is used in the strategy for constructing single nucleotide polymorphisms (SNPs) maps. In this strategy, the focus of primer design can be either on the validation of annotated public SNPs or on the SNP discovery in exon regions or extended genomic regions, both by resequencing. SNPbox relies on Primer3 for the primer design and combines this program with other publicly available software tools such as BLAST, Spidey and RepeatMasker, and newly developed algorithms. Primer conditions were chosen such that PCR amplifications are uniform for each PCR amplicon facilitating the use of high-throughput genetic platforms. SNPbox can also be used for the design of primer sets for mutation analysis, STR marker genotyping and microarray oligo design. Of the 2500 primer sets designed by SNPbox, 95% successfully amplified genomic DNA under uniform PCR conditions.

Availability: The software is available from the authors upon request.

Contact: jurgen.delfavero{at}ua.ac.be

Supplementary information: SNPbox_supplement.


    INTRODUCTION
 TOP
 Abstract
 INTRODUCTION
 STRATEGY
 IMPLEMENTATION
 REFERENCES
 
Single nucleotide polymorphisms (SNPs) are the most frequent DNA sequence variations in the human genome with an average spacing of 1–2 kb (Cooper et al., 1985; Holden, 2002; Sachidanandam et al., 2001) and are therefore the markers of choice in genetic studies aiming at identifying susceptibility genes for complex diseases (Rafalski, 2002). SNPs can be retrieved from public databases like dbSNP (Sherry et al., 1999, Sherry et al., 2001) HGVbase (Fredman, 2002) and JSNP (Hirakawa et al., 2002). However, the majority of SNPs in these databases have not yet been validated as true polymorphisms and/or their polymorphic content still needs to be determined in the population under investigation (Vieux et al., 2002; Marth et al., 2001). As a result, the map density when considering only validated public SNPs is often too small for detailed genetic studies. To efficiently construct high-density SNP maps, a combined strategy of SNP validation and discovery is required. For both steps, high-quality PCR and sequencing primers need to be generated, preferably in a fast, automated process, while carefully taking repeat sequences into account. Furthermore, the use of these primers in high-throughput laboratory environments requires that they amplify DNA under consistent and well-defined criteria.

Nowadays, a number of primer design tools are available as web applications or as stand-alone programs (Chen et al., 2003; Haas et al., 1998, 2003; Li et al., 1997; Proutski and Holmes, 1996; Raddatz et al., 2001; Rozen and Skaletsky, 2003). Although the efficiency of these programs is beyond dispute, most of these programs can only design one primer set at a time and therefore are less useful in large-scale primer design projects.

We present the program SNPbox offering a modular strategy for highly automated and standardized primer design in the construction of high-density SNP maps and mutation analysis based on resequencing of target sequences.


    STRATEGY
 TOP
 Abstract
 INTRODUCTION
 STRATEGY
 IMPLEMENTATION
 REFERENCES
 
SNPbox automates the primer design for a number of well-defined genomic sequences, further called ‘objects’. These objects are the starting points to define targets for which Primer3 will design primers within a frame of 70 bp 5' and 3' of the target. The default length of a target is 450 bp but can be changed if required. In case an object is less than the optimal target length, it is first symmetrically extended 5' and 3' till the optimal length is reached (Fig. 1A). When the object is >450 bp, multiple overlapping targets are defined. Small and neighboring objects will be joined into one or more targets. When SNPbox encounters repeat sequences while selecting suitable target sequences, they can be included in the target depending on their size, nature and distance to the object. Repeats that are included should be <300 bp and belong to the interspersed repeat class. Polymorphic repeats are excluded from a target sequence since these sequences often result in problematic sequence reads. Also, SNPbox is not allowed to design primers within repeat sequences. Several potential scenarios are illustrated in the Supplementary file.



View larger version (7K):
[in this window]
[in a new window]
 
Fig. 1 SNPbox: frames, objects and targets. (A) Symmetric extension of an object till the optimal target length is reached. ‘e’ represents the extension which is the same at both sides. ‘f’ is the primer design frame of 70 bp in which Primer3 is allowed to select primers. (B) Example of the saturation module. Repeats R1 and R2 fulfill the inclusion criteria. Repeat R3 is less than the inclusion cutoff value of 300 bp, but cannot be included because of the polymorphic character of the repeat. R4 represents a repeat belonging to the interspersed repeat class, but is excluded because the repeat size is longer than the threshold size of 300 bp. The repeats R5 and R6 are interspersed repeats, but since the inter-repeat distance is <140 bp (two times the primer selection frame), they are considered as one large repeat that exceeds the threshold size of 300 bp.

 

    IMPLEMENTATION
 TOP
 Abstract
 INTRODUCTION
 STRATEGY
 IMPLEMENTATION
 REFERENCES
 
SNPbox holds three modules for automated primer design: an SNP module, an exon module and a saturation module. All sequences can be provided as a FASTA file or as a GenBank gi number. In the latter case, the sequence is downloaded directly from the NCBI using EFetch (http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html). Primer design is always related to a genomic sequence that upon first use is masked for repeats using RepeatMasker (http://repeatmasker.org) and an adapted version of Sputnik (http://www.espressosoftware.com). This adapted version is used to detect microsatellite repeats and single base stretches and its output is arranged in three classes: repeats with less than eight repeat units, repeats with eight or more repeat units and single base stretches of at least eight identical bases.

The SNP module allows primer design for the validation of public SNPs. In order to map public SNPs on a given sequence, the BLAST program (Altschul et al., 1997) is used to align the genomic DNA to a database containing the HGVbase SNPs. SNPs are selected only if they fulfill the following criteria: a sequence similarity of ≥95% over a minimum length of 40 bp with a maximum E-value of 1E–10. The positions of the SNPs in the genomic sequence are determined and used to define the object as the region 30 bp upstream and 30 bp downstream of each SNP. Objects found within a region of 300 bp are joined into one object. In the exon module, coding sequences are identified within a genomic sequence by aligning cDNA and/or expressed sequence tag (EST) sequences using the Spidey program (Wheelan et al., 2001). In the object definition, exons are symmetrically extended by 50 bp on both sides to include the branch point and the splicing sites. In case an excluded repeat sequence is near an exon, the exon can be extended by 25 bp or not at all. Objects within a region of 250 bp are joined into one object.

In the saturation module, the objects are the parts of the genomic sequence between the excluded repeats and can consist of a regulatory region, introns of a specified gene, a complete gene or an extended chromosomal region. Targets are defined with a default overlap of 35 bp. Taking the frame of 70 bp into account in which primers are selected, the real overlap between the amplicons will be a maximum of 175 bp, including the primers. Since the length of an object is not necessarily a multifold of the optimal target length, SNPbox aims to design targets approaching the optimal target length as close as possible (Fig. 1B). The output of SNPbox consists of a HTML page with a graphical representation of the annotated genomic sequence and hyperlinks to a variety of files, allowing easy inspection of data. A tab-delimited file contains the primer sequences, genomic position and PCR amplification conditions. Also GC-content is calculated per 50 bp of amplicon that translates into a value for average GC-content, and a minimal and a maximal GC-content.

SNPbox was successfully used in our laboratory to design primers for about 2500 targets, all on human DNA. In >95%, the PCR amplifications resulted in one specific amplicon of expected size using the built-in PCR conditions without the need for optimization. SNPbox was also used to design primer sets for all exons of the human genome, based on Ensembl data. For the 208.202 exons, 227.187 objects were designated and for 98.62% of these, a target could be defined. SNPbox designed primer pairs for 98.53% of the targets, resulting in a global success rate of 97.17% (Weckx et al., 2004).

In conclusion, given that the standardization of the ‘in silico’ primer design for defined targets produced high success rates for both primer selection and subsequent PCR amplification, the software package SNPbox is a valuable asset for laboratories involved in resequencing projects particularly when aimed at generating long distance high-density SNP maps.


    Acknowledgments
 
We thank Dirk Van den Bossche for technical support, and Dominique Audenaert, Godelieve Claes and Rosa Rademakers for their valuable feedback during the development, fine-tuning and use of SNPbox. This work was in part funded by the Special Research Fund of the University of Antwerp, the Fund for Scientific Research Flanders and the Interuniversity Attraction Poles program P5/19 of the Belgian Federal Science Policy Office.

Received on February 10, 2004; revised on July 5, 2004; accepted on August 26, 2004

    REFERENCES
 TOP
 Abstract
 INTRODUCTION
 STRATEGY
 IMPLEMENTATION
 REFERENCES
 

    Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402[Abstract/Free Full Text].

    Chen, S.H., Lin, C.Y., Cho, C.S., Lo, C.Z., Hsiung, C.A. (2003) Primer Design Assistant (PDA): a web-based primer design tool. Nucleic Acids Res., 31, 3751–3754[Abstract/Free Full Text].

    Cooper, D.N., Smith, B.A., Cooke, H.J., Niemann, S., Schmidtke, J. (1985) An estimate of unique DNA sequence heterozygosity in the human genome. Hum.Genet., 69, 201–205[CrossRef][Web of Science][Medline].

    Fredman, D., Siegfried, M., Yuan, Y.P., Bork, P., Lehvaslaiho, H., Brookes, A.J. (2002) HGVbase: a human sequence variation database emphasizing data quality and a broad spectrum of data sources. Nucleic Acids Res., 30, 387–391[Abstract/Free Full Text].

    Haas, S., Vingron, M., Poustka, A., Wiemann, S. (1998) Primer design for large scale sequencing. Nucleic Acids Res., 26, 3006–3012[Abstract/Free Full Text].

    Haas, S.A., Hild, M., Wright, A.P., Hain, T., Talibi, D., Vingron, M. (2003) Genome-scale design of PCR primers and long oligomers for DNA microarrays. Nucleic Acids Res., 31, 5576–5581[Abstract/Free Full Text].

    Hirakawa, M., Tanaka, T., Hashimoto, Y., Kuroda, M., Takagi, T., Nakamura, Y. (2002) JSNP: a database of common gene variations in the Japanese population. Nucleic Acids Res., 30, 158–162[Abstract/Free Full Text].

    Biotechniques. Holden, A.L. (2002) The SNP consortium: summary of a private consortium effort to develop an applied map of the human genome. 26.

    Li, P., Kupfer, K.C., Davies, C.J., Burbee, D., Evans, G.A., Garner, H.R. (1997) PRIMO: a primer design program that applies base quality statistics for automated large-scale DNA sequencing. Genomics, 40, 476–485[CrossRef][Web of Science][Medline].

    Marth, G., Yeh, R., Minton, M., Donaldson, R., Li, Q., Duan, S., Davenport, R., Miller, R.D., Kwok, P.Y. (2001) Single-nucleotide polymorphisms in the public domain: how useful are they?. Nat. Genet., 27, 371–372[CrossRef][Web of Science][Medline].

    Proutski, V. and Holmes, E.C. (1996) Primer Master: a new program for the design and analysis of PCR primers. Comput. Appl. Biosci., 12, 253–255[Free Full Text].

    Raddatz, G., Dehio, M., Meyer, T.F., Dehio, C. (2001) PrimeArray: genome-scale primer design for DNA-microarray construction. Bioinformatics, 17, 98–99[Abstract/Free Full Text].

    Rafalski, A. (2002) Applications of single nucleotide polymorphisms in crop genetics. Curr. Opin. Plant Biol., 5, 94–100[CrossRef][Web of Science][Medline].

    Rozen, S. and Skaletsky, H.J. (2003) Primer3 on the WWW for general users and for biologist programmers. Methods Mol. Biol., 132, 365–286.

    Sachidanandam, R., Weissman, D., Schmidt, S.C., Kakol, J.M., Stein, L.D., Marth, G., Sherry, S., Mullikin, J.C., Mortimore, B.J., Willey, D.L., et al. (2001) A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature, 409, 928–933[CrossRef][Medline].

    Sherry, S.T., Ward, M., Sirotkin, K. (1999) dbSNP—database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res., 9, 677–679[Free Full Text].

    Sherry, S.T., Ward, M.H., Kholodov, M., Baker, J., Phan, L., Smigielski, E.M., Sirotkin, K. (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res., 29, 308–311[Abstract/Free Full Text].

    Biotechniques. Vieux, E.F., Kwok, P.Y., Miller, R.D. (2002) Primer design for PCR and sequencing in high-throughput analysis of SNPs. 32.

    Weckx, S., De Rijk, P., Van Broeckhoven, C., Del Favero, J. (2004) SNPbox: web-based high-throughput primer design from gene to genome. Nucleic Acids Res., 32, W170–W172[Abstract/Free Full Text].

    Wheelan, S.J., Church, D.M., Ostell, J.M. (2001) Spidey: a tool for mRNA-to-genomic alignments. Genome Res., 11, 1952–1957[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
F. Zhang and Z. Zhao
SNPNB: analyzing neighboring-nucleotide biases on single nucleotide polymorphisms (SNPs)
Bioinformatics, May 15, 2005; 21(10): 2517 - 2519.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
S. Weckx, J. Del-Favero, R. Rademakers, L. Claes, M. Cruts, P. De Jonghe, C. Van Broeckhoven, and P. De Rijk
novoSNP, a novel computational tool for sequence variation discovery
Genome Res., March 1, 2005; 15(3): 436 - 442.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/3/385    most recent
bti006v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (7)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Weckx, S.
Right arrow Articles by Del-Favero, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Weckx, S.
Right arrow Articles by Del-Favero, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?