Skip Navigation


Bioinformatics Advance Access originally published online on January 18, 2007
Bioinformatics 2007 23(6):774-776; doi:10.1093/bioinformatics/btl657
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/6/774    most recent
btl657v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Luna, A.
Right arrow Articles by Nicodemus, K. K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Luna, A.
Right arrow Articles by Nicodemus, K. K.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

snp.plotter: an R-based SNP/haplotype association and linkage disequilibrium plotting package

Augustin Luna 1 and Kristin K. Nicodemus 1,2,*

1 GCAP/CBDB, NIMH/NIH, 10 Center Drive; Room 4S-235, Bethesda, MD 20814, USA and 2Epidemiology, Johns Hopkins SPH, Baltimore, MD, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 SOFTWARE OVERVIEW
 3 DATA INPUT
 4 snp.plotter USAGE
 5 EXAMPLE
 ACKNOWLEDGEMENTS
 REFERENCES
 

Summary: snp.plotter is a newly developed R package which produces high-quality plots of results from genetic association studies. The main features of the package include options to display a linkage disequilibrium (LD) plot below the P-value plot using either the r2 or D' LD metric, to set the X-axis to equal spacing or to use the physical map of markers, and to specify plot labels, colors, symbols and LD heatmap color scheme. snp.plotter can plot single SNP and/or haplotype data and simultaneously plot multiple sets of results. R is a free software environment for statistical computing and graphics available for most platforms. The proposed package provides a simple way to convey both association and LD information in a single appealing graphic for genetic association studies.

Availability: Downloadable R package and example datasets are available at http://cbdb.nimh.nih.gov/~kristin/snp.plotter.html and http://www.r-project.org

Contact: nicodemusk{at}mail.nih.gov


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 SOFTWARE OVERVIEW
 3 DATA INPUT
 4 snp.plotter USAGE
 5 EXAMPLE
 ACKNOWLEDGEMENTS
 REFERENCES
 
Genetic association studies have been an important strategy for identifying susceptibility genes for a range of diseases including Alzheimer disease, deep vein thrombosis, inflammatory bowel disease, hypertriglyceridemia, diabetes and schizophrenia (Morton 2005). Single nucleotide polymorphisms (SNPs) are often used to test for a statistical association between a disease phenotype and single markers or multiple markers via haplotype-based analyses. SNPs may be tightly linked and exhibit correlation or linkage disequilibrium (LD). Knowledge of LD aids in the selection of SNPs and haplotypes to be examined for association with a disease (Abecasis et al., 2005) and in localizing a putative causal variant. Given the importance of LD to genetic association studies, researchers often plot the results of association studies in relation to LD present in the chromosomal region or gene examined. However, researchers often create the LD plot and association result plot separately using different software, which can lead to difficulty in aligning the two plots, making the resulting graphic unclear. We propose snp.plotter, which produces Portable Document Format (PDF) or Encapsulated Postscript (EPS) images of genetic association results using single SNP and/or haplotype data with a corresponding LD heatmap in one correctly aligned graphic.


    2 SOFTWARE OVERVIEW
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 SOFTWARE OVERVIEW
 3 DATA INPUT
 4 snp.plotter USAGE
 5 EXAMPLE
 ACKNOWLEDGEMENTS
 REFERENCES
 
snp.plotter is a package for R, the freely available statistical computing and graphics environment, which is available for several platforms including Windows, MacOS and UNIX/Linux (R Development Core Team, 2006). Nearly all aspects of the images produced by snp.plotter are customizable including labels, symbols, colors and color schemes, LD metric, graph P-value threshold, Y-axis scale, and lines corresponding to user specified P-value thresholds. snp.plotter has the ability to visualize multiple SNP and haplotype association sets of results. Haplotype results can be plotted using either global and/or individual haplotype P-values. P-value results may be plotted using physical spacing or can be evenly spaced. Even spacing of P-values aids in elucidating results in areas with dense SNP maps. Figures are produced in two print sizes (3.5 and 7 inches) corresponding to one and two columns, respectively, on a printed page in resolution-independent formats (PDF and EPS) for ease of use in manuscript preparation. snp.plotter figures can be easily imported into LaTeX documents, and due to the resolution-independent formats used, figures can be converted into raster image formats such as JPG, PNG and BMP without a loss in quality.


    3 DATA INPUT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 SOFTWARE OVERVIEW
 3 DATA INPUT
 4 snp.plotter USAGE
 5 EXAMPLE
 ACKNOWLEDGEMENTS
 REFERENCES
 
snp.plotter uses four different types of input files: configuration files, single SNP and haplotype file for each result set, and genotype data; all files used are plain-text and tab-delimited. The configuration file is the preferred method of running snp.plotter because it allows users to save preferred settings and avoids the difficulty of writing extended R commands.

SNP.FILE=snp20_ss.txt,snp20_ss2.txt

HAP.FILE=snp20_haplo.txt,snp20_haplo2.txt

GENOTYPE.FILE=snp20_geno.txt

DISP.LDMAP=TRUE

COLOR.LIST=blue,red

SYMBOLS=circle-fill,square

LD.TYPE=rsquare

IMAGE.TYPE=pdf

The single SNP result set, SNP.FILE, includes four necessary columns: ASSOC, SNP.NAME, LOC and SS.PVAL corresponding to positive or negative association (indicating susceptibility or protective alleles), a SNP label, the location and a P-value for each SNP.


ASSOC NAME LOC SS.PVAL
+ rs1 126272509 0.065
rs2 126274467 0.029
+ rs3 126275017 0.046
rs4 126275750 0.005

Haplotypes are specified using three necessary columns: ASSOC, GBL.PVAL and IND.PVAL, corresponding to positive or negative association, a global P-value, and an individual P-value for each haplotype followed by a set of columns of SNPs containing the corresponding haplotypes. Haplotypes are presented with the major allele given as 1 and the minor allele as 2; haplotype variants for a set of SNPs should be grouped together in the file. SNP labels in HAP.FILE must be the same as in SNP.FILE, and only SNPs with corresponding haplotypes need to be included.


ASSOC G.PVAL I.PVAL rs1 rs2 rs3 rs4
0.015 0.004 1 1 1
+ 0.015 0.062 1 2 2
+ 0.075 0.079 1 1 1
+ 0.075 0.039 2 2 2

Genotype data are formatted in modified LINKAGE format pedigree files; this marker information is used in the creation of LD plots and may be based on the controls from a case-control study or the founders in a family-based study. An optional file type can be used to specify color schemes for LD plots; PALETTE.FILE colors are hexadecimal HTML color codes with one color per line. The first and last colors correspond to the lowest and highest value of the chosen LD metric, respectively.


    4 snp.plotter USAGE
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 SOFTWARE OVERVIEW
 3 DATA INPUT
 4 snp.plotter USAGE
 5 EXAMPLE
 ACKNOWLEDGEMENTS
 REFERENCES
 
The package makes use of the grid graphics package for creation and placement of individual graphic elements, and the genetics package is used for the calculation of linkage disequilibrium (Warnes and Leisch, 2005). Modified code from the LDheatmap package is used to create a LD heatmap (Shin et al., 2006). Once snp.plotter and its dependencies are installed, snp.plotter can be loaded into R using this command:

library(snp.plotter)

snp.plotter is then run using the following command; this command produces the desired figure in the current working directory:

snp.plotter(config.file="config.txt")

In addition, there is an optional web interface for snp.plotter utilizing the Rpad R package for download. The web interface is best suited to intranet environments since users have complete access to any command in R and any system command (Short et al., 2005). snp.plotter must be installed on the machine running Rpad. Instructions for server deployment are presented on the Rpad website. The interface includes the majority of features, but is limited to one result set. The snp.plotter interface can be extended with basic knowledge of HTML and R to manipulate options presented or to perform additional analysis the researcher may require.


    5 EXAMPLE
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 SOFTWARE OVERVIEW
 3 DATA INPUT
 4 snp.plotter USAGE
 5 EXAMPLE
 ACKNOWLEDGEMENTS
 REFERENCES
 
The HapMap Project catalogs SNPs from populations with African, Asian and European ancestry (The International HapMap Consortium, 2005). Sample data for 20 SNPs was obtained from HapMap and two case-control populations with 500 cases and 500 controls were simulated using the Simulation of Haplotype Heterogeneity, Interaction and Population Stratification (SH2IPS) R package (Nicodemus and Luna, 2006). Logistic regression was used to determine association of each SNP with the disease phenotype. Haplotypes were analyzed using haplo.stats to evaluate disease association of haplotypes using a 3-SNP sliding window (Schaid et al., 2002). The results are presented in Figure 1 using snp.plotter. Single SNP and global haplotype P-values are shown for the two populations; the adjoining LD plot uses the r2 metric.


Figure 1
View larger version (33K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Association results and LD presented using snp.plotter of two simulated populations using data obtained from HapMap; haplotypes are indicated by sample symbols connected by a solid line, and single SNPs are represented by single symbols. Dotted lines represent P-value thresholds.

 

    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 SOFTWARE OVERVIEW
 3 DATA INPUT
 4 snp.plotter USAGE
 5 EXAMPLE
 ACKNOWLEDGEMENTS
 REFERENCES
 
We are grateful to Dr Daniel Weinberger, Dr Steven Huffaker, and Anushka Aqil for comments and feedback and to Dr Richard Coppola for help with Rpad.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Martin Bishop

Received on November 21, 2006; revised on November 21, 2006; accepted on December 21, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 SOFTWARE OVERVIEW
 3 DATA INPUT
 4 snp.plotter USAGE
 5 EXAMPLE
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Abecasis GR, et al. Linkage disequilibrium: ancient history drives the new genetics. In: Hum. Hered (2005) 59:118–124.[CrossRef][Web of Science][Medline]

    The International HapMap Consortium. A haplotype map of the human genome. In: Nature (2005) 437:1299–1320.[CrossRef][Medline]

    Morton NE. Linkage disequilibrium maps and association mapping. In: J. Clin. Invest. (2005) 115:1425–1430.[CrossRef][Web of Science][Medline]

    Nicodemus KK, Luna A. Simulation of haplotype heterogeneity, interaction, and population stratification. R package version 1.0. (2006).

    R Development Core Team. R: a language and environment for statistical computing. (2006).

    Schaid DJ, et al. Score tests for association between traits and haplotypes when linkage phase is ambiguous. In: Am. J. Hum. Genet. (2002) 70:425–434.[CrossRef][Web of Science][Medline]

    Shin J, et al. LDheatmap: Graphical display of pairwise linkage disequilibria between SNPs. R package version 0.2. (2006).

    Short T, Grosjean P. Rpad: workbook-style, web-based interface to R. R package version 1.1.1. (2006).

    Warnes G, Leisch F. Genetics: population genetics. R package version 1.2.0. (2005).


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/6/774    most recent
btl657v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Luna, A.
Right arrow Articles by Nicodemus, K. K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Luna, A.
Right arrow Articles by Nicodemus, K. K.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?