Bioinformatics Advance Access originally published online on June 16, 2006
Bioinformatics 2006 22(16):1933-1934; doi:10.1093/bioinformatics/btl288
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Building chromosome-wide LD maps
1 Software Engineering Department, University of Granada Granada 18071, Spain
2 Department of Biostatistics, Boston University School of Public Health Boston MA 02118, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: BMapBuilder builds maps of pairwise linkage disequilibrium (LD) in either two or three dimensions. The optimized resolution allows for graphical display of LD for single nucleotide polymorphisms (SNPs) in a whole chromosome.
Availability: The program is coded in Java, which runs on all relevant operating systems, including Windows, Mac and Unix/Linux, and is available from http://bios.ugr.es/BMapBuilder
Contact: sebas{at}bu.edu
Supplementary information: Maps displaying chomosome-wide LD are available at http://bios.ugr.es/BMapBuilder/supplementary
One of the current goals to measure genome-wide linkage disequilibrium (LD) is to estimate recombination in a population and its implication for gene mapping and association studies. This is, for example, one of the aims of the International HapMap Project (IHP) (HapMap-Consortium, 2005).
Because of their simplicity, pairwise measures of LD, such as D' or r2, are the most popular measures used to capture the strength of association between pairs of SNPs (Ott, 1999). Visual inspection of genome-wide LD requires graphical methods able to display large-scale patterns of LD. The commonly used graphical displays—sliding windows (Dawson et al., 2002) and LD decay plots (Reich et al., 2001)—display average pairwise LD inside a region. The first one displays average LD over regions determined by a window of constant size, the second one displays patterns of LD decays for increasing physical or genetic distances. These plots however do not show pairwise LD but compress information into a one-dimensional plot. Therefore a tool able to display a bidimensional LD along a whole chromosome would allow researchers to examine recombination, as well as distribution of haplotype blocks (HapMap-Consortium, 2005) across a whole chromosome, without information loss.
A two-dimensional display of LD is available in the software Haploview (Barret et al., 2005), but its resolution— >50 times greater than the optimal resolution—is suboptimal. Furthermore, in our experience this software can generate LD maps only for a region with at most about 6000 SNPs. The suboptimal resolution of Haploview implies that a map created to display LD of 1000 pairs of SNPs would require 50 912 pixels and hence 50 windows in a common screen with resolution 1024 x 768 to be displayed. The optimal resolution map would require only 1000 pixels and be displayed in only one window.
To the best of our knowledge, there is no software that is able to build pairwise LD maps for a fast screening of large datasets such as those produced by the IHP. To address this limitation, we have developed BMapBuilder: a program that can generate bitmap images to represent genome-wide pairwise LD, and can build LD maps over an entire chromosome. The program takes as input a text file, tab, blank or comma-delimited, with a row for each pair of SNPs and at least three columns—positions i, j of each SNP pair and a corresponding measure of LD, and creates an LD map that is saved as a png file. BMapBuilder provides users with a wide choice of resolutions. A resolution equal to s means that, for each pairwise estimate of LD between SNPs at loci i and j, i < j, a square of s x s pixels is plotted. The color intensity of each square follows the legend in Figure 1, when the selected color for the map is red. Equivalent scales are used for the other two color options of BMapBuilder: green and blue.
To optimize the resolution without loosing information, BMapBuilder can use only one pixel to represent the estimation of LD between a pair of SNPs such as D' or r2. Higher resolution can also be chosen to magnify the map for smaller DNA regions, and the current implementation allows a maximum resolution of 20. If, besides the estimate of a measure of LD for each pair of SNPs, the user provides two columns with allele frequencies, the program builds maps with different thresholds on the Minor Allele Frequencies (MAF) to reduce the bias towards disequilibrium of the Maximum Likelihood (ML) estimator of D' (Gabriel et al., 2002; Teare et al., 2002). We emphasize that BmapBuilder allows users to use any pairwise measure of LD, including the standard D' and r2 regardless of how they are computed. The input file can be generated with programs such as Haploview or BLink (Abad-Grau and Sebastiani (2006). While Haploview uses ML to compute D' and r2, BLink uses a Bayesian approach with a proper prior that reduces the bias toward disequilibrium of D' in small samples.
Figure 1 (back image, panel b) shows a reduced image of a map generated by BMapBuilder using a genotype dataset published by the IHP. The data consist of 19 854 SNPs genotyped in chromosome 22 using samples from 30 trios of the Yoruba population. The data are available in post-makeped format at http://bios.ugr.es/BMapBuilder/supplementary/data.html. To build the map we used the traditional ML estimates of D' (available from the IHP website). The Supplementary Material contains also maps that were built using the ML estimates of r2, as well as the novel Bayesian estimator of D' that is implemented in BLink (Abad-Grau and Sebastiani, 2006). The website reports maps with resolution s = 1, so that only one pixel is used to graphically display the magnitude of LD. Larger resolutions maps with s = 4 can be examined by double-clicking on each map. If one wants to focus attention on smaller DNA regions, BMapBuilder can build maps with higher resolutions. As an example, Figure 1 (middle and front images, panel b) shows two zoomed maps of LD using overlapping subsets of SNPs. While the chomosome-wide map displays LD for all 19 854 SNPs, the middle image displays LD for 6050 SNPs and the front image displays LD for 1200 SNPs.
The figure highlights the LD landscape of the whole chromosome so that regions of higher LD can be identified by visual inspection, and patterns of LD can be compared across different populations. The maps in Figure 1, panels c and d, display LD for the first strand of chromosome 22, up to the physical positions 29 437 558. There are 8000 SNPs genotyped in the IHP for this strand in the Yoruba population (panel c) and 7569 in the CEPH population (panel d). Wider blocks are evident in the LD map of the CEPH population, in agreement with results showing a more reduced haplotype diversity in the European-descent population (Reich et al., 2001; Daly et al., 2001).
The software also includes a 3D display option for smaller sets of SNPs. As an example, 3D maps can be created to visually examine the effect of other factors influencing LD, such as allele frequencies or types of polymorphisms. In BMapBuilder, different polymorphisms can be coded with different colors chosen by the user, so that colored 3D maps would be created. By using transparency features proportional to the estimator, 3D maps can be projected into 2D maps in which colors represent a types of polymorphisms and estimator values are represented by the transparency. Compared with standard 2D LD maps, these projections include more information through different color choices. These images constitute a novel visual tool for a first step in the identification of patterns of LD that include other factors and open a new range of options.
|
| Acknowledgments |
|---|
The authors thank Marco Ramoni who inspired them to write this paper. This work was supported by NHLBI grants R21 HL080463-01 and NIDDK 1R01DK069646-01A1 and the Spanish Research Program under projects TIN2004-07672-C03-02 and TIN2005-09098-C05-03.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Martin Bishop
Received on April 25, 2006; revised on May 26, 2006
| REFERENCES |
|---|
|
|
|---|
Abad-Grau, M.M. and Sebastiani, P. (2006) Bayesian correction for SNP ascertainment bias:. In proceeding of the Third International Conference, MDAI 2006April 3–5, 2006Tarragona, SpainModeling Decisions for Artificial Intelligence In Vicenc Torra, A.V., Narukawa, Y., Domingo-Ferrer, J. (Eds.). , Berlin/Heidelberg Vol. 3885 of LNCS Springer, pp. 262–273.
Barret, J.C., et al. (2005) Haploview: analysis and visualization of LD and haplotype—maps. Bioinformatics, 21, 263–265
Daly, M.J., et al. (2001) High-resolution haplotype structure in the human genome. Nat. Genet, . 29, 229–232[CrossRef][ISI][Medline].
Dawson, E., et al. (2002) A first-generation linkage disequilibrium map of human chromosome 22. Nature, 418, 544–548[CrossRef][Medline].
Gabriel, S., et al. (2002) The structure of haplotype blocks in the human genome. Science, 296, 2225–2229
HapMap-Consortium, T.I. (2005) A haplotype map of the human genome. Nature, 437, 1299–1320[CrossRef][Medline].
Ott, J. Analysis of human genetic linkage, (1999) , Baltimore, MD John Hopkins.
Reich, D.E., et al. (2001) Linkage disequilibrium in the human genome. Nature, 411, 199–204[CrossRef][Medline].
Teare, M.D. (2002) Sampling distribution of summary linkage disequilibrium measures. Ann. Hum. Genet, . 21, 263–265.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
