Bioinformatics Advance Access originally published online on October 23, 2006
Bioinformatics 2006 22(24):3061-3066; doi:10.1093/bioinformatics/btl540
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SequenceLDhot: detecting recombination hotspots
Department of Mathematics and Statistics, Lancaster University Lancaster LA1 4YF, UK
| ABSTRACT |
|---|
|
|
|---|
Motivation: There is much local variation in recombination rates across the human genomewith the majority of recombination occuring in recombination hotspotsshort regions of around
2 kb in length that have much higher recombination rates than neighbouring regions. Knowledge of this local variation is important, e.g. in the design and analysis of association studies for disease genes. Population genetic data, such as that generated by the HapMap project, can be used to infer the location of these hotspots. We present a new, efficient and powerful method for detecting recombination hotspots from population data.
Results: We compare our method with four current methods for detecting hotspots. It is orders of magnitude quicker, and has greater power, than two related approaches. It appears to be more powerful than HotspotFisher, though less accurate at inferring the precise positions of the hotspot. It was also more powerful than LDhot in some situations: particularly for weaker hotspots (1040 times the background rate) when SNP density is lower (< 1/kb).
Availability: Program, data sets, and full details of results are available at: http://www.maths.lancs.ac.uk/~fearnhea/Hotspot.
Contact: p.fearnhead{at}lancs.ac.uk
| 1 INTRODUCTION |
|---|
|
|
|---|
There is currently much interest in understanding the fine-scale variation in the recombination rate across the human genome, and detecting the presence of recombination hotspots. Primarily this is because knowledge of this variation will inform the design and analysis of association studies for complex diseases (Zondervan and Cardon, 2004; Hirschhorn and Daly, 2005), and also because of interest in the evolutionary forces affecting recombination hotspots (Jeffreys and Neumann, 2002; Pineda-Krch and Redfield, 2005; Myers et al., 2005).
In recent years, recombination hotspots have been found by direct observation of crossovers in sperm (Jeffreys et al., 2001, 2005; Kauppi et al., 2005). While these give accurate measurements of current local recombination rates, and position of recombination hotspots, analysis of sperm is costly and time-consuming, and has so far been restricted to a small number of genetic regions.
To learn about genome-wide variation in recombination rates and hotpots, analysis of population genetic diversity data has proven more successful (Crawford et al., 2004; McVean et al., 2004; Winckler et al., 2005; Ptak et al., 2005; Fearnhead and Smith, 2005; Myers et al., 2005), particularly due to the large amount of single nucleotide polymorphism (SNP) genotype data describing genetic variation in different human populations (The International HapMap Consortium, 2005).
Here we describe a new method for detecting recombination hotspots from population genetic data. This method uses the approximate marginal likelihood method of Fearnhead and Donnelly (2002) and is closely related to the methods described in Fearnhead et al. (2004) and Fearnhead and Smith (2005). The approach scans through a chromosomal region of interest, and considers fitting a recombination hotspot at a set of possible locations (from a pre-specified grid). For each possible hotspot location, a likelihood ratio statistic is calculated for the test of whether a hotspot is present. The set of likelihood ratio statistic values can then be used to visually show the evidence for a recombination hotspot at different positions along the chromosome and to flag likely locations for hotspots (see Section 2.3 for more details).
Whilst similar, this approach differs from those of Fearnhead et al. (2004) and Fearnhead and Smith (2005). For both these approachs the chromosomal region was split up into a series of sub-regions (defined to each contain a specified number of consecutive SNPs), and then the likelihood curve for the recombination rate was calculated for each sub-region under the assumption of a constant recombination rate within that sub-region. [The methods described in Fearnhead et al. (2004) and Fearnhead and Smith (2005) differ in how they combine the information from these separate sub-regions into evidence for hotspots at different locations.]
The advantage of the new approach described here is firstly computational, with CPU times being reduced by over an order of magnitude (see Section 4.1). This is because the Monte Carlo effort for calculating the likelihood ratio statistic for a hotspot at each possible position can be curtailed when it becomes obvious that either there is little or there is overwhelming evidence for a hotspot (see Section 2.4). Second, the method allows for a more accurate estimate of the background recombination rate by using the PACL method of Li and Stephens (2003), [see Smith and Fearnhead (2005) and discussion in Fearnhead and Smith (2005)] and allows for this background rate to vary across large chromosomal regions. Finally the new approach can more accurately be applied to regions of data where the SNP density is low. For such regions, the earlier approaches would estimate a constant recombination rate over potentially large sub-regions, and any signal from a hotspot within that sub-region would be weakened due to the averaging of a small hotspot with larger non-hotspot (background) regions. This is avoided within the new method by always fitting an appropriate hotspot model, consisting of a small hotspot region flanked by a background region.
We have compared our new method for detecting hotspots with the earlier methods of Fearnhead et al. (2004) and Fearnhead and Smith (2005), as well as the HotspotFisher program of Li et al. (2006) and the LDhot program of McVean et al. (2004) and Myers et al. (2005).
| 2 METHOD |
|---|
|
|
|---|
Our method takes as input haplotype data from n chromosomes each typed at L SNPs in a specific region. Our method also assumes an estimate of the background recombination rate across the whole region, though this background rate can be allowed to vary across the region. Details of one approach, the one used for the results given in this paper, for both phasing genotype data and for estimating the background recombination rate is given in Section 2.4.
2.1 Overview of method
Our approach is to consider a grid of possible hotspot positions, and to evaluate the evidence for the presence of the hotspot at each of these positions. To define the grid, we specify a hotspot width, w, and a spacing, l (see Section 2.2). Assume the L SNPs are at ordered positions x1, ... , xL, and without loss of generality relabel positions so that x1 = 1. Let N be the largest integer such that N x l + w < xL. Then our algorithm consists of the following loop:
For i = 0, ... , N:
- Consider a hotspot from position i x l to i x l + w. Denote
to be the recombination rate within the hotspot and
to be the background recombination rate close to this hotspot.
- Choose S SNPs close to this hotspot, and summarize the data by the sequences defined solely by the alleles at these S SNPs.
- Use the approximate marginal likelihood method of Fearhead and Donnelly (2002) to estimate LRi the likelihood-ratio statistic for
against
, for the data chosen in (2).
The output of the method is a set of likelihood ratio statistics
for the presence of a hotspot of width w starting at positions 0, l, ... , N x l. These likelihood ratio statistics are estimated based on an importance sampling approach (Fearnhead and Donnelly, 2001, 2002). They are estimated under a standard neutral coalescent model, though the likelihood for such a model has been shown to be robust for inference of relative recombination rates (Smith and Fearnhead, 2005).
Whilst this is a non-regular inference problem, simulation studies (Fearnhead and Donnelly, 2002; Fearnhead et al., 2004) suggest that the null distribution of the likelihood ratio statistic is approximately an equal mixture of a point mass at 0, and a chi-squared distribution with one degree of freedom. A plot of the likelihood ratio statistic against hotspot position (see Fig. 1) can give a picture of the evidence for the presence of a hotspot against position across the chromosomal region. Details of how we use this output to give predictions for the position of hotspots is given in Section 2.3.
|
2.2 Details of method
The method requires specifying a number of parameters. We describe here the default choices of our method, which are suitable for analysing human population genetic data and are the values we used for the results shown in this paper. First we chose the hotspot width, w = 2000, and spacing, l = 1000. The width is based on evidence that hotspots are of the order of 12 kb (Jeffreys et al., 2001), and the spacing is based on a trade-off between computational cost and accuracy. Both were chosen based on analysis of the SeattleSNP data (see Section 3): we found that the setting w = 2000 gave slightly higher power than w = 1500 or w = 1000; while using l = 1000 was more powerful than l = 2000, but there was no increase in power when reducing l further. In calculating the likelihood ratio statistic in step (3) we allow for a range for the hotspot recombination rate, and these were chosen to be between 10 and 100 times the background rate.
The choice of the number of SNPs, S, in step (2) of the algorithm is again a trade-off between the information in the data summary, and the computational cost and Monte Carlo error in the estimate of the likelihood ratio statistic. We chose S = 7, which for the SeattleSNP data of Section 3 appears to give a noticeable improvement over S = 6, while, increasing S further did not appear to substantially improve performance.
The algorithm for choosing which SNPs to keep in step (2) is based upon the intuition that the most informative set of SNPs will have larger minor allele frequency and be equally spaced in or close to the putative hotspot.
2.3 Summarising output
Given the likelihood ratio statistic for one putative hotspot position LRi, the simplest approach is to predict the presence of a hotspot if LRi > c for some cutoff c. The approximate null distribution of the likelihood ratio statistic can be used to specify a suitable value for c. Values of c = 10 and 12 would produce a false-positive approximately once in every 1200 and 3700 independent tests, respectively (and are what we choose for the results in this paper). Given a hotspot spacing of 1 kb, and making the conservative approximation that tests for hotspot positions are independent, then this would suggest a false-positive rate of <1/1.2 Mb and 1/3.7 Mb, respectively.
However this simple approach is likely to predict a number of hotspot positions for each true hotspot, as each real hotspot will overlap a number of the putative hotspot positions within our grid. Furthermore even a hotspot near, but not overlapping, the putative hotspot may produce some evidence of a hotspot, as its presence will reduce the amount of linkage disequilibrium (LD) within the sub-region covered by the S SNPs chosen in step (2) of the method. Thus the patterns generated by the S SNPs may fit a hotspot model better than a no-hotspot model, even if the hotspot is in the wrong position.
As a result we summarize the output of our method by a set of disjoint extended hotspot regions, which are defined to be contiguous regions with evidence for a hotspot. Each extended hotspot region contains at least one putative hotspot with LR > c. Extending out from this putative hotspot we then include all hotspot positions with LR > 4 (corresponding to evidence of a hotspot at an
97.5% significance level), and all hotspot positions that overlap with a more distant hotspot position that have LR > c. The idea is to describe in an automated way a contiguous region that contains all hotspot positions whose LR value may have been affected by the presence of the putative hotspot, and thus to avoid inferring clusters of nearby hotspots all except one of which are likely to be false positives. (More accurate methods may be possible, but this ad hoc approach appears to work well in practice.)
Within each extended hotspot region we then infer a single hotspot, whose position is chosen to be the hotspot position with the largest likelihood ratio value within that extended hotspot region.
See Figure 1 for an example of output of our method, and the definition of the extended hotspot regions and the inferred hotspots.
2.4 Implementation
To obtain haplotype data from genotype data we used PHASEv2.1 (Stephens et al., 2001; Stephens and Donnelly, 2003). To obtain estimates of the background recombination rate we used the inferred recombination rates within PHASEv2.1 obtained under the MR flag (Li and Stephens, 2003). These estimates are based on a model which allows a different recombination rate between each pair of consecutive SNPs. To estimate the background recombination rate at a position x we took the median of all the recombination rates esimated within a 100 kb window centered on x. (The program sequenceLDhot is able to directly input the appropriate output from PHASEv2.1; though alternative methods for estimating the background recombination rate can be used, and input directly into sequenceLDhot.)
The final detail of implementing our method, is the number of Monte Carlo simulations used within step (3) of the method. We allowed this to vary across different putative hotspots, depending on the evidence for a hotspot. We specified a minimum, N0 and maximum K x N0 number of iterations. Every k x N0 iterations, for k = 1, 2, ... , K 1 we checked the current estimate of the likelihood ratio statistic LRi. If LRi < 4 or LRi > 20 then we stopped the Monte Carlo simulations for that putative hotspot. The idea is to stop the simulations if there is either little or overwhelming evidence for a hotspot. The choices of cut-off value where chosen to be a factor of
2 different from the cutoffs of LRi > 10 and LRi > 12 considered for detecting hotspots.
This idea of curtailing the Monte Carlo simulation in step (3) substantially reduces the computation cost of the method, and was found to have no noticeable effect on the performance of the method. For the results shown here we chose N0 = 300 and K = 50. The Monte Carlo method in step (3) uses bridge sampling (Meng and Wong, 1996). For each set of 300 Monte Carlo simulations we used 100 simulation from each of three driving values (see Fearnhead and Donnelly, 2001)one being the background recombination rate and two being rates consistent with a hotspot.
| 3 DATA AND OTHER METHODS |
|---|
|
|
|---|
We compare our method to four recent methods, these are
- the likelihood ratio method of Fearnhead et al. (2004);
- the penalised likelihood of Fearnhead and Smith (2005), code available from http://www.maths.lancs.ac.uk/~fearnhea;
- the HotspotFisher program of Li et al. (2006) which is available from http://bioinfo.au.tsinghua.edu.cn/member/~lijun; and
- the LDhot program of Myers et al. (2005).
Our comparisons were based on three sets of simulated data taken from a number of recent papers. The names we use for each set of simulations, based upon the real data the simulations try to mimic, together with brief descriptions are as follows:
SeattleSNP. These data sets attempt to mimic data from the SeattleSNP, and consist of 200 independent data sets, each for a 25 kb region sampled from a European and African American population; the sample sizes are 23 and 24 individuals, respectively, and 100 data sets contain no hotspots, and the other 100 each contain a single hotspot. The background recombination rates varied between 0.1 and 5 cM per megabase (cM/Mb; mean 1.2 cM/Mb), and the hotspot recombination rates varied between 50 and 75 cM/Mb. These are taken from Fearnhead and Smith (2005).
HapMap Encode. These data sets consist of one hundred 200 kb regions, sampled in three populations (European, Asian and African). Each region contains a random number of hotspots (mean close to 4), and 90% of recombination events occur within the hotspots. Sample sizes are 90 individuals for each population. These are the HQ = 90% data sets from Li et al. (2006).
HumanChimp. These data sets are taken from Winkler et al. (2005) and were generated empirically from real data. They consist of data from three different Encode regions (4q26, 7q21 and 7q31), in European, African and chimp populations. In humans, the average background rate in the three regions were 0.50.6 cM/Mb, 0.4 cM/Mb and 0.4 cM/Mb, respectively. Each data set consists of a 100 kb region with a 2 kb hotspot at position 4951 kb. A range of hotspot sizes were considered, and we give results for hotspots of the following intensities (all cM/Mb): 0.8, 4, 8, 16, 40, 80 and 160. Sample sizes were 60, 60 and 38, respectively. SNP density varied considerably: 1.8/kb (European), 2.5/kb (African) and 0.6/kb (chimp).
For further details of these data sets, see the original papers. The SeattleSNP and HapMap Encode simulations both used the cosi program of Schaffner et al. (2005).
The HapMap Encode simulated data sets are available from http://bioinfo.au.tsinghua.edu.cn/member/~lijun and the other data sets are available from http://www.maths.lancs.ac.uk/~fearnhea/Hotspot.
| 4 RESULTS |
|---|
|
|
|---|
We now give the results of our new method, sequenceLDhot, on the three sets of simulated data sets as described in Section 3. All simulated data sets provide genotype information, and we first inferred haplotypes and estimated background recombination rates using PHASEv2.1. For the power results we show for sequenceLDhot, we treat a hotspot as found if it overlaps with an inferred hotspot. We count as false-positives any hotspots that do not overlap with a true hotspot. (Thus we ignore the extended hotspot regions when calculating power and false-positive rates, which makes comparisons with existing methods fair.)
4.1 SeattleSNPs
Our first comparison is with the likelihood ratio method of Fearnhead et al. (2004) and the penalised likelihood method of Fearnhead and Smith (2005). First, the computation involved using sequenceLDhot, which is substantially smaller than that of either the LR or PL methods. All methods require the use of PHASE to infer haplotypes. For an example data set sequenceLDhot took 10 min to analyse the resulting haplotype data; whereas both the likelihood ratio and penalized likelihood methods took of the order of 4 h.
Table 1 gives the results of the three methods. In testing for hotspots we used a cutoff value of c = 10, as this gave comparable false-positive rates to the other two methods. The power of sequenceLDhot is greater than that of either of the two alternative approaches, with power averaged across the two populations being 69% for sequenceLDhot, 65% for the penalised likelihood method, and 50% for the LR method.
|
We also used HotspotFisher to analyse these data, but this program performed poorly. Across both populations HotspotFisher had 14 false-positives and a power of 30%. The reason for this appears to be the small size of each data set, together with the fact that we used the default settings of the program that had been chosen based on analysing data with lower SNP density for 200 kb regions. In particular HotspotFisher had difficulty with estimating the background recombination rates. The false-postives tended to occur in data sets where the background recombination rates were substantially underestimated (and we had three data sets which included multiple false-positives). Similarly the lack of power again appears partly due to errors in estimating the background recombination rate; but this time due to overestimates of the rates for some datasets.
4.2 HapMap Encode
We next analysed the HapMap Encode simulated data, to provide a fairer comparison with HotspotFisher. While each data set consists of 200 kb sequenced in 90 individuals, to speed up the implementation of our method (in particular to reduce the CPU cost of PHASE) we subsampled just 45 individuals, and analysed separately the first and last 110 kb of sequence. We chose to split the sequence in this way so that for putative hotspots at positions close to 100 kb we would still have sufficient informative SNPs surrounding the hotspot that we would not suffer any loss of power. Our method then took on the order of 12 h to analyse a single 200 kb data set (which includes running both PHASE and sequenceLDhot), as compared to a few minutes for HotspotFisher.
Results are given in Table 2. When testing for hotspots we used a cut-off of c = 12 so that our method had similar false-positive rates to HotspotFisher. We have a noticeable improvement in power over HotspotFisher: when averaged over three population, we have a power of 79% as compared with 67%. However, for inferred hotspots, HotspotFisher is more accurate at detecting the hotspot position.
|
One noticeable problem with our method is in terms of detecting individual hotspots when they cluster together. In these cases our method will tend to infer a large extended hotspot region, and thus a single hotspot. The power of our method increases by 8% if we include as detected all hotspots that lie fully within any extended hotspot region. (The average hotspot region is
56 kb in length.)
4.3 HumanChimp
Finally we analysed the HumanChimp data. We ran sequenceLDhot on all data sets, and HotspotFisher on the human datasets (we had technical difficulties with running HotspotFisher on the chimp data). We also obtained results for LDhot from Table S3 of Winckler et al. (2005). These data sets only enable us to compare power (as opposed to false-positive rates), as they all contain a known hotspot, but the recombination landscape in the remaining part of the region is unknown.
The results for sequenceLDhot and HotspotFisher give values for the power of the method for seven different recombination rates of the hotspot ranging from 0.8 to 160 cM/Mb. (As compared with a background rate in humans of
0.4 cM/Mb.) We pooled results for all three regions together, and also pooled results from both human populations.
For comparison between sequenceLDhot and HotspotFisher we chose a cutoff value of c = 12 since for this value the two methods had similar false-positive rates for the HapMap Encode data. The resulting estimated power curves for both methods are shown by the black and blue curves in Figure 2. Again we see that sequenceLDhot is more powerful at inferring hotspots than HotspotFisher.
|
Table S3 of Winckler et al. (2005) gives power of LDhot (at a 5% significance level) for different hotspot intensities. For human hotspots, they only give power values for hotspots with intensity close to 8 cM/Mb; for chimp hotspots they give power values for a range of hotspot intensities. We have plotted these values on Figure 2 (green curve). For comparison we also give power curves for sequenceLDhot with cutoff c = 5, which gives a similar nominal significance level to LDhot. The results suggest that LDhot is more powerful at estimating hotspots of strength close to 8 cM/Mb (20 times the background rate) in the human data. For the chimp data it appears that sequenceLDhot is more powerful for weaker hotspots (up to around 16 cM/Mb); and LDhot is more powerful for inferring hotspots that are stronger than 16 cM/Mb.
| 5 DISCUSSION |
|---|
|
|
|---|
Our new method has a number of advantages over existing methods. It is substantially quicker, and appears to be more powerful than the likelihood ratio method of Fearnhead et al. (2004) and the penalised likelihood method of Fearnhead and Smith (2005). The gain in computational speed is substantial, and the new method is scalable to analysing genome-wide data. For example analysing a 200 kb data set in the HapMap Encode analysis took of the order of 12 h computing. So analysying a genome-wide data set would take of the order of 1000 CPU days, which is practicable as the analysis is trivially parallelisable.
Our method appears to be more powerful at detecting hotspots than HotspotFisher, though the latter method is both quicker than ours, and can localize the position of the hotspots more accurately. The reason it appears more accurate at inferring the position of the hotspots is likely to be due to the finer grid it uses for putative hotspots.
A comparison with LDhot is more difficult, as this method is currently not publicly available. During comparison we did have the problem that, we could only compare the power of the different methods and not also the false-positive rates. While we attempted to perform a comparison where the methods had similar putative false-positive rates, there was no way to check these in practice.
However, this comparison based on the published power results of LDhot suggests that sequenceLDhot may be more powerful for weaker hotspots, and perhaps data with lower SNP density; whereas LDhot is more powerful for stronger hotspots and data with higher SNP density. Both these results seem plausible. First, the power of sequenceLDhot is reliant on correctly choosing informative SNPs to be used to calculate the likelihood ratio statistics. If it chooses these well, then it fully utilizes the information contained in these SNPs, but a poor choice may mean that it misses hotspots that would be obvious from a different choice of SNPs. This may mean it works poorly, compared with other methods, for stronger hotspots where the occasional choice of a poor set of SNPs will limit its power slightly away from 100%. Second, as SNP density increases substantially (say >1 SNP/kb), sequenceLDhot is unable to fully utilize this extra information as it always calculates the likelihood ratio statistics based on a fixed number of SNPs. By comparison LDhot will continue to be able to take account of the information in these extra SNPs regardless of how high SNP density becomes.
When implementing our method various choices need to be made. In terms of summarizing the output, the most important one is the choice of the cutoff c. It is suggested (Section 2.3) to choose c based on (i) an appropriate false-positive rate; and (ii) using the approximate null-distribrution of the likelihood ratio statistic to approximately give this false-positive rate. We can get some idea as to how accurate step (2) is from the results of analysing the SeattleSNP and HapMap Encode data sets. For the SeattleSNP data we used c = 10 which has a putative false-positive rate of 1/1.2 Mb, and an empirical false-positive rate of 1/1.25 Mb; for the HapMap Encode data we use c = 12 which has a putative false-positive rate of 1/3.7 Mb, and an empirical rate of 1/5 Mb. The close proximity of the putative and empirical rates are encouraging, particularly as there are sources of uncertainty (e.g. the phasing on the genotype data and the estimate of the background recombination rate) that are not formally accounted for in the calculation of the likelihood ratio statistics.
In terms of implementing the method, there are numerous choices regarding how to estimate the background recombination rate, choice of w and l, and how many SNPs, S, to use per subregion. It is possible to use any method to produce estimates of the background recombination rate [e.g. using LDhat (McVean et al., 2002)]. We chose to estimate the background recombination rate using PHASEv2.1 primarily because it allows joint inference of the phase of the data and of the background recombination rate. The advantage of this is primarily one of ease of implementation: genotype data needs to be pre-processed only by PHASEv2.1, rather than by two programs, one to estimate the haplotypes and one to estimate the background recombination rate. The choice of w, l and S appears well-calibrated for analysing human data; but care may be needed in applying the method to data from other organisms (for example if hotspot widths differ from the 12 kb observed in humans, then changing both w and l will be necessary).
| Acknowledgments |
|---|
The idea for this work came out of discussions with the Statistical Genetics Group, Department of Statistics, University of Oxford. I thank Simon Myers and Jun Li for sending me the HumanChimp and HapMap Encode data sets, respectively. This work was supported by EPSRC grant GR/S18786/01.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Martin Bishop
Received on August 4, 2006; revised on September 25, 2006; accepted on October 17, 2006
| REFERENCES |
|---|
|
|
|---|
Crawford, D.C., et al. (2004) Evidence for substantial fine-scale variation in recombination rates across the human genome. Nat. Genetics, 36, 700706[CrossRef][Web of Science][Medline].
Fearnhead, P. and Donnelly, P. (2001) Estimating recombination rates from population genetic data. Genetics, 159, 12991318
Fearnhead, P. and Donnelly, P. (2002) Approximate likelihood methods for estimating local recombination rates (with discussion). JRSS series B, 64, 657680.
Fearnhead, P., et al. (2004) Application of coalescent methods to reveal fine-scale rate variation and recombination hotspots. Genetics, 167, 20672081
Fearnhead, P. and Smith, N.G.C. (2005) A novel method with improved power to detect recombination hotspots from polymorphism data reveals multiple hotspots in human genes. Am. J. Hum. Genet, . 77, 781794[CrossRef][Web of Science][Medline].
Hirschhorn, J.N. and Daly, M.J. (2005) Genome-wide association studies for complex diseases and complex traits. Nat. Rev. Genet, . 6, 95108[Web of Science][Medline].
Jeffreys, A.J., et al. (2001) Intensely punctate meiotic recombination in the class II region of the Major Histocompatibility Complex. Nat. Genet, . 29, 217222[CrossRef][Web of Science][Medline].
Jeffreys, A.J. and Neumann, R. (2002) Reciprocal crossover asymmetry and meiotic drive in a human recombination hotspot. Nat. Genet, . 31, 267271[CrossRef][Web of Science][Medline].
Jeffreys, A.J., et al. (2005) Human recombination hotspots hidden within regions of strong marker association. Nat. Genet, . 37, 601606[CrossRef][Web of Science][Medline].
Kauppi, L., et al. (2005) Localized breakdown in linkage disequilibrium does not always predict sperm crossover hot spots in the human MHC class II region. Genomics, 86, 1324[CrossRef][Web of Science][Medline].
Li, J., et al. (2006) A new method for detecting human recombination hotspots and its applications to the HapMap ENCODE data. Am. J. Hum. Genet, . 79, 628639[CrossRef][Web of Science][Medline].
Li, N. and Stephens, M. (2003) Modelling LD, and identifying recombination hotspots from SNP data. Genetics, 165, 22132233
McVean, G.A.T., et al. (2002) A coalescent method for detecting recombination from gene sequences. Genetics, 160, 12311241
McVean, G.A.T., et al. (2004) The fine-scale structure of recombination rate variation in the human genome. Science, 304, 581584
Meng, X. and Wong, W.H. (1996) Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Statistica Sinica, 6, 831860[Web of Science].
Myers, S., et al. (2005) A fine-scale map of recombination rates and hotspots across the human genome. Science, 310, 321324
Pineda-Krch, M. and Redfield, R.J. (2005) Persistence and loss of meiotic recombination hotspots. Genetics, 169, 23192333
Ptak, S.E., et al. (2005) Fine-scale recombination patterns differ between chimpanzees and humans. Nat. Genet, . 37, 429434[CrossRef][Web of Science][Medline].
Schaffner, S.F., et al. (2005) Calibrating a coalescent simulation of human genome sequence variation. Genome Res, . 15, 15761583
Smith, N.G.C. and Fearnhead, P. (2005) A comparison of three estimators of the population-scaled recombination rate: accuracy and robustness. Genetics, 171, 205162
Stephens, M. and Donnelly, P. (2003) A comparison of Bayesian methods for haplotype reconstruction from population genotype data. Am. J. Hum. Genet, . 73, 11621169[CrossRef][Web of Science][Medline].
Stephens, M., et al. (2001) A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet, . 68, 978989[CrossRef][Web of Science][Medline].
The International HapMap Consortium. (2005) A haplotype map of the human genome. Nature, 437, 12991320[CrossRef][Medline].
Winckler, W., et al. (2005) Comparison of fine-scale recombination rates in humans and chimpanzees. Science, 308, 107111
Zondervan, K.T. and Cardon, L.R. (2004) The complex interplay among factors that influence allelic association. Nat. Rev. Genet, . 5, 89100[Web of Science][Medline].
This article has been cited by other articles:
![]() |
S. L. Zheng, V. L. Stevens, F. Wiklund, S. D. Isaacs, J. Sun, S. Smith, K. Pruett, K. E. Wiley, S.-T. Kim, Y. Zhu, et al. Two Independent Prostate Cancer Risk-Associated Loci at 11q13 Cancer Epidemiol. Biomarkers Prev., June 1, 2009; 18(6): 1815 - 1820. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Wang and B. Rannala Population genomic inference of recombination rates and hotspots PNAS, April 14, 2009; 106(15): 6215 - 6219. [Abstract] [Full Text] [PDF] |
||||
![]() |
B.-L. Chang, S. D. Cramer, F. Wiklund, S. D. Isaacs, V. L. Stevens, J. Sun, S. Smith, K. Pruett, L. M. Romero, K. E. Wiley, et al. Fine mapping association study and functional analysis implicate a SNP in MSMB at 10q11 as a causal variant for prostate cancer risk Hum. Mol. Genet., April 1, 2009; 18(7): 1368 - 1375. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-F. Lefebvre and D. Labuda Fraction of Informative Recombinations: A Heuristic Approach to Analyze Recombination Rates Genetics, April 1, 2008; 178(4): 2069 - 2079. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Auton and G. McVean Recombination rate estimation in the presence of hotspots Genome Res., August 1, 2007; 17(8): 1219 - 1227. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






