Bioinformatics Advance Access originally published online on November 21, 2006
Bioinformatics 2007 23(2):255-256; doi:10.1093/bioinformatics/btl580
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
WHAP: haplotype-based association analysis
1 Center for Human Genetic Research, MGH Boston, MA, USA
2 Broad Institute of Harvard and MIT Cambridge, MA, USA
3 Genome Research Center, University of Hong Kong Pokfulam, Hong Kong
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: We describe a software tool to perform haplotype-based association analysis, for quantitative and qualitative traits, in population and family samples, using single nucleotide polymorphism or multiallelic marker data. A range of tests is offered: omnibus and haplotype-specific tests; prospective and retrospective likelihoods; covariates and moderators; sliding window analyses; permutation P-values. We focus on the ability to flexibly impose constraints on haplotype effects, which allows for a range of conditional haplotype-based likelihood ratio tests: for example, whether an allele has an effect independent of its haplotypic background, or whether a single variant can explain the overall association at a locus. We illustrate using these tests to dissect a multi-locus association.
Availability: WHAP is a C/C++ program, freely available from the author's website: http://pngu.mgh.harvard.edu/purcell/whap/
Contact: shaun{at}pngu.mgh.harvard.edu
| 1 INTRODUCTION |
|---|
|
|
|---|
WHAP performs haplotype-based association analysis, using a method similar to other recent methods (Schaid et al., 2002; Zaykin et al., 2002; Dudbridge, 2003). We use a weighted maximum likelihood model to account for the potential ambiguity in individuals' statistically-inferred haplotypes. For H haplotypes, two types of basic tests are available: haplotype-specific tests, i.e. H separate 1 degree-of-freedom tests of each specific haplotype compared with all others; and a single omnibus H 1 degree-of-freedom test, jointly testing all haplotypes. In this Application Note, we introduce WHAP and describe a class of conditional tests that allow multi-locus associations to be dissected by asking does X have an effect independent of Z?. Here, X and Z can be single, or multiple, markers or haplotypes.
| 2 METHODS |
|---|
|
|
|---|
WHAP implements an extension of Sham et al. (2004), which performed well in an independent and comprehensive comparison of haplotype-based case/control association methods (Cordell, 2006). Briefly, an observed, unphased multi-locus genotype is denoted G; a phased haplotype pair H; a phenotype Y; paternal, maternal and offspring indicators are P, M and O. For individuals without genotyped parents, a modified EM algorithm (Clayton, 2003, www-gene.cimr.cam.ac.uk/clayton/software/snphap.txt) is used to obtain the set of possible phases and their probabilities P(Hh|G). When present, both parents are phased separately; assuming independence between paternal and maternal chromosomes and applying basic Mendelian laws, we calculate the probability of each offspring phase consistent with the observed offspring genotypes and parental phase, giving phase probabilities P(HO|GO, HP, HM). Parental genotypes also allow partitioning of genetic effects in separate between-family (based on the expected offspring haplotype counts for each possible parental phase) and within-family (the difference between the observed offspring haplotype counts and the between component) components (Abecasis et al., 2000). By default, between and within components are equated, but other models can be specified. The contribution to the likelihood from each individual is P(Y|G) =
h P(Y|Hh) P(Hh|G) where P(Y|Hh) is parameterized following ordinary linear or logistic regression models and maximum likelihood is used to estimate the regression coefficients as described in Sham et al. (2004).
Finally, a retrospective likelihood using P(G|Y) in place of P(Y|G) can be used, which can have some advantages in selected samples (Sham et al., 2000) including the case of affected-only offspring designs. Omitting the individual subscript, the conditional likelihood is in the form
![]() |
2.1 Conditional tests
The framework described above allows for a range of basic haplotype tests. WHAP also allows the user to flexibly equate haplotype effects, to group haplotypes and perform cladistic tests, or to test null hypotheses of homogeneous effects rather than no effects. As shown below, this also facilitates conditional testing: once an association signal has been detected, conditional tests can be useful in determining which variant, or variants, is most likely to be causal, as opposed to showing only indirect association due to linkage disequilibrium with the causal variant.
Standard association analysis is concerned with detection, i.e. whether X is associated with the phenotype, where X is an allele, genotype, haplotype or set of haplotypes. In contrast, conditional tests are concerned with dissection, or how to best characterize multiple-marker associations, by asking whether X is associated with the phenotype independent of something else, Z. We will consider two (related) conditional tests: the independent effect test (is this variant associated independent of everything else?) and the sole-variant test (is everything else associated independent of this variant?). In both cases, everything else refers to the local haplotypic background as determined by the markers under study. It is important to note that, unlike all other tests, the sole-variant test looks for one particular marker or haplotype showing a non-significant result, given a significant omnibus test result. This would be consistent with the variant being causal (i.e. nothing else has an independent effect) although this does not of course prove this (as the effect might represent indirect association to local ungenotyped variation).
To illustrate the conditional tests, we simulated a dataset (also available from the author's website) of 200 cases and 200 controls containing 5 single nucleotide polymorphisms (SNPs) and 6 common haplotypes. Table 1 shows the estimated frequencies and association results for both conditional and unconditional SNP and haplotype tests. Haplotype H5 (AACTA, not uniquely tagged by any single SNP) was set to be the single risk-increasing variant for disease. Importantly, the significant omnibus result (
, p = 0.00186) suggests that it is appropriate to proceed to conditional testing.
|
Tests can be represented by the models applied under the alternate and null hypotheses. Each model is described by the coefficients for the six haplotypes in order (1 is the reference category; equality constraints only apply within a particular model). Consider five models: Model I {1, 1, 1, 1, 1, 1}; Model II {1, ß2, ß3, ß4, ß5, ß6}; Model III {1, 1, 1, 1, ß1, 1}; Model IV {1, ß2, ß2, ß3, ß4, ß5}; Model V {1, 1, ß1, ß1, ß1, 1}. Various tests can be constructed as likelihood ratio tests involving two nested models: the omnibus test compares II (alternate) versus I (null); haplotype-specific test for H5 is III versus I; the single SNP test for S5 is V versus I; the independent effect test for S5 is II versus IV; the sole-variant test for S5 is II versus V; the sole-variant test for H5 is II versus III. In WHAP these tests are easily constructed with a few simple commands: the --alt, --null and --constrain keywords. It is also possible to test for independent effects using locus-coding rather than haplotype-coding by entering a variant as a covariate coded to represent allelic dosage and performing a 1 df test controlling for Z, the background haplotypic effects.
Single SNP tests (SS) indicate that three SNPs are associated p < 0.05, one is near-significant at this threshold and one is clearly not associated. There are no independent effects (IE), however; S2 and S4 are in complete linkage disequilibrium and so their effects cannot be separated statistically. The SNP-based sole-variant tests (SVS) are inconclusive: all are significant indicating that no single SNP can explain the original omnibus association (i.e. no single SNP is a necessary and sufficient cause of the omnibus association).
We see a significant haplotypic omnibus and two haplotype-specific (HS) associations: H1 and H5 (the disease haplotype). The H1 association reflects a subtlety of haplotype-specific testing. Imagine a three haplotypes: HA, HB and HC with frequencies 0.90, 0.05 and 0.05. A risk effect of HC will induce a protective effect of HA, as 50% of not-HA haplotypes are HC (only just over 5% of not-HB haplotypes are HC). It would clearly be misleading to report a protective, common HA haplotype, and a rare, disease HC haplotype as distinct findings, however. Although HA is indeed protective relative to HC, this interpretation misses that haplotypes should really be grouped HA, HB versus HC.
The haplotype-based sole-variant tests (SVH) clarify the pattern of haplotype-specific results: as one would expect, H5 is the sole haplotype that can explain the omnibus association. Finally, if we add the causal variant as a sixth SNP (i.e. a single SNP unique to H5) we see that this single sixth SNP passes the SNP-based sole-variant test (result not shown in Table 1), i.e. yielding a non-significant result,
, p = 0.207 indicates that controlling for this single SNP can explain the entire omnibus association result, while no other SNP can do so.
| 3 CONCLUSION |
|---|
|
|
|---|
WHAP offers a range of methods for haplotype-based association analysis, with a focus on conditional tests to dissect genetic effects. Although such tests will often be under-powered and inconclusive, other times they will help resolve strong multi-locus association signals as in the example mentioned above. Other features of WHAP such as dominant and recessive genetic models, multi-allelic markers and covariates (having main and interacting effects) are described in the on-line documentation. Finally, it should be noted that WHAP is designed for candidate gene studies, or studies of small to moderately-sized chromosomal regions: different tools will be needed for whole genome association studies.
| Acknowledgments |
|---|
S.P. acknowledges UK MRC grant G9901258 and National Eye Institute grant EY-12562.
Conflict of Interest: None declared.
| FOOTNOTES |
|---|
Associate Editor: Keith A Crandal
Received on August 2, 2006; revised on October 26, 2006; accepted on November 13, 2006
| REFERENCES |
|---|
|
|
|---|
Abecasis, G.R., et al. (2000) A general test of association for quantitative traits in nuclear families. Am. J. Hum. Genet, . 66, 279292[CrossRef][ISI][Medline].
Clayton, D. (2003) SNPHAP software program.
Cordell, H. (2006) Estimation and testing of genotype and haplotype effects in case-control studies: comparison of weighted regression and multiple imputation procedures. Genet. Epidemiol, . 30, 259275[CrossRef][ISI][Medline].
Dudbridge, F. (2003) Pedigree disequilibrium tests for multilocus haplotypes. Genet. Epidemiol, . 25, 115121[CrossRef][ISI][Medline].
Schaid, D.J., et al. (2002) Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am. J. Hum. Genet, . 70, 42534[CrossRef][ISI][Medline].
Sham, P.C., et al. (2000) Variance components QTL linkage analysis: conditioning on trait values. Genet. Epidemiol, . 19, S22S28.
Sham, P.C., et al. (2004) Haplotype association analysis of discrete and continuous traits using mixture of regression models. Behav. Genet, . 34, 207214[CrossRef][ISI][Medline].
Zaykin, D.V., et al. (2002) Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals. Hum. Hered, . 53, 7991[ISI][Medline].
This article has been cited by other articles:
![]() |
M. Ozaki, K. Y. C. Lee, E. N. Vithana, V. H. Yong, A. Thalamuthu, T. Mizoguchi, A. Venkatraman, and T. Aung Association of LOXL1 Gene Polymorphisms with Pseudoexfoliation in the Japanese Invest. Ophthalmol. Vis. Sci., September 1, 2008; 49(9): 3976 - 3980. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. P. Reiner, C. S. Carlson, B. Thyagarajan, M. J. Rieder, J. F. Polak, D. S. Siscovick, D. A. Nickerson, D. R. Jacobs Jr, and M. D. Gross Soluble P-Selectin, SELP Polymorphisms, and Atherosclerotic Risk in European-American and African-African Young Adults: The Coronary Artery Risk Development in Young Adults (CARDIA) Study Arterioscler. Thromb. Vasc. Biol., August 1, 2008; 28(8): 1549 - 1555. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Duesing, G. Fatemifar, G. Charpentier, M. Marre, J. Tichet, S. Hercberg, B. Balkau, P. Froguel, and F. Gibson Evaluation of the Association of IGF2BP2 Variants With Type 2 Diabetes in French Caucasians Diabetes, July 1, 2008; 57(7): 1992 - 1996. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Sookoian, C. Gemma, T. F. Gianotti, A. Burgueno, G. Castano, and C. J. Pirola Genetic variants of Clock transcription factor are associated with individual susceptibility to obesity Am. J. Clinical Nutrition, June 1, 2008; 87(6): 1606 - 1615. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Y. Lee, E. N. Vithana, R. Mathur, V. H. Yong, I. Y. Yeo, A. Thalamuthu, M.-W. Lee, A. H. Koh, M. C. Lim, A. C. How, et al. Association Analysis of CFH, C2, BF, and HTRA1 Gene Polymorphisms in Chinese Patients with Polypoidal Choroidal Vasculopathy Invest. Ophthalmol. Vis. Sci., June 1, 2008; 49(6): 2613 - 2619. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. R. Wendland, P. R. Moya, M. R. Kruse, R. F. Ren-Patterson, C. L. Jensen, K. R. Timpano, and D. L. Murphy A novel, putative gain-of-function haplotype at SLC6A4 associates with obsessive-compulsive disorder Hum. Mol. Genet., March 1, 2008; 17(5): 717 - 723. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S. Zeiger, B. C. Haberstick, I. Schlaepfer, A. C. Collins, R. P. Corley, T. J. Crowley, J. K. Hewitt, C. J. Hopfer, J. Lessem, M. B. McQueen, et al. The neuronal nicotinic receptor subunit genes (CHRNA6 and CHRNB3) are associated with subjective responses to tobacco Hum. Mol. Genet., March 1, 2008; 17(5): 724 - 734. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Benn, M. C. A. Stene, B. G. Nordestgaard, G. B. Jensen, R. Steffensen, and A. Tybjaerg-Hansen Common and Rare Alleles in Apolipoprotein B Contribute to Plasma Levels of Low-Density Lipoprotein Cholesterol in the General Population J. Clin. Endocrinol. Metab., March 1, 2008; 93(3): 1038 - 1045. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. H. Perlis, S. Purcell, J. Fagerness, A. Kirby, T. L. Petryshen, J. Fan, and P. Sklar Family-Based Association Study of Lithium-Related and Other Candidate Genes in Bipolar Disorder Arch Gen Psychiatry, January 1, 2008; 65(1): 53 - 61. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Ramirez-Lorca, A. Grilo, M. T. Martinez-Larrad, L. Manzano, F. J. Serrano-Hernando, F. J. Moron, V. Perez-Gonzalez, J. L. Gonzalez-Sanchez, J. Fresneda, R. Fernandez-Parrilla, et al. Sex and Body Mass Index Specific Regulation of Blood Pressure by CYP19A1 Gene Variants Hypertension, November 1, 2007; 50(5): 884 - 890. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. L. Simpson, P. Hysi, S. S. Bhattacharya, C. J. Hammond, A. Webster, C. S. Peckham, P. C. Sham, and J. S. Rahi The Roles of PAX6 and SOX2 in Myopia: Lessons from the 1958 British Birth Cohort Invest. Ophthalmol. Vis. Sci., October 1, 2007; 48(10): 4421 - 4425. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. M. Plenge, M. Seielstad, L. Padyukov, A. T. Lee, E. F. Remmers, B. Ding, A. Liew, H. Khalili, A. Chandrasekaran, L. R.L. Davies, et al. TRAF1-C5 as a Risk Locus for Rheumatoid Arthritis -- A Genomewide Study N. Engl. J. Med., September 20, 2007; 357(12): 1199 - 1209. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Thys, I. Schrauwen, K. Vanderstraeten, K. Janssens, N. Dieltjens, K. Van Den Bogaert, E. Fransen, W. Chen, M. Ealy, M. Claustres, et al. The coding polymorphism T263I in TGF-{beta}1 is associated with otosclerosis in two independent populations Hum. Mol. Genet., September 1, 2007; 16(17): 2021 - 2030. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Stone, B. Merriman, R. M. Cantor, D. H. Geschwind, and S. F. Nelson High density SNP association study of a major autism linkage region on chromosome 17 Hum. Mol. Genet., March 15, 2007; 16(6): 704 - 715. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||









