Skip Navigation


Bioinformatics Advance Access originally published online on November 21, 2006
Bioinformatics 2007 23(2):255-256; doi:10.1093/bioinformatics/btl580
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/2/255    most recent
btl580v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (38)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Purcell, S.
Right arrow Articles by Sham, P. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Purcell, S.
Right arrow Articles by Sham, P. C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

WHAP: haplotype-based association analysis

Shaun Purcell 1,2,*, Mark J. Daly 1,2 and Pak C. Sham 3

1 Center for Human Genetic Research, MGH Boston, MA, USA
2 Broad Institute of Harvard and MIT Cambridge, MA, USA
3 Genome Research Center, University of Hong Kong Pokfulam, Hong Kong

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 CONCLUSION
 REFERENCES
 

Summary: We describe a software tool to perform haplotype-based association analysis, for quantitative and qualitative traits, in population and family samples, using single nucleotide polymorphism or multiallelic marker data. A range of tests is offered: omnibus and haplotype-specific tests; prospective and retrospective likelihoods; covariates and moderators; sliding window analyses; permutation P-values. We focus on the ability to flexibly impose constraints on haplotype effects, which allows for a range of conditional haplotype-based likelihood ratio tests: for example, whether an allele has an effect independent of its haplotypic background, or whether a single variant can explain the overall association at a locus. We illustrate using these tests to dissect a multi-locus association.

Availability: WHAP is a C/C++ program, freely available from the author's website: http://pngu.mgh.harvard.edu/purcell/whap/

Contact: shaun{at}pngu.mgh.harvard.edu


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 CONCLUSION
 REFERENCES
 
WHAP performs haplotype-based association analysis, using a method similar to other recent methods (Schaid et al., 2002; Zaykin et al., 2002; Dudbridge, 2003). We use a weighted maximum likelihood model to account for the potential ambiguity in individuals' statistically-inferred haplotypes. For H haplotypes, two types of basic tests are available: haplotype-specific tests, i.e. H separate 1 degree-of-freedom tests of each specific haplotype compared with all others; and a single omnibus H – 1 degree-of-freedom test, jointly testing all haplotypes. In this Application Note, we introduce WHAP and describe a class of conditional tests that allow multi-locus associations to be dissected by asking ‘does X have an effect independent of Z?’. Here, X and Z can be single, or multiple, markers or haplotypes.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 CONCLUSION
 REFERENCES
 
WHAP implements an extension of Sham et al. (2004), which performed well in an independent and comprehensive comparison of haplotype-based case/control association methods (Cordell, 2006). Briefly, an observed, unphased multi-locus genotype is denoted G; a phased haplotype pair H; a phenotype Y; paternal, maternal and offspring indicators are P, M and O. For individuals without genotyped parents, a modified E–M algorithm (Clayton, 2003, www-gene.cimr.cam.ac.uk/clayton/software/snphap.txt) is used to obtain the set of possible phases and their probabilities P(Hh|G). When present, both parents are phased separately; assuming independence between paternal and maternal chromosomes and applying basic Mendelian laws, we calculate the probability of each offspring phase consistent with the observed offspring genotypes and parental phase, giving phase probabilities P(HO|GO, HP, HM). Parental genotypes also allow partitioning of genetic effects in separate between-family (based on the expected offspring haplotype counts for each possible parental phase) and within-family (the difference between the observed offspring haplotype counts and the between component) components (Abecasis et al., 2000). By default, between and within components are equated, but other models can be specified. The contribution to the likelihood from each individual is P(Y|G) = {sum}h P(Y|Hh) P(Hh|G) where P(Y|Hh) is parameterized following ordinary linear or logistic regression models and maximum likelihood is used to estimate the regression coefficients as described in Sham et al. (2004).

Finally, a retrospective likelihood using P(G|Y) in place of P(Y|G) can be used, which can have some advantages in selected samples (Sham et al., 2000) including the case of affected-only offspring designs. Omitting the individual subscript, the conditional likelihood is in the form

Formula
where the numerator sum {g1} represents all phases consistent with the individual's or trio's observed genotypes, whereas the denominator sum {g0} is over all possible phases, regardless of observed genotypes.

2.1 Conditional tests
The framework described above allows for a range of basic haplotype tests. WHAP also allows the user to flexibly equate haplotype effects, to group haplotypes and perform cladistic tests, or to test null hypotheses of homogeneous effects rather than no effects. As shown below, this also facilitates conditional testing: once an association signal has been detected, conditional tests can be useful in determining which variant, or variants, is most likely to be causal, as opposed to showing only indirect association due to linkage disequilibrium with the causal variant.

Standard association analysis is concerned with detection, i.e. whether X is associated with the phenotype, where X is an allele, genotype, haplotype or set of haplotypes. In contrast, conditional tests are concerned with dissection, or how to best characterize multiple-marker associations, by asking whether X is associated with the phenotype independent of something else, Z. We will consider two (related) conditional tests: the independent effect test (is this variant associated independent of everything else?) and the sole-variant test (is everything else associated independent of this variant?). In both cases, ‘everything else’ refers to the local haplotypic background as determined by the markers under study. It is important to note that, unlike all other tests, the sole-variant test looks for one particular marker or haplotype showing a non-significant result, given a significant omnibus test result. This would be consistent with the variant being causal (i.e. nothing else has an independent effect) although this does not of course prove this (as the effect might represent indirect association to local ungenotyped variation).

To illustrate the conditional tests, we simulated a dataset (also available from the author's website) of 200 cases and 200 controls containing 5 single nucleotide polymorphisms (SNPs) and 6 common haplotypes. Table 1 shows the estimated frequencies and association results for both conditional and unconditional SNP and haplotype tests. Haplotype H5 (AACTA, not uniquely tagged by any single SNP) was set to be the single risk-increasing variant for disease. Importantly, the significant omnibus result (Formula, p = 0.00186) suggests that it is appropriate to proceed to conditional testing.


View this table:
[in this window]
[in a new window]

 
Table 1 SNP and haplotype main effect and conditional effect tests

 
Tests can be represented by the models applied under the alternate and null hypotheses. Each model is described by the coefficients for the six haplotypes in order (1 is the reference category; equality constraints only apply within a particular model). Consider five models: Model I {1, 1, 1, 1, 1, 1}; Model II {1, ß2, ß3, ß4, ß5, ß6}; Model III {1, 1, 1, 1, ß1, 1}; Model IV {1, ß2, ß2, ß3, ß4, ß5}; Model V {1, 1, ß1, ß1, ß1, 1}. Various tests can be constructed as likelihood ratio tests involving two nested models: the omnibus test compares II (alternate) versus I (null); haplotype-specific test for H5 is III versus I; the single SNP test for S5 is V versus I; the independent effect test for S5 is II versus IV; the sole-variant test for S5 is II versus V; the sole-variant test for H5 is II versus III. In WHAP these tests are easily constructed with a few simple commands: the --alt, --null and --constrain keywords. It is also possible to test for independent effects using locus-coding rather than haplotype-coding by entering a variant as a covariate coded to represent allelic dosage and performing a 1 df test controlling for Z, the background haplotypic effects.

Single SNP tests (SS) indicate that three SNPs are associated p < 0.05, one is near-significant at this threshold and one is clearly not associated. There are no independent effects (IE), however; S2 and S4 are in complete linkage disequilibrium and so their effects cannot be separated statistically. The SNP-based sole-variant tests (SVS) are inconclusive: all are significant indicating that no single SNP can explain the original omnibus association (i.e. no single SNP is a necessary and sufficient cause of the omnibus association).

We see a significant haplotypic omnibus and two haplotype-specific (HS) associations: H1 and H5 (the disease haplotype). The H1 association reflects a subtlety of haplotype-specific testing. Imagine a three haplotypes: HA, HB and HC with frequencies 0.90, 0.05 and 0.05. A risk effect of HC will induce a protective effect of HA, as 50% of not-HA haplotypes are HC (only just over 5% of not-HB haplotypes are HC). It would clearly be misleading to report a protective, common HA haplotype, and a rare, disease HC haplotype as distinct findings, however. Although HA is indeed protective relative to HC, this interpretation misses that haplotypes should really be grouped HA, HB versus HC.

The haplotype-based sole-variant tests (SVH) clarify the pattern of haplotype-specific results: as one would expect, H5 is the sole haplotype that can explain the omnibus association. Finally, if we add the causal variant as a sixth SNP (i.e. a single SNP unique to H5) we see that this single sixth SNP passes the SNP-based sole-variant test (result not shown in Table 1), i.e. yielding a non-significant result, Formula, p = 0.207 indicates that controlling for this single SNP can explain the entire omnibus association result, while no other SNP can do so.


    3 CONCLUSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 CONCLUSION
 REFERENCES
 
WHAP offers a range of methods for haplotype-based association analysis, with a focus on conditional tests to dissect genetic effects. Although such tests will often be under-powered and inconclusive, other times they will help resolve strong multi-locus association signals as in the example mentioned above. Other features of WHAP such as dominant and recessive genetic models, multi-allelic markers and covariates (having main and interacting effects) are described in the on-line documentation. Finally, it should be noted that WHAP is designed for candidate gene studies, or studies of small to moderately-sized chromosomal regions: different tools will be needed for whole genome association studies.


    Acknowledgments
 
S.P. acknowledges UK MRC grant G9901258 and National Eye Institute grant EY-12562.

Conflict of Interest: None declared.


    FOOTNOTES
 
Associate Editor: Keith A Crandal

Received on August 2, 2006; revised on October 26, 2006; accepted on November 13, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 CONCLUSION
 REFERENCES
 

    Abecasis, G.R., et al. (2000) A general test of association for quantitative traits in nuclear families. Am. J. Hum. Genet, . 66, 279–292[CrossRef][ISI][Medline].

    Clayton, D. (2003) SNPHAP software program.

    Cordell, H. (2006) Estimation and testing of genotype and haplotype effects in case-control studies: comparison of weighted regression and multiple imputation procedures. Genet. Epidemiol, . 30, 259–275[CrossRef][ISI][Medline].

    Dudbridge, F. (2003) Pedigree disequilibrium tests for multilocus haplotypes. Genet. Epidemiol, . 25, 115–121[CrossRef][ISI][Medline].

    Schaid, D.J., et al. (2002) Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am. J. Hum. Genet, . 70, 425–34[CrossRef][ISI][Medline].

    Sham, P.C., et al. (2000) Variance components QTL linkage analysis: conditioning on trait values. Genet. Epidemiol, . 19, S22–S28.

    Sham, P.C., et al. (2004) Haplotype association analysis of discrete and continuous traits using mixture of regression models. Behav. Genet, . 34, 207–214[CrossRef][ISI][Medline].

    Zaykin, D.V., et al. (2002) Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals. Hum. Hered, . 53, 79–91[ISI][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
IOVSHome page
M. Ozaki, K. Y. C. Lee, E. N. Vithana, V. H. Yong, A. Thalamuthu, T. Mizoguchi, A. Venkatraman, and T. Aung
Association of LOXL1 Gene Polymorphisms with Pseudoexfoliation in the Japanese
Invest. Ophthalmol. Vis. Sci., September 1, 2008; 49(9): 3976 - 3980.
[Abstract] [Full Text] [PDF]


Home page
Arterioscler. Thromb. Vasc. Bio.Home page
A. P. Reiner, C. S. Carlson, B. Thyagarajan, M. J. Rieder, J. F. Polak, D. S. Siscovick, D. A. Nickerson, D. R. Jacobs Jr, and M. D. Gross
Soluble P-Selectin, SELP Polymorphisms, and Atherosclerotic Risk in European-American and African-African Young Adults: The Coronary Artery Risk Development in Young Adults (CARDIA) Study
Arterioscler. Thromb. Vasc. Biol., August 1, 2008; 28(8): 1549 - 1555.
[Abstract] [Full Text] [PDF]


Home page
DiabetesHome page
K. Duesing, G. Fatemifar, G. Charpentier, M. Marre, J. Tichet, S. Hercberg, B. Balkau, P. Froguel, and F. Gibson
Evaluation of the Association of IGF2BP2 Variants With Type 2 Diabetes in French Caucasians
Diabetes, July 1, 2008; 57(7): 1992 - 1996.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Clin. Nutr.Home page
S. Sookoian, C. Gemma, T. F. Gianotti, A. Burgueno, G. Castano, and C. J. Pirola
Genetic variants of Clock transcription factor are associated with individual susceptibility to obesity
Am. J. Clinical Nutrition, June 1, 2008; 87(6): 1606 - 1615.
[Abstract] [Full Text] [PDF]


Home page
IOVSHome page
K. Y. Lee, E. N. Vithana, R. Mathur, V. H. Yong, I. Y. Yeo, A. Thalamuthu, M.-W. Lee, A. H. Koh, M. C. Lim, A. C. How, et al.
Association Analysis of CFH, C2, BF, and HTRA1 Gene Polymorphisms in Chinese Patients with Polypoidal Choroidal Vasculopathy
Invest. Ophthalmol. Vis. Sci., June 1, 2008; 49(6): 2613 - 2619.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
J. R. Wendland, P. R. Moya, M. R. Kruse, R. F. Ren-Patterson, C. L. Jensen, K. R. Timpano, and D. L. Murphy
A novel, putative gain-of-function haplotype at SLC6A4 associates with obsessive-compulsive disorder
Hum. Mol. Genet., March 1, 2008; 17(5): 717 - 723.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
J. S. Zeiger, B. C. Haberstick, I. Schlaepfer, A. C. Collins, R. P. Corley, T. J. Crowley, J. K. Hewitt, C. J. Hopfer, J. Lessem, M. B. McQueen, et al.
The neuronal nicotinic receptor subunit genes (CHRNA6 and CHRNB3) are associated with subjective responses to tobacco
Hum. Mol. Genet., March 1, 2008; 17(5): 724 - 734.
[Abstract] [Full Text] [PDF]


Home page
J. Clin. Endocrinol. Metab.Home page
M. Benn, M. C. A. Stene, B. G. Nordestgaard, G. B. Jensen, R. Steffensen, and A. Tybjaerg-Hansen
Common and Rare Alleles in Apolipoprotein B Contribute to Plasma Levels of Low-Density Lipoprotein Cholesterol in the General Population
J. Clin. Endocrinol. Metab., March 1, 2008; 93(3): 1038 - 1045.
[Abstract] [Full Text] [PDF]


Home page
Arch Gen PsychiatryHome page
R. H. Perlis, S. Purcell, J. Fagerness, A. Kirby, T. L. Petryshen, J. Fan, and P. Sklar
Family-Based Association Study of Lithium-Related and Other Candidate Genes in Bipolar Disorder
Arch Gen Psychiatry, January 1, 2008; 65(1): 53 - 61.
[Abstract] [Full Text] [PDF]


Home page
HypertensionHome page
R. Ramirez-Lorca, A. Grilo, M. T. Martinez-Larrad, L. Manzano, F. J. Serrano-Hernando, F. J. Moron, V. Perez-Gonzalez, J. L. Gonzalez-Sanchez, J. Fresneda, R. Fernandez-Parrilla, et al.
Sex and Body Mass Index Specific Regulation of Blood Pressure by CYP19A1 Gene Variants
Hypertension, November 1, 2007; 50(5): 884 - 890.
[Abstract] [Full Text] [PDF]


Home page
IOVSHome page
C. L. Simpson, P. Hysi, S. S. Bhattacharya, C. J. Hammond, A. Webster, C. S. Peckham, P. C. Sham, and J. S. Rahi
The Roles of PAX6 and SOX2 in Myopia: Lessons from the 1958 British Birth Cohort
Invest. Ophthalmol. Vis. Sci., October 1, 2007; 48(10): 4421 - 4425.
[Abstract] [Full Text] [PDF]


Home page
NEJMHome page
R. M. Plenge, M. Seielstad, L. Padyukov, A. T. Lee, E. F. Remmers, B. Ding, A. Liew, H. Khalili, A. Chandrasekaran, L. R.L. Davies, et al.
TRAF1-C5 as a Risk Locus for Rheumatoid Arthritis -- A Genomewide Study
N. Engl. J. Med., September 20, 2007; 357(12): 1199 - 1209.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
M. Thys, I. Schrauwen, K. Vanderstraeten, K. Janssens, N. Dieltjens, K. Van Den Bogaert, E. Fransen, W. Chen, M. Ealy, M. Claustres, et al.
The coding polymorphism T263I in TGF-{beta}1 is associated with otosclerosis in two independent populations
Hum. Mol. Genet., September 1, 2007; 16(17): 2021 - 2030.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
J. L. Stone, B. Merriman, R. M. Cantor, D. H. Geschwind, and S. F. Nelson
High density SNP association study of a major autism linkage region on chromosome 17
Hum. Mol. Genet., March 15, 2007; 16(6): 704 - 715.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/2/255    most recent
btl580v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (38)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Purcell, S.
Right arrow Articles by Sham, P. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Purcell, S.
Right arrow Articles by Sham, P. C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?