Bioinformatics Advance Access originally published online on February 18, 2007
Bioinformatics 2007 23(8):1038-1039; doi:10.1093/bioinformatics/btm058
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A new JAVA interface implementation of THESIAS: testing haplotype effects in association studies
INSERM, UMR S 525 and Université Pierre et Marie Curie-Paris6, UMR S 525, Paris, France
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: THESIAS (Testing Haplotype EffectS In Association Studies) is a popular software for carrying haplotype association analysis in unrelated individuals. In addition to the command line interface, a graphical JAVA interface is now proposed allowing one to run THESIAS in a user-friendly manner. Besides, new functionalities have been added to THESIAS including the possibility to analyze polychotomous phenotype and X-linked polymorphisms.
Availability: The software package including documentation and example data files is freely available at http://genecanvas.ecgene.net. The source codes are also available upon request.
Contact: david.tregouet{at}chups.jussieu.fr
| 1 INTRODUCTION |
|---|
|
|
|---|
Haplotype analysis has become an essential step when investigating an association between several polymorphisms within a gene and a phenotype. Haplotype-based analysis may help to differentiate the true effect of a polymorphism from what is due to its linkage disequilibrium with other variants (Frere et al., 2006; Tregouet et al., 2003a). Haplotypes may serve as better markers for unknown functional variants than single nucleotide polymorphisms (SNPs). And importantly, they may also define functional units whose effects cannot be predicted from what is known of the individual effect of each variant that enters into their combination. When investigating unrelated individuals, haplotypes can generally not be deduced from genotypes and must be statistically inferred. This explains why a large amount of work has been devoted to the development of statistical tools allowing to infer haplotypes, and in particular for simultaneously estimating haplotype frequencies and haplotype effects [see (Adkins, 2004; Marchini et al., 2006; Niu, 2004; Salem et al., 2005; Weale, 2004) for different reviews on methods and software]. We initially developed maximum likelihood methods for haplotype-based association analysis of quantitative and binary (case-control) phenotypes (Tregouet et al., 2002), and these methods were later extended to matched case control and survival analysis (Tregouet and Tiret, 2004). All methods were implemented into the THESIAS (Testing Haplotype EffectS In Association Studies) software, which is based on the Stochastic-EM algorithm (Tregouet et al., 2003b). The SEM algorithm has the advantage over the standard EM algorithm of being more robust to problems of lack of convergence and convergence to local minima (Tregouet et al., 2003b). Besides, unlike software that treat haplotype as observed, our algorithm is a multiple-imputation algorithm that never assigns haplotype to individuals, and is therefore not subject to type I error inflation (Curtis and Sham, 2006).
| 2 IMPLEMENTATION AND METHODS |
|---|
|
|
|---|
THESIAS has already been used by several research groups [e.g. (Inoue et al., 2006; Meyre et al., 2005; Qi et al., 2006; Wootton et al., 2006)], but a wider use was limited by the lack of a graphical interface. This is why we developed a graphical JAVA-based interface allowing one to run THESIAS in a user-friendly way (Fig. 1). Selection of genotypes, phenotype and covariates is made through scroll list while options are selected through check boxes. Detailed description of how to use THESIAS is described in a documentation file.
|
In addition, the polytomous logistic model (Ananth and Kleinbaum, 1997) has now been implemented into the THESIAS software allowing one to address the association of haplotypes with a categorical phenotype, and THESIAS has also been extended to incorporate analysis of X-linked polymorphisms and the possibility to include an offset variable in the standard haplotype logistic regression analysis.
From a set of di- or tri-allelic polymorphisms genotyped in a sample of unrelated individuals, THESIAS output provides
- Allele frequencies for each SNP
- Tests for deviation from Hardy–Weinberg equilibrium at each locus using the standard
2 goodness-of-fit test
- Haplotype frequencies and their standard errors (SE) estimated from the genotypic data at each locus
- Pairwise linkage disequilibrium (LD) coefficients between SNPs. LD is expressed in terms of both r2 and D' which is the ratio of the unstandardized coefficient to its maximal/minimal value. The statistical significance is then tested by means of a
2 test with 1 df (Thompson et al., 1988).
- Squared-correlation coefficient (R2) between true and predicted haplotype dosage as described in (Stram et al., 2003)
- Haplotype effects (SE) on the phenotype. Effects are expressed either as haplotypic odds ratio, mean effect or hazard ratio by reference to the most frequent haplotype. Some effects (such as those that are associated with rare haplotypes) can be fixed to 0.
- Testing the null hypothesis of no association of haplotypes with the phenotype by means of the likelihood-ratio test (with k – 1 degrees of freedom for k haplotypes)
- Testing for the absence of deviation from the (by default) assumption of additivity of haplotype effects on the phenotype
- Adjusting for covariates and testing interactions between these covariates and haplotypes
- Testing specific hypotheses about haplotype effects by setting appropriate constraints on regression parameters. Two types of hypotheses can be tested:
- Homogeneity of haplotype effects: for example, testing ßABC = ßAbC = ßabC enables one to test whether the effects of the three haplotypes carrying the C allele are homogeneous, where ßi is the effect associated with haplotype i
- Homogeneity of allele effects according to different haplotypic backgrounds: for example, testing ßABC – ßABc = ßAbC – ßAbc = ßaBC – ßaBc enables one to test whether the effect of the C/c polymorphism is homogeneous according to the three haplotypic backgrounds AB-, Ab- and aB-
- Performing haplotype analysis using missing genotypic data that are inferred from non-missing genotypes and the estimated haplotype frequencies.
| 3 CONCLUSION |
|---|
|
|
|---|
THESIAS is one of the most complete software for haplotype analysis in unrelated individuals that can treat quantitative, binary (matched and unmatched case control), survival and polychotomous phenotype analyses while adjusting for covariates and testing haplotype x environment interaction. A graphical JAVA-based interface is now proposed for the THESIAS software that should make THESIAS easier to use. This JAVA implementation of THESIAS is mainly devoted to the in-depth analysis of SNPs within a candidate gene or a candidate region and may not be appropriate for the analysis of a very large number of SNPs where sliding-window approaches should be preferable. The source C code underlying THESIAS is, however, available upon request so that anyone interested in sliding-window approaches can develop his/her own batch mode program. The main limitations of the THESIAS program are that all computations are performed under the assumption of Hardy–Weinberg equilibrium at the haplotypic level that prospective likelihood is used for case-control analysis instead of retrospective likelihood (Epstein and Satten, 2003) and that, in its current version, only di- and tri-allelic polymorphisms can be investigated. The statistical core of THESIAS is written in ANSI C. Dynamic link libraries compiled for Windows, Linux and some Unix systems were linked to a JAVA-based interface requiring the Java Standard Edition Runtime Environment version 1.4.2 (or more). THESIAS is available free of charge from http://genecanvas.ecgene.net/and distributed under the GPL license.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
This work was funded by a grant from the French Ministry of Research (ACI IMPBIO No 032619) and in part by the BMBF-BioChancePlus project No 010131P3428.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Martin Bishop
Received on December 26, 2006; revised on February 9, 2007; accepted on February 9, 2007
| REFERENCES |
|---|
|
|
|---|
Adkins RM. Comparison of the accuracy of methods of computational haplotype inference using a large empirical dataset. BMC Genet. (2004) 5:22–29.[CrossRef][Medline]
Ananth CV, Kleinbaum DG. Regression models for ordinal responses: a review of methods and applications. Int. J. Epidemiol. (1997) 26:1323–1333.
Curtis D, Sham PC. Estimated haplotype counts from case-control samples cannot be treated as observed counts. Am. J. Hum. Genet. (2006) 78:729–730.[CrossRef][Web of Science][Medline]
Epstein MP, Satten GA. Inference on haplotype effects in case-control studies using unphased genotype data. Am. J. Hum. Genet. (2003) 73:1316–1329.[CrossRef][Web of Science][Medline]
Frere C, et al. Fine mapping of quantitative trait nucleotides underlying thrombin-activatable fibrinolysis inhibitor antigen levels by a transethnic study. Blood (2006) 108:1562–1568.
Inoue K, et al. Search on chromosome 17 centromere reveals TNFRSF13B as a susceptibility gene for intracranial aneurysm: a preliminary study. Circulation (2006) 113:2002–2010.
Marchini J, et al. A comparison of phasing algorithms for trios and unrelated individuals. Am. J. Hum. Genet (2006) 78:437–450.[CrossRef][Web of Science][Medline]
Meyre D, et al. Variants of ENPP1 are associated with childhood and adult obesity and increase the risk of glucose intolerance and type 2 diabetes. Nat. Genet. (2005) 37:863–867.[CrossRef][Web of Science][Medline]
Niu T. Algorithms for inferring haplotypes. Genet. Epidemiol. (2004) 27:334–347.[CrossRef][Web of Science][Medline]
Qi L, et al. Genetic variation in IL6 gene and type 2 diabetes: tagging-SNP haplotype analysis in large-scale case-control study and meta-analysis. Hum. Mol. Genet. (2006) 15:1914–1920.
Salem RM, et al. A comprehensive literature review of haplotyping software and methods for use with unrelated individuals. Hum. Genomics (2005) 2:39–66.[Medline]
Stram DO, et al. Modeling and E-M estimation of haplotype-specific relative risks from genotype data for a case-control study of unrelated individuals. Hum. Hered. (2003) 55:179–190.[CrossRef][Web of Science][Medline]
Thompson EA, et al. The detection of linkage disequilibrium between closely linked markers: RFLPs at the AI-CIII apolipoprotein genes. Am. J. Hum. Genet. (1988) 42:113–124.[Web of Science][Medline]
Tregouet DA, et al. Specific haplotypes of the P-selectin gene are associated with myocardial infarction. Hum. Mol. Genet. (2002) 11:2015–2023.
Tregouet DA, et al. SELPLG gene polymorphisms in relation to plasma SELPLG levels and coronary artery disease. Ann. Hum. Genet. (2003a) 67:504–511.[CrossRef][Web of Science][Medline]
Tregouet DA, Escolano S, Tiret L, Mallet A, Golmard JL. A new maximum likelihood algorithm for haplotype-based association analysis: the SEM algorithm. Ann. Hum. Genet. (2003b) 165–177.
Tregouet DA, Tiret L. Cox proportional hazards survival regression in haplotype-based association analysis using the stochastic-EM algorithm. Eur. J. Hum. Genet. (2004) 12:971–974.[CrossRef][Web of Science][Medline]
Weale ME. A survey of current software for haplotype phase inference. Hum. Genomics (2004) 1:141–144.[Medline]
Wootton PT, et al. Tagging-SNP haplotype analysis of the secretory PLA2IIa gene PLA2G2A shows strong association with serum levels of sPLA2IIa: results from the UDACS study. Hum. Mol. Genet. (2006) 15:355–361.
This article has been cited by other articles:
![]() |
E. Goekkurt, S.-E. Al-Batran, J. T. Hartmann, U. Mogck, G. Schuch, M. Kramer, E. Jaeger, C. Bokemeyer, G. Ehninger, and J. Stoehlmacher Pharmacogenetic Analyses of a Phase III Trial in Metastatic Gastroesophageal Adenocarcinoma With Fluorouracil and Leucovorin Plus Either Oxaliplatin or Cisplatin: A Study of the Arbeitsgemeinschaft Internistische Onkologie J. Clin. Oncol., June 10, 2009; 27(17): 2863 - 2873. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Debette, S. Bevan, J.-F. Dartigues, M. Sitzer, M. Lorenz, P. Ducimetiere, P. Amouyel, and H. S. Markus Fractalkine Receptor/Ligand Genetic Variants and Carotid Intima-Media Thickness Stroke, June 1, 2009; 40(6): 2212 - 2214. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. J. Regieli, J. W. Jukema, D. E. Grobbee, J. J.P. Kastelein, J. A. Kuivenhoven, A. H. Zwinderman, Y. van der Graaf, M. L. Bots, and P. A. Doevendans CETP genotype predicts increased mortality in statin-treated men with proven cardiovascular disease: an adverse pharmacogenetic interaction Eur. Heart J., November 2, 2008; 29(22): 2792 - 2799. [Abstract] [Full Text] [PDF] |
||||
![]() |
D.-A. Tregouet, P.-H. Groop, S. McGinn, C. Forsblom, S. Hadjadj, M. Marre, H.-H. Parving, L. Tarnow, R. Telgmann, T. Godefroy, et al. G/T Substitution in Intron 1 of the UNC13B Gene Is Associated With Increased Risk of Nephropathy in Patients With Type 1 Diabetes Diabetes, October 1, 2008; 57(10): 2843 - 2850. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Schunkert, A. Gotz, P. Braund, R. McGinnis, D.-A. Tregouet, M. Mangino, P. Linsel-Nitschke, F. Cambien, C. Hengstenberg, K. Stark, et al. Repeated Replication and a Prospective Meta-Analysis of the Association Between Chromosome 9p21.3 and Coronary Artery Disease Circulation, April 1, 2008; 117(13): 1675 - 1684. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





