Skip Navigation


Bioinformatics Advance Access originally published online on August 25, 2005
Bioinformatics 2005 21(20):3935-3937; doi:10.1093/bioinformatics/bti643
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/20/3935    most recent
bti643v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (8)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Gordon, D.
Right arrow Articles by Finch, S. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gordon, D.
Right arrow Articles by Finch, S. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oxfordjournals.org

PAWE-3D: visualizing power for association with error in case–control genetic studies of complex traits

Derek Gordon 1,*, Chad Haynes 1, Jon Blumenfeld 2,3 and Stephen J. Finch 4

1Laboratory of Statistical Genetics, Rockefeller University 1230 York Avenue, New York, NY 10021, USA
2The Rogosin Institute 505 East 70th Street, New York, NY 10021, USA
3Weill Medical College of Cornell University 1300 York Avenue, New York, NY 10021, USA
4Department of Applied Mathematics and Statistics, Stony Brook University Stony Brook, NY 11794, USA

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 VISUALIZING THE POWER
 EXAMPLE SAMPLE SIZE CALCULATION
 REFERENCES
 

Summary: A website that plots power and sample size calculations over a range of up to eight parameters (including diagnostic misclassification error parameters) for two commonly used statistical tests of genetic association, the linear trend test and the genotypic test of association.

Availability: This method is made available via the website http://linkage.rockefeller.edu/pawe3d/

Contact: pawe3d{at}linkage.rockefeller.edu

Power and sample size calculations are a critical part of the study design for genetic association analysis. Traditionally, statistical power for linkage or association analysis is computed by specifying genetic model parameters, such as the disease allele frequency and the conditional probabilities Pr(affected | j copies of disease allele), where j = 0, 1 or 2 for a diallelic disease locus (Boehnke, 1986; Weeks et al., 1990; Gordon et al., 2002; Purcell et al., 2003; De La Vega et al., 2005). The conditional probabilities are often referred to as penetrances (Ott, 1999). Equivalently, one can specify the genotype relative risks (Schaid and Sommer, 1993) and the prevalence of the disease (Sham, 1998; Purcell et al., 2003). Although these values can usually be estimated with a high degree of accuracy for Mendelian disorders, they are typically unknown for complex diseases (Ulgen et al., 2004). One statistical method to deal with such uncertainty regards considering a range of values for parameters. One can then either report the worst-case scenario (i.e. the smallest power or largest required sample size observed over the range) or median power and/or sample size values (Cox and Hinkley, 1979). This approach has been considered in several genetic applications over the past several years (Gordon et al., 1997; Vieland, 1998; Abreu et al., 1999; Cousin et al., 2003; Gordon et al., 2005; Zheng et al., 2005). One advantage is that researchers can observe a distribution of power values for the range of parameter values considered, including minimum, median, average and maximum power.


    VISUALIZING THE POWER
 TOP
 Abstract
 VISUALIZING THE POWER
 EXAMPLE SAMPLE SIZE CALCULATION
 REFERENCES
 
We have implemented a method to visualize power and sample size for two commonly used statistical tests for genetic association, the linear trend test (Cochran, 1954; Armitage, 1955) and the genotypic test of association (Gordon et al., 2002). The linear trend test is actually a class of tests that are functions of weights. For genetic case–control association analyses, Sasieni (1997) made recommendations for the choice of weights assuming different underlying genetic models. Power and/or sample size are computed through derivation of the respective test's non-centrality parameter (Mitra, 1958; Chapman and Nam, 1968). The power for a fixed sample size of cases and controls and minimal sample size for a fixed power, each at a specified significance level, are computed as functions of genotype relative risks for the heterozygote (R1) and for the homozygous risk allele (R2), disease allele frequency (pd), marker allele frequency (p1) of the SNP allele in coupling with disease allele, measure of disequilibrium between disease and SNP locus [(D') (Lewontin, 1964) or r2 (Fisher, 1970; Weir, 1990)] and disease prevalence (K). Alternatively, if one is studying a quantitative trait locus (QTL) and one specifies lower and upper cutoffs for definition of affected and unaffected individuals, then power and/or sample size are calculated as functions of QTL variance, the dominance/additive ratio, the frequency of the QTL ‘increaser’ allele, marker allele frequency (p1) of the SNP allele in coupling with QTL increaser allele and a measure of disequilibrium between the QTL and SNP locus [(D') or r2 as above] (Purcell et al., 2003).

Futhermore, because of work documenting the effects of diagnostic misclassification on the power of the linear trend test (Zheng and Tian, 2005) and the genotypic test of association (Bross, 1954; Edwards et al., 2005), we also include misclassification probabilities {theta} (the probability of misclassifying a true affected as an observed unaffected) and {varphi} (the probability of misclassifying a true unaffected as an observed affected).

In total, there are eight disease model parameters required for the determination of power and/or sample size at a given significance level, assuming a diallelic disease or QTL and a marker locus that are in disequilibrium. Our webtool, PAWE-3D, allows one to perform power calculations considering a range of values for any subset of the eight parameters (with the remaining parameters specified at a single value). If we consider a range for only one parameter, the resulting figure is a graph. If we consider a range for exactly two parameters, the resulting figure is a contour plot. If we consider a range for three or more parameters, the resulting figure is a histogram. The figures are created by randomly sampling from either a Uniform or a Beta distribution for 100 000 data points in the n-dimensional cube defined by the parameter intervals and computing power and/or sample size for these data points.


    EXAMPLE SAMPLE SIZE CALCULATION
 TOP
 Abstract
 VISUALIZING THE POWER
 EXAMPLE SAMPLE SIZE CALCULATION
 REFERENCES
 
Consider the following example, gleaned from a case–control genetic association study design of modifier loci in the PKD1 locus (Rossetti et al., 2002, 2003) for polycystic kidney disease. In this design, affected (respectively unaffected) status is defined by being at high (respectively low) risk to develop premature end-stage renal failure as determined by the diagnostic instrument used in The Consortium for Radiologic Imaging Studies of Polycystic (CRISP) Kidney Disease cohort (Chapman et al., 2003). The prevalence (K) of case individuals is ~40–50%, although we assume that we have equal number of cases and controls when performing the statistical analysis. Sequencing of several polymorphic SNPs in these patients indicates that the large majority of these SNPs have a minor allele frequency (P1) of 0.01. If we assume that one of the SNPs is a modifier locus that increases risk for being a case, then the disease allele frequency (pd) equals the minor allele frequency and D' (or r2) between the two SNPs is 1.0.

Since we anticipate the modifier loci will have small to moderate genotype relative risks (Schaid and Sommer, 1993), we consider genotype relative risks R1 and R2 in the range [1.5, 2.5]. We also consider misclassification probabilities {theta} and {varphi} in the range [0.00, 0.03]. Using the information above, we consider a prevalence K in the range [0.40, 0.50]. Thus, we compute sample sizes considering ranges for five parameters and consequently, our resultant figure will be a histogram. In this example, we use a Uniform prior distribution for the parameters considered.

In Figure 1, we present the histogram of sample size calculations (cases and controls) for the linear trend test for power = 0.80, ratio cases-controls = 1.0, significance level = 5%. The weights considered for the linear trend test are, X0 = 2, X1 = 1, X0 = 0, where Xi is the weight corresponding to genotype having i copies of the SNP minor allele. Note that the sample sizes assume that the test statistic is the one being used for analysis. In Table 1, we present sample size values corresponding to specified percentile thresholds.



View larger version (24K):
[in this window]
[in a new window]
 
Fig. 1 Histogram of sample size values in linear trend test example allowing for phenotype misclassification error.

 

View this table:
[in this window]
[in a new window]
 
Table 1 Sample size values for a given percentile in linear trend test example allowing for phenotype misclassification error

 
Viewing Table 1, we see that a total sample size of 587 (respectively 1005, 2825) is sufficient to achieve 80% power for at least half (respectively 75%, all) of the genetic model parameter settings (Table 1). In the spirit of mini–max theory (Cox and Hinkley, 1979), these results give researchers a way of determining the worst-case sample size requirements.


    Acknowledgments
 
The authors acknowledge grants received from the National Institutes of Health, K01-HG00055 and MH44292.

Conflict of Interest: none declared.

Received on May 31, 2005; revised on August 18, 2005; accepted on August 22, 2005

    REFERENCES
 TOP
 Abstract
 VISUALIZING THE POWER
 EXAMPLE SAMPLE SIZE CALCULATION
 REFERENCES
 

    Abreu, P.C., et al. (1999) Direct power comparisons between simple LOD scores and NPL scores for linkage analysis in complex diseases. Am. J. Hum. Genet., 65, 847–857[CrossRef][Web of Science][Medline].

    Armitage, P. (1955) Tests for linear trends in proportions and frequencies. Biometrics, 11, 375–386[CrossRef].

    Boehnke, M. (1986) Estimating the power of a proposed linkage study: a practical computer simulation approach. Am. J. Hum. Genet., 39, 513–527[Web of Science][Medline].

    Bross, I. (1954) Misclassification in 2 x 2 tables. Biometrics, 10, 478–486[CrossRef].

    Chapman, A.B., et al. (2003) Renal structure in early autosomal-dominant polycystic kidney disease (ADPKD): the Consortium for Radiologic Imaging Studies of Polycystic Kidney Disease (CRISP) cohort. Kidney Int., 64, 1035–1045[CrossRef][Medline].

    Chapman, D.G. and Nam, J.M. (1968) Asymptotic power of chi square tests for linear trends in proportions. Biometrics, 24, 315–327[CrossRef][Web of Science][Medline].

    Cochran, W.G. (1954) Some methods for strengthening the common chi-squared tests. Biometrics, 10, 417–451[CrossRef].

    Cousin, E., et al. (2003) Association studies in candidate genes: strategies to select SNPs to be tested. Hum. Hered., 56, 151–159[CrossRef][Web of Science][Medline].

    Cox, D.R. and Hinkley, D.V. Theoretical Statistics, (1979) , Boca Raton CRC Press.

    De La Vega, F.M., et al. (2005) Power and sample size calculations for genetic case/control studies using gene-centric SNP maps: application to Human Chromosomes 6, 21, and 22 in three populations. Hum. Hered., 60, 43–60[Medline].

    Edwards, B.J., et al. (2005) Power and sample size calculations in the presence of phenotype errors for case/control genetic association studies. BMC Genet., 6, 18[CrossRef][Medline].

    Fisher, R.A. Statistical Methods for Research Workers, (1970) 14th ed , New York Hafner/MacMillan.

    Gordon, D., et al. (1997) Association of posterior p-values of S.A.G.E. SIBPAL proportion-IBD and Haseman–Elston statistics for ACTHR112. Genet. Epidemiol., 14, 629–634[Medline].

    Gordon, D., et al. (2002) Power and sample size calculations for case–control genetic association tests when errors are present: application to single nucleotide polymorphisms. Hum. Hered., 54, 22–33[CrossRef][Web of Science][Medline].

    Gordon, D., et al. (2005) Power for complex trait genetic association. Clin. Neuroscience Res., 5, 31–35.

    Lewontin, R.C. (1964) The interaction of selection and linkage. I. General considerations; heterotic models. Genetics, 49, 49–67[Free Full Text].

    Mitra, S.K. (1958) On the limiting power function of the frequency chi-square test. Ann. Math. Stat., 29, 1221–1233.

    Ott, J. Analysis of Human Genetic Linkage, (1999) , Baltimore Johns Hopkins.

    Purcell, S., et al. (2003) Genetic power calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics, 19, 149–150[Abstract/Free Full Text].

    Rossetti, S., et al. (2002) The position of the polycystic kidney disease 1 (PKD1) gene mutation correlates with the severity of renal disease. J. Am. Soc. Nephrol., 13, 1230–1237[Abstract/Free Full Text].

    Rossetti, S., et al. (2003) Association of mutation position in polycystic kidney disease 1 (PKD1) gene and development of a vascular phenotype. Lancet, 361, 2196–2201[CrossRef][Web of Science][Medline].

    Sasieni, P.D. (1997) From genotypes to genes: doubling the sample size. Biometrics, 53, 1253–1261[CrossRef][Web of Science][Medline].

    Schaid, D.J. and Sommer, S.S. (1993) Genotype relative risks: methods for design and analysis of candidate-gene association studies. Am. J. Hum. Genet., 53, 1114–1126[Web of Science][Medline].

    Sham, P. Statistics in Human Genetics, (1998) , New York J. Wiley and Sons, Inc.

    Ulgen, A., et al. (2004) Percentiles of the null distribution of 2 maximum lod score tests. Hum. Hered., 57, 39–48[CrossRef][Web of Science][Medline].

    Vieland, V.J. (1998) Bayesian linkage analysis, or: how I learned to stop worrying and love the posterior probability of linkage. Am. J. Hum. Genet., 63, 947–954[CrossRef][Medline].

    Weeks, D.E., et al. (1990) SLINK: a general simulation program for linkage analysis. Am. J. Hum. Genet., 47, A204 Suppl.

    Weir, B.S. Genetic Data Analysis: Methods for Discrete Population Genetic Data, (1990) , Sunderland Sinauer Associates, Inc.

    Zheng, G. and Tian, X. (2005) The impact of diagnostic error on testing genetic association in case-control studies. Stat. Med., 24, 869–882[Medline].

    Zheng, G., et al. (2005) On averaging power for genetic association and linkage studies. Hum. Hered., 59, 14–20[Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Arch NeurolHome page
H. Li, S. Wetten, L. Li, P. L. St. Jean, R. Upmanyu, L. Surh, D. Hosford, M. R. Barnes, J. D. Briley, M. Borrie, et al.
Candidate Single-Nucleotide Polymorphisms From a Genomewide Association Study of Alzheimer Disease
Arch Neurol, January 1, 2008; 65(1): 45 - 53.
[Abstract] [Full Text] [PDF]


Home page
DiabetesHome page
S. Wiedmann, M. Fischer, M. Koehler, K. Neureuther, G. Riegger, A. Doering, H. Schunkert, C. Hengstenberg, and A. Baessler
Genetic Variants Within the LPIN1 Gene, Encoding Lipin, Are Influencing Phenotypes of the Metabolic Syndrome in Humans
Diabetes, January 1, 2008; 57(1): 209 - 217.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/20/3935    most recent
bti643v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (8)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Gordon, D.
Right arrow Articles by Finch, S. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gordon, D.
Right arrow Articles by Finch, S. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?