Skip Navigation


Bioinformatics Advance Access originally published online on April 21, 2005
Bioinformatics 2005 21(14):3097-3104; doi:10.1093/bioinformatics/bti456
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
21/14/3097    most recent
bti456v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (20)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Jung, S.-H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Jung, S.-H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

Sample size for FDR-control in microarray data analysis

Sin-Ho Jung

Department of Biostatistics and Bioinformatics, CALGB Statistical Center Hock Plaza, Suite 802,2424 Erwin Road Duke University Durham, NC 27705, USA


    Abstract
 TOP
 Abstract
 1 INTRODUCTION
 2 FALSE DISCOVERY RATE
 3 SAMPLE SIZE CALCULATION
 4 NUMERICAL STUDIES
 5 DISCUSSION
 REFERENCES
 

Summary: We consider identifying differentially expressing genes between two patient groups using microarray experiment. We propose a sample size calculation method for a specified number of true rejections while controlling the false discovery rate at a desired level. Input parameters for the sample size calculation include the allocation proportion in each group, the number of genes in each array, the number of differentially expressing genes and the effect sizes among the differentially expressing genes. We have a closed-form sample size formula if the projected effect sizes are equal among differentially expressing genes. Otherwise, our method requires a numerical method to solve an equation. Simulation studies are conducted to show that the calculated sample sizes are accurate in practical settings. The proposed method is demonstrated with a realstudy.

Contact: jung005{at}mc.duke.edu


    1 INTRODUCTION
 TOP
 Abstract
 1 INTRODUCTION
 2 FALSE DISCOVERY RATE
 3 SAMPLE SIZE CALCULATION
 4 NUMERICAL STUDIES
 5 DISCUSSION
 REFERENCES
 
Microarray method has been widely used for identifying differentially expressing genes, called prognostic genes, in the subjects with different types of disease. Statistical procedures to identify differentially expressing genes involve a serious multiple comparison problem since we perform as many hypothesis testings as the number of the candidate genes in microarrays. If we use a type I error rate {alpha} in each testing, then the probability to reject any hypothesis will greatly exceed the intended overall {alpha} level. In order to avoid this pitfall, two approaches are widely used: false discovery rate (FDR) control and family-wise error rate (FWER)control.

Sample size calculation is a critical procedure when designing a microarray study. There have been several publications on sample size estimation in the microarray context, e.g. Simon et al. (2002). Some focused on exploratory and approximate relationships among statistical power, sample size (or the number of replicates) and effect size (often, in terms of fold-change), and used the most conservative Bonferroni adjustment for controlling FWER (the probability to discover one or more genes when none of the genes under consideration is prognostic) without any attempt to incorporate the underlying correlation structure (Wolfinger et al., 2001; Black and Doerge, 2002; Pan et al., 2002; Cui and Churchill, 2003). Jung et al. (2005) incorporated the correlation structure to derive an accurate sample size when controlling the FWER.

Some researchers proposed a new concept of testing error called FDR, defined as the expected value of the proportion of the non-prognostic genes among the discovered genes (Benjamini and Hochberg, 1995; Storey, 2002). Controlling this quantity relaxes the multiple testing criteria compared with controlling the FWER in general, and consequently increases the number of declared significant genes. Operating and numerical characteristics of FDR are elucidated in recent publications (Genovese and Wasserman, 2002; Dudoit et al., 2003).

Lee and Whitmore (2002) considered multiple group cases, including the two-sample case, using ANOVA models and derived the relationship between the effect sizes and the FDR based on a Bayesian perspective. They discuss a power analysis without involving the multiple testing issue. Müller et al. (2004) chose a pair of testing errors, including FDR, and minimized one while controlling the other at a specified level using a Bayesian decision rule. They proposed a simulation algorithm to demonstrate the relationship between the sample size and the chosen testing errors based on asymptotic results. This approach requires specification of complicated parametric models for prior and data distributions, and extensive computing for the Bayesian simulations. Most of the existing studies for FDR-control do not show the explicit relationship between the sample size and the effect sizes because of different reasons. For example, Lee and Whitmore (2002) and Gadbury et al. (2004) modelled a distribution of p-values from pilot studies to produce sample size estimates but did not provide an explicit sample size formula. None of the aforementioned studies based on FDR evaluated their sample sizes using simulations.

In this paper, we propose a sample size estimation procedure for FDR-control. We derive the sample size required for a specified number of true rejections (i.e. identifying the prognostic genes) while controlling the FDR at a desired level. As input parameters, we specify the allocation proportions between two groups, the total number of candidate genes, the number of prognostic genes, the effect sizes of the prognostic genes in addition to the required number of true rejections and the FDR level. In general, our procedure requires solving an equation using a numerical method, such as the bisection method. However, if the effect sizes are equal among all prognostic genes, the equation can be solved to give a closed form formula. We review the background of FDR and its estimation method in Section 2, and propose a new sample size method in Section 3. In Section 4, we discuss simulation studies that are conducted to show that the calculated sample sizes are accurate, and demonstrate an application of our method to a real study. van den Oord and Sullivan (2003) considered a similar setting for sample size calculation, but their formulation is so general that they do not provide an explicit formula in any specific case.


    2 FALSE DISCOVERY RATE
 TOP
 Abstract
 1 INTRODUCTION
 2 FALSE DISCOVERY RATE
 3 SAMPLE SIZE CALCULATION
 4 NUMERICAL STUDIES
 5 DISCUSSION
 REFERENCES
 
Suppose that we conduct m multiple tests, of which the null hypotheses are true for m0 tests and the alternative hypotheses are true for m1(=mm0) tests. The tests declare that, of the m0 null hypotheses, A0 hypotheses are null (true negative) and R0 hypotheses are alternative (false rejection, false discovery or false positive). Among the m1 alternative hypotheses, A1 are declared null (false negative) and R0 are declared alternative (true rejection, true discovery or true positive). Table 1 summarizes the outcome of m hypothesis tests.


View this table:
[in this window]
[in a new window]
 
Table 1 Outcomes of m multiple tests

 
Benjamini and Hochberg (1995) define the FDR as

(1)
Note that this expression is undefined if Pr(R = 0) > 0. To avoid this issue, Benjamini and Hochberg (1995) redefine the FDR as

(2)
These two definitions are identical if Pr(R = 0) = 0, in which case we have FDR = E(R0/R|R > 0) ({equiv}pFDR, which will be defined below).

If m = m0, then FDR = 1 by any critical value with Pr(R = 0) = 0. Pointing out this issue, Storey and Tibshirani (2003) defines the second factor in the right-hand side of Equation (2) as pFDR,

and proposes to control this quantity instead of FDR. Storey (2002) claims that Pr(R > 0) {approx} 1 with a large m, so that pFDR is equivalent to FDR. We accept this argument in this paper and do not distinguish between FDR and pFDR. Hence, definitions (1) and (2) are considered to be equal. We observed R > 0 in all of the simulations conducted in Section 4.

Benjamini and Hochberg (1995) propose a multi-step procedure to control the FDR at a specified level. However, this is known to be conservative, and the conservativeness increases in m0, see, e.g. Storey et al. (2004).

Suppose that, in the j-th testing, we reject the null hypothesis Hj if the p-value pj is smaller than or equal to {alpha} (0,1). Assuming independence of the m p-values, we have

which equals m0{alpha}, where m–1op(m) -> 0 in probability as m -> {infty} (Storey, 2002). Ignoring the error term, we have

(3)
where . Given {alpha}, estimation of FDR by Equation (3) requires estimation of m0.

For the estimation of m0, Storey (2002) assumes that the histogram of m p-values is a mixture of m0 p-values that are corresponding to the true null hypotheses and following U(0,1) distribution, and m1 p-values that are corresponding to the alternative hypotheses and expected to be close to 0. Consequently, for a chosen constant {lambda} away from 0, none (or few, if any) of the latter m1 p-values will fall above {lambda}, so that the number of p-values above {lambda}, , can be approximated by the expected frequency among the m0 p-values above {lambda} from U(0,1) distribution, i.e. m0/(1 {lambda}). Hence, given {lambda}, m0 is estimated by

By combining this m0 estimator with Equation (3), Storey (2002) obtains

For an observed p-value pj, Storey (2002) defines the q-value, the minimum FDR level at which we reject Hj, as

This formula is reduced to

if FDR({alpha}) is strictly increasing in {alpha}, see Theorem 2 of Storey and Tibshirani (2003). Supporting Material shows that this assumption holds if the power function of the individual tests is concave in {alpha}, which is the case when the test statistics follow the standard normal distribution under the null hypotheses. We reject Hj (or, equivalently, discover gene j) if qj is smaller than or equal to the prespecified FDR level.

The independence assumption among m test statistics was relaxed to independence only among m0 test statistics corresponding to the null hypotheses by Storey and Tibshirani (2001) and to weak independence among all m test statistics by Storey and Tibshirani (2003) and Storey et al. (2004). These approaches are implemented in the statistical package called SAM (see Storey and Tibshirani, 2003).


    3 SAMPLE SIZE CALCULATION
 TOP
 Abstract
 1 INTRODUCTION
 2 FALSE DISCOVERY RATE
 3 SAMPLE SIZE CALCULATION
 4 NUMERICAL STUDIES
 5 DISCUSSION
 REFERENCES
 
Let and denote the set of genes for which the null and alternative hypotheses are true, respectively. Note that the cardinalities of and are m0 and m1, respectively. Since the estimated FDR is invariant to the order of the genes, we may rearrange the genes and set and .

By Storey (2002) and Storey and Tibshirani (2001) for large m and under independence (or weak dependence) among the test statistics, we have

where for h = 0,1, {xi}j({alpha}) = P(pj ≤ {alpha}) is the marginal power of the single {alpha}-test applied to gene . So, from (3), we have

(4)
by omitting the error term.

Let Xij (Yij) denote the expression level of gene j for subject i in group 1 (and group 2, respectively) with common variance . We consider two-sample t-tests,

for hypothesis j (=1,...,m), where nk is the number of subjects in group k(= 1,2), j and j are sample means of {Xij,i = 1,...,n1} and {Yij,i = 1,...,n2}, respectively, and is the pooled sample variance. We assume a large sample (i.e. nk -> {infty}), so that Tj ~ N(0,1) for . Let n=n1+n2 denote the total sample size, and ak=nk/n the allocation proportion for group k.

Let {delta}j denote the effect size for gene j in the fraction of its standard error, i.e.

At the moment, we consider one-sided tests, Hj:{delta}j = 0 against j:{delta}j > 0, by assuming {delta}j > 0 for and {delta}j = 0 for . The two-sided testing case is briefly discussed at the end of this section. Note that, for large n, for , so that we have

where denotes the survivor function and is the upper 100{alpha}-th percentile of N(0,1). Hence, Equation (4) is expressed as

(5)
From Equation (5), FDR is decreasing in {delta}j, n and |a1–1/2|. Further, FDR is increasing in {alpha} (see Supporting Material). If the effect sizes are equal among the prognostic genes, FDR is increasing in {pi}0=m0/m. It is easy to show that FDR increases from 0 to m0/m as {alpha} increases from 0 to 1.

At the design stage of a study, m is decided by the microarray chips chosen for experiment and m1, and a1 are projected based on experience or from pilot data if any. The only variables undecided in Equation (5) are {alpha} and n. With all other design parameters fixed, FDR is controlled at a certain level by the chosen {alpha} level. So, we want to find the sample size n that will guarantee a certain number, say r1(≤ m1), of true rejections with FDR controlled at a specified level f.

In Equation (5), the expected number of true rejections is

(6)
In multiple testing controlling FDR, E(R1)/m1 plays the role of the power of a conventional testing (see Lee and Whitmore, 2002; van den Oord and Sullivan, 2003). With E(R1) and the FDR level set at r1 and f, respectively, Equation (5) is expressed as

By solving this equation with respect to {alpha}, we obtain

Given m0, {alpha}* is the marginal type I error level for r1 true rejections with the FDR controlled at f. With {alpha} and E(R1) replaced by {alpha}* and r1, respectively, Equation (6) yields an equation h(n) = 0, where

(7)
We obtain the sample size by solving this equation. In general, solving the equation h(n) = 0 requires a numerical approach, such as the bisection method:

  1. Choose s1 and s2 such that 0 < s1 < s2 and h1 h2 < 0, where hk = h(sk) for k = 1,2. (If h1 h2 > 0 and h1 > 0, then choose a smaller s1; if h1 h2 > 0 and h2 < 0, then choose a larger s2.)
  2. For s3 = (s1 + s2)/2, calculate h3 = h(s3).
  3. If h1h3 < 0, then replace s2 and h2 with s3 and h3, respectively. Else, replace s1 and h1 with s3 and h3, respectively. Go to (2).
  4. Repeat (2) and (3) until |s1s3| < 1 and |h3| < 1, and obtain the required sample size n = [s3] + 1, where [s] is the largest integer smaller than s.

If we do not have prior information on the effect sizes, we may want to assume equal effect sizes {delta}j = {delta} (>0) for . In this case, Equation (7) is reduced to

and, by solving h(n) = 0, we obtain a closed form formula:

(8)
where {alpha}* = r1f/{m0(1 – f)} and ß* = 1 – r1/m1. Note that Equation (8) is the conventional sample size formula when we want to detect an effect size of {delta} with power 1 – ß* while controlling the type I error level at {alpha}*.

In summary, our sample size calculation proceeds as follows:

  1. Specify the input parameters:
    1. f = FDR level;
    2. r1 = number of true rejections;
    3. ak = allocation proportion for group k(=1,2);
    4. m = total number of genes for testing;
    5. m1 = number of prognostic genes (m0 = mm1);
    6. = effect sizes for prognostic genes.

  2. Obtain the required sample size:
    1. If the effect sizes are constant {delta}j = {delta} for ,

      where {alpha}* = r1f/{m0(1 – f)} and ß* = 1 – r1/m1.

    2. Otherwise, solve h(n) = 0 using the bisection method, where

      and {alpha}* = r1f/{m0(1–f)}.


Given sample sizes n1 and n2, one may want to check how many true rejections are expected as if we want to check the power in a conventional testing. In this case, we solve the equations for r1. For example, when the effect sizes are constant, {delta}j = {delta} for , we solve the equation

with respect to r1, where {alpha}*(r1) = r1f/{m0(1 – f)} and ß*(r1) = 1 – r1/m1.

EXAMPLE 1
(One-sided tests and constant effect sizes) Suppose that we want to design a microarray study on m = 4000 candidate genes, among which about m1 = 40 genes are expected to be differentially expressing between two patient groups. Note that m0 = mm1 = 3960. Constant effect sizes, {delta}j = {delta} = 1, for the m1 prognostic genes are projected. About equal number of patients are expected to enter the study from each group, i.e. a1 = a2 = 0.5. We want to discover r1 = 24 prognostic genes by one-sided tests with the FDR controlled at f = 1% level. Then

and ß* = 1 – 24/40 = 0.4, so that z{alpha}* = 3.841 and zß* = 0.253. Hence, from Equation (8), the required sample size is given as

or n1 = n2 = 34.

EXAMPLE 2
(One-sided tests and varying effect sizes) We assume (m,m1,a1,r1,f) = (4000,40,0.5,24,0.01), {delta}j = 1 for 1 ≤ j ≤ 20 and {delta}j = 1/2 for 21 ≤ j ≤ 40. Then

and z{alpha}* = 3.841, so that we have

Table 2 displays the bisection procedure with starting values s1 = 100 and s2 = 200. The procedure stops after seven iterations and gives n = [147.7] + 1 = 148.


View this table:
[in this window]
[in a new window]
 
Table 2 The bisection procedure for Example 2

 

3.1 Two-sided tests
Suppose one wants to test Hj:{delta}j = 0 against j:{delta}j != 0. We reject Hj if |Tj| > z{alpha}/2 for a certain {alpha} level, and obtain the power function . In this case, {alpha}* is the same as that for one-sided test case, i.e.

but Equation (7) is changed to

(9)

If the effect sizes are constant, i.e. {delta}j = {delta} for , then we have a closed form formula

(10)
where {alpha}* = r1f/{m0(1–f)} and ß* = 1 – r1/m1.

Now we derive the relationship between the sample size for one-sided test case and that for two-sided test case. Suppose that the input parameters m, m1, a1 and are fixed and we want r1 true rejections in both cases. Without loss of generality, we assume that the effect sizes are non-negative. The only difference between the two cases is the parts of {alpha}* in Equation (7) and {alpha}*/2 in Equation (9). Let f1 and f2 denote the FDR levels for one- and two-sided testing cases, respectively. Then, the two formulae will give exactly the same sample size as far as these two parts are identical, i.e.

which yields f1 = f2/(2 – f2). In other words, with all other parameters fixed, the sample size for two-sided tests to control the FDR at f can be obtained using the sample size formula for one-sided tests [Equation (7)] by setting the target FDR level at f/(2 – f). Note that this value is slightly larger than f/2. The same relationship holds when the effect sizes for prognostic genes are constant.

EXAMPLE 3
(Two-sided tests and constant effect sizes) We assume (m,m1,{delta},a1,r1,f) = (4000,40,1,0.5,24,0.01) as in Example 1, but we want to use two-sided tests here. Then

and ß* = 1 – 24/40 = 0.4, so that z{alpha}*/2 = 4.008 and zß* = 0.253. Hence, from Equation (10), the required sample size is given as

By the above argument, we obtain exactly the same sample size using formula (8) and f = 0.01/(2 – 0.01) = 0.005025. Note that this sample size is slightly larger than n = 68 which was obtained for one-sided tests in Example 1.

3.2 Exact formula based on t-distribution
If the gene expression level, or its transformation, is a normal random variable and the available resources are so limited that only a small sample size can be considered, then one may want to use the exact formula based on t-distributions, rather than that based on normal approximation. In one-sided testing case, Equation (5) will be modified to

where T{nu},{eta}(t) is the survivor function for the non-central t-distribution with {nu} degrees of freedom and non-centrality parameter {eta}, and is the upper 100{alpha}-th percentile of the central t-distribution with {nu} degrees of freedom. The required sample size n for r1 true rejections with the FDR controlled at f solves hT(n)=0, where

and {alpha}* = r1f/{m0(1 – f)}. If the effect sizes are constant among the prognostic genes, then the equation reduces to

but, contrary to the normal approximation case, we do not have a closed form sample size formula since n is included in both the degrees of freedom and the non-centrality parameter of the t-distribution functions.

Similarly, the sample size for two-sided t-tests can be obtained by solving hT(n) = 0, where

and {alpha}* = r1f/{m0(1 f)}. Note that the sample size for FDR = f with two-sided testings is the same as that for FDR = f/(2 – f) with one-sided testings as in the testing based on normal approximation.


    4 NUMERICAL STUDIES
 TOP
 Abstract
 1 INTRODUCTION
 2 FALSE DISCOVERY RATE
 3 SAMPLE SIZE CALCULATION
 4 NUMERICAL STUDIES
 5 DISCUSSION
 REFERENCES
 
In order to investigate the accuracy of the proposed sample size formula, we conducted extensive simulation studies. We set m = 4000, m1 = 40 or 200, constant effect sizes {delta} = 0.5 or 1, and a1 = 0.5 or 0.7. We want r1 to be 30, 60 or 90% of m1 while controlling the FDR level at f = 1, 5 or 10% using one-sided p-values. Given a design setting, we first calculate the sample size n using formula (8),which is based on normal approximation, and then generate N = 5000 samples of size n from independent normal distributions under the same setting. From each simulation sample, the number of true rejections are counted while controlling the FDR at the specified level using the Storey's approach discussed in Section 2 with {lambda} = 0.5. The first, second and third quartiles, Q1, Q2 and Q3, of the observed true rejections, , are estimated from the 5000 simulation samples. Table 3 reports n and the three quartiles of for each design setting. We observe that n increases in |a1 – 1/2| and r1, and decreases in {delta} and FDR. The median, Q2, of is close to the nominal r1 overall except when (a1,m1,{delta},r1) = (0.5,200,1,60) or (0.7,200,1,60), for which n is relatively small and r1 tends to be overestimated. With a large n, r1 is very accurately estimated, i.e. Q2 is close to r1 and the interquartile range (Q3Q1) is small. The interquartile range of increases in r1, but does not seem to be much dependent on the FDR level.


View this table:
[in this window]
[in a new window]
 
Table 3 Sample size n for r1 (=30, 60 or 90% of m1) true rejections at FDR = 1, 5 or 10% level by one-sided tests when m = 4000, m1 = 40 or 200, {delta} = 0.5 or 1, a1 = 0.5 or 0.7

 
Figure 1 displays the empirical distribution of from 5000 simulations. With (a1,m1,{delta}) fixed at (0.5,40,0.5), the four figures are generated for (1) (r1,FDR) = (12,0.01), (2) (r1,FDR) = (12,0.1), (3) (r1,FDR) = (36,0.01) and (4) (r1,FDR) = (36,0.1). Note that is distributed around the nominal r1 under each setting. The distributions are truncated by 0 from below and m1 = 40 from above, so that they will be skewed to the right if r1 is close to 0 and to the left if r1 is close to m1. As mentioned above, the distribution of does not seem to depend on FDR, and has less dispersion with a larger r1.



View larger version (23K):
[in this window]
[in a new window]
 
Fig. 1 Distribution of the observed number of true rejections, , from 5000 simulations under (a1,m1,{delta})=(0.5,40,0.5) and (r1,FDR) = (12,0.01) in (a); (12,0.1) in (b); (36,0.01) in (c); (36,0.1) in (d).

 
Now, we consider a case where we have pilot data. Golub et al. (1999) explored m = 6810 genes extracted from bone marrow in n = 38 patients, of which n1 = 27 with acute lymphoblastic leukaemia and n2 = 11 with acute myeloid leukaemia, in order to identify the susceptible genes with potential clinical heterogeneity in the two subclasses of leukaemia. Suppose that we use the dataset from this study as pilot data in designing a new study with the same study objective. For gene j(=1,...,6810), we calculated the sample means j, j and the sample variances

and estimated the effect sizes

from the pilot data. In order to reflect the variability of the estimated effect sizes and for a slightly conservative sample size, we multiply 0.6 with the observed effect sizes, i.e. , in the following sample size calculation. We assume that the top m1 = 50 genes with the largest effect sizes in absolute value are prognostic. Suppose that we want to identify 60% of the prognostic genes, i.e. r1 = 0.6 x 50 = 30, while controlling the FDR at f = 1% level using two-sided P-values. Based on the pilot data, we set a1 = 0.7({approx}27/38)and m = 7000. In this case, we have

so that z{alpha}*/2=4.088. From Equation (9), we solve

using the bisection method, and obtain n = 58, or (n1,n2) {approx} (41,17). We generated 5000 simulation samples of size n = 58 under the design setting, and observed the quartiles Q2(Q1,Q3) = 30(28,33) from the empirical distribution of the observed true rejections. Note that the median Q2 exactly matches the projected r1 in this case. Table 4 reports the sample sizes under different settings: m1 = 50 or 100; r1/m1 = 0.3or 0.6; and FDR = 1, 5 or 10%.


View this table:
[in this window]
[in a new window]
 
Table 4 Sample size n for r1 (=30 or 60% of m1) true rejections at FDR = 1, 5 or 10% level by two-sided P-values when m = 7000, m1 = 50 or 100, a1 = 0.7

 
A referee raised a question about the accuracy of our sample size estimate when the gene expression data are correlated or have other distributions than normal distributions. In order to address this issue, we at first consider normal gene expression data with a block compound symmetry (CS) correlation structure: there are 400 independent blocks, and each block consists of 10 dependent genes with a CS structure and correlation coefficient {rho} = 0.6. The first half of Table 5 reports the distribution of the empirical true rejections under (a1,m1,{delta}) = (0.5,40,1) and r1 = 12 or 24. We assume that the prognostic genes belong to the first four blocks. Note that the estimated sample sizes are given in Table 3 under the same design settings. From Table 5, we observe that the median Q2 of the observed true rejections is close to the nominal r1 as in the independent data case. However, the interquartile range is almost doubled from that under independence, from ~5 to ~10. In the second set of simulations, we generate gene expression data from a correlated asymmetric distribution: for b = 1,...,400 and 10(b – 1) + 1 ≤ j ≤ 10b,

where {rho} = 0.6 and ek,1,...,ek,4000,{varepsilon}k,1,...,{varepsilon}k,400 are i.i.d. random variables from the {chi}2-distribution with 2 degrees of freedom. Note that both (X1,...,Xm) and (Y1,...,Ym) have marginal variances 1, and the same block CS correlation structure as in the above correlated normal data case. The second half of Table 5 reports the simulation results. We observe almost the same results as in the correlated normal data case. Benjamini and Yekutieli (2001) investigate general distributional assumptions for the control of FDR.


View this table:
[in this window]
[in a new window]
 
Table 5 Simulation results from normal or mixture of -distributions with 400 independent blocks and CS correlation structure with {rho} = 0.6 within each block of size 10

 

    5 DISCUSSION
 TOP
 Abstract
 1 INTRODUCTION
 2 FALSE DISCOVERY RATE
 3 SAMPLE SIZE CALCULATION
 4 NUMERICAL STUDIES
 5 DISCUSSION
 REFERENCES
 
Microarray has been a major high-throughput assay method to display DNA or RNA abundance for a large number of genes concurrently. Discovery of the prognostic genes should be made taking multiplicity into account, but also with enough statistical power to identify important genes successfully. Owing to the costly nature of microarray experiments, however, often only a small sample size is available and the resulting data analysis does not give reliable answers to the investigators. If the findings from a small study look promising, a large-scale study may be developed to confirm the findings using appropriate statistical tools. Our sample size formula will play the role in the design stage of such a confirmatory study. It can be used to check the statistical power, r1/m1, of a small-scale pilot study too.

The proposed method is to calculate the sample size for a specified number of true rejections (or the expected number of true rejections given a sample size) while controlling the FDR at a given level. The input variables to be pre-specified are total number of genes for testing m, projected number of prognostic genes m1, allocation proportions ak between groups and effect sizes for the prognostic genes. The method does not require any heavy computation, such as Monte Carlo simulations, so that we get a sample size in a second. Especially, if the effect sizes among the prognostic genes are the same, we have a closed form formula that can be calculated using a scientific calculator and a normal distribution table. The proposed method can be used to design a new study based on the parameter values estimated from the pilot data.

It is shown through simulations that the formula based on normal approximation works well overall, even when the expression levels are weakly correlated or have skewed distributions. If there exists dependency among the genes, the observed number of true rejections tends to have a wide variation around the nominal r1. The computer program for sample size calculation is available from the author.


    Acknowledgments
 
The author wants to thank the two reviewers for their valuable comments.

Received on December 8, 2004; revised on March 30, 2005; accepted on April 6, 2005

    REFERENCES
 TOP
 Abstract
 1 INTRODUCTION
 2 FALSE DISCOVERY RATE
 3 SAMPLE SIZE CALCULATION
 4 NUMERICAL STUDIES
 5 DISCUSSION
 REFERENCES
 

    Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B, 57, 289–300.

    Benjamini, Y. and Yekutieli, D. (2001) The control of the false discovery rate in multiple testing under dependency. Ann. Stat., 29, 1165–1188[CrossRef].

    Black, M.A. and Doerge, R.W. (2002) Calculation of the minimum number of replicate spots required for detection of significant gene expression fold change in microarray experiments. Bioinformatics, 18, 1609–1616[Abstract/Free Full Text].

    Cui, X. and Churchill, G.A. (2003) How many mice and how many arrays? Replication in mouse cDNA microarray experiments. Methods of Microarray DataAnalysis II, , Norwell, MA Kluwer Academic Publishers, pp. 139–154.

    Dudoit, S., et al. (2003) Multiple hypothesis testing in microarray experiments. Stat. Sci., 18, 71–103[CrossRef][Web of Science].

    Gadbury, G.L., et al. (2004) Power and sample size estimation in high dimensional biology. Stat. Meth. Med. Res., 13, 325–338[Abstract/Free Full Text].

    Genovese, C. and Wasserman, L. (2002) Operating characteristics and extensions of the false discovery rate procedure. J. R. Stat. Soc. Ser. B, 64, 499–517[CrossRef].

    Golub, T.R., et al. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531–537[Abstract/Free Full Text].

    Jung, S.H., et al. (2005) Sample size calculation for multiple testing in microarray data analysis. Biostatistics, 6, 157–169[Abstract].

    Lee, M.L.T. and Whitmore, G.A. (2002) Power and sample size for DNA microarray studies. Stat Med., 21, 3543–3570[CrossRef][Web of Science][Medline].

    Müller, P., et al. (2004) Optimal sample size for multiple testing: the case of gene expression microarrays. J. Am. Stat. Assoc., 99, 990–1001[CrossRef].

    Pan, W., et al. (2002) How many replicated of arrays are required to detect geneexpression changes in microarray experiments? A mixture model approach. Genome Biol., 3, 1–10.

    Simon, R., et al. (2002) Design of studies with DNA microarrays. Genet. Epidemiol., 23, 21–36[CrossRef][Web of Science][Medline].

    Storey, J.D. (2002) A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B, 64, 479–498[CrossRef].

    Storey, J.D. (2003) The positive false discovery rate: a Bayesian interpretation and the q-value. Ann. Stat., 31, 2013–2035[CrossRef].

    Technical Report 2001-28 Storey, J.D. and Tibshirani, R. (2001) Estimating false discovery rates under dependence, with applications to DNA microarrays. , CA Department of Statistics, Stanford University.

    Storey, J.D. and Tibshirani, R. (2003) SAM thresholding and false discovery rates for detecting differential gene expression in DNA microarrays. In Parmigiani, G., Garrett, E.S., Irizarry, R.A., Zeger, S.L. (Eds.). The Analysis of Gene Expression Data: Methods and Software, , New York Springer.

    Storey, J.D., et al. (2004) Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. J. R. Stat. Soc. Ser. B, 66, 187–205[CrossRef].

    van den Oord, E.J.C.G. and Sullivan, P.F. (2003) A framework for controlling false discovery rates and minimizing the amount of genotyping in gene-finding studies. Hum. Hered., 56, 188–199[Web of Science][Medline].

    Wolfinger, R.D., et al. (2001) Assessing gene significance from cDNA microarray expression data via mixed models. J. Comput. Biol., 8, 625–637[CrossRef][Web of Science][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BiostatisticsHome page
P. de Valpine, H.-M. Bitter, M. P. S. Brown, and J. Heller
A simulation-approximation approach to sample size planning for high-dimensional classification studies
Biostat., July 1, 2009; 10(3): 424 - 435.
[Abstract] [Full Text] [PDF]


Home page
BiostatisticsHome page
Y. Lai
A moment-based method for estimating the proportion of true null hypotheses and its application to microarray gene expression data
Biostat., October 1, 2007; 8(4): 744 - 755.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
W. Zhang, A. Carriquiry, D. Nettleton, and J. C.M. Dekkers
Pooling mRNA in microarray experiments and its effect on power
Bioinformatics, May 15, 2007; 23(10): 1217 - 1224.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
P. Liu and J. T. G. Hwang
Quick calculation for sample size while controlling false discovery rate with application to microarray analysis
Bioinformatics, March 15, 2007; 23(6): 739 - 746.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
G. J. M. Rosa, N. de Leon, and A. J. M. Rosa
Review of microarray experimental design strategies for genetical genomics studies
Physiol Genomics, December 13, 2006; 28(1): 15 - 23.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S.-H. Jung and W. Jang
How accurately can we control the FDR in analyzing microarray data?
Bioinformatics, July 15, 2006; 22(14): 1730 - 1736.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
S. B. Pounds
Estimation and control of multiple testing error rates for microarray studies
Brief Bioinform, March 1, 2006; 7(1): 25 - 36.



Home page
BioinformaticsHome page
S. Pounds and C. Cheng
Sample size determination for the false discovery rate
Bioinformatics, December 1, 2005; 21(23): 4263 - 4271.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
21/14/3097    most recent
bti456v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (20)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Jung, S.-H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Jung, S.-H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?