Bioinformatics Advance Access originally published online on April 21, 2005
Bioinformatics 2005 21(14):3097-3104; doi:10.1093/bioinformatics/bti456
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sample size for FDR-control in microarray data analysis
Department of Biostatistics and Bioinformatics, CALGB Statistical Center Hock Plaza, Suite 802,2424 Erwin Road Duke University Durham, NC 27705, USA
| Abstract |
|---|
|
|
|---|
Summary: We consider identifying differentially expressing genes between two patient groups using microarray experiment. We propose a sample size calculation method for a specified number of true rejections while controlling the false discovery rate at a desired level. Input parameters for the sample size calculation include the allocation proportion in each group, the number of genes in each array, the number of differentially expressing genes and the effect sizes among the differentially expressing genes. We have a closed-form sample size formula if the projected effect sizes are equal among differentially expressing genes. Otherwise, our method requires a numerical method to solve an equation. Simulation studies are conducted to show that the calculated sample sizes are accurate in practical settings. The proposed method is demonstrated with a realstudy.
Contact: jung005{at}mc.duke.edu
| 1 INTRODUCTION |
|---|
|
|
|---|
Microarray method has been widely used for identifying differentially expressing genes, called prognostic genes, in the subjects with different types of disease. Statistical procedures to identify differentially expressing genes involve a serious multiple comparison problem since we perform as many hypothesis testings as the number of the candidate genes in microarrays. If we use a type I error rate
in each testing, then the probability to reject any hypothesis will greatly exceed the intended overall
level. In order to avoid this pitfall, two approaches are widely used: false discovery rate (FDR) control and family-wise error rate (FWER)control. Sample size calculation is a critical procedure when designing a microarray study. There have been several publications on sample size estimation in the microarray context, e.g. Simon et al. (2002). Some focused on exploratory and approximate relationships among statistical power, sample size (or the number of replicates) and effect size (often, in terms of fold-change), and used the most conservative Bonferroni adjustment for controlling FWER (the probability to discover one or more genes when none of the genes under consideration is prognostic) without any attempt to incorporate the underlying correlation structure (Wolfinger et al., 2001; Black and Doerge, 2002; Pan et al., 2002; Cui and Churchill, 2003). Jung et al. (2005) incorporated the correlation structure to derive an accurate sample size when controlling the FWER.
Some researchers proposed a new concept of testing error called FDR, defined as the expected value of the proportion of the non-prognostic genes among the discovered genes (Benjamini and Hochberg, 1995; Storey, 2002). Controlling this quantity relaxes the multiple testing criteria compared with controlling the FWER in general, and consequently increases the number of declared significant genes. Operating and numerical characteristics of FDR are elucidated in recent publications (Genovese and Wasserman, 2002; Dudoit et al., 2003).
Lee and Whitmore (2002) considered multiple group cases, including the two-sample case, using ANOVA models and derived the relationship between the effect sizes and the FDR based on a Bayesian perspective. They discuss a power analysis without involving the multiple testing issue. Müller et al. (2004) chose a pair of testing errors, including FDR, and minimized one while controlling the other at a specified level using a Bayesian decision rule. They proposed a simulation algorithm to demonstrate the relationship between the sample size and the chosen testing errors based on asymptotic results. This approach requires specification of complicated parametric models for prior and data distributions, and extensive computing for the Bayesian simulations. Most of the existing studies for FDR-control do not show the explicit relationship between the sample size and the effect sizes because of different reasons. For example, Lee and Whitmore (2002) and Gadbury et al. (2004) modelled a distribution of p-values from pilot studies to produce sample size estimates but did not provide an explicit sample size formula. None of the aforementioned studies based on FDR evaluated their sample sizes using simulations.
In this paper, we propose a sample size estimation procedure for FDR-control. We derive the sample size required for a specified number of true rejections (i.e. identifying the prognostic genes) while controlling the FDR at a desired level. As input parameters, we specify the allocation proportions between two groups, the total number of candidate genes, the number of prognostic genes, the effect sizes of the prognostic genes in addition to the required number of true rejections and the FDR level. In general, our procedure requires solving an equation using a numerical method, such as the bisection method. However, if the effect sizes are equal among all prognostic genes, the equation can be solved to give a closed form formula. We review the background of FDR and its estimation method in Section 2, and propose a new sample size method in Section 3. In Section 4, we discuss simulation studies that are conducted to show that the calculated sample sizes are accurate, and demonstrate an application of our method to a real study. van den Oord and Sullivan (2003) considered a similar setting for sample size calculation, but their formulation is so general that they do not provide an explicit formula in any specific case.
| 2 FALSE DISCOVERY RATE |
|---|
|
|
|---|
Suppose that we conduct m multiple tests, of which the null hypotheses are true for m0 tests and the alternative hypotheses are true for m1(=m m0) tests. The tests declare that, of the m0 null hypotheses, A0 hypotheses are null (true negative) and R0 hypotheses are alternative (false rejection, false discovery or false positive). Among the m1 alternative hypotheses, A1 are declared null (false negative) and R0 are declared alternative (true rejection, true discovery or true positive). Table 1 summarizes the outcome of m hypothesis tests.
|
Benjamini and Hochberg (1995) define the FDR as
![]() | (1) |
![]() | (2) |
pFDR, which will be defined below).
If m = m0, then FDR = 1 by any critical value with Pr(R = 0) = 0. Pointing out this issue, Storey and Tibshirani (2003) defines the second factor in the right-hand side of Equation (2) as pFDR,
![]() |
1 with a large m, so that pFDR is equivalent to FDR. We accept this argument in this paper and do not distinguish between FDR and pFDR. Hence, definitions (1) and (2) are considered to be equal. We observed R > 0 in all of the simulations conducted in Section 4. Benjamini and Hochberg (1995) propose a multi-step procedure to control the FDR at a specified level. However, this is known to be conservative, and the conservativeness increases in m0, see, e.g. Storey et al. (2004).
Suppose that, in the j-th testing, we reject the null hypothesis Hj if the p-value pj is smaller than or equal to
(0,1). Assuming independence of the m p-values, we have
![]() |
, where m1op(m)
0 in probability as m
(Storey, 2002). Ignoring the error term, we have
![]() | (3) |
. Given
, estimation of FDR by Equation (3) requires estimation of m0.
For the estimation of m0, Storey (2002) assumes that the histogram of m p-values is a mixture of m0 p-values that are corresponding to the true null hypotheses and following U(0,1) distribution, and m1 p-values that are corresponding to the alternative hypotheses and expected to be close to 0. Consequently, for a chosen constant
away from 0, none (or few, if any) of the latter m1 p-values will fall above
, so that the number of p-values above
,
, can be approximated by the expected frequency among the m0 p-values above
from U(0,1) distribution, i.e. m0/(1
). Hence, given
, m0 is estimated by
![]() |
![]() |
For an observed p-value pj, Storey (2002) defines the q-value, the minimum FDR level at which we reject Hj, as
![]() |
![]() |
) is strictly increasing in
, see Theorem 2 of Storey and Tibshirani (2003). Supporting Material shows that this assumption holds if the power function of the individual tests is concave in
, which is the case when the test statistics follow the standard normal distribution under the null hypotheses. We reject Hj (or, equivalently, discover gene j) if qj is smaller than or equal to the prespecified FDR level. The independence assumption among m test statistics was relaxed to independence only among m0 test statistics corresponding to the null hypotheses by Storey and Tibshirani (2001) and to weak independence among all m test statistics by Storey and Tibshirani (2003) and Storey et al. (2004). These approaches are implemented in the statistical package called SAM (see Storey and Tibshirani, 2003).
| 3 SAMPLE SIZE CALCULATION |
|---|
|
|
|---|
Let
and
denote the set of genes for which the null and alternative hypotheses are true, respectively. Note that the cardinalities of
and
are m0 and m1, respectively. Since the estimated FDR is invariant to the order of the genes, we may rearrange the genes and set
and
.
By Storey (2002) and Storey and Tibshirani (2001) for large m and under independence (or weak dependence) among the test statistics, we have
![]() |
for h = 0,1,
j(
) = P(pj
) is the marginal power of the single
-test applied to gene
. So, from (3), we have
![]() | (4) |
Let Xij (Yij) denote the expression level of gene j for subject i in group 1 (and group 2, respectively) with common variance
. We consider two-sample t-tests,
![]() |
j and
j are sample means of {Xij,i = 1,...,n1} and {Yij,i = 1,...,n2}, respectively, and
is the pooled sample variance. We assume a large sample (i.e. nk
), so that Tj
N(0,1) for
. Let n=n1+n2 denote the total sample size, and ak=nk/n the allocation proportion for group k.
Let
j denote the effect size for gene j in the fraction of its standard error, i.e.
![]() |
j = 0 against
j:
j > 0, by assuming
j > 0 for
and
j = 0 for
. The two-sided testing case is briefly discussed at the end of this section. Note that, for large n,
for
, so that we have
![]() |
denotes the survivor function and
is the upper 100
-th percentile of N(0,1). Hence, Equation (4) is expressed as
![]() | (5) |
j, n and |a11/2|. Further, FDR is increasing in
(see Supporting Material). If the effect sizes are equal among the prognostic genes, FDR is increasing in
0=m0/m. It is easy to show that FDR increases from 0 to m0/m as
increases from 0 to 1.
At the design stage of a study, m is decided by the microarray chips chosen for experiment and m1,
and a1 are projected based on experience or from pilot data if any. The only variables undecided in Equation (5) are
and n. With all other design parameters fixed, FDR is controlled at a certain level by the chosen
level. So, we want to find the sample size n that will guarantee a certain number, say r1(
m1), of true rejections with FDR controlled at a specified level f.
In Equation (5), the expected number of true rejections is
![]() | (6) |
![]() |
, we obtain
![]() |
* is the marginal type I error level for r1 true rejections with the FDR controlled at f. With
and E(R1) replaced by
* and r1, respectively, Equation (6) yields an equation h(n) = 0, where
![]() | (7) |
- Choose s1 and s2 such that 0 < s1 < s2 and h1 h2 < 0, where hk = h(sk) for k = 1,2. (If h1 h2 > 0 and h1 > 0, then choose a smaller s1; if h1 h2 > 0 and h2 < 0, then choose a larger s2.)
- For s3 = (s1 + s2)/2, calculate h3 = h(s3).
- If h1h3 < 0, then replace s2 and h2 with s3 and h3, respectively. Else, replace s1 and h1 with s3 and h3, respectively. Go to (2).
- Repeat (2) and (3) until |s1s3| < 1 and |h3| < 1, and obtain the required sample size n = [s3] + 1, where [s] is the largest integer smaller than s.
If we do not have prior information on the effect sizes, we may want to assume equal effect sizes
j =
(>0) for
. In this case, Equation (7) is reduced to
![]() |
![]() | (8) |
* = r1f/{m0(1 f)} and ß* = 1 r1/m1. Note that Equation (8) is the conventional sample size formula when we want to detect an effect size of
with power 1 ß* while controlling the type I error level at
*. In summary, our sample size calculation proceeds as follows:
- Specify the input parameters:
- f = FDR level;
- r1 = number of true rejections;
- ak = allocation proportion for group k(=1,2);
- m = total number of genes for testing;
- m1 = number of prognostic genes (m0 = m m1);
= effect sizes for prognostic genes.
- f = FDR level;
- Obtain the required sample size:
- If the effect sizes are constant
j =
for
,
where
* = r1f/{m0(1 f)} and ß* = 1 r1/m1.
- Otherwise, solve h(n) = 0 using the bisection method, where
and
* = r1f/{m0(1f)}.
- If the effect sizes are constant
Given sample sizes n1 and n2, one may want to check how many true rejections are expected as if we want to check the power in a conventional testing. In this case, we solve the equations for r1. For example, when the effect sizes are constant,
j =
for
, we solve the equation
![]() |
*(r1) = r1f/{m0(1 f)} and ß*(r1) = 1 r1/m1.
EXAMPLE 1
(One-sided tests and constant effect sizes) Suppose that we want to design a microarray study on m = 4000 candidate genes, among which about m1 = 40 genes are expected to be differentially expressing between two patient groups. Note that m0 = m m1 = 3960. Constant effect sizes,j =
= 1, for the m1 prognostic genes are projected. About equal number of patients are expected to enter the study from each group, i.e. a1 = a2 = 0.5. We want to discover r1 = 24 prognostic genes by one-sided tests with the FDR controlled at f = 1% level. Then
and ß* = 1 24/40 = 0.4, so that z
* = 3.841 and zß* = 0.253. Hence, from Equation (8), the required sample size is given as
or n1 = n2 = 34.
EXAMPLE 2
(One-sided tests and varying effect sizes) We assume (m,m1,a1,r1,f) = (4000,40,0.5,24,0.01),j = 1 for 1
j
20 and
j = 1/2 for 21
j
40. Then
and z
* = 3.841, so that we have
Table 2 displays the bisection procedure with starting values s1 = 100 and s2 = 200. The procedure stops after seven iterations and gives n = [147.7] + 1 = 148.
View this table:
[in this window]
[in a new window]
Table 2 The bisection procedure for Example 2
3.1 Two-sided tests
Suppose one wants to test Hj:
j = 0 against
j:
j
0. We reject Hj if |Tj| > z
/2 for a certain
level, and obtain the power function
. In this case,
* is the same as that for one-sided test case, i.e.
![]() |
![]() | (9) |
If the effect sizes are constant, i.e.
j =
for
, then we have a closed form formula
![]() | (10) |
* = r1f/{m0(1f)} and ß* = 1 r1/m1.
Now we derive the relationship between the sample size for one-sided test case and that for two-sided test case. Suppose that the input parameters m, m1, a1 and
are fixed and we want r1 true rejections in both cases. Without loss of generality, we assume that the effect sizes are non-negative. The only difference between the two cases is the parts of
* in Equation (7) and
*/2 in Equation (9). Let f1 and f2 denote the FDR levels for one- and two-sided testing cases, respectively. Then, the two formulae will give exactly the same sample size as far as these two parts are identical, i.e.
![]() |
EXAMPLE 3
(Two-sided tests and constant effect sizes) We assume (m,m1,,a1,r1,f) = (4000,40,1,0.5,24,0.01) as in Example 1, but we want to use two-sided tests here. Then
and ß* = 1 24/40 = 0.4, so that z
*/2 = 4.008 and zß* = 0.253. Hence, from Equation (10), the required sample size is given as
By the above argument, we obtain exactly the same sample size using formula (8) and f = 0.01/(2 0.01) = 0.005025. Note that this sample size is slightly larger than n = 68 which was obtained for one-sided tests in Example 1.
3.2 Exact formula based on t-distribution
If the gene expression level, or its transformation, is a normal random variable and the available resources are so limited that only a small sample size can be considered, then one may want to use the exact formula based on t-distributions, rather than that based on normal approximation. In one-sided testing case, Equation (5) will be modified to
![]() |
,
(t) is the survivor function for the non-central t-distribution with
degrees of freedom and non-centrality parameter
, and
is the upper 100
-th percentile of the central t-distribution with
degrees of freedom. The required sample size n for r1 true rejections with the FDR controlled at f solves hT(n)=0, where
![]() |
* = r1f/{m0(1 f)}. If the effect sizes are constant among the prognostic genes, then the equation reduces to
![]() |
Similarly, the sample size for two-sided t-tests can be obtained by solving hT(n) = 0, where
![]() |
* = r1f/{m0(1 f)}. Note that the sample size for FDR = f with two-sided testings is the same as that for FDR = f/(2 f) with one-sided testings as in the testing based on normal approximation. | 4 NUMERICAL STUDIES |
|---|
|
|
|---|
In order to investigate the accuracy of the proposed sample size formula, we conducted extensive simulation studies. We set m = 4000, m1 = 40 or 200, constant effect sizes
= 0.5 or 1, and a1 = 0.5 or 0.7. We want r1 to be 30, 60 or 90% of m1 while controlling the FDR level at f = 1, 5 or 10% using one-sided p-values. Given a design setting, we first calculate the sample size n using formula (8),which is based on normal approximation, and then generate N = 5000 samples of size n from independent normal distributions under the same setting. From each simulation sample, the number of true rejections are counted while controlling the FDR at the specified level using the Storey's approach discussed in Section 2 with
= 0.5. The first, second and third quartiles, Q1, Q2 and Q3, of the observed true rejections,
, are estimated from the 5000 simulation samples. Table 3 reports n and the three quartiles of
for each design setting. We observe that n increases in |a1 1/2| and r1, and decreases in
and FDR. The median, Q2, of
is close to the nominal r1 overall except when (a1,m1,
,r1) = (0.5,200,1,60) or (0.7,200,1,60), for which n is relatively small and r1 tends to be overestimated. With a large n, r1 is very accurately estimated, i.e. Q2 is close to r1 and the interquartile range (Q3 Q1) is small. The interquartile range of
increases in r1, but does not seem to be much dependent on the FDR level.
|
Figure 1 displays the empirical distribution of
from 5000 simulations. With (a1,m1,
) fixed at (0.5,40,0.5), the four figures are generated for (1) (r1,FDR) = (12,0.01), (2) (r1,FDR) = (12,0.1), (3) (r1,FDR) = (36,0.01) and (4) (r1,FDR) = (36,0.1). Note that
is distributed around the nominal r1 under each setting. The distributions are truncated by 0 from below and m1 = 40 from above, so that they will be skewed to the right if r1 is close to 0 and to the left if r1 is close to m1. As mentioned above, the distribution of
does not seem to depend on FDR, and has less dispersion with a larger r1.
|
Now, we consider a case where we have pilot data. Golub et al. (1999) explored m = 6810 genes extracted from bone marrow in n = 38 patients, of which n1 = 27 with acute lymphoblastic leukaemia and n2 = 11 with acute myeloid leukaemia, in order to identify the susceptible genes with potential clinical heterogeneity in the two subclasses of leukaemia. Suppose that we use the dataset from this study as pilot data in designing a new study with the same study objective. For gene j(=1,...,6810), we calculated the sample means
j,
j and the sample variances
![]() |
![]() |
, in the following sample size calculation. We assume that the top m1 = 50 genes with the largest effect sizes in absolute value are prognostic. Suppose that we want to identify 60% of the prognostic genes, i.e. r1 = 0.6 x 50 = 30, while controlling the FDR at f = 1% level using two-sided P-values. Based on the pilot data, we set a1 = 0.7(
27/38)and m = 7000. In this case, we have
![]() |
*/2=4.088. From Equation (9), we solve
![]() |
(41,17). We generated 5000 simulation samples of size n = 58 under the design setting, and observed the quartiles Q2(Q1,Q3) = 30(28,33) from the empirical distribution of the observed true rejections. Note that the median Q2 exactly matches the projected r1 in this case. Table 4 reports the sample sizes under different settings: m1 = 50 or 100; r1/m1 = 0.3or 0.6; and FDR = 1, 5 or 10%.
|
A referee raised a question about the accuracy of our sample size estimate when the gene expression data are correlated or have other distributions than normal distributions. In order to address this issue, we at first consider normal gene expression data with a block compound symmetry (CS) correlation structure: there are 400 independent blocks, and each block consists of 10 dependent genes with a CS structure and correlation coefficient
= 0.6. The first half of Table 5 reports the distribution of the empirical true rejections under (a1,m1,
) = (0.5,40,1) and r1 = 12 or 24. We assume that the prognostic genes belong to the first four blocks. Note that the estimated sample sizes are given in Table 3 under the same design settings. From Table 5, we observe that the median Q2 of the observed true rejections is close to the nominal r1 as in the independent data case. However, the interquartile range is almost doubled from that under independence, from
5 to
10. In the second set of simulations, we generate gene expression data from a correlated asymmetric distribution: for b = 1,...,400 and 10(b 1) + 1
j
10b,
![]() |
= 0.6 and ek,1,...,ek,4000,
k,1,...,
k,400 are i.i.d. random variables from the
2-distribution with 2 degrees of freedom. Note that both (X1,...,Xm) and (Y1,...,Ym) have marginal variances 1, and the same block CS correlation structure as in the above correlated normal data case. The second half of Table 5 reports the simulation results. We observe almost the same results as in the correlated normal data case. Benjamini and Yekutieli (2001) investigate general distributional assumptions for the control of FDR.
|
| 5 DISCUSSION |
|---|
|
|
|---|
Microarray has been a major high-throughput assay method to display DNA or RNA abundance for a large number of genes concurrently. Discovery of the prognostic genes should be made taking multiplicity into account, but also with enough statistical power to identify important genes successfully. Owing to the costly nature of microarray experiments, however, often only a small sample size is available and the resulting data analysis does not give reliable answers to the investigators. If the findings from a small study look promising, a large-scale study may be developed to confirm the findings using appropriate statistical tools. Our sample size formula will play the role in the design stage of such a confirmatory study. It can be used to check the statistical power, r1/m1, of a small-scale pilot study too.
The proposed method is to calculate the sample size for a specified number of true rejections (or the expected number of true rejections given a sample size) while controlling the FDR at a given level. The input variables to be pre-specified are total number of genes for testing m, projected number of prognostic genes m1, allocation proportions ak between groups and effect sizes for the prognostic genes. The method does not require any heavy computation, such as Monte Carlo simulations, so that we get a sample size in a second. Especially, if the effect sizes among the prognostic genes are the same, we have a closed form formula that can be calculated using a scientific calculator and a normal distribution table. The proposed method can be used to design a new study based on the parameter values estimated from the pilot data.
It is shown through simulations that the formula based on normal approximation works well overall, even when the expression levels are weakly correlated or have skewed distributions. If there exists dependency among the genes, the observed number of true rejections tends to have a wide variation around the nominal r1. The computer program for sample size calculation is available from the author.
| Acknowledgments |
|---|
The author wants to thank the two reviewers for their valuable comments.
Received on December 8, 2004; revised on March 30, 2005; accepted on April 6, 2005
| REFERENCES |
|---|
|
|
|---|
Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B, 57, 289300.
Benjamini, Y. and Yekutieli, D. (2001) The control of the false discovery rate in multiple testing under dependency. Ann. Stat., 29, 11651188[CrossRef].
Black, M.A. and Doerge, R.W. (2002) Calculation of the minimum number of replicate spots required for detection of significant gene expression fold change in microarray experiments. Bioinformatics, 18, 16091616
Cui, X. and Churchill, G.A. (2003) How many mice and how many arrays? Replication in mouse cDNA microarray experiments. Methods of Microarray DataAnalysis II, , Norwell, MA Kluwer Academic Publishers, pp. 139154.
Dudoit, S., et al. (2003) Multiple hypothesis testing in microarray experiments. Stat. Sci., 18, 71103[CrossRef][ISI].
Gadbury, G.L., et al. (2004) Power and sample size estimation in high dimensional biology. Stat. Meth. Med. Res., 13, 325338
Genovese, C. and Wasserman, L. (2002) Operating characteristics and extensions of the false discovery rate procedure. J. R. Stat. Soc. Ser. B, 64, 499517[CrossRef].
Golub, T.R., et al. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531537
Jung, S.H., et al. (2005) Sample size calculation for multiple testing in microarray data analysis. Biostatistics, 6, 157169[Abstract].
Lee, M.L.T. and Whitmore, G.A. (2002) Power and sample size for DNA microarray studies. Stat Med., 21, 35433570[CrossRef][ISI][Medline].
Müller, P., et al. (2004) Optimal sample size for multiple testing: the case of gene expression microarrays. J. Am. Stat. Assoc., 99, 9901001[CrossRef].
Pan, W., et al. (2002) How many replicated of arrays are required to detect geneexpression changes in microarray experiments? A mixture model approach. Genome Biol., 3, 110.
Simon, R., et al. (2002) Design of studies with DNA microarrays. Genet. Epidemiol., 23, 2136[CrossRef][ISI][Medline].
Storey, J.D. (2002) A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B, 64, 479498[CrossRef].
Storey, J.D. (2003) The positive false discovery rate: a Bayesian interpretation and the q-value. Ann. Stat., 31, 20132035[CrossRef].
Technical Report 2001-28 Storey, J.D. and Tibshirani, R. (2001) Estimating false discovery rates under dependence, with applications to DNA microarrays. , CA Department of Statistics, Stanford University.
Storey, J.D. and Tibshirani, R. (2003) SAM thresholding and false discovery rates for detecting differential gene expression in DNA microarrays. In Parmigiani, G., Garrett, E.S., Irizarry, R.A., Zeger, S.L. (Eds.). The Analysis of Gene Expression Data: Methods and Software, , New York Springer.
Storey, J.D., et al. (2004) Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. J. R. Stat. Soc. Ser. B, 66, 187205[CrossRef].
van den Oord, E.J.C.G. and Sullivan, P.F. (2003) A framework for controlling false discovery rates and minimizing the amount of genotyping in gene-finding studies. Hum. Hered., 56, 188199[ISI][Medline].
Wolfinger, R.D., et al. (2001) Assessing gene significance from cDNA microarray expression data via mixed models. J. Comput. Biol., 8, 625637[CrossRef][ISI][Medline].
This article has been cited by other articles:
![]() |
Y. Lai A moment-based method for estimating the proportion of true null hypotheses and its application to microarray gene expression data Biostat., October 1, 2007; 8(4): 744 - 755. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Zhang, A. Carriquiry, D. Nettleton, and J. C.M. Dekkers Pooling mRNA in microarray experiments and its effect on power Bioinformatics, May 15, 2007; 23(10): 1217 - 1224. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Liu and J. T. G. Hwang Quick calculation for sample size while controlling false discovery rate with application to microarray analysis Bioinformatics, March 15, 2007; 23(6): 739 - 746. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. J. M. Rosa, N. de Leon, and A. J. M. Rosa Review of microarray experimental design strategies for genetical genomics studies Physiol Genomics, December 13, 2006; 28(1): 15 - 23. [Abstract] [Full Text] [PDF] |
||||
![]() |
S.-H. Jung and W. Jang How accurately can we control the FDR in analyzing microarray data? Bioinformatics, July 15, 2006; 22(14): 1730 - 1736. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. B. Pounds Estimation and control of multiple testing error rates for microarray studies Brief Bioinform, March 1, 2006; 7(1): 25 - 36. |
||||
![]() |
S. Pounds and C. Cheng Sample size determination for the false discovery rate Bioinformatics, December 1, 2005; 21(23): 4263 - 4271. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||








































-distributions with 400 independent blocks and CS correlation structure with 


