Skip Navigation


Bioinformatics Advance Access originally published online on November 25, 2004
Bioinformatics 2005 21(8):1502-1508; doi:10.1093/bioinformatics/bti162
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/8/1502    most recent
bti162v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (15)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Tsai, C.-A.
Right arrow Articles by Chen, J. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Tsai, C.-A.
Right arrow Articles by Chen, J. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2004. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

Sample size for gene expression microarray experiments{dagger}

Chen-An Tsai 1, Sue-Jane Wang 2, Dung-Tsa Chen 3 and James J. Chen 1,*

1Division of Biometry and Risk Assessment, National Center for Toxicological Research, Food and Drug Administration Jefferson, AR 72079, USA
2Division of Biometrics II, Office of Biostatistics, Center for Drug Evaluation and Research, Food and Drug Administration Rockville, MD 20857, USA
3Biostatistics and Bioinformatics Unit, University of Alabama at Birmingham 153 Wallace Tumor Institute, Birmingham, AL 35294, USA

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 

Motivation: Microarray experiments often involve hundreds or thousands of genes. In a typical experiment, only a fraction of genes are expected to be differentially expressed; in addition, the measured intensities among different genes may be correlated. Depending on the experimental objectives, sample size calculations can be based on one of the three specified measures: sensitivity, true discovery and accuracy rates. The sample size problem is formulated as: the number of arrays needed in order to achieve the desired fraction of the specified measure at the desired family-wise power at the given type I error and (standardized) effect size.

Results: We present a general approach for estimating sample size under independent and equally correlated models using binomial and beta-binomial models, respectively. The sample sizes needed for a two-sample z-test are computed; the computed theoretical numbers agree well with the Monte Carlo simulation results. But, under more general correlation structures, the beta-binomial model can underestimate the needed samples by about 1–5 arrays.

Contact: jchen{at}nctr.fda.gov


    1 INTRODUCTION
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
DNA microarray technology provides tools for studying the expression profiles of a large number of distinct genes simultaneously. A universal objective in microarray experiments is to identify a subset of genes that are differentially expressed between the different samples (treatments) of interest. A common approach is to perform a univariate analysis of treatment mean differences for each gene, and then identify the significant genes. Univariate statistical methods, such as bootstrap or permutation test (e.g., Dudoit et al., 2002) have been used to identify significant genes. Before conducting a microarray experiment, one important issue that needs to be decided is the number of arrays required in order for the results to be statistically interpretable. Many authors have proposed methods for sample size and power calculations (e.g., Lee and Whitmore, 2002; Zien et al., 2003; Wang and Chen, 2004; Gadbury et al., 2004; Muller et al., 2004).

The problem of calculating sample size needed in microarray experiments is similar to the sample size and power calculation in clinical trials or other scientific experiments. Nevertheless, microarray experiments often involve hundreds or thousands of genes, and in a typical experiment only a fraction of genes is expected to be differentially expressed; in addition, the measured intensities among different genes may be correlated. Since a large number of tests is made, using nominal significance levels without multiplicity adjustment could lead to a high chance of false positive findings. There are two approaches to choosing a significance level, the family-wise error rate (FWE) and the false discovery rate (FDR). The FWE approach is to control the probability of making one or more false positives (e.g., Westfall and Young, 1993). The FWE approach could present a problem in analysis since the analysis tends to screen out all but a handful of genes that show extreme differential expressions if the number of genes is very large. The FDR approach concerns controlling the expectation of false positives among the positives declared (Benjamini and Hochberg, 1995). The FDR criterion is a reasonable compromise since making a few wrong decisions is generally not a serious problem as long as the majority of decisions are correct. In this paper, the sample size calculation is based on fixing the expected number of false positives using the individual comparison-wise error rate.

A general goal in microarray experiments is to determine relationships between genes or gene clusters for identifying biological functions or studying relationships between genes and samples for predicting specific biological outcomes (or diseases). When the objective is to identify a few genes for prediction or follow-up confirmation, then the control of the false positive rate, either the family-wise or false discovery error rate, is appropriate. When the objective is to develop genetic profiles, then a procedure that will select a large number of potentially differentially expressed genes is more appropriate. Sample size planning should be based on the experimental objective. This paper introduces three measures, sensitivity, true discovery (positive predictivity) and accuracy rates, for different experimental objectives. We propose a general procedure for estimating sample size in microarray studies. The methodology takes account of multiple testing and dependence among differentially expressed genes.


    2 METHODS
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
Let m denote the number of genes studied in an array of which m0 and m1 are the numbers of non-differentially and differentially expressed genes, respectively. Let {alpha} be the comparison-wise error (CWE) rate for each individual test. Assume an equal probability that a truly differential expressed gene is significant, denoted by (1 – ß) (individual comparison-wise power). Given the significance level {alpha}, the results of m tests can be summarized as a 2 x 2 table (Table 1).


View this table:
[in this window]
[in a new window]
 
Table 1 Possible outcomes when testing m hypotheses

 
The fraction V/m0 is the proportion of genes that are declared significant among all genes that are, in fact, not differentially expressed; U/m1 is the proportion of genes that are declared significant among all genes that are truly differentially expressed. The fraction U/m1 represents the sensitivity and 1 – V/m0 represents the specificity. V/R is the proportion of declared significant genes that are not differentially expressed. U/R is the proportion of declared significant genes that are truly differentially expressed (positive predictivity or predictive value of a positive test), and S/A is the proportion of not-declared significant genes that are, indeed, not differentially expressed (negative predictability or predictive value of a negative test). The fractions V/R, U/R and (U + S)/m represent the proportions of false discovery, true discovery and accuracy, respectively.

A number of statistical error-controlled criteria have been proposed to determine differentially expressed genes in terms of V and/or R. The FWE is the probability of at least one false rejection Pr(V ≥ 1), regardless of the number of genes tested. The simplest FWE method is the Bonferroni correction by setting the CWE level at {alpha} = FWE/m. When m is very large [the probability P(V ≥ 1) can be large], the use of the FWE criterion may require a large number of arrays. Alternatively, Benjamini and Hochberg (1995) defined the false discovery rate as the expectation of V/R; that is, FDR = E(V/R) for R > 0 and FDR=0 for R = 0. Benjamini and Hochberg (2000) proposed an FDR-controlled procedure by finding a CWE level such that {alpha} ≤ r · q*/m0, where q* is the desired FDR level and r is the number of significances (pi ≤ {alpha}) from the observed data set. Delongchamp et al. (2004) proposed an ROC-like approach based on minimizing the total cost from making false positive (v/m0) and false negative errors (t/m1). Recently, Muller et al. (2004) proposed a simulation approach to evaluate the expected false negative rate and power across sample sizes under a decision framework.

In this paper, the sample size formulation is based on the criterion of fixing the expected number of false positives E(V) or setting the CWE level at {alpha} = v/m0, where v is a pre-specified number of false positives (1 or 2). For example, using {alpha} = v/1,000 = 0.001 would expect one false positive with 1000 non-differentially expressed genes tested. This approach is also considered by Lee and Whitmore (2002). They provided an approximation for the relation between v and FWE: v {approx}ln(1–FWE). For example, v = 1 approximately corresponds to FWE = 0.632.

2.1 Sample size calculation under independence model
Assume the intensity responses among m genes are independent. The outcome (significance or non-significance) of a univariate test on a differentially expressed gene can be modelled by a Bernoulli Ui with success probability (1 – ß). Given m0 and m1, the random variable for the number of true discoveries is binomially distributed Bin (m1, 1 – ß). The probability of at least u true discoveries can be computed by summing the binomial probabilities:

(1)
{varphi} is interpreted as the (family-wise) power of identifying at least u out of m1 truly differentially expressed genes for the given CWE {alpha} and power (1 – ß). Equation (1) is the fundamental formula for the sample size calculation under the independent model.

For a single gene, the sample size problem is typically formulated as the number of arrays needed in order to ensure the power (1 – ß) of detecting the mean difference (standardized effect size) {delta} at a pre-specified Type I error {alpha}. For the two-sample z-test given {alpha} and {delta}, the number of arrays needed in each group in order to achieve the desired power (1 – ß) is

(2)
where Z{alpha} is the {alpha} percentile of a standard normal distribution. For simultaneous analysis of a set of genes, sample size depends on {alpha}, (1 – ß) and {delta} of each individual gene. We assume an equal standardized effect size for all differentially expressed genes; thus, the power (1 – ß) of detecting any differentially expressed gene is constant. Given {alpha}, to detect all differentially expressed genes (u = m1) at the desired family-wise power of {varphi} the required comparison-wise power (1 – ß) for individual genes can be calculated from Equation (1): 1 – ß = {varphi}1/m1. When m1 is moderate or large, (1 – ß) is close to 1, even for small {varphi}. For example, when m1 = 100 and {varphi} = 0.6, then 1 – ß = 0.995. Thus, a large number of arrays would be needed in order to identify all differentially expressed genes. We propose a less stringent criterion by identifying (at least) a specified fraction of truly differentially expressed genes at the desired family-wise power, conditional on fixing the number of false positives. Depending on the experimental objective, the fraction can be based on one of the three measures, sensitivity (u/m1), true discovery (u/r) and accuracy (m0v+u)/m.

The use of sensitivity measure {lambda} = u/m1 for gene selection is recently proposed by Wang and Chen (2004). Specifically, to detect at least the fraction {lambda} of truly differentially expressed genes at the desired family-wise power {varphi}{lambda}, the required comparison-wise power (1 – ß) for an individual gene is calculated by solving Equation (1):

where u = [m1 {lambda}], and the notation [m1 {lambda}] denotes the largest integer less than m1 {lambda}. The sample size needed, then, can be calculated using Equation (2) for given {alpha} and {delta}. Wang and Chen (2004) tabulated the required sample sizes for one- and two-sample t-tests for various combinations of {alpha}, {delta}, {lambda}, {pi}1 and m, where {pi}1 = m1/m.

The true discovery rate {eta} = u/r is an alternative measure for gene selection. To achieve at least the true discovery rate {eta} at the desired family-wise power {varphi}{eta}, the comparison-wise power (1 – ß) can be computed by setting u = v · {eta}/(1 – {eta}) in Equation (1). Let {eta} = 1 – q*, where q* = v/r is the FDR. The criterion to achieve at least the true discovery rate {eta} is equivalent to the control of the FDR at q*. In terms of the FDR criterion, for specified v and q*, the number of true discoveries should be at least u = v(1/q* – 1) in order to have FDR ≤ q*. The third measure is the overall accuracy rate {tau} = (m0v + u)/m. The criterion to achieve at least an overall accuracy rate of {tau} is equivalent to identifying at least u = m {tau}m0 + v = m({tau}{pi}0) + v truly differentially expressed genes, where {pi}0 = m0/m. Since m is large and v is small, {tau} should be greater than {pi}0.

2.2. Sample size calculation under dependence
This assumption of independence may not be realistic since the measured intensities among different genes are likely to be correlated. It may be reasonable to assume that measured intensities among the non-differentially expressed genes as well as the intensities between differentially and non-differentially expressed genes are independent. We further assume the intensities among the differentially expressed genes are equally correlated, that is, E(Ui) = 1 – ß, Var(Ui) = ß(1 – ß) and Corr(Ui,Uj) = {theta}. The mean and variance of the total number of true discoveries are E(U) = m1 (1 – ß) and Var(U) = m1ß(1 – ß)[1 + {theta}(m1 – 1)]. The distribution of U is a generalized binomial. If {theta} = 0, then U becomes a binomial. When {theta} > 0, U has the over-dispersion binomial, and {theta} is known as the intra-cluster correlation. The commonly known beta-binomial distribution is used to model the number of true discoveries U:

where B(a,b) = {Gamma}(a){Gamma}(b)/{Gamma}(a + b), where {Gamma}(·) is the gamma function, a > 0 and b > 0. Under the re-parameterization (1 – ß) = a/(a + b) and {theta} = (a + b + 1)–1, the mean and variance of U are E(U) = m1a/(a + b) and Var(U) = m1ab/(a + b)2 [1 + (m1 – 1)/(a + b + 1)]. The probability of at least u true discoveries can be computed by summing the beta-binomial probabilities:

(3)
Equation (3) is the fundamental formula for the sample size calculation under the equally correlated model. For given {varphi} and {theta}, the required comparison-wise power (1 – ß) = a/(a + b) is obtained by solving the beta-binomial tailed probability [Equation (3)]. The needed sample size, then, can be computed using Equation (2).

In practice, the intra-cluster correlation {theta} between Bernoulli random variables Ui and Uj needs to be estimated. Let t1,...,tm1 be the test statistics corresponding to the m1 random variables U1,...,Um1; each pair (ti,tj) is bivariate normally distributed with a common correlation {rho}. The correlation coefficient between Ui and Uj (1 ≤ i,j ≤ m1) is {theta} = [c – (1 – ß)2]/[ß(1 – ß)], where and F is the standard bivariate normal distribution function with correlation coefficient {rho} (Ahn and Chen, 1995). Thus, the relation between the correlation of the bivariate normal {rho} and intra-cluster correlation {theta} under the beta-binomial model is

(4)
It should be noted that since {theta} depends on {alpha}, {rho}, {delta} and n, calculation of sample size requires solving Equations (2), (3) and (4) simultaneously. Finally, the assumption of an equal correlation is approximated by using the mean correlations

(5)

2.3 Estimating the number of non-differentially expressed genes
Hsueh et al. (2003) investigated three methods and proposed two additional methods, a least squares (LS) and a mean of differences (MD) method, to estimate m0. They showed via Monte Carlo simulation that the LS method that used information from both null and alternative distributions was the least biased method. The MD method and the Benjamini and Hochberg (2000) method were conservative which over-estimated m0. The FWE and FDR approaches focus on the false positive errors, the conservative estimates such as the MD estimate should be used to ensure the control of FWE or FDR. We use the MD estimate to illustrate the sample size and power calculation for an example data set in the next section.

It should be noted that the MD estimate assumes that the majority of the genes are expected to be non-differentially expressed. In planning a new experiment, the proposed procedure needs to either estimate or have prior knowledge of the number of differentially expressed genes.


    3 RESULTS
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
3.1 Sample size needed in a two-sample comparison
In clinical trials, sample size estimation often uses a z-test approach [Equation (2)] assuming a known targeted variability of the effect size learned from earlier clinical experience. We illustrate the sample size calculation in a common microarray experiment to study changes in gene expression between a control and a treatment group. The proposed procedure can be extended to other tests such as the one-sample or two-sample t-test, or the k-sample ANOVA F-tests.

We consider a balanced design with an equal number of arrays in the control and treatment groups. The parameter values used are as follows: m = 10 000 and 1000; {pi}1 = 0.05, 0.10 and 0.20; {delta} = 2; v = 1 and 2; {lambda} = 0.8, 0.9 and 0.95; {eta} = 0.95 and 0.99; {tau} = 0.95 and 0.99; {varphi}{lambda} = {varphi}{eta} = {varphi}{tau} = 0.80 and 0.90. The intra-cluster correlations are {theta} = 0.00 and 0.22 for binomial and beta-binomial models, respectively. Note that the relation between {rho} and {theta} depends on {delta} and n [Equation (4)]. The {theta} = 0.22 is used to simplify computation. More details on the {rho} and {theta}, and on {rho} and n are given in Sections 3.3 and 4. The numbers of arrays needed are tabulated in Tables 2 and 3 for the binomial (independent) and beta-binomial (equally correlated) models, respectively.


View this table:
[in this window]
[in a new window]
 
Table 2 Sample size needed for each group in order to achieve the sensitivity ({lambda}), true discovery ({eta}) or accuracy ({tau}) at the desired family-wise power {varphi} = 80 and 90% at the expected number of false positive v = 1 and 2 for m =10 000 and 1000

 

View this table:
[in this window]
[in a new window]
 
Table 3 Sample size needed for each group in order to achieve the sensitivity ({lambda}), true discovery ({eta}), or accuracy ({tau}) at the desired family-wise power {varphi} = 80 and 90% at the expected number of false positive v = 1 and 2 for m = 10 000 and 1000

 
For the sensitivity measure ({lambda}), the independent model (Table 2) and correlated model (Table 3) show very similar patterns. The numbers of arrays needed are between 8 and 16 under the independent model and between 9 and 18 under the correlated model. The correlated model generally needs one or two additional arrays. The needed number of arrays decreases as v increases or m decreases, as expected. It appears that the unknown fraction {pi}1 does not have much effect on the needed sample size.

For the true discovery measure ({eta}), similar to {lambda} the unknown fraction {pi}1 does not have much effect on the needed sample size. But, unlike {lambda} the sample size can increase or decrease, depending on the specified values of v and m, and the model. When m = 10 000, the number of arrays needed are between 2 and 7 under the independent model and between 4 and 10 under the correlated model. That is, more arrays are needed under the correlated model. However, when m = 1000, the number of arrays needed are between 3 and 21 under the independent model and between 4 and 16 under the correlated model. Fewer arrays are needed under the correlated model. In general, the needed number of arrays increases as v increases from 1 to 2 with few exceptions; for example, the number needed decreases when m = 1000, {varphi} = 0.9 and {pi}1 = 5 and 10%. The number needed increases considerably for {pi}1 = 20%. It is of interest to know that the numbers of arrays needed are less than 10 when {eta} = 0.95 corresponding to FDR (q* = 0.05) except for v = 2, m = 1000 and {pi}1 = 10%.

The accuracy measure ({tau}) and sensitivity ({lambda}) have very similar patterns. The needed number of arrays decreases as v increases or m decreases; and the correlated model needs more arrays than the independent model. But, unlike the sensitivity measure, the number of arrays needs increases considerably as {pi}1 increases. Much more arrays are needed to achieve the 99% accuracy rate. It is noteworthy to point out that for {tau} = 0.95 and {pi}1 = 5%, the needed number of array is 1 or 2 under the independent model. An explanation is that to achieve an accuracy rate of {tau} it suffices to identify u = m({tau}{pi}0)+v differentially expressed genes; for {tau} = 0.95 and {pi}0 = 0.95, then u = 1. Thus, one or two arrays per group would meet this criterion.

3.2 Simulation experiment
A Monte Carlo simulation was conducted to validate the tabulated theoretical results in Tables 2 and 3. We only present the results for the correlated model with m = 10 000, {pi}1 = 0.1 and v = 1. In the equally correlated model, 9000 genes were generated under the null model N(0,1) and the remaining 1000 genes were generated under the equi-correlated multivariate normal ({rho} = 0.25) with the effect size {delta} = 2. The corresponding range of {theta} is between 0.18 and 0.58. The number of arrays considered ranged from 2 to 20. These numbers covered the range reported in Tables 2 and 3. One thousand samples were simulated in each case. For each simulated sample, we computed the z-statistic. A gene was declared as significant if its p-value was less than or equal to {alpha} = 1/9000 = 0.000111, based on the two-sided test. We then calculated the numbers of false positives (v), false negatives (t), true positives (u) and true negatives (s) out of the 1000 repetitions. The empirical estimates of {lambda}, {eta} and {tau} and the empirical estimates of family-wise powers {varphi}{lambda}, {varphi}{eta} and {varphi}{tau} were computed. The powers were estimated as the probability that the number of significant genes achieved at least the desired family power {varphi}. For example, for {varphi}{lambda} = 90%, the empirical power was calculated as the proportion of times out of the 1000 iterations such that at least 90% of truly differentially expressed genes were significant. Simulations were conducted in Alpha workstation using FORTRAN 90.

Table 4 shows the simulation results from the equi-correlated model. These results also generally agree with the results shown in Table 3. For examples, for {varphi}{lambda} = 0.8, the numbers of arrays needed are 13, 15 and 16 for {lambda} = 0.80, 0.90 and 0.95, respectively. The corresponding estimates (Table 4) are 0.891, 0.944 and 0.962 with the estimated powers 0.91, 0.86 and 0.76, respectively. Note that the power 0.76 is slightly smaller than the desired {varphi} = 80%. For {varphi}{tau} = 0.8, the number of arrays needed are 10 and 15 (Table 3) for {tau} = 0.95 and 0.99, respectively. The corresponding estimates (Table 4) are 0.973 and 0.994 with the empirical powers 0.96 and 0.86, respectively. Note that the estimated power 0.96 is larger than the desired 80%. Table 4 indicates that and with n = 9. That is, nine per group appears to be sufficient.


View this table:
[in this window]
[in a new window]
 
Table 4 The empirical estimates of the sensitivity ({lambda}), true discovery ({eta}) and accuracy ({tau}), and empirical family-wise power, proportion of times, out of 1000, in which the number of genes selected achieved the desired fraction for m = 10 000 and m1 = 1000, and {pi}1 = 0.10, {delta} = 2, n = 220, under the equally correlated model ({rho} = 0.25)

 
To evaluate the robustness of the method derived from the equally correlated model, further simulations were conducted under the models with general correlation structures. Tables 5 and 6 are the simulation results from two general correlation models. In the first model, the 1000 differentially expressed genes were divided into 10 equally correlated groups with 100 genes per group. The correlations for the ten groups are 0.05, 0.10,...,0.50. The correlations between genes from different groups are zero. Thus, the correlation matrix for the 1000 genes consists of 10 100 x 100 diagonal blocks, denoted by {Sigma}1. The second correlation model is unstructured, denoted by {Sigma}2. The pairwise correlations among the 1000 genes were randomly generated from a beta(2,6) distribution. (We also examined a model with two equally correlated groups, the results are similar to the 10 equally correlated model. Simulation results on the independent model and the four correlated models are available from the authors.)


View this table:
[in this window]
[in a new window]
 
Table 5 The empirical estimates of the sensitivity ({lambda}), true discovery ({eta}) and accuracy ({tau}), and empirical family-wise power, proportion of times, out of 1000, in which the number of genes selected achieved the desired fraction for m = 10 000 and m1 = 1000, and {pi}1 = 0.10, {delta} = 2, n = 220, under the correlation model {Sigma}1 with 10 equally correlated groups

 

View this table:
[in this window]
[in a new window]
 
Table 6 The empirical estimates of the sensitivity ({lambda}), true discovery ({eta}) and accuracy ({tau}), and empirical family-wise power, proportion of times, out of 1000, in which the number of genes selected achieved the desired fraction for m = 10 000 and m1 = 1000, and {pi}1 = 0.10, {delta} = 2, n = 220, under unstructured correlation model, {Sigma}2

 
Tables 5 and 6 show that the sample sizes calculated under the beta-binomial model generally underestimate the needed sample sizes. For example, for {varphi}{lambda} = 0.8 in Table 3, the numbers of arrays needed are 13, 15, and 16 for {lambda} = 0.80, 0.90 and 0.95, respectively. The needed sample sizes for model {Sigma}1 in Table 5 are 16, 18 and 20 with corresponding estimates of 0.850, 0.925 and 0.963 and with estimated powers of 0.88, 0.84 and 0.81, respectively. The needed sample sizes for model {Sigma}2 in Table 6 are 17, 19 and 20 with corresponding estimates of 0.893, 0.950 and 0.996 and with estimated powers of 0.88, 0.87 and 0.80, respectively. For {varphi}{tau} = 0.8 in Table 3, the number of arrays needed are 5 and 6 for {eta} = 0.95 and 0.99, respectively. The needed sample sizes for model {Sigma}1 in Table 5 are 6 and 8 with corresponding estimates of 0.985 and 0.996 and with estimated powers 0.93 and 0.86, respectively. The needed sample sizes for model {Sigma}2 in Table 6 are 6 and 9 with corresponding estimates of 0.965 and 0.993 and with estimated powers of 0.86 and 0.89, respectively. In summary, for the correlation models and the parameters considered in Tables 5 and 6, the equally correlated beta-binomial model can underestimate the needed sample sizes by 1–3 arrays with the true discovery measure and by 3–4 arrays with the sensitivity or accuracy measure.

3.3 Application
We use the well-known colon tumor data set (Alon et al., 1999) for illustration. This data set consists of 22 normal and 40 tumor colon tissue samples on 2000 human genes with highest minimal intensity across the 62 samples. The two-sample t-statistic with unequal variances was used:

where and are the sample means (sample variances) of gene i in the control and treatment groups, respectively. The p-values were computed based on 200 000 permutations. The estimate of the number of non-differentially expressed genes from the mean of differences (MD) method was . We first studied the effect of sample size on the estimates of {lambda}, {eta} and {tau}. We randomly selected 10, 15 and 20 arrays without replacement from each sample. In addition, we also considered the un-balanced design with the ratio 1:2; that is, the numbers of arrays for the tumors samples were 20, 30 and 40. Two significance levels were evaluated {alpha} = 1/1721 and 2/1721 which corresponded to v = 1 and 2, respectively.

Table 7 shows the observed numbers of significant genes (r) and the estimated , and . As the number of arrays increases, all three measures increase. Also, when {alpha} increases from 1/1721 to 2/1721, both the sensitivity and accuracy increase, but the true discovery rate decreases. The maximum sensitivity is only about 25%. The true discovery and accuracy rates are 97.2 and 89.5%, respectively. For the complete data set, is 98.2% for v = 1 and is 97.2% for v = 2. The corresponding FDR estimates are 1.8 and 2.8% with the number of significances 56 and 72, respectively. These results are consistent with the results of Tables 26. Furthermore, applying the Benjamini and Hochberg FDR-controlled procedure (2000), the number of significances is 121 for q* = 5% and is 28 for q* = 1%.


View this table:
[in this window]
[in a new window]
 
Table 7 The number of significant genes (r) and the estimated sensitivity (), true discovery () and accuracy () for different sample sizes (normal, tumor): balanced design (10,10), (15,15), (20,20); unbalanced design (10,20), (15,30), (20,40), (22,40)

 
Next, we used this data set to illustrate the sample size calculation in a pilot study in planning a new experiment. The estimated parameter values are assumed to be true values. Based on and , the 2000 genes can be divided into two sets according to their p-values. That is, the set of differentially expressed genes consisted of the 279 smallest p-values. The estimated proportion of differentially expressed gene is 279/2000 = 0.139. The distributions of the standardized effect sizes among the 279 differentially expressed genes were calculated. The 25-, 50-, and 75-percentiles of the standardized effect sizes were 0.70, 0.79 and 0.92, respectively; and the mean is 0.85. The average of the estimated pairwise correlations were {rho} = 0.441 using Equation (5). Based on the above values, we set {pi}1 = 0.15, {delta} = 0.8, {rho} = 0.45, and v = 1. Given {pi}1, {delta}, {theta} and v and the specified measure {lambda}, {eta}, or {tau} and the desired family-wise power {varphi}, the number of arrays needed was calculated by solving the Equations (2), (3) and (4) simultaneously. Table 8 shows the number of arrays needed for {varphi} = 0.8 and 0.9, and {lambda} = 0.8, 0.9; {eta} = 0.95, 0.99. These numbers are consistent with the previous results. The true discovery measure needs the least number of arrays. The numbers needed for {lambda} = 0.90 and {tau} = 0.99 are similar.


View this table:
[in this window]
[in a new window]
 
Table 8 Sample size needed for each group in order to achieve the sensitivity ({lambda}), true discovery ({eta}) or accuracy ({tau}) at the desired family-wise power {varphi} = 80 and 90% at the significance level {alpha} = 1/1700 for m =2,000, {pi}1 = 15%, {delta} = 0.8, and {rho} = 0.45, these values being based on the estimates from the colon tumor data set

 

    4 DISCUSSION
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
Current methods proposed for power and sample size analyses, to our knowledge, do not consider the dependency of expression levels among genes. When the tests are correlated, the number of significances can be modelled by a generalized binomial model. In this paper, we propose using the beta-binomial to model the number of true positives (u). The beta-binomial distribution is mathematically tractable. Its probability can be easily computed using the beta or gamma functions. The beta-binomial model assumes an equal standardized effect size and an equal correlation among the differentially expressed genes. The assumed equal standardized effect size can be replaced by the minimum, mean or some percentile of the standardized effect size among all genes. Using the minimum will give a conservative estimate on the number of replicates. Depending on the study objective, the standardized effect size can also be the expected log intensity ratio divided by a pre-specified variability or by maximum variability resulting in conservative estimates. With regard to the correlations among genes, the binomial model likely results in underestimation of the true needed sample size.

The proposed approach is based on fixing the expected number of false positives E(v). This approach is a generalization of the Bonferroni approach. Setting {alpha} = FWE/m0 in the Bonferroni approach can be regarded as fixing the expected number (fraction) of false positives at FWE < 1. When the expression levels among the non-differentially expressed genes are independent, the number of false positives v has the binomial distribution Bin(m0, {alpha}). For specified v and {alpha} = v/m0, the number of false positives in a given experiment should be close to v since m0 is large. However, if the expression levels among the non-differentially expressed genes are correlated, then the number of false positives v becomes an over-dispersed (generalized) binomial. Let {theta}' > 0 be the average correlation among the non-differentially expressed genes, the number of false positive has an over-dispersed binomial with the variance m0 {alpha} (1 – {alpha}) [1 + {theta}'(m0 – 1)]. Thus, the observed number of false positives can be much larger or much smaller than v. In other words, when the test statistics are dependent, the actual Type I error rate may not be close to its nominal level.

In summary, microarray experiments have been widely used in many biomedical applications. Different studies have different objectives. Different objectives require different experimental design and analysis strategy. Microarray data analysis involves multiple testing of thousand genes. Unlike the analysis of multiple endpoints in clinical trials which focuses on the control of false positive error rate, the microarray data consists of a large number of genes that are not expected to be different. Three measures for gene identification are proposed for sample size planning. If the objective is to identify a small number of truly differentially expressed biomarker genes for prediction or for further confirmation, then the true discovery measure with small v(v ≤ 1) can be used. If the objective is to develop genetic profiling, then the sensitivity or accuracy measure is more appropriate. In general, sample size calculation varies with the type of microarrays, experimental designs, number of treatment groups and other factors. The simulation approach given in the example can be used to estimate the number of arrays needed for the choice of the parameter specifications, underlying model and statistical method. Once a proper statistical model and method of analysis are determined, the number of arrays needed can be computed by a simulation method for a range of parameter values and correlation structures of interest.


    Footnotes
 
{dagger}The views presented in this paper are those of the authors and not necessarily representing those of the US Food and Drug Administration. Back

Received on October 28, 2004; revised on November 16, 2004; accepted on November 17, 2004

    REFERENCES
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 

    Ahn, H. and Chen, J.J. (1995) Generation of over-dispersed and under-dispersed binomial variates. J. Comput. Graphical Statist., 4, 55–64.

    Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J. (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl Acad. Sci. USA, 96, 6745–6750[Abstract/Free Full Text].

    Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. B, 57, 289–300.

    Benjamini, Y. and Hochberg, Y. (2000) On the adaptive control of the false discovery rate in multiple testing with independent statistics. J. Educat. Behav. Statist., 25, 60–83[CrossRef].

    Delongchamp, R.R., Bowyer, J.F., Chen, J.J., Kodell, R.L. (2004) Multiple-testing strategy for analyzing cDNA array data on gene expression. Biometrics, 60, 774–782[CrossRef][Web of Science][Medline].

    Dudoit, S., Yang, Y.H., Callow, M.J., Speed, T.P. (2002) Statistical methods for identifying differential expressed genes in replicated cDNA microarray experiments. Statistica Sinica, 12, 111–139[Web of Science].

    Gadbury, G.L., Page, G.P., Edwards, J., Kayo, T., Prolla, T.A., Weindruch, R.W., Permana, P.A., Mountz, J., Allison, D.B. (2004) Power and sample size estimation in high dimensional biology. Statist. Method Med. Res., 13, 325–338.

    Hsueh, H., Chen, J.J., Kodell, R.L. (2003) Comparison of methods for estimating number of true null hypothesis in multiplicity testing. J. Biopharm. Statist., 13, 675–689[CrossRef][Medline].

    Lee, M.-L.T. and Whitmore, G.A. (2002) Power and sample size for DNA microarray studies. Statist. Med., 21, 3543–3570.

    Muller, P., Parmigiani, G., Robert, C., Rousseau, J. (2004) Optimal sample size for multiple testing: the case of gene expression microarrays. J. Am. Statist. Assoc., 99, 990–1001[CrossRef].

    Wang, S.-J. and Chen, J.J. (2004) Sample size for identifying differentially expressed genes in microarray experiments. J. Comput. Biol., 11, 714–726[CrossRef][Web of Science][Medline].

    Westfall, P.H. and Young, S.S. Resampling-Based Multiple Testing, (1993) , New York John Wiley and Sons.

    Zien, A., Fluck, J., Zimmer, R., Lengauer, T. (2003) Microarrays: How many do you need? J. Comput. Biol., 10, 653–667[CrossRef][Web of Science][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BiostatisticsHome page
T. Oura, S. Matsui, and K. Kawakami
Sample size calculations for controlling the distribution of false discovery proportion in microarray experiments
Biostat., October 1, 2009; 10(4): 694 - 705.
[Abstract] [Full Text] [PDF]


Home page
BiostatisticsHome page
P. de Valpine, H.-M. Bitter, M. P. S. Brown, and J. Heller
A simulation-approximation approach to sample size planning for high-dimensional classification studies
Biostat., July 1, 2009; 10(3): 424 - 435.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K.-S. Lynn, L.-L. Li, Y.-J. Lin, C.-H. Wang, S.-H. Sheng, J.-H. Lin, W. Liao, W.-L. Hsu, and W.-H. Pan
A neural network model for constructing endophenotypes of common complex diseases: an application to male young-onset hypertension microarray data
Bioinformatics, April 15, 2009; 25(8): 981 - 988.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Seo, H. Gordish-Dressman, and E. P. Hoffman
An interactive power analysis tool for microarray hypothesis testing and generation
Bioinformatics, April 1, 2006; 22(7): 808 - 814.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
S. B. Pounds
Estimation and control of multiple testing error rates for microarray studies
Brief Bioinform, March 1, 2006; 7(1): 25 - 36.



Home page
BioinformaticsHome page
S. Pounds and C. Cheng
Sample size determination for the false discovery rate
Bioinformatics, December 1, 2005; 21(23): 4263 - 4271.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/8/1502    most recent
bti162v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (15)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Tsai, C.-A.
Right arrow Articles by Chen, J. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Tsai, C.-A.
Right arrow Articles by Chen, J. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?