Skip Navigation


Bioinformatics Advance Access originally published online on April 27, 2006
Bioinformatics 2006 22(14):1730-1736; doi:10.1093/bioinformatics/btl161
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/14/1730    most recent
btl161v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (6)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Jung, S.-H.
Right arrow Articles by Jang, W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Jung, S.-H.
Right arrow Articles by Jang, W.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

How accurately can we control the FDR in analyzing microarray data?

Sin-Ho Jung 1,* and Woncheol Jang 2

1 Department of Biostatistics and Bioinformatics, Duke University NC 27710, USA
2 Institute of Statistics and Decision Sciences, Duke University NC 27705, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 FDR-BASED MULTIPLE TESTING...
 3 A SIMULATION-BASED METHOD
 4 NUMERICAL STUDIES
 5 DISCUSSION
 APPENDIX
 REFERENCES
 

Summary: We want to evaluate the performance of two FDR-based multiple testing procedures by Benjamini and Hochberg (1995, J. R. Stat. Soc. Ser. B, 57, 289–300) and Storey (2002, J. R. Stat. Soc. Ser. B, 64, 479–498) in analyzing real microarray data. These procedures commonly require independence or weak dependence of the test statistics. However, expression levels of different genes from each array are usually correlated due to coexpressing genes and various sources of errors from experiment-specific and subject-specific conditions that are not adjusted for in data analysis. Because of high dimensionality of microarray data, it is usually impossible to check whether the weak dependence condition is met for a given dataset or not. We propose to generate a large number of test statistics from a simulation model which has asymptotically (in terms of the number of arrays) the same correlation structure as the test statistics that will be calculated from the given data and to investigate how accurately the FDR-based testing procedures control the FDR on the simulated data. Our approach is to directly check the performance of these procedures for a given dataset, rather than to check the weak dependency requirement. We illustrate the proposed method with real microarray datasets, one where the clinical endpoint is disease group and another where it is survival.

Contact: jung0005{at}mc.duke.edu


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 FDR-BASED MULTIPLE TESTING...
 3 A SIMULATION-BASED METHOD
 4 NUMERICAL STUDIES
 5 DISCUSSION
 APPENDIX
 REFERENCES
 
Microarrays are high throughput technology measuring the expression levels of a large number of genes simultaneously. Discovering the genes whose expression levels are associated with a clinical endpoint, such as disease type or survival, involves a serious multiple testing problem. Suppose that, for j = 1, ... , m, we want to test the null hypothesis

Formula
against the alternative hypothesis

Formula
Without an appropriate adjustment for the multiplicity, many discoveries will be false positives. Two multiple testing type I error rates have been used to tackle this issue: family-wise error rate (FWER) and false discovery rate (FDR). Usually, expression levels on different genes are correlated due to coexpressing genes and shared noises from experiment-specific and subject-specific conditions that are not adjusted for in analysis. The correlation, together with the high dimensionality, has a strong influence on the testing results controlling these type I errors. FWER is defined as the probability of at least one false positive. So, a multiple testing procedure controlling the FWER requires the null distribution of the test statistics while maintaining the correlation among test statistics. The permutation method proposed by Westfall and Young (1993) usually approximates the null distribution for FWER-control well. Refer to Huang et al. (2005) for the cases where the permutation method may not be appropriate.

Some investigators consider controlling the probability of one false rejection out of thousands of genes to be too strict and advocate an FDR-based multiple testing instead. Suppose that there are m genes among which the null hypotheses, Hj, are true for m0 genes, called non-prognostic genes, and the alternative hypotheses, Formula, are true for m1(= mm0) genes, called prognostic genes. Based on the calculated p-values, we may reject or accept the null hypotheses. Let R denote the number of genes for which the null hypotheses are rejected (discovery), and R0 denote, among these R genes, the number of genes that the null hypotheses are true (false discovery). Then, the FDR is defined as the expected value of the proportion R0/R.

Benjamini and Hochberg (1995) propose a step-up procedure to control the FDR. They prove that this procedure conservatively controls the FDR if the test statistics for the m0 non-prognostic genes are independent. The conservativeness becomes more serious as the proportion of prognostic genes m1/m increases. Benjamini and Yekutieli (2001) loosen the independence assumption among m0 non-prognostic genes to a weak dependence assumption called positive regression.

Pointing out the conservativeness of the Benjamini and Hochberg's procedure, Storey (2002) proposes a procedure that is considered to be more accurately controlling the FDR when m is large as in most microarray data. Under the independence assumption on m genes, he shows that the FDR is approximated by

Formula
and replaces m0 with an estimated value Formula. He later loosens the independence assumption for a weak dependence assumption among whole m genes (Storey, 2003) or among m0 non-prognostic genes (Storey and Tibshirani, 2001). Jung (2005) derives a sample size formula for the Storey's procedure under the weak dependence assumption among m genes.

For an accurate control of the FDR, we need the joint distribution of (R0, R), which will be decided by the joint distribution of the test statistics, under the true effect size for each gene and the dependency among m test statistics. The aforementioned FDR-based procedures have been shown to be reliably controlling the FDR through simulations based on the independence or weak dependence assumptions. However, because of high-dimensionality of microarray data and complexity of the real correlation structure, it is almost impossible to check whether the weak dependence condition holds for a given dataset or not. As an approach to tackling this difficulty, we propose to generate a large number of random vectors from the asymptotic distribution of the test statistics, to apply an FDR-based procedure and to see how accurately the procedure controls the FDR on the simulated data.

When a new dataset is given, we calculate the effect size (mean and standard deviation in case of a two-sample t-test) of each gene. For a chosen m1 value within a range of interest, we identify the top m1 genes in terms of effect size. The simulation is modeled as follows. We specify the effect sizes of these m1 genes by the observed ones from the data. For the remaining m0 = mm1 genes, the effect sizes are set at 0. Lin (2005) proposes a simulation algorithm to generate the null distribution of correlated test statistics conditioning on given data. By modifying his algorithm, we generate a large number of test statistics under the observed effect sizes and correlation structure. For a nominal FDR level to be chosen for analysis, we apply an FDR-based multiple testing procedure to each simulation sample and count the numbers of false and total discoveries. The true FDR is estimated by the empirical average of their proportions through a large number of simulations. By comparing the empirical FDR with the nominal one, we can decide how accurate the data analysis will be. Although the simulated test statistics have the same covariances as those observed from the data, the simulation scheme does not require the preliminary estimation of the covariance matrix. Our method is suggested when a sufficient number of arrays are available. The proposed method is applied to the real datasets by Golub et al. (1999) and Beer et al. (2002).


    2 FDR-BASED MULTIPLE TESTING PROCEDURES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 FDR-BASED MULTIPLE TESTING...
 3 A SIMULATION-BASED METHOD
 4 NUMERICAL STUDIES
 5 DISCUSSION
 APPENDIX
 REFERENCES
 
We assume that, for large number of arrays, the distribution of the test statistics is approximated by a multivariate normal distribution. Let p1, ... , pm denote the p-values for m genes that are calculated by a resampling method or the theoretical distribution of the test statistics. In this paper, we obtain them from the standard normal distribution based on the large sample approximation. Let p(1) ≤···≤ p(m) denote the ordered p-values for m genes, and H(j) for j = 1, ... , m the corresponding null hypotheses. Benjamini and Hochberg (1995) propose to reject H(j) for all j ≤ J = max{j : p(j) ≤ jq*/m}. They prove that this procedure controls the FDR below q* if the m0 non-prognostic genes are independent.

Suppose that we reject Hj if pj < {alpha}. Storey (2002), under independence or a weak dependence among m0 null genes, claims that

Formula
which equals m0{alpha} with the error term ignored. Here, m–1 op(m) -> 0 in probability as m -> {infty} . Hence, with R replaced by the observed value r, we have

Formula 1(1)
Now, given {alpha}, estimation of FDR({alpha}) requires estimation of m0 only. Storey (2002) proposes to estimate m0 by

Formula 1
for a chosen constant {lambda} away from 0, such as 0.5. By combining this estimator with (1), we obtain

Formula 1
For an observed p-value pj, Storey (2003) defines q-value, the minimum FDR level at which we reject Hj, as qj = inf{alpha}≥pj Formula 1. Jung (2005) shows that this formula is simplified to Formula 1 in a two-sample testing case. This procedure is implemented by a computer package called SAM (Significance Analysis of Microarrays) (Tusher et al., 2001).

This procedure has been shown to reliably control the FDR under independence (Storey, 2002) or a weak dependence (Storey, 2003), such as block compound symmetry, by simulations. Given a real dataset, however, it is difficult to check whether the weak dependence requirement holds or not. In this paper, we propose a direct approach for evaluation of the FDR-based multiple testing methods in real data analysis. This approach uses a simulation algorithm by Lin (2005) modified for estimation of the true FDR.


    3 A SIMULATION-BASED METHOD
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 FDR-BASED MULTIPLE TESTING...
 3 A SIMULATION-BASED METHOD
 4 NUMERICAL STUDIES
 5 DISCUSSION
 APPENDIX
 REFERENCES
 
In this section, we propose a simulation method to generate a large number of test statistics whose correlation structure, given the data, is asymptotically identical to that of the test statistics to be calculated from the given dataset. There are two versions of large sample approximations in this paper: one with respect to large m and the other with respect to large n. All asymptotic results here and after are with respect to large n.

3.1 Two-sample t-test case: when the clinical outcome is dichotomous
Let xkij denote the expression level of gene j(= 1, ... , m) from subject i(= 1, ... , nk) in disease group k(= 1,2). We assume that {(xki1, ... , xkim), 1 ≤ i ≤ nk} are independent and identically distributed (IID) random vectors from a multivariate distribution with means (µk1, ... , µkm), variances Formula 1 and correlation matrix {Gamma} = ({rho}jj')1≤j,j'≤m. Let Formula 1 and Formula 1 denote the sample means and pooled variances, respectively, estimated from the data. If the sample sizes are large, then the test statistics

Formula 1
are approximately normal with mean

Formula 1
and covariance {Gamma}.

We want to generate random vectors with asymptotically the same distribution as (W1, ... , Wm) using a simulation method. Let ({varepsilon}ki, 1 ≤ i ≤ nk, k = 1,2) be IID N(0,1) random variables, which are independent of the dataset, and Formula 1 with

Formula 1
denote a new dataset. Also, let

Formula 1
denote the sample means of the new dataset, and

Formula 1
the resulting test statistics. Then, given the dataset {(xki1, ... , xkim), 1 ≤ i ≤ nk, k = 1,2}, each Formula 1 is a weighted average of the independent normal random variables. It follows that, given the data, Formula 1 is normal with means

Formula 1
for j = 1, ... , m and covariance matrix Formula 1 which are consistent estimators of {delta}j for j = 1, ... , m and {Gamma}. Hence, the conditional joint distribution of Formula 1 given the data is asymptotically identical to the unconditional joint distribution of (W1, ... , Wm). See Appendix for a proof with general test statistics. Note that our simulation method requires calculation of sample means and variances from data, but not the correlation coefficients.

The above procedure is based on an equal variance-covariance assumption in two groups. However, some genes may possibly have unequal variances and covariances between the two groups even under Hj : µ1j = µ2j. In this case, the null distribution of the t-test statistics based on equal variance-covariance assumption may not be approximated by the standard normal distribution. If the equal variance-covariance assumption is questionable, we can use the same simulation method with Formula 1 replaced by Formula 1 in the denominators of Wj, Formula 1 and Formula 1, where Formula 1 are sample variances for group k(= 1,2). In this case, the effect sizes will be expressed as

Formula 1
where Formula 1 is the population variance for gene j(= 1, ... , m) in group k(= 1,2).

3.2 Cases for cox regression and general test statistics
A large family of test statistics for Hj : {theta}j = {theta}j0 versus Hj : {theta}j != {theta}j0 can be written as (U1, ... , Um) = {U1({theta}10), ... , Um({theta}m0)}, where

Formula 1
and, for large n, Uij = Uij({theta}j0) can be expressed as a function of the data from subject i only, so that U1j, ... , Unj are independent. Let µij({theta}j) = E(Uij) and Formula 1. If E{Uj({theta})} is a smooth function and E{Uj({theta})} = 0 has a unique solution, then the solution Formula 1 to Uj({theta}) = 0 is consistent to {theta}j. Further, by the central limit theorem, (U1, ... , Um) is approximately normal with means µj({theta}j) and covariances {sigma}jj' that can be consistently estimated by Formula 1 and

Formula
respectively, where Formula 1. Let Formula 1 and Formula 1.

For IID N(0,1) random variables {varepsilon}1, ... , {varepsilon}n that are independent of the data, let

Formula (2)
Let Formula 1 and Formula 1. By Appendix, the conditional joint distribution of Formula 1 given the data is asymptotically identical to the unconditional joint distribution of (W1, ... , Wm). Note that the standardized effect sizes of test statistics Formula 1 are Formula 1.

The two-sample t-test case discussed in Section 3.1 is a specific example of this formulation. As another example, we consider the cases to discover the genes associated with a survival endpoint. Let, for subject i(= 1, ... , n), Ti denote the time to an event, such as tumor recurrence or death, and (zi1, ... , zim) denote the gene expression data from m genes. Survival time may be censored due to loss to follow-up or study completion, so that we observe Xi = min(Ti, Ci) together with a censoring indicator {Delta}i = I(Ti ≤ Ci), where Ci is the censoring time that is assumed to be independent of Ti given the gene expression data (zi1, ... , zim). A given dataset will be expressed as {(Xi, {Delta}i, zi1, ... , zim),1 ≤ i ≤ n}. Let Yi(t) = I(Xi ≥ t) and Ni(t) = {Delta}iI(Xi ≤ t) be the at-risk and event processes for patient i, respectively, and let Formula 1 and Formula 1.

Suppose that, for subject i, zij is related to the hazard rate by

Formula 3(3)
where {lambda}j0(t) is the unknown baseline hazard specific to gene j (Cox, 1972). The hypotheses are expressed as Hj : {theta}j = 0 versus Formula 3. The partial MLE, Formula 3, solves the partial score function Formula 3, where

Formula 3
Let

Formula 3
and

Formula
where d{Lambda}j0(t) = {lambda}j0(t)dt and

Formula 3
refer to Andersen and Gill (1982). Then the partial score test statistic (U1, ... , Um) = (U1(0), ... , Um(0)) is approximately normal with mean Formula 3 and variance-covariances {sigma}jj' that can be consistently estimated by Formula 3 and

Formula
respectively. Let Formula 3. Note that µj = 0 under Hj.

The regression model (3) may not hold for some genes. By Lin and Wei (1989), whether model (3) is valid or not, the score statistic Uj is a meaningful measure of association between zij and Ti, and the test statistic Formula 3 is asymptotically N(0,1) under Hj. For robust testing against potential outliers in gene expression data, Jung et al. (2005) propose to use the rank of zij among gene j observations, z1j, ... , znj, as the covariate of model (3), rather than the raw expression level.

For IID N(0,1) random variables {varepsilon}1, ... , {varepsilon}n which are independent of the data, let

Formula
and Formula 3. Then, the conditional joint distribution of Formula 3 given the data is asymptotically identical to the unconditional joint distribution of (W1, ... , Wm).


    4 NUMERICAL STUDIES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 FDR-BASED MULTIPLE TESTING...
 3 A SIMULATION-BASED METHOD
 4 NUMERICAL STUDIES
 5 DISCUSSION
 APPENDIX
 REFERENCES
 
In this section, we take real microarray data and demonstrate our simulation-based method discussed in the previous section to check how well the FDR-control procedures will work under the dependency embedded in a given dataset.

An accurate estimation of the FDR, to be discussed below, requires identification of the prognostic genes and their effect sizes in addition to the correlation structure among the genes. While the second part in the right-hand side of (2), Formula 3, is to approximate the correlation among m test statistics, the first part, Formula 3, is a function of their effect sizes. For an accurate calculation of the FDR, we need to know which genes are prognostic. Since none of the observed Formula 3 would be exactly 0, we first have to specify m1 and identify m1 genes with large effect sizes as follows. Given a dataset, we first calculate Formula 3 (Formula 3 in two-sample t-test case) and sj. For a chosen m1, we identify the genes with top m1 effect sizes in absolute value. The effect sizes for these m1 genes will be set at the observed absolute values Formula 3, but those for the remaining m0 = mm1 genes will be set at {delta}j = 0. By the same arguments as in Appendix, the change in effect sizes does not change the correlation structure of the test statistics. In order to simplify the procedure, we take the absolute values of the effect sizes and use one-sided tests. We observe similar results by using two-sided tests and the raw effect sizes.

Now, we generate a large number of sets (say, B = 4,000) of the test statistics Formula 3, where

Formula
Let Formula 3 denote the set of m test statistics from the b-th (b = 1, ... , B) simulation. Also, let

Formula 3
denote the numbers of false rejections and total rejections, respectively, from the b-th simulation sample. Here, z1–{alpha} is the 100(1 {alpha})-th percentile of N(0, 1)distribution. For a large B, the true FDR is approximated by

Formula 3
If we want to control the FDR at q* level, then we have to find the corresponding {alpha} = {alpha}* by solving Formula 3 using the bisection method, and reject all genes with p-values smaller than {alpha}*. We call this procedure ‘exact method’ since it exactly controls the empirical FDR from the simulations based on the true parameter values. We estimate the average true rejections and false rejections by

Formula 3
and

Formula 3
respectively, where r1b = rbr0b. The calculated Formula 3 and Formula 3 will be compared with those by the Storey's method, called SAM, and the method by Benjamini and Hochberg (1995), called BH, that are discussed in Section 2. The exact method uses a common critical value for m p-values like SAM, so that it can be used as a gold standard for SAM. Note that the exact method cannot be used in a real data analysis because we do not know which genes are prognostic. We can replace z1–{alpha} with a resampling-based quantile. However, we use the theoretical standard normal quantile based on the asymptotic normality.

SAM and BH are applied to each simulated set of test statistics to control the FDR level at q*, calculate r0b and rb, and estimate the FDR level, at which these procedures are really controlling, as the average of r0b/rb,

Formula 3
through the B simulations. If these procedures accurately control the FDR, then Formula 3 is expected to be close to q*.

4.1 An example for two-sample t-tests
An example dataset is taken from the golubTrain object in golubEsets package (version 1.0.1) in Bioconductor release 1.7. (Gentleman et al., 2004). Golub et al. (1999) explore m = 6810 genes extracted from bone marrow in 38 patients, of which n1 = 27 with acute lymphoblastic leukemia (ALL) and n2 = 11 with acute myeloid leukemia (AML), in order to identify the genes with potential clinical heterogeneity being differentially expressed in the two subclasses of leukemia. Genes useful to distinguish ALL from AML may provide insight into cancer pathogenesis and patient treatment. We conduct simulation studies for m1 = 20, 60, 100, 400, 1000 and 2500, and q* = 0.5, 0.4, 0.3, 0.2, 0.1, 0.05 and 0.01. Using {lambda} = 0.5, we obtain Formula 3(Formula 3) from the original data.

Table 1 reports the estimated FDR, Formula 3, number of true and false rejections, Formula 3and Formula 3, respectively, for SAM and BH methods. Only Formula 3 and Formula 3 are reported for the exact method since it controls the FDR accurately. BH is always more conservative than SAM, i.e. Formula 3. For example, when m1 = 20 and the nominal FDR is set at q* = 5% level, SAM and BH control the FDR at Formula 3 and 8.09%, respectively. The ratio Formula 3 increases in m1 as in independent case (Storey, 2002). With m1 ≤ 400, SAM (BH) is conservative if q* ≥ 0.3(q* ≥ 0.2) and anti-conservative otherwise. SAM controls the FDR accurately when m1 ≥ 100 and q* ≤ 0.3. So does BH when m1 ≥ 400 and q* ≤ 0.2. However, the bias of SAM and BH in Formula 3 is more serious with a smaller m1 value.


View this table:
[in this window]
[in a new window]
 
Table 1 Simulation results for Golub et al. (1999)

 
There exists a similar trend in Formula 3 to that in Formula 3, i.e. with m1 ≤ 400, both SAM and BH have smaller Formula 3 than the exact method for q* ≥ 0.2 and larger Formula 3 for q* < 0.2. SAM and BH have almost the same Formula 3 for m1 ≤ 400, but, with a larger m1, the former tends to have a larger Formula 3.

SAM always has higher false rejections, Formula 3, than the other two methods. The discrepancy is more noticeable with smaller m1. BH also has higher false rejections than the exact method when m1 ≤ 400. With m1 ≥ 1000, however, BH tends to have lower Formula 3 than the exact method except for a small q* such as 1 or 5%.

Figure 1 reports the distribution of R0/R observed from the 4000 simulations for q* = 0.5 or 0.01 with m1 fixed at 20. When q* = 0.5, R0/R is highly distributed around 0 and 1. When q* = 0.01, R0/R has a large density around 0 and is widely distributed in the rest of the range. Note that the horizontal axes of Figure 1c and d are rescaled using a square root transformation to show the distribution of R0/R around q* = 0.01 better. At each q* level, the distributions of R0/R for SAM and BH look almost the same, so that it is difficult to observe any difference in the amount of conservativeness or anti-conservativeness between the two procedures. Furthermore, the distributions are widely spread over the range of [0, 1], so that the figures do not clearly show the location shift of the distributions from the nominal q*.


Figure 1
View larger version (11K):
[in this window]
[in a new window]
 
Fig. 1 Distribution of R/Ro from 4000 simulations under q* = 0.5 and 0.01 and m1 = 20. (a) q* = 0.5 (SAM), (b) q* = 0.5 (BH), (c) q* = 0.01 (SAM), (d) q*=0.01 (BH).

 
4.2 An example with survival data
Beer et al. (2002) generated expression profiles of m = 4966 genes to discover the genes that can predict disease progression. The data include n = 86 stage I or III lung cancer patients, of whom 24 patients have disease progressions. By controlling the FWER at 10% level, Jung et al. (2005) discovered two genes whose expression levels are significantly associated with the time to progression. In this section, we consider the same test statistics standardized by the standard errors as described in Section 3.2. Simulations are conducted at similar settings to those in the previous example.

The simulation results are reported in Table 2. We observe that both SAM and BH conservatively control the FDR in the simulation settings. BH is slightly more conservative than SAM, but they become similar as m1 decreases. SAM controls the FDR very accurately for q* ≤ 0.1 which is the range of interest in usual data analyses. SAM becomes more accurate with a larger m1. Using {lambda} = 0.5, we obtain Formula 3(Formula 3) from the original data.


View this table:
[in this window]
[in a new window]
 
Table 2 Simulation results for Beer et al. (2002)

 
In all the simulation settings, the exact method has the largest Formula 3 and BH has the smallest Formula 3, although the difference is small. When controlling the FDR at q* = 10% level, the exact method has only about Formula 3 of true rejections for m1 ≥ 60. The other two methods have lower true rejection rates.

When m1 ≤ 200, the exact method has the smallest Formula 3 and SAM has the largest Formula 3 among the three methods. The discrepancy in Formula 3 among the three methods becomes smaller as m1 increases and q* decreases.


    5 DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 FDR-BASED MULTIPLE TESTING...
 3 A SIMULATION-BASED METHOD
 4 NUMERICAL STUDIES
 5 DISCUSSION
 APPENDIX
 REFERENCES
 
Benjamini and Hochberg (1995) and Benjamini and Yekutieli (2001) prove that their approach conservatively controls the FDR under independence and weak dependence of m0 test statistics for which null hypotheses are true. However, from the first example of Section 4, we found that the BH method can be anti-conservative in a general correlation combined with a small number of differentially expressing genes, m1, and a small nominal FDR level, q*. Storey et al. (2004) also claim that SAM procedure conservatively controls the FDR under weak dependency. From the first example, it is shown that SAM can be anti-conservative too (with small m1 and q*).

If an FDR-based procedure does not control the FDR accurately in a real data analysis, there may be two potential reasons: (1) the test statistics are heavily correlated or (2) the null distribution of each test statistic cannot be approximated by the standard normal distribution because the sample size is not large enough for the normal approximation of the test statistics. In order to avoid issue (2) and focus our discussion on (1), we generated {varepsilon}s from N(0, 1) distribution so that the simulated test statistics have exactly normal distributions. As mentioned in Appendix, however, if n is large enough, {varepsilon}s can be generated from any distribution with mean 0 and variance 1, such as Formula 3.

From simulations not reported in this paper, we observed that SAM closely estimates E{R0({alpha})}E{R–1({alpha})} = {alpha}m0 E{R–1({alpha})} rather than the FDR, E{R0({alpha})/R({alpha})}.

As mentioned previously, our procedure is to check the performance of the existing multiple testing methods for a given dataset, so that it is different from the simulation-based multiple testing procedures used for data analysis. However, a similar simulation method can be used to derive a testing procedure too, see Lin (2005). Given {gamma} isin (0, 1), van der Laan et al. (2004) and Lehmann and Romano (2005) propose a conservative procedure to control the false discovery proportion, defined as P(R0/R > {gamma}), at a certain level. Our simulation method can be easily modified to evaluate the conservativeness of their procedure for a given dataset.

We have discussed our procedure in terms of microarray data, but it can be used for any high dimensional data involving multiple testing using dependent test statistics, e.g. proteomic data. The simulation programs are written in Fortran 77. Uniform random numbers were generated using RAN2 subroutine of Press et al. (1980).


    APPENDIX
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 FDR-BASED MULTIPLE TESTING...
 3 A SIMULATION-BASED METHOD
 4 NUMERICAL STUDIES
 5 DISCUSSION
 APPENDIX
 REFERENCES
 
It suffices to show that the conditional joint distribution of Formula 3 given the data, Formula 3, is asymptotically identical to the unconditional joint distribution of (U1, ... , Um). As discussed in Section 3.2, (U1, ... , Um) is asymptotically normal with means and covariances that can be consistently estimated by Formula 3 and Formula 3 for 1 ≤ j, j' ≤ m, respectively.

Given the data, Formula 3 and Formula 3are constants, so that Formula 3 for j = 1, ... , m are weighted sums of IID N(0, 1) random variables, {varepsilon}1, ... , {varepsilon}n. Hence, Formula 3 is normal with means

Formula
and covariances

Formula
which concludes the proof.

In fact, {varepsilon}i's can be generated from any distribution with mean 0 and variance 1, such as Formula 3. In this case, the conditional joint distribution of Formula 3 given Formula 3 is approximated by the same distribution by the central limit theorem.


    Acknowledgments
 
The authors want to thank the two reviewers for their valuable comments.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: David Rocke

Received on December 5, 2005; revised on April 11, 2006; accepted on April 23, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 FDR-BASED MULTIPLE TESTING...
 3 A SIMULATION-BASED METHOD
 4 NUMERICAL STUDIES
 5 DISCUSSION
 APPENDIX
 REFERENCES
 

    Anderson, P.K. and Gill, R.D. (1982) Cox's regression model for counting processes: a large sample study. Ann. Stat, . 10, 1100–1120.

    Beer, D.G., et al. (2002) Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat. Med, . 8, 816–824[Web of Science][Medline].

    Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B, 57, 289–300.

    Benjamini, Y. and Yekutieli, D. (2001) The control of the false discovery rate in multiple testing under dependency. Ann. Stat, . 29, 1165–1188[CrossRef].

    Cox, D.R. (1972) Regression models and life-tables (with discussion). J. R. Stat. Soc. Ser. B, 34, 187–220.

    Gentleman, R.C., et al. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol, . 5, R80[CrossRef][Medline].

    Golub, T.R., et al. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531–537[Abstract/Free Full Text].

    Huang, Y., Xu, H., Calian, V., Hsu, J.C. (2005) To permute or not permute. Technical Report 756. , OH Department of Statistics, Ohio State University.

    Jung, S.-H. (2005) Sample size for FDR-control in microarray data analysis. Bioinformatics, 21, 3097–3104[Abstract/Free Full Text].

    Jung, S.-H., et al. (2005) A multiple testing procedure to associate gene expression levels with survival. Stat. Med, . 21, 3097–3104.

    Lehmann, E.L. and Romano, J.P. (2005) Generalizations of the familywise error rate. Ann. Stat, . 33, 1138–1154[CrossRef].

    Lin, D.Y. (2005) An efficient Monte Carlo approach to assessing statistical significance in genomic studies. Bioinformatics, 21, 781–787[Abstract/Free Full Text].

    Lin, D.Y. and Wei, L.J. (1989) The robust inference for the Cox proportional hazards model. J. Am. Stat. Assoc, . 84, 1074–1078[CrossRef][Web of Science].

    Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T. Numerical Recipes: The Art of Scientific Computing, (1980) , NY Cambridge University Press.

    Storey, J.D. (2002) A direct approach to false discovery rates under dependence. J. R. Stat. Soc. Ser. B, 64, 479–498[CrossRef].

    Storey, J.D. (2003) The positive false discovery rate: a Bayesian interpretation and the q-value. Ann. Stat, . 31, 2013–2035[CrossRef].

    Storey, J.D., et al. (2004) Strong control, conservative point estimation, and simultaneous conservative consistency of false discovery rates: a unified approach. J. R. Stat. Soc. Ser. B, 66, 187–205[CrossRef].

    Storey, J.D. and Tibshirani, R. (2001) Estimating false discovery rates under dependence, with applications to DNA microarrays. , CA Technical Report 2001–28 Department of Statistics, Stanford University.

    Tusher, V., et al. (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA, 98, 5116–5121[Abstract/Free Full Text].

    van der Laan, M.J., et al. (2004) Augmentation procedures for control of the generalized family-wise error rate and tail probabilities for the proportion of false positives. Stat. Appl. Gene. Mol. Biol, . 3, 15.

    Westfall, P.H. and Young, S.S. Resampling-based Multiple Testing: Examples and Methods for P-value Adjustment, (1993) , NY Wiley.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J. Clin. Endocrinol. Metab.Home page
E. C. Kaizer, C. L. Glaser, D. Chaussabel, J. Banchereau, V. Pascual, and P. C. White
Gene Expression in Peripheral Blood Mononuclear Cells from Children with Diabetes
J. Clin. Endocrinol. Metab., September 1, 2007; 92(9): 3705 - 3711.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
T. S. Mehta, S. O. Zakharkin, G. L. Gadbury, and D. B. Allison
Epistemological issues in omics and high-dimensional biology: give the people what they want
Physiol Genomics, December 13, 2006; 28(1): 24 - 32.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/14/1730    most recent
btl161v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (6)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Jung, S.-H.
Right arrow Articles by Jang, W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Jung, S.-H.
Right arrow Articles by Jang, W.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?