Bioinformatics Advance Access originally published online on June 23, 2008
Bioinformatics 2008 24(15):1655-1661; doi:10.1093/bioinformatics/btn310
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
On correcting the overestimation of the permutation-based false discovery rate estimator
Department of Statistics, University of Nebraska Lincoln, Lincoln, NE 68526, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Recent attempts to account for multiple testing in the analysis of microarray data have focused on controlling the false discovery rate (FDR), which is defined as the expected percentage of the number of false positive genes among the claimed significant genes. As a consequence, the accuracy of the FDR estimators will be important for correctly controlling FDR. Xie et al. found that the standard permutation method of estimating FDR is biased and proposed to delete the predicted differentially expressed (DE) genes in the estimation of FDR for one-sample comparison. However, we notice that the formula of the FDR used in their paper is incorrect. This makes the comparison results reported in their paper unconvincing. Other problems with their method include the biased estimation of FDR caused by over- or under-deletion of DE genes in the estimation of FDR and by the implicit use of an unreasonable estimator of the true proportion of equivalently expressed (EE) genes. Due to the great importance of accurate FDR estimation in microarray data analysis, it is necessary to point out such problems and propose improved methods.
Results: Our results confirm that the standard permutation method overestimates the FDR. With the correct FDR formula, we show the method of Xie et al. always gives biased estimation of FDR: it overestimates when the number of claimed significant genes is small, and underestimates when the number of claimed significant genes is large. To overcome these problems, we propose two modifications. The simulation results show that our estimator gives more accurate estimation.
Contact: szhang3{at}unl.edu
| 1 INTRODUCTION |
|---|
|
|
|---|
The use of microarray technology makes it possible to monitor the expression levels of thousands of genes simultaneously. A common goal of analyzing the genome-wide expression data generated from this technology is to detect differentially expressed (DE) genes. Now, as the cost of microarray experiments keeps decreasing, replicated microarray experiments are feasible.
Numerous methods (parametric and non-parametric) have been introduced to detect DE genes. Some of the most well-known parametric approaches include the regression approach of Thomas et al. (2001), the empirical Bayes (EB) methods of Newton et al. (2001) and Kendziorski et al. (2003) and the linear models and EB methods of Smyth (2004). Among the non-parametric methods, some well known names include the EB method of Efron et al. (2001), the significance analysis of microarray (SAM) of Tusher et al. (2001) and the mixture model method (MMM) of Pan et al. (2003).
False discovery rate (FDR) introduced by Benjamini and Hochberg (1995) is now commonly used as the choice of the Type I error rate in microarray studies. It is defined as the expected percentage of false positive (FP) genes among the claimed significant genes. It was proved that in many cases controlling FDR is more appropriate compared to controlling family-wise error rate (FWER) since the FDR approaches typically reject more null hypotheses than the FWER approaches (Benjamini and Yekutieli, 2001; Yekutieli and Benjamini, 1999). Several FDR controlling methods are implemented in the R multtest package (Pollard et al., 2004).
However, the true FDR is unknown in practice. Hence, the estimated FDR will serve as the criterion to compare different methods when controlling the error rates. The comparison results are reasonable only if the estimated FDR approximates the true FDR well. The most common method of estimating the FDR is to use the permutation method. However, it has been reported in the literature that the permutation-based FDR estimator tends to overestimate the true FDR. A number of papers has discussed the correction of the overestimation problem of the permutation method (Guo and Pan, 2005; Pan, 2003; Zhao and Pan, 2003; Zhang, 2006).
Xie et al. (2005) also noticed the overestimation problem of standard permutation method. Their paper showed that the over-estimation of FDR is caused by the fact that the distribution of null statistics generated from the permutation method is more dispersed than the true null distribution of the test statistics. To solve the problem, they proposed to exclude the predicted DE genes from the estimation of FDR. However, we find that their proposed method has serious under- or over-estimation problem depending on the number of genes declared significant. In addition, we found that they used an incorrect formula of FDR, and hence the comparison results reported in their paper are not correct and the conclusions they drew might be misleading. More seriously, we found that Xie et al. (2005) implicitly used an estimator of the proportion of equivalently expressed (EE) genes (
0) which can only provide good estimate of
0 when the number of genes declared significant is equal or close to the true number of DE genes in the microarray data and is otherwise biased.
| 2 METHODS |
|---|
|
|
|---|
2.1 The test statistics and the null statistics
As in Xie et al. (2005), only one-sample comparison will be considered in this article. Suppose that Yij is the expression level of gene i in array j (i=1,2, ..., n; j=1, ..., k). The goal is to test the following hypothesis: H0:E(Yij)=0 against H1:E(Yij)
0. We use the same three test statistics as in Xie et al. (2005) for the purpose of comparison:- The mean statistic:
,
- The t-statistic:
,
- The SAM statistic:
,
, where b=1, ..., B, and i=1, ..., n.
2.2 Method for FDR estimation
Given the test statistics Zi and a fixed cutoff value d, define TS(d)=#{i:|Zi|>d} as the total number of significant genes; FP(d)=#{i:|Zi|>d,i
EE} as the number of FP genes, where EE is the set of all EE genes;
0 as the proportion of EE genes; and
as its estimator. According to Storey and Tibshirani (2003), the FDR can be approximated as
|
| (1) |
|
| (2) |
To estimate FDR, the standard method is to use the permutated null statistics. Define
|
| (3) |
0. Storey and Tibshirani (2003) suggested to estimate the FDR by
|
| (4) |
d'}, where Si is the SAM statistic and d' is chosen so that the number of genes not in set D(d) is the same as TS(d). In other words, D(d)=
–TS(d), where
is the set of all genes. |
| (5) |
|
| (6) |
|
| (7) |
In Xie et al. (2005), the above method was proved to be able to correct the FDR overestimation problem of the permutation method effectively. However, our study has found that (6) has four major problems:
- In Xie et al. (2005), the true FDR formula (2) is incorrectly defined as
This mistake will affect the evaluation of their proposed FDR estimator.
(8)
- In Xie et al. (2005), the SAM statistic was used to define the set D(d), which is used in (5) to estimate the number of FP even if the test statistic is the mean or t-statistics. This is unreasonable. If one has chosen the mean or t-statistic as the test statistic, why would he/she use a different statistic to estimate the number of FP? The only explanation is that the mean statistic and the t-statistic do not provide results as good as the SAM statistic does. Note that the mean statistic and the t-statistic can be viewed as two extreme cases of the SAM statistic with the fudge factor equal to
and 0, respectively. It is well known that the performance of the testing procedure based on the mean statistic and the t-statistic is generally inferior to that based on the SAM statistic.
- It can be seen from (7) that Xie et al. (2005) implicitly uses
=1–TS(d)/n as an estimate of
0. Noticing that TS(d) is the number of claimed significant genes, such
can range from 0 to 1 for TS(d) from n to 0. As a consequence, one will always under- or over-estimate
0 unless TS(d)=the true number of DE genes.
- The over- or under-estimation of FDR due to under- or over- deletion of genes, which will be discussed in Section 2.3.
2.3 Our proposed method for FDR estimation
Considering the unreasonable estimates
of Xie et al. (2005) may provide, we suggest estimating
0 by the method introduced in Storey and Tibshirani (2003), which is implemented in SAM. In their paper, they calculated P-values for each gene. Denote the P-values by p1,p2,...,pn. Then,
0 is estimated by
, where
is a tuning parameter. As we can see,
is a constant no matter how TS(d) changes. In addition, after a test statistic Zi is determined, we use the same test statistic for both identifying the DE genes and defining the set D(d). In other words, D(d)={i:|Zi|
{d}. With
and this new D(d), we propose the following FDR estimator
|
| (9) |
The estimator
corrects Xie et al.'s method by using a more reasonable estimator of
0. However, another question comes to light: Is removing all the predicted DE genes a proper way of estimating the FDR? As we know, what we really want is to remove all the DE genes and use all the EE genes to construct the null statistics. However, in those predicted DE genes, there are some genes which are actually EE genes, but are falsely identified as positive (FP genes). It is obvious that the FP genes are the EE genes with the greatest test statistics in absolute values. Therefore, excluding such genes will cause underestimation of the tail of the null distribution. In Section 3.2, we will show that removing all the predicted DE genes gives significantly different FP estimates from those obtained by removing the true DE genes (which is not feasible in practice but good for comparison).
Since removing all predicted DE genes will cause underestimation of the FDR, an intuitive solution would be to add the FP genes back into the pool of the genes for the estimation of the FDR. For this purpose, we propose the following two-step procedure to estimate the FDR, in which the first step is to remove all the predicted DE genes and the second step is trying to re-include the possible FP genes to construct the null statistics:
- Suppose Zi is the test statistic, for any given d>0, any gene i with |Zi|>d is said to be significant. Let TS(d)=#{i:|Zi|>d}, D(d)={i:|Zi|
{d},
, and
.
- Using
from Step 1, let D(d')={i:|Zi|
{d'}, d' is chosen such that the number of genes not in D(d') is
. Then following the same procedure as Step 1, we get
, and

(10)
The idea behind our proposed method is as follows: when the number of predicted DE genes is greater than the true number of significant genes, there will be a substantial number of FP genes in them. Since removing all predicted DE genes will cause biased estimation of the FDR, we only remove the genes which we consider are most likely to be true DE genes.
| 3 RESULTS |
|---|
|
|
|---|
3.1 Problems caused by using Xie et al.'s estimate of
0In Xie et al. (2005),
0 is estimated by
0 by
To show this, 5 (=k) replicates of 4000(=n) genes are generated, among which 400 are DE genes and the others are EE genes. The expression levels Yij for EE genes are generated from N(0,4) and Yij for DE gene are generated from N(µi,4), while µi
N(0,16). The SAM, mean and t-statistics are used as the test statistics. Our purpose is to compare the FDR estimator of Xie et al. (2005) (
) from (7) and one of our proposed estimator (
) from (9). The values of the standard FDR estimator from (4) and the true FDR values are also plotted as references.
Overestimation of FDR when TS(d) is smaller than the true number of DE genes.
In this scenario, TS(d) is set to be from 100 to 200, which is much less than the true number of DE genes (=400). In Figure 1, as we expected,
always overestimates the true FDR while
provides much less biased estimates. In some cases,
still gives overestimation. This overestimation is caused by the fact that
also always overestimates the true
0, but to a much lesser degree.
|
Underestimation of FDR when TS(d) is greater than the true number of DE genes.
The same simulation set-up is used as above except now TS(d) is set to be from 500 to 600, which is greater than the the true number of DE genes (=400).
As shown in Figure 2, for the t and SAM statistics, Xie et al.'s method underestimates the true FDR while our proposed method gives much more accurate estimates. However, for the mean statistic, our method does not give any improvement over Xie et al.'s method. The reason is that the SAM statistic was used to predict DE genes in Xie et al. (2005) while our method
uses the same mean statistic in both predicting the DE genes and estimating the FDR. The better performance of Xie et al.'s method in this case is due to the use of the SAM statistic in predicting DE genes, rather than the method itself. As it can be seen from the top plot of Figure 2, our estimator
performs significantly better than Xie et al.'s method when the SAM statistic is used.
|
3.2 Underestimation caused by removing the predicted DE genes
In this section, we show that removing all predicted DE genes will lead to an underestimation of the true FP number. We generate $n=4000$ genes with $k=5$, while 150 of them are DE genes. The expression levels for EE and DE genes are generated in the same way as in Section 3.1. The number of claimed significant genes is set to be 150, which is the number of true DE genes. Table 1 lists the true FP number, the estimated FP number with 150 predicted DE genes removed (
From Table 1, we can see
is always less than
. This shows removing predicted DE genes gives a smaller estimate of FP number than that of removing the true DE genes.
|
3.3 Performance of our methods
To evaluate the performance of our methods, the same simulation set-ups are used as those in Section 3.2. We want to see whether our proposed estimator
We compare four different FDR estimation methods: the standard estimator
from (4), Xie et al. (2005) estimator
from (7), and two estimators we proposed:
from (9),
from (10).
Figure 3 shows that the estimator of Xie et al. (2005) always significantly underestimates the true FDR's. The estimator
also underestimates FDR due to over-deletion, but is much better than Xie et al.'s estimator for the SAM statistic. For the mean and t-statistics, Xie et al.'s estimator outperforms
sometimes due to the same reason discussed previously—the use of the SAM statistic in obtaining the predicted DE genes. In contrast,
does not have this problem. However, for the SAM statistic and the t-statistic,
slightly overestimates the true FDR. This overestimation is not caused by the estimator
, but by the overestimation of
0 caused by
. To see this, we replaced
in (9) for
and in (10) for
with the true
0=3850/4000. Figure 4 shows the comparison between the true FDR and the estimated FDR from (9) and (10) with the true value of
0. We can see that
now gives smaller estimates of FDR for all three test statistics compared to Figure 3. Another fact worth noticing in Figure 3 and 4 is that when the number of claimed significant genes is small,
does not show much advantage. The reason is that, in such a case, most of the significant genes are true DE genes and the number of FP genes is much smaller than the number of true DE genes. Hence, removing the FP genes is not going to have significant impact on the estimation of the FDR.
|
|
3.4 Comparisons under other simulation set-ups
We also want to see how the ratio of induced (I) and repressed (R) genes influences the performance of the FDR estimators. Here, k=5, n=4000 and there are 150 DE genes. The expression level Yij for EE genes are generated from N(0,4). For DE genes, n' of them are generated from N(4,4), and the rest of them are generated from N(–4,4); where n'=150,100,50,0. We set the number of claimed significant genes as 300. The results reported in Table 2 are the averages from 50 replications.
|
The results confirm that our methods are stable to the change of ratios of the induced and repressed genes.
We have also conducted another simulation which tries to mimic the real data. Similar simulation set-up as above is used except the expression level Yij for EE genes are generated from N(0,
) while 

Gamma(4,2) and Yij for DE gene are generated from N(µi,
) while µi
N(0,16), 

Gamma(4,2).
From Figure 5, we can see that the results are similar as before for the SAM and t-statistics: the standard method always overestimates and method of Xie et al. (2005) always underestimates.
performs better than Xie et al.'s method and
always performs the best.
|
3.5 Biological data
In Zhong et al. (2004), duplications and deletions in an evolved strain (DD2459) were identified by a whole-genome Escherichia coli MG1655 spotted DNA microarray experiment with three replicates. Thirty-eight genes have been confirmed to be true duplicated/deleted genes by rtPCR. To compare our proposed estimator
|
| 4 DISCUSSION |
|---|
|
|
|---|
In this article, we have showed that the bias-corrected FDR estimator proposed in Xie et al. (2005) uses an inappropriate estimate of
0 and still has severe under- or over-estimation problem. We have proposed two new modifications to overcome those problems. Simulation studies and application to real data have confirmed that our estimator Current null statistics are constructed by randomly assigning the + or – signs to replicates of genes. As a consequence, the number of + and – signs can be different in this random assignment. Mean expression levels of EE genes will always be 0 regardless of the way of assigning the signs. However, when there is an unbalanced number of + and –, the mean expression levels of DE genes will not be 0, which may cause the null statistics of DE genes to have different distributions from that of EE genes. Hence, it is intuitive to deduce that if we make the number of + and – stay balanced, this problem can be avoided. In Pan (2003) and Zhang (2006), they proposed a series of such kind of balanced null statistics, which have the same distribution for both DE and EE genes. It would be interesting to compare the performance of our FDR estimators and estimators based on balanced null statistics in the future research.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
We would like to thank the Associate Editor and three referees for their constructive comments which have significantly improved the quality of the article. We also thank Dr Shaobin Zhong for providing us the microarray data.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Alex Bateman
Received on February 14, 2008; revised on June 11, 2008; accepted on June 12, 2008
| REFERENCES |
|---|
|
|
|---|
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a pratical and powerful approach to multiple testing. J. R. Stat. Soc (1995) 57:289–300.
Benjamini Y, Yekutieli D. The control of the False discovery rate in multiple testing under dependency. Ann. Stat (2001) 29:1165–1188.[CrossRef]
Efron B, et al. Empirical Bayes analysis of a microarray experiment. J. Am. Stat. Assoc (2001) 96:1151–1160.[CrossRef][Web of Science]
Guo X, Pan W. Using weighted permutation scores to detect differential gene expression with microarray data. J. Comput. Biol (2005) 3:989–1006.
Kendziorski CM, et al. On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Stat. Med (2003) 22:3899–3914.[CrossRef][Web of Science][Medline]
Kerr MK, et al. Analysis of variance for gene expression microarray data. J. Comput. Biol (2000) 7:19–837.
Newton MA, et al. On differentially variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J. Comput. Biol (2001) 8:37–52.[CrossRef][Web of Science][Medline]
Pan W, et al. A mixture model approach to detecting differentially expressed genes with microarray data. Funct. Integr. Genomics (2003) 3:117–124.[CrossRef][Medline]
Pan W. On the use of permutation in the performance of a class of nonparametric methods to detect differential gene expression. Bioinformatics (2003) 19:1333–1040.
Pollard KS, et al. Multiple testing procedures: R multtest package and applications to genomics. (2004) last accessed date December 2004. 164. U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper, Available at http://www.bepress.com/ucbbiostat/paper164.
Smyth GK. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Gene. Mol. Biol (2004) 3. Article 3.
Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA (2003) 100:9440–9445.
Thomas JG, et al. An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res (2001) 11:1227–1236.
Tusher VG, et al. Significant analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA (2001) 98:5116–5121.
Xie Y, et al. A note on using permutation based false discovery rate estimate to compare different analysis methods for microarray data. Bioinformatics (2005) 21:4280–4288.
Yekutieli D, Benjamini Y. Resampling based false discovery rate controlling multiple testing procedure for correlated test statistics. J. Stat. Plann. Inference (1999) 82:171–196.[CrossRef]
Zhao Y, Pan W. Modified nonparametric approaches to detecting differentially expressed genes in replicated microarray experiments. Bioinformatics (2003) 19:1046–1054.
Zhang S. An improved nonparametric approach for detecting differentially expressed genes with replicated microarray data. Stat. Appl. Gene. Mol. Biol (2006) 5. Article 30.
Zhong S, et al. Evolutionary genomics of ecological specialization. Proc. Natl Acad. Sci. USA (2004) 101:11719–11724.
This article has been cited by other articles:
![]() |
Y. Xie Comments on 'On correcting the overestimation of the permutation-based false discovery rate estimator' Bioinformatics, October 15, 2008; 24(20): 2420 - 2420. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





