Skip Navigation


Bioinformatics Advance Access originally published online on June 23, 2008
Bioinformatics 2008 24(15):1655-1661; doi:10.1093/bioinformatics/btn310
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
24/15/1655    most recent
btn310v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Jiao, S.
Right arrow Articles by Zhang, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Jiao, S.
Right arrow Articles by Zhang, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

On correcting the overestimation of the permutation-based false discovery rate estimator

Shuo Jiao and Shunpu Zhang *

Department of Statistics, University of Nebraska Lincoln, Lincoln, NE 68526, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Motivation: Recent attempts to account for multiple testing in the analysis of microarray data have focused on controlling the false discovery rate (FDR), which is defined as the expected percentage of the number of false positive genes among the claimed significant genes. As a consequence, the accuracy of the FDR estimators will be important for correctly controlling FDR. Xie et al. found that the standard permutation method of estimating FDR is biased and proposed to delete the predicted differentially expressed (DE) genes in the estimation of FDR for one-sample comparison. However, we notice that the formula of the FDR used in their paper is incorrect. This makes the comparison results reported in their paper unconvincing. Other problems with their method include the biased estimation of FDR caused by over- or under-deletion of DE genes in the estimation of FDR and by the implicit use of an unreasonable estimator of the true proportion of equivalently expressed (EE) genes. Due to the great importance of accurate FDR estimation in microarray data analysis, it is necessary to point out such problems and propose improved methods.

Results: Our results confirm that the standard permutation method overestimates the FDR. With the correct FDR formula, we show the method of Xie et al. always gives biased estimation of FDR: it overestimates when the number of claimed significant genes is small, and underestimates when the number of claimed significant genes is large. To overcome these problems, we propose two modifications. The simulation results show that our estimator gives more accurate estimation.

Contact: szhang3{at}unl.edu


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
The use of microarray technology makes it possible to monitor the expression levels of thousands of genes simultaneously. A common goal of analyzing the genome-wide expression data generated from this technology is to detect differentially expressed (DE) genes. Now, as the cost of microarray experiments keeps decreasing, replicated microarray experiments are feasible.

Numerous methods (parametric and non-parametric) have been introduced to detect DE genes. Some of the most well-known parametric approaches include the regression approach of Thomas et al. (2001), the empirical Bayes (EB) methods of Newton et al. (2001) and Kendziorski et al. (2003) and the linear models and EB methods of Smyth (2004). Among the non-parametric methods, some well known names include the EB method of Efron et al. (2001), the significance analysis of microarray (SAM) of Tusher et al. (2001) and the mixture model method (MMM) of Pan et al. (2003).

False discovery rate (FDR) introduced by Benjamini and Hochberg (1995) is now commonly used as the choice of the Type I error rate in microarray studies. It is defined as the expected percentage of false positive (FP) genes among the claimed significant genes. It was proved that in many cases controlling FDR is more appropriate compared to controlling family-wise error rate (FWER) since the FDR approaches typically reject more null hypotheses than the FWER approaches (Benjamini and Yekutieli, 2001; Yekutieli and Benjamini, 1999). Several FDR controlling methods are implemented in the R multtest package (Pollard et al., 2004).

However, the true FDR is unknown in practice. Hence, the estimated FDR will serve as the criterion to compare different methods when controlling the error rates. The comparison results are reasonable only if the estimated FDR approximates the true FDR well. The most common method of estimating the FDR is to use the permutation method. However, it has been reported in the literature that the permutation-based FDR estimator tends to overestimate the true FDR. A number of papers has discussed the correction of the overestimation problem of the permutation method (Guo and Pan, 2005; Pan, 2003; Zhao and Pan, 2003; Zhang, 2006).

Xie et al. (2005) also noticed the overestimation problem of standard permutation method. Their paper showed that the over-estimation of FDR is caused by the fact that the distribution of null statistics generated from the permutation method is more dispersed than the true null distribution of the test statistics. To solve the problem, they proposed to exclude the predicted DE genes from the estimation of FDR. However, we find that their proposed method has serious under- or over-estimation problem depending on the number of genes declared significant. In addition, we found that they used an incorrect formula of FDR, and hence the comparison results reported in their paper are not correct and the conclusions they drew might be misleading. More seriously, we found that Xie et al. (2005) implicitly used an estimator of the proportion of equivalently expressed (EE) genes ({pi}0) which can only provide good estimate of {pi}0 when the number of genes declared significant is equal or close to the true number of DE genes in the microarray data and is otherwise biased.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
2.1 The test statistics and the null statistics
As in Xie et al. (2005), only one-sample comparison will be considered in this article. Suppose that Yij is the expression level of gene i in array j (i=1,2, ..., n; j=1, ..., k). The goal is to test the following hypothesis: H0:E(Yij)=0 against H1:E(Yij)!=0. We use the same three test statistics as in Xie et al. (2005) for the purpose of comparison:

  1. The mean statistic: Formula ,
  2. The t-statistic: Formula ,
  3. The SAM statistic: Formula ,
where Formula , Formula , and V0 is the fudge factor used to stabilize the variance.In this article, we will focus on the permutation-based method for estimating the FDR. The key issue in the permutation-based method is the generation of the so-called null statistics (the values of the test statistic when the genes are EE). For convenience, we shall use Zi as a general notation to denote the test statistic and use zi to denote its corresponding null statistic. In the standard permutation method, one set of null statistics is calculated by applying the test statistic to one set of permuted data. The set of permuted data is obtained by randomly assigning the ‘+’ or ‘–’ signs on each Yi1,..., Yik (SAM). Suppose the number of permutations is B, applying the test statistic to the b-th set of permutated data will create the b-th set of null statistics zFormula, where b=1, ..., B, and i=1, ..., n.

2.2 Method for FDR estimation
Given the test statistics Zi and a fixed cutoff value d, define TS(d)=#{i:|Zi|>d} as the total number of significant genes; FP(d)=#{i:|Zi|>d,iisinEE} as the number of FP genes, where EE is the set of all EE genes; {pi}0 as the proportion of EE genes; and Formula as its estimator. According to Storey and Tibshirani (2003), the FDR can be approximated as


Formula 1

(1)
A practical version of FDR is the false discovery proportions (FDP) defined by


Formula 2

(2)

To estimate FDR, the standard method is to use the permutated null statistics. Define


Formula 3

(3)
Notice that Formula is actually an estimate of FP(d)/{pi}0. Storey and Tibshirani (2003) suggested to estimate the FDR by


Formula 4

(4)
However, as shown in Xie et al. (2005), although the null statistics of EE genes have the true null distribution of test statistics, the null statistics of DE genes are more dispersed than those of EE genes. As a result, the empirical distribution of the null statistics from all genes is not a good approximation to the true null distribution. To overcome this problem, Xie et al. (2005) proposed a new FDR estimator. Their idea is as follows: since the over-estimation problem of standard permutation method is caused by the DE genes, using only EE genes to construct the null distribution will avoid this problem. Nevertheless, in practice which genes are EE genes is unknown. Therefore, they proposed to use the predicted EE genes to estimate the FDR. Their FDR estimation procedure works as follows: suppose Zi is the test statistic and Si is the SAM statistic, for any given d>0, any gene i with |Si|>d is said to be significant. TS(d) is defined the same as before. Define a set of non-significant genes D(d)={i:|Si|≤d'}, where Si is the SAM statistic and d' is chosen so that the number of genes not in set D(d) is the same as TS(d). In other words, D(d)={Omega}TS(d), where {Omega} is the set of all genes. Formula is then estimated by constructing B sets of null statistics as before. The only difference is that only genes in D(d) are going to be used this time. Let


Formula 5

(5)
Then, the FDR is estimated by


Formula 6

(6)
Note that Formula in (6) is the average number of significant genes found from the genes in D(d). We can re-write (6) in the form of (4) as


Formula 7

(7)
where Formula can be viewed as the average number of significant genes if all n genes are EE and Formula is the estimated proportion of EE genes in the microarray data.

In Xie et al. (2005), the above method was proved to be able to correct the FDR overestimation problem of the permutation method effectively. However, our study has found that (6) has four major problems:

  1. In Xie et al. (2005), the true FDR formula (2) is incorrectly defined as


    Formula 8

    (8)
    This mistake will affect the evaluation of their proposed FDR estimator.

  2. In Xie et al. (2005), the SAM statistic was used to define the set D(d), which is used in (5) to estimate the number of FP even if the test statistic is the mean or t-statistics. This is unreasonable. If one has chosen the mean or t-statistic as the test statistic, why would he/she use a different statistic to estimate the number of FP? The only explanation is that the mean statistic and the t-statistic do not provide results as good as the SAM statistic does. Note that the mean statistic and the t-statistic can be viewed as two extreme cases of the SAM statistic with the fudge factor equal to {infty} and 0, respectively. It is well known that the performance of the testing procedure based on the mean statistic and the t-statistic is generally inferior to that based on the SAM statistic.
  3. It can be seen from (7) that Xie et al. (2005) implicitly uses Formula =1–TS(d)/n as an estimate of {pi}0. Noticing that TS(d) is the number of claimed significant genes, such Formula can range from 0 to 1 for TS(d) from n to 0. As a consequence, one will always under- or over-estimate {pi}0 unless TS(d)=the true number of DE genes.
  4. The over- or under-estimation of FDR due to under- or over- deletion of genes, which will be discussed in Section 2.3.

2.3 Our proposed method for FDR estimation
Considering the unreasonable estimates Formula of Xie et al. (2005) may provide, we suggest estimating {pi}0 by the method introduced in Storey and Tibshirani (2003), which is implemented in SAM. In their paper, they calculated P-values for each gene. Denote the P-values by p1,p2,...,pn. Then, {pi}0 is estimated by Formula , where {lambda} is a tuning parameter. As we can see, Formula is a constant no matter how TS(d) changes. In addition, after a test statistic Zi is determined, we use the same test statistic for both identifying the DE genes and defining the set D(d). In other words, D(d)={i:|Zi|≤{d}. With Formula and this new D(d), we propose the following FDR estimator


Formula 9

(9)
where Formula .

The estimator Formula corrects Xie et al.'s method by using a more reasonable estimator of {pi}0. However, another question comes to light: Is removing all the predicted DE genes a proper way of estimating the FDR? As we know, what we really want is to remove all the DE genes and use all the EE genes to construct the null statistics. However, in those predicted DE genes, there are some genes which are actually EE genes, but are falsely identified as positive (FP genes). It is obvious that the FP genes are the EE genes with the greatest test statistics in absolute values. Therefore, excluding such genes will cause underestimation of the tail of the null distribution. In Section 3.2, we will show that removing all the predicted DE genes gives significantly different FP estimates from those obtained by removing the true DE genes (which is not feasible in practice but good for comparison).

Since removing all predicted DE genes will cause underestimation of the FDR, an intuitive solution would be to add the FP genes back into the pool of the genes for the estimation of the FDR. For this purpose, we propose the following two-step procedure to estimate the FDR, in which the first step is to remove all the predicted DE genes and the second step is trying to re-include the possible FP genes to construct the null statistics:

  1. Suppose Zi is the test statistic, for any given d>0, any gene i with |Zi|>d is said to be significant. Let TS(d)=#{i:|Zi|>d}, D(d)={i:|Zi|≤{d}, Formula , and Formula .
  2. Using Formula from Step 1, let D(d')={i:|Zi|≤{d'}, d' is chosen such that the number of genes not in D(d') is Formula . Then following the same procedure as Step 1, we get Formula , and


    Formula 10

    (10)

where Formula .

The idea behind our proposed method is as follows: when the number of predicted DE genes is greater than the true number of significant genes, there will be a substantial number of FP genes in them. Since removing all predicted DE genes will cause biased estimation of the FDR, we only remove the genes which we consider are most likely to be true DE genes.


    3 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
3.1 Problems caused by using Xie et al.'s estimate of {pi}0
In Xie et al. (2005), {pi}0 is estimated by Formula . As stated before, we would expect to see over-or under-estimation of FDR by this method because of the over-or under-estimation of {pi}0 by Formula .

To show this, 5 (=k) replicates of 4000(=n) genes are generated, among which 400 are DE genes and the others are EE genes. The expression levels Yij for EE genes are generated from N(0,4) and Yij for DE gene are generated from Ni,4), while µi~N(0,16). The SAM, mean and t-statistics are used as the test statistics. Our purpose is to compare the FDR estimator of Xie et al. (2005) (Formula ) from (7) and one of our proposed estimator (Formula ) from (9). The values of the standard FDR estimator from (4) and the true FDR values are also plotted as references.

Overestimation of FDR when TS(d) is smaller than the true number of DE genes.

In this scenario, TS(d) is set to be from 100 to 200, which is much less than the true number of DE genes (=400). In Figure 1, as we expected, Formula always overestimates the true FDR while Formula provides much less biased estimates. In some cases, Formula still gives overestimation. This overestimation is caused by the fact that Formula also always overestimates the true {pi}0, but to a much lesser degree.


Figure 1
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. FDR curves of different estimation methods using the SAM, mean and t-statistics. There are 400 DE genes among 4000 genes. The number of claimed significant gene ranges from 100 to 200. Figure 1 is used as the estimate of {pi}0. Our method 1 is the estimator Figure 1 from (9).

 
Underestimation of FDR when TS(d) is greater than the true number of DE genes.

The same simulation set-up is used as above except now TS(d) is set to be from 500 to 600, which is greater than the the true number of DE genes (=400).

As shown in Figure 2, for the t and SAM statistics, Xie et al.'s method underestimates the true FDR while our proposed method gives much more accurate estimates. However, for the mean statistic, our method does not give any improvement over Xie et al.'s method. The reason is that the SAM statistic was used to predict DE genes in Xie et al. (2005) while our method Formula uses the same mean statistic in both predicting the DE genes and estimating the FDR. The better performance of Xie et al.'s method in this case is due to the use of the SAM statistic in predicting DE genes, rather than the method itself. As it can be seen from the top plot of Figure 2, our estimator Formula performs significantly better than Xie et al.'s method when the SAM statistic is used.


Figure 2
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. FDR curves of different estimation methods using the SAM mean and t-statistics. There are 400 DE genes among 4000 genes. The number of claimed significant gene ranges from 500 to 600. Our method 1 is the estimator Figure 2 from (9).

 
3.2 Underestimation caused by removing the predicted DE genes
In this section, we show that removing all predicted DE genes will lead to an underestimation of the true FP number. We generate $n=4000$ genes with $k=5$, while 150 of them are DE genes. The expression levels for EE and DE genes are generated in the same way as in Section 3.1. The number of claimed significant genes is set to be 150, which is the number of true DE genes. Table 1 lists the true FP number, the estimated FP number with 150 predicted DE genes removed (Formula ), and the estimated FP number with 150 true DE genes removed (Formula ). The results reported are the averages from 50 replicates.

From Table 1, we can see Formula is always less than Formula . This shows removing predicted DE genes gives a smaller estimate of FP number than that of removing the true DE genes.


View this table:
[in this window]
[in a new window]

 
Table 1. Comparison of estimated FP numbers and the true FP numbers using the SAM, mean and t-statistics

 
3.3 Performance of our methods
To evaluate the performance of our methods, the same simulation set-ups are used as those in Section 3.2. We want to see whether our proposed estimator Formula from (10) can overcome the problems or at least has some advantages over other estimators.

We compare four different FDR estimation methods: the standard estimator Formula from (4), Xie et al. (2005) estimator Formula from (7), and two estimators we proposed: Formula from (9), Formula from (10).

Figure 3 shows that the estimator of Xie et al. (2005) always significantly underestimates the true FDR's. The estimator Formula also underestimates FDR due to over-deletion, but is much better than Xie et al.'s estimator for the SAM statistic. For the mean and t-statistics, Xie et al.'s estimator outperforms Formula sometimes due to the same reason discussed previously—the use of the SAM statistic in obtaining the predicted DE genes. In contrast, Formula does not have this problem. However, for the SAM statistic and the t-statistic, Formula slightly overestimates the true FDR. This overestimation is not caused by the estimator Formula , but by the overestimation of {pi}0 caused by Formula . To see this, we replaced Formula in (9) for Formula and in (10) for Formula with the true {pi}0=3850/4000. Figure 4 shows the comparison between the true FDR and the estimated FDR from (9) and (10) with the true value of {pi}0. We can see that Formula now gives smaller estimates of FDR for all three test statistics compared to Figure 3. Another fact worth noticing in Figure 3 and 4 is that when the number of claimed significant genes is small, Formula does not show much advantage. The reason is that, in such a case, most of the significant genes are true DE genes and the number of FP genes is much smaller than the number of true DE genes. Hence, removing the FP genes is not going to have significant impact on the estimation of the FDR.


Figure 3
View larger version (14K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. FDR curves of different estimation methods using the SAM, mean and t-statistics. There are 150 DE genes among 4000 genes. The number of claimed significant gene ranges from 20 to 400. Figure 3 is used as estimate of {pi}0. Our methods 1 and 2 are the estimators Figure 3 from (9) and Figure 3 from (10), respectively.

 

Figure 4
View larger version (14K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4. FDR curves of different estimation methods using the SAM, mean and t-statistics. There are 150 DE genes among 4000 genes. The number of claimed significant gene ranges from 20 to 400. The true {pi}0=3850/4000 is used as estimate of {pi}0. Our methods 1 and 2 are the estimators Figure 4 from (9) and Figure 4 from (10), respectively.

 
3.4 Comparisons under other simulation set-ups
We also want to see how the ratio of induced (I) and repressed (R) genes influences the performance of the FDR estimators. Here, k=5, n=4000 and there are 150 DE genes. The expression level Yij for EE genes are generated from N(0,4). For DE genes, n' of them are generated from N(4,4), and the rest of them are generated from N(–4,4); where n'=150,100,50,0. We set the number of claimed significant genes as 300. The results reported in Table 2 are the averages from 50 replications.


View this table:
[in this window]
[in a new window]

 
Table 2. Comparison of the performance of FDR estimator when the ratio of induced and repressed genes changes

 
The results confirm that our methods are stable to the change of ratios of the induced and repressed genes.

We have also conducted another simulation which tries to mimic the real data. Similar simulation set-up as above is used except the expression level Yij for EE genes are generated from N(0,{sigma}Formula) while {sigma}Formula~Gamma(4,2) and Yij for DE gene are generated from Ni,{sigma}Formula) while µi~N(0,16), {sigma}Formula~Gamma(4,2).

From Figure 5, we can see that the results are similar as before for the SAM and t-statistics: the standard method always overestimates and method of Xie et al. (2005) always underestimates. Formula performs better than Xie et al.'s method and Formula always performs the best.


Figure 5
View larger version (14K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 5. FDR curves of different estimation methods using the SAM, mean, and t-statistics. Mimicking the real data. There are 150 DE genes among 4000 genes. The number of claimed significant gene ranges from 20 to 400. Our methods 1 and 2 are the estimators Figure 5 from (9) and Figure 5 from (10), respectively.

 
3.5 Biological data
In Zhong et al. (2004), duplications and deletions in an evolved strain (DD2459) were identified by a whole-genome Escherichia coli MG1655 spotted DNA microarray experiment with three replicates. Thirty-eight genes have been confirmed to be true duplicated/deleted genes by rtPCR. To compare our proposed estimator Formula with Xie et al.'s estimator Formula , we used this data to construct a table summarizing the upper bound of true FDR (the proportion of detected DE genes which are not in the confirmed 38 DE genes), FDR estimates given by Formula and Formula for different number of total significant genes (TS(d)). Because the confirmed 38 true DE genes are mostly genes with largest mean in absolute value, we can see from Table 3 that the mean statistic gives the smallest FDR upper bound while the t-statistic does not detect any one of the 38 true DE genes. Table 3 also shows that Formula always gives more accurate FDR estimates than Formula .


View this table:
[in this window]
[in a new window]

 
Table 3. Comparison of the performance of Table 3 and Table 3 using microarray data from Zhong et al. (2004)

 

    4 DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
In this article, we have showed that the bias-corrected FDR estimator proposed in Xie et al. (2005) uses an inappropriate estimate of {pi}0 and still has severe under- or over-estimation problem. We have proposed two new modifications to overcome those problems. Simulation studies and application to real data have confirmed that our estimator Formula gives significantly better FDR estimates than Formula in Xie et al. (2005).

Current null statistics are constructed by randomly assigning the ‘+’ or ‘–’ signs to replicates of genes. As a consequence, the number of ‘+’ and ‘–’ signs can be different in this random assignment. Mean expression levels of EE genes will always be 0 regardless of the way of assigning the signs. However, when there is an unbalanced number of ‘+’ and ‘–’, the mean expression levels of DE genes will not be 0, which may cause the null statistics of DE genes to have different distributions from that of EE genes. Hence, it is intuitive to deduce that if we make the number of ‘+’ and ‘–’ stay balanced, this problem can be avoided. In Pan (2003) and Zhang (2006), they proposed a series of such kind of ‘balanced’ null statistics, which have the same distribution for both DE and EE genes. It would be interesting to compare the performance of our FDR estimators and estimators based on ‘balanced’ null statistics in the future research.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
We would like to thank the Associate Editor and three referees for their constructive comments which have significantly improved the quality of the article. We also thank Dr Shaobin Zhong for providing us the microarray data.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Alex Bateman

Received on February 14, 2008; revised on June 11, 2008; accepted on June 12, 2008

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Benjamini Y, Hochberg Y. Controlling the false discovery rate: a pratical and powerful approach to multiple testing. J. R. Stat. Soc (1995) 57:289–300.

    Benjamini Y, Yekutieli D. The control of the False discovery rate in multiple testing under dependency. Ann. Stat (2001) 29:1165–1188.[CrossRef]

    Efron B, et al. Empirical Bayes analysis of a microarray experiment. J. Am. Stat. Assoc (2001) 96:1151–1160.[CrossRef][Web of Science]

    Guo X, Pan W. Using weighted permutation scores to detect differential gene expression with microarray data. J. Comput. Biol (2005) 3:989–1006.

    Kendziorski CM, et al. On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Stat. Med (2003) 22:3899–3914.[CrossRef][Web of Science][Medline]

    Kerr MK, et al. Analysis of variance for gene expression microarray data. J. Comput. Biol (2000) 7:19–837.

    Newton MA, et al. On differentially variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J. Comput. Biol (2001) 8:37–52.[CrossRef][Web of Science][Medline]

    Pan W, et al. A mixture model approach to detecting differentially expressed genes with microarray data. Funct. Integr. Genomics (2003) 3:117–124.[CrossRef][Medline]

    Pan W. On the use of permutation in the performance of a class of nonparametric methods to detect differential gene expression. Bioinformatics (2003) 19:1333–1040.[Abstract/Free Full Text]

    Pollard KS, et al. Multiple testing procedures: R multtest package and applications to genomics. (2004) last accessed date December 2004. 164. U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper, Available at http://www.bepress.com/ucbbiostat/paper164.

    Smyth GK. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Gene. Mol. Biol (2004) 3. Article 3.

    Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA (2003) 100:9440–9445.[Abstract/Free Full Text]

    Thomas JG, et al. An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res (2001) 11:1227–1236.[Abstract/Free Full Text]

    Tusher VG, et al. Significant analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA (2001) 98:5116–5121.[Abstract/Free Full Text]

    Xie Y, et al. A note on using permutation based false discovery rate estimate to compare different analysis methods for microarray data. Bioinformatics (2005) 21:4280–4288.[Abstract/Free Full Text]

    Yekutieli D, Benjamini Y. Resampling based false discovery rate controlling multiple testing procedure for correlated test statistics. J. Stat. Plann. Inference (1999) 82:171–196.[CrossRef]

    Zhao Y, Pan W. Modified nonparametric approaches to detecting differentially expressed genes in replicated microarray experiments. Bioinformatics (2003) 19:1046–1054.[Abstract/Free Full Text]

    Zhang S. An improved nonparametric approach for detecting differentially expressed genes with replicated microarray data. Stat. Appl. Gene. Mol. Biol (2006) 5. Article 30.

    Zhong S, et al. Evolutionary genomics of ecological specialization. Proc. Natl Acad. Sci. USA (2004) 101:11719–11724.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
Y. Xie
Comments on 'On correcting the overestimation of the permutation-based false discovery rate estimator'
Bioinformatics, October 15, 2008; 24(20): 2420 - 2420.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
24/15/1655    most recent
btn310v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Jiao, S.
Right arrow Articles by Zhang, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Jiao, S.
Right arrow Articles by Zhang, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?