Skip Navigation


Bioinformatics Advance Access originally published online on August 16, 2005
Bioinformatics 2005 21(20):3865-3872; doi:10.1093/bioinformatics/bti626
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow A correction has been published
Right arrow All Versions of this Article:
21/20/3865    most recent
bti626v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (13)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Pawitan, Y.
Right arrow Articles by Ploner, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Pawitan, Y.
Right arrow Articles by Ploner, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oxfordjournals.org

Bias in the estimation of false discovery rate in microarray studies

Yudi Pawitan 1,*, Karuturi R. Krishna Murthy 2, Stefan Michiels 3 and Alexander Ploner 1

1Department of Medical Epidemiology and Biostatistics, Karolinska Institutet 17177 Stockholm, Sweden
2Genome Institute of Singapore Singapore
3Unit of Biostatistics and Epidemiology, Institute Gustave Roussy Villejuif, France

*To whom correspondence should be adderessed.


    Abstract
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 

Motivation: The false discovery rate (FDR) provides a key statistical assessment for microarray studies. Its value depends on the proportion {pi}0 of non-differentially expressed (non-DE) genes. In most microarray studies, many genes have small effects not easily separable from non-DE genes. As a result, current methods often overestimate {pi}0 and FDR, leading to unnecessary loss of power in the overall analysis.

Methods: For the common two-sample comparison we derive a natural mixture model of the test statistic and an explicit bias formula in the standard estimation of {pi}0. We suggest an improved estimation of {pi}0 based on the mixture model and describe a practical likelihood-based procedure for this purpose.

Results: The analysis shows that a large bias occurs when {pi}0 is far from 1 and when the non-centrality parameters of the distribution of the test statistic are near zero. The theoretical result also explains substantial discrepancies between non-parametric and model-based estimates of {pi}0. Simulation studies indicate mixture-model estimates are less biased than standard estimates. The method is applied to breast cancer and lymphoma data examples.

Availability: An R-package OCplus containing functions to compute {pi}0 based on the mixture model, the resulting FDR and other operating characteristics of microarray data, is freely available at http://www.meb.ki.se/~yudpaw

Contact: yudi.pawitan{at}meb.ki.se and alexander.ploner{at}meb.ki.se


    1 INTRODUCTION
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
The false discovery rate (FDR) is defined as the expected proportion of false positives among the list of genes declared to be differentially expressed (DE). This directly useful interpretation allows the researcher to balance the length of their candidate list against its quality, and this makes the FDR a more useful metric for controlling the false positive rate than the standard P-value. Benjamini and Hochberg (1995) originally introduced a procedure to control FDR by a sequential P-value adjustment, but it is clear that the FDR can be considered simply as an unknown parameter to be estimated from data [Storey (2002), Tsai et al. (2003), Broët et al. (2004), Pounds and Cheng (2004), Datta and Datta (2005), Dalmasso et al. (2005), Grant et al. (2005), Pawitan et al. (2005)]. The purpose of this paper is to discuss the bias problem in the current estimation of FDR.

It is evident from theory that FDR depends on the proportion {pi}0 of non-differentially expressed (non-DE) genes. In our experience with many microarray data, it is a common feature that there are many genes with small effects. When coupled with small sample size, these genes become barely separable from truly non-DE genes. Using a simple example later, we will show that this is the key determinant of bias in the estimation of {pi}0 and FDR. We develop a mixture-model theory to explain exactly the reasons of bias in two-sample comparison problems. The mixture model also suggests a less-biased estimation of {pi}0. In summary, novel contributions of the paper include (i) a theoretical explanation and formula of the bias in the current estimation of {pi}0 and FDR, hence providing a theoretical connection between the model-based and non-parametric estimation of {pi}0, and (ii) a practical likelihood-based procedure to improve the estimation of {pi}0 based on the mixture model.

Although the ideas apply generally, we consider the common two-group comparison problem using a certain statistic Z to compare the mean log-expression level. The distribution F of the observed statistics is a mixture of the form

(1)
where {pi}0 is the proportion of truly non-DE genes, F0(z) is the distribution of the statistics for non-DE genes, and F1(z) for DE genes. Using a symmetric two-sided procedure, so that we declare DE genes if |Z| > c for a given critical value c > 0, we can compute the proportion of declared DE genes as {F(–c) + 1 – F(c)}. If F is symmetric, this is equal to 2{1 – F(c)}, but we will use the more general asymmetric formulas. Similarly, the classical significance level is equal to {F0(–c) + 1 – F0(c)}. The FDR as a function of c is then given by

The estimation of the FDR from the observed statistics z1, ..., zm for m genes requires therefore three quantities: {pi}0, F0 and F. The estimate of F is immediate from the observed statistics, using the empirical distribution function. The null distribution F0 can be generated using a permutation argument, since for non-DE genes the group labels can be permuted without changing the distribution (Efron et al., 2001). A commonly used formula for estimating {pi}0 (Storey, 2002) is

(2)
for a certain choice of {lambda}, and the P-values are associated with statistics zis. This nonparametric or model-free formula has been justified intuitively with the argument that the largest P-values come from non-DE genes, and for these genes the P-values are uniformly distributed between 0 and 1, and

so the estimate (2) looks reasonable. Simple choices of {lambda} such as 0.5 or 0.75 are often used; Storey (2002) proposed optimizing the mean-squared error in the estimation of FDR, but usually there is little difference in the resulting estimate. Since the density f(p) of the P-value distribution is expected to be monotone decreasing, all existing non-parametric estimation procedures of {pi}0 essentially attempt to estimate f(1), the right-hand limit of the P-value density.

Hence, given the estimates , and , we can compute the empirical FDR by

This simple formula is not guaranteed to produce an FDR estimate that is sensibly monotone in c, but we can impose monotonicity by taking the cumulative minimum:

(3)
This monotone estimate is used throughout the paper.

In terms of the distribution of P-values we have an equivalent formula

(4)
where is the empirical distribution of P-values, i.e. the proportion of P-values less than or equal to {alpha}. If we order the P-values from smallest to largest as p(1), ..., p(m), at {alpha} = p(k), we have , and exactly

where the upper bound is the unconstrained version of the Benjamini and Hochberg (1995) adjusted P-value. The corresponding monotone constraint is imposed by taking a cumulative minimum over larger P-values. For a large number of genes, the constrained form of (4) is also essentially the same as the Q-value in Storey (2002).

To demonstrate the bias problem in the estimation of FDR, we start with a relatively simple theoretical model. Later we show that this model captures most of the characteristics of real data. Suppose we have two groups with n arrays per group, the proportion of truly non-DE genes is {pi}0, and for DE genes the log-fold changes (in log-2 scale) are distributed at ±D with equal proportions. In group 1, the log-expression values are normally distributed with mean zero and variance {sigma}2. In group 2, the non-DE genes are log-normal with mean zero and variance {sigma}2, while the DE genes are log-normal with mean ±D and variance {sigma}2. So, the distribution of the observed t-statistics is a mixture of the form

(5)

(6)
where F0(t) is the central t-distribution with degrees of freedom df = 2n – 2, and G1(t) and G2(t) are non-central t distributions with df = 2n – 2 and non-centrality parameters and –{delta}, respectively (Pawitan et al., 2005).

Figure 1a shows the histogram of P-values obtained from 10 000 t-statistics simulated from the model above with n = 10 arrays per group, {pi}0 = 0.6, D = 1.33 and {sigma} = 1. The P-values are computed using the the empirical null distribution F0 based on permutations (which in this case is almost identical to the theoretical null distribution F0). The horizontal line at the y-axis equal to {pi}0 = 0.6 matches the level of the histogram at larger P-values, so the bias in estimating {pi}0 using (2) is negligible. Figure 1b on the other hand shows the P-values for a dataset simulated from the same parameters, except for a smaller log-fold change D = 0.45. Here the bias is obvious, as the histogram clearly converges at a much higher level than {pi}0 = 0.6.



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 1 (a) Histogram of 10 000 P-values of the t-statistics for two-group comparisons of data simulated for {pi}0 = 0.6 and log-fold change D = 1.33. The horizontal line is at {pi}0 = 0.6. (b) The same as (a), except for a smaller log-fold change of D = 0.45. (c) Estimated FDR (solid) and true FDR (dashed) for the data simulated with a large log-fold change of D = 1.33. The grey lines indicate a pointwise 90% confidence band for the estimated FDR, based on 100 additional simulated datasets. The curves are very close to each other. (d) The same as (c), except for the smaller log-fold change of D = 0.45.

 
Figure 1c and d show the effect of the bias in the estimation of {pi}0 on the estimate of FDR. For the data with a large effect size in Figure 1c, the estimated FDR (solid line) and the true FDR (dashed line) are within the 90%-confidence band for the estimated FDR. However, for the data with small effect size in Figure 1d, there is a large difference between the estimated and the true FDR, evidently larger than the sampling variability in the estimated FDR as indicated by the confidence bands. To distinguish it from the latter estimate, sometimes we will call this bias-inflated estimated FDR the apparent FDR. Note that this same problem occurred in the simulation results (Table 5) of Taylor et al. (2005), though the authors did not comment on it.


    2 METHODS
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
2.1 Conventional estimation of {pi}0
Conventional approaches to estimating {pi}0 can be motivated by the model (1)

If {pi}0 is close to 1, or if there is an interval [a,b] so that F1(z) is almost constant between a and b, then we can use the resulting approximation

(7)
to estimate {pi}0 by

Intuitively, as the DE genes have non-zero effects, for t-statistics F1(z) should be almost constant between –a and a, where a is a small positive value. For example, we might use

for a certain choice of {alpha}, for example 0.1 or 0.25, where t1, ..., tm are the observed t-statistics. We also get the standard estimator (2) by using the P-value as the non-parametric test statistic and setting e.g. [a,b] = [0.5, 1]. Alternatively, we can take the limit of the interval [a, b] in (7) to a point z0 representative of the null hypothesis, e.g. z0 = 0 when using a t-statistic, in which case we get

However, these simple estimates can be seriously biased if {pi}0 is far from 1 and there is no suitable interval [a, b]. The latter means F1 does not have a flat region, occurring when over-expressed and under-expressed genes do not separate cleanly, as happens when there are many genes with small effects. At best these estimates only work as loose upper bounds for {pi}0.

2.2 Bias for a simple alternative
To analyse the bias theoretically, from model (5), the resulting distribution D(p) of the P-value is also a mixture of the form

using the previous F0, G1 and G2 for the central and non-central t distributions. Unfortunately the non-central t distribution is much too complicated for direct analysis and computation. For our purpose, however, we will use an approximation

i.e. we approximate the non-central t distribution by a shifted central t distribution. The approximation is excellent in the center part of the distribution, which is what we need for estimating the bias. A numerical derivative method of the non-central distribution leads to very close results.

Using some calculus, the P-value density f(p) is a mixture

(8)



where f0(t) is the density of the central t distribution with 2n – 2 degrees of freedom.

As previously noted, a non-parametric estimate of {pi}0 such as (2) tries to estimate f(1), so from (8), it is a biased estimate and the amount of bias is given by

(9)
Note that the relative error in estimating {pi}0 caused by the bias is the same as the relative error in the FDR:

2.3 Bias for a general mixture alternative
Let us assume that the log-fold changes are distributed at discrete values {D0 {equiv} 0, D1, ..., Dq}, with probabilities {{pi}0, {pi}1, ..., {pi}q}. Let N = n1 + n2 be the combined sample size in both groups, where n1 is the number of arrays in group 1 and n2 in group 2. The previous results can then be generalized as follows: the distribution of the test statistics is a mixture

(10)
where Gi(t) is the non-central t-distribution with N – 2 degrees of freedom and non-centrality parameter

For i = 0 we have {delta}0 = 0 and G0(t) {equiv} F0(t), the central t distribution with N – 2 degrees of freedom. The resulting distribution D(p) of the two-sided P-value is a mixture of the form

where, as before, we approximate the non-central t distribution by a shifted central t distribution.

The P-value density f(p) is a mixture

(11)

where f0(t) is the density of the central t distribution with N–2 degrees of freedom. Hence the bias in the standard estimation of {pi}0 is

2.4 Estimating {pi}0 from the general mixture model
Since the bias formula depends on the unknown mixing probabilities, in practice it does not help us in estimating {pi}0. The key problem in the standard methods such as (2) is that genes with small effects have large P-values, so any method that relies on the proportion of large P-values will be seriously biased. Using the observed statistics t1, ..., tm as data, an alternative estimate of {pi}0 that is much less biased can be obtained from the mixture model (10). In this model, we have 2q+1 unknown parameters

where . Note that no assumptions are made regarding individual variances {sigma}2. The number of components q can be chosen large enough so that the model captures the distribution of the observed statistics. Specifying too many components typically leads to replication of certain components, splitting the corresponding p0 across several very similar non-centrality parameters. For an objective choice, we can use a model selection criterion such as the (Akaike information criterion (AIC) [e.g. Pawitan (2001), Section 13.5] or the Bayesian Information Criterion (BIC)).

As a working model, assume that the statistics are independent, so that we can immediately apply a likelihood-based method to estimate the parameters. (In practice, the effect of dependence on inference can be accounted for by using the bootstrap.) Based on the data t1,...,tm, the log-likelihood is

where fj(t) is the non-central t-density with N – 2 degrees of freedom and non-centrality parameter {delta}j, and {delta}0 {equiv} 0. An attractive feature of this mixture model is that, as shown in the previous section, the non-centrality parameter {delta}j can be interpreted in terms of effect size. Furthermore, if the total sample size N is >30, the t-density fj() can be well approximated by the normal density with mean {delta}j and variance one, which leads to simpler formulas.

A commonly used algorithm to estimate mixture models is given by the EM algorithm, described e.g. in [Pawitan (2001), Section 12.5]. We found, however, that because there is typically a large number of genes, the EM algorithm is too slow for routine use. This is compounded by the problem of local maxima in the likelihood surface, so a proper estimation requires replications with many different starting values. We derive a much faster algorithm based on prebinning the data into a relatively fine grid of intervals. We first split the range of the observed statistics into B equispaced bins (Tk–1, Tk)s, for k = 1, ..., B. Let y = (y1, ..., yB) be the number of statistics tis that fall in the bins. Then y has a multinomial distribution with probabilities ({alpha}1, ..., {alpha}B), where

(12)
using F(t) in (10). The log-likelihood based on the binned data y is

(13)
where the unknown parameters enter via the {alpha}ks, using (10) and (12). To deal with the constraints on the probabilities, we use the logistic transform

so the log-likelihood (13) can be maximized as a function of ({varphi}1, ..., {varphi}q) without constraints.

Given the complete set of estimated mixture parameters, we could theoretically proceed to estimate the FDR using the estimated mixture distribution F(t) in (10). We have, however, found that that it is difficult for the mixture model to capture the tail behavior of F(t) properly, and that the estimation of the mixture parameters corresponding to the contribution of the DE genes can be sensitive to the degree of binning used. In contrast, the estimation of {pi}0 is robust towards the effects of binning, and appears unbiased for sufficient effect sizes; see the simulation results below. Intuitively, this is not surprising, as the estimation of {pi}0 relies on the center of the distribution, which has better signal-to-noise ratio, whereas the estimation of the {pi}j for j > 0, and the corresponding effects {delta}js, will rely to a larger degree on the sparsely populated tail bins of the distribution.

In practice, we suggest therefore the use of the mixture model to estimate {pi}0 only, and the use of the empirical distribution functions to estimate F(t) and F0(t) for the purpose of computing the FDR. The other parameters of the mixture model can still be usefully interpreted as descriptive statistics, or for planning the sample size of future studies (Pawitan et al., 2005). The estimated effect sizes provide information about bias of the standard FDR estimate. An R-package OCplus containing functions to compute {pi}0 based on the mixture model, the resulting FDR and other operating characteristics of microarray data, is freely available at http://www.meb.ki.se/~yudpaw.


    3 RESULTS
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
3.1 Determinants of bias
From formula (9), we can show that the bias b in estimating {pi}0 is a decreasing function of the non-centrality parameter {delta}; see Figure 2a. For example, for our previous example, using n = 10, {pi} = 0.6 and D = 0.45, we have {delta} = 1 and f(1) = 0.84, and the bias in estimating {pi}0 is 0.24 or 40%. Recall that

combining all relevant design parameters of the experiment: it increases with larger n and larger log-fold change D, but decreases with larger measurement variance. Hence a large positive bias occurs in small experiments or when there are many genes with small effects or strictly with large coefficient of variation. The bias factor becomes negligible only when {delta} is >2.5 or 3. In Figure 1, for n = 10 and D = 1.33 we have {delta} = 3.0, so there is little bias. The curves in Figure 2a vary very little for different n, so the bias depends on the group size n almost exclusively through the non-centrality parameter {delta}.



View larger version (14K):
[in this window]
[in a new window]
 
Fig. 2 (a) The constant h1(1) in the bias term (9) as a function of the non-centrality parameter {delta}, for n = 5 (dashed), n = 10 (solid) and n = 20 (dotted). (b) Contour plot of relative error when using (2) for estimating {pi}0, as a function of {pi}0 and D/{sigma}, for n = 10. The same relative error applies to the apparent FDR based on that estimate.

 
The bias of and consequently the estimated FDR on the other hand will depend in a non-trivial manner on the scaled effect size D/{sigma}. Figure 2b shows the contour plot of the relative error in the estimation of both {pi}0 and FDR for a fixed group size of n = 10. As expected, the error decreases with increasing effect size and percentage of non-DE genes. This effect is even more pronounced for larger sample sizes (figure not shown).

3.2 Simulated data
To illustrate the theoretical result, we first go back to the previous simulation example with D = 0.45 in Figure 1b and d. Figure 3a shows that the theoretical density f(p) from (8) follows closely the histogram of the P-values. The value f(1) = 0.84—the upper horizontal dashed line—matches the level of the histogram for large P-values. We can also compute the biased FDR using f(1) instead of the true {pi}0, using the mixture model

(14)
where F(t) and F0(t) are the same as in (5), but F1b(t) is a biased version of F1(t), since the mixture must add up to the same F(t). The resulting FDR is shown in Figure 3b and follows closely the apparent FDR. Thus the bias in estimating the FDR is largely explained by the bias in estimating {pi}0.



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 3 (a) Histogram of 10 000 P-values of the t-statistics for two-group comparisons; the horizontal lines are at {pi}0 = 0.6 and f(1) = 0.84. The log-fold change is D = 0.45. (b) The apparent FDR (solid) and the biased FDR implied by the model (dashed), together with 90% confidence bands for the estimated FDR.

 
To assess the factors influencing the estimation of {pi}0, we conducted a larger simulation study using model (5) with a simple two-sided alternative (q = 2). We varied the proportion of non-DE genes {pi}0 = 0.5, 0.7 and 0.9, the effect size as measured by the non-centrality parameter {delta} = 0.5, 1 and 2, the number of arrays per group n = 10 and 30. We set the number of genes to 10 000, and each simulation set-up is run 25 times. The boxplots in Figure 4 summarize the estimation of {pi}0.



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 4 Simulation results of the the estimation of {pi}0 in the mixture model (5). Each boxplot summarizes 25 independent replications. The target parameter {pi}0 is drawn as a dashed line, and the biased values f(1) for {pi}0 derived in (8) are shown as connected dots at the top of each panel.

 
The results indicate that the non-centrality parameter is by far the most important factor when estimating using the mixture model. For a comparatively large effect {delta} = 2, all estimates are very close to the true value, regardless of {pi}0 or n. For an intermediate effect {delta} =1, the estimates still appear unbiased, but with larger variability. Only for the small effects {delta} = 0.5, we see clear bias in . From this simulation we expect to be able to properly estimate components with {delta} around 1 or larger, but have to be more careful if the estimated {delta} is less than one.

3.3 Breast cancer data
We applied the mixture-based estimation of to expression data collected from patients suffering from hereditary breast cancer (Hedenfalk et al., 2001). The patients had mutations either of the BRCA1 gene (n1 = 7) or the BRCA2 gene (n2 = 8). We compared the two types of tumors using a t-test with pooled variance, where the null distribution was generated via the permutation of group labels.

Figure 5a shows the histogram of the P-values from 3170 genes, with a strong spike near zero, indicating many genes with small P-values. The histogram converges at around 0.67, which is the estimate of {pi}0 using the standard formula (2) with {lambda} = 0.7. The resulting apparent FDR curve in Figure 5b (solid) yields 155 genes with FDR <5%, at a critical value of c = 3.31.



View larger version (13K):
[in this window]
[in a new window]
 
Fig. 5 (a) Histogram of 3170 P-values for comparison of 7 BRCA1 versus 8 BRCA2 breast tumors. The horizontal lines are drawn at 0.67 [using formula (2)] and 0.43 (the mixture estimate). (b) The apparent FDR using the biased estimate (solid line), the corrected FDR using the estimate based on the mixture model (dashed line) and the biased FDR implied by the mixture model (dotted line).

 
The AIC for the mixture model is minimized at q = 3 components. This provides the following estimates:

This suggests as a less-biased estimate of {pi}0. Using the profile likelihood of {pi}0, we obtained the nominal 95% confidence interval of 0.4 < {pi}0 < 0.53. (As we commented previously, a more precise interval can be found by bootstrapping.) Using the mixture estimate 0.43 instead of the biased estimate 0.67 for {pi}0, we calculated the corrected FDR curve as before, shown as dashed line in Figure 5b. We found 215 genes with corrected FDR <5%, at a critical level of c = 2.96. To show that the mixture model largely captures the observed distribution, we also show in Figure 5b the biased FDR implied by the mixture model (dotted), which is close to the apparent FDR but slightly larger in the tail area.

3.4 Lymphoma data
We next applied the mixture model approach to the analysis of 240 cases of diffuse large B-cell lymphoma data described in Rosenwald et al. (2002). The authors used a custom-made DNA microarray chip to collect measurements for 7399 probes. The average clinical follow up for patients was 4.4 years, with 138 deaths occurring during this period. For this analysis, we ignore the censoring information and only compare the 102 survivors versus 138 non-survivors, using again a t-test with pooled variance and a permutation null distribution.

Figure 6a shows the histogram of the P-values, which converges at ~0.84, which is also the standard estimate of {pi}0 using formula (2) with {lambda} = 0.7. The resulting apparent FDR curve in Figure 6b (solid line) indicates 15 genes with FDR <10%, at critical value c = 3.71.



View larger version (14K):
[in this window]
[in a new window]
 
Fig. 6 (a) Histogram of 7399 P-values for comparison of 102 survivors versus 138 non-survivors of lymphoma. The horizontal lines are at 0.84 [using formula (2)] and 0.59 (the mixture estimate). (b) The apparent FDR using the biased estimate (solid line), the corrected FDR using the estimate based on the mixture model (10) (dashed line) and the biased FDR implied by the mixture model (dotted).

 
The mixture model with q = 2 has the lowest AIC and provides the following estimates:

This means that a less-biased estimate of {pi}0 is (95% confidence interval 0.39 < {pi}0 < 0.69). The corrected FDR resulting from the mixture model (dashed line in Fig. 6b) is again smaller than the apparent FDR, and we find 105 genes with estimated FDR <10%, at critical level c = 3.01. The biased FDR implied by the mixture model (dotted) is also close to the apparent FDR (solid).


    4 DISCUSSION
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
The massive multiplicity of gene expression microarray data creates a serious problem when we try to discover genes that are DE between experimental conditions. The classical control of the family-wise error rate (FWER) is overly conservative. When used for gene discovery, the microarray is essentially a screening tool that allows researchers to generate a list of candidate genes that needs to be confirmed by different means. In this context, controlling FWER—the probability of any gene being falsely declared DE—is not relevant, and most researchers will accept a small number of false positive genes as the price for finding a good candidate list. This explains why the FDR has become popular for assessing the differential expression of genes.

Recent methods of estimation of FDR are either non-parametric, based on the observed distribution of P-values, or based on mixture models [e.g. Pounds and Cheng (2004), Broët et al. (2004)]. These two methods can produce different results, and our analysis shows that they are not estimating the same quantity, and that the non-parametric estimate of {pi}0 is severely biased upwards if there are many genes with small effects. Intuitively, this occurs because some fraction of these genes are not separable from non-DE genes. As shown in our analysis, the term ‘small effect’ is objectively captured by a small non-centrality parameter, so it is a function of sample size and measurement variance as well as the true effect size. In general, the bias problem is worse in small experiments. The FDR estimated using the biased estimate of {pi}0 is also biased, so it results in unnecessary loss of power.

The concept of local false discovery rate (fdr, to distinguish it from the global FDR) (Efron et al., 2001) also needs an estimate of {pi}0 and its current use suffers from the bias problem mentioned in this paper.

Note that in (1) we can only get estimates for F and F0 easily using the empirical distribution functions. After we plug in these estimates, the equation still contains two unknown parameters, {pi}0 and F1. Current approaches estimate {pi}0 first, and use this biased estimate to compute . We have shown that this can lead to severe bias in the subsequent estimation of FDR. We suggest that under these circumstances, it is possible to get less bias by estimating {pi}0 and F1 jointly, using a method that we have described.

Recently Efron (2004) argued that genes of small effects are not ‘interesting’ and they should be considered ‘null’ genes. He did not, however, consider the effect of these genes on the bias in estimating {pi}0 and FDR. The problem with such genes is that it requires a large sample to detect them. However, since it is now becoming economically feasible to run large microarray studies to detect these genes, it is not at all obvious at what effect size one should consider these genes uninteresting. Genes that are uninteresting in an experiment with sample size n = 10 can become interesting when n = 40 or larger. It is not unrealistic to imagine studies with triple or quadruple the sample size of many current studies, thus reducing the FDR substantially and allowing the discovery of genes with small effects.

We have shown that the multiplicity of genes in a microarray study allows an estimation of the proportion of genes with relatively small effects. Very likely, the majority of these genes are not detectable in the experiment, i.e. the sensitivity is small (Pawitan et al., 2005). However, we believe it is important to know the proportions and effect sizes, since they will inform us how large our experiment should be to detect them. Our simulation study indicates, however, that there is an inherent limit in the determination of effect size, approximately around a non-centrality parameter of {delta} = 1. At this size the mixture-based estimate is less conservative than the current procedure based on (2). As we have shown in the examples, the mixture-model leads to distinct estimates of {pi}0 and FDR compared to the current estimates.

Conflict of Interest: Murthy declares that he is currently conducting research supported by the Genome Institute of Singapore, Singapore.

Received on June 8, 2005; revised on July 11, 2005; accepted on August 10, 2005

    REFERENCES
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 

    Benjamini, Y. and Hochberg, Y. (1995) Controling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. B, 57, 289–300.

    Broët, P., et al. (2004) A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments. Bioinformatics, 20, 2562–2571[Abstract/Free Full Text].

    Dalmasso, C., et al. (2005) A simple procedure for estimating the false discovery rate. Bioinformatics, 21, 660–668[Abstract/Free Full Text].

    Datta, S. and Datta, S. (2005) Empirical Bayes screening of many P-values with applications to microarray studies. Bioinformatics, 21, 1987–1994[Abstract/Free Full Text].

    Efron, B., et al. (2001) Empirical Bayes analysis of a microarray experiment. J. Am. Stat. Soc., 96, 1151–1160.

    Efron, B. (2004) Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J. Am. Stat. Soc., 99, 96–104.

    Grant, G.R., et al. (2005) A practical false discovery rate approach to identifying patterns of differential expression in microarray data. Bioinformatics, 21, 2684–2690[Abstract/Free Full Text].

    Hedenfalk, I., et al. (2001) Gene-expression profiles in hereditary breast cancer. N. Engl. J. Med., 344, 539–548[Abstract/Free Full Text].

    Pawitan, Y. In All Likelihood: Statistical Modelling and Inference Using Likelihood, (2001) , Oxford Oxford University Press.

    Pawitan, Y., et al. (2005) False discovery rate, sensitivity and sample size for microarray studies. Bioinformatics, 21, 3017–3024[Abstract/Free Full Text].

    Pounds, S. and Cheng, C. (2004) Improving false discovery rate estimation. Bioinformatics, 20, 1737–1745[Abstract/Free Full Text].

    Rosenwald, A., et al. (2002) The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N. Engl. J. Med., 346, 1937–1947[Abstract/Free Full Text].

    Storey, J.D. (2002) A direct approach to false discovery rates. J. R. Statist. Soc. B., 64, 479–498[CrossRef].

    Taylor, J., et al. (2005) The ‘miss rate’ for the analysis of gene expression data. Biostatistics, 6, 111–117[Abstract].

    Tsai, C.A., et al. (2003) Estimation of false discovery rates in multiple testing: application to gene microarray data. Biometrics, 59, 1071–1081[CrossRef][Web of Science][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
M. Zhang, C. Yao, Z. Guo, J. Zou, L. Zhang, H. Xiao, D. Wang, D. Yang, X. Gong, J. Zhu, et al.
Apparently low reproducibility of true differential expression discoveries in microarray studies
Bioinformatics, September 15, 2008; 24(18): 2057 - 2063.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Y. Pawitan, S. Calza, and A. Ploner
Estimation of false discovery proportion under general dependence
Bioinformatics, December 15, 2006; 22(24): 3025 - 3031.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
T. S. Mehta, S. O. Zakharkin, G. L. Gadbury, and D. B. Allison
Epistemological issues in omics and high-dimensional biology: give the people what they want
Physiol Genomics, December 13, 2006; 28(1): 24 - 32.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Ploner, S. Calza, A. Gusnanto, and Y. Pawitan
Multidimensional local false discovery rate for microarray studies
Bioinformatics, March 1, 2006; 22(5): 556 - 565.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
L. Wolfram, E. Haas, and P. Bauerfeind
Nickel Represses the Synthesis of the Nickel Permease NixA of Helicobacter pylori
J. Bacteriol., February 15, 2006; 188(4): 1245 - 1250.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow A correction has been published
Right arrow All Versions of this Article:
21/20/3865    most recent
bti626v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (13)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Pawitan, Y.
Right arrow Articles by Ploner, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Pawitan, Y.
Right arrow Articles by Ploner, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?