Bioinformatics Advance Access originally published online on February 1, 2008
Bioinformatics 2008 24(7):943-949; doi:10.1093/bioinformatics/btn049
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bayesian models based on test statistics for multiple hypothesis testing problems
1Department of Bioinformatics and Computational Biology and 2Department of Systems Biology, The University of Texas, M. D. Anderson Cancer Center, Houston, TX 77030, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool.
Results: Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.
Contact: yuanji{at}mdanderson.org
Supplementary information: Supplementary data are available at Bioinformatics online.
| 1 INTRODUCTION |
|---|
|
|
|---|
The problem of multiple testing is routinely encountered in bioinformatics research and is currently the subject of intense study. Recent Bayesian perspectives on this problem include Berry and Hochberg (1999), Müller et al. (2004, 2007) and Scott and Berger (2006). Other studies applying multiple testing approaches in bioinformatics, primarily in the context of microarray experiments, include those of Efron et al. (2001), McLachlan et al. (2006), Newton et al. (2004) and West (2003).
Like most Bayesian testing procedures, we propose to approach the multiple testing problem by estimating the posterior probabilities that individual null hypotheses are true. These probabilities are studied through the introduction of latent binary random variables that indicate whether each null hypothesis is true or false. Conditionally on the values of the binary indicators, test statistics are modeled as being generated from one of two distributions: (1) the distribution under the null hypothesis and (2) the distribution under the alternative hypothesis. Ideally, the test statistics would be independent and their null distributions would be known. For example, the nominal null distribution might be assumed to be a central F- or a central
2-distribution. In what follows, we refer to this distribution as the theoretical null. However, in most bioinformatics experiments—such as gene expression studies—many genes are functionally dependent. Consequently, the corresponding test statistics are no longer independent, and their joint distribution is not accurately described by the product measure obtained from the theoretical null under the null hypothesis. Instead, the test statistics follow a different distribution, called the empirical null, which has been discussed recently in the statistical literature (Efron, 2004, 2007).
Empirical null is an estimated distribution for the test statistics under the null hypothesis when the test statistics can no longer be considered as a random sample from the theoretical null distribution, e.g. when the test statistics are dependent. The term empirical is used to indicate that the null distribution is defined based on observation, rather than on theory. Therefore, different bioinformatics studies may yield different empirical null distributions, which we need to estimate. Figure 1 is taken from Efron (2007) as an illustration of the empirical null. The two panels represent histograms of test statistics from two bioinformatics studies. In both panels, the theoretical null distribution should be the standard normal N(0,1). The density of N(0,1) is too narrow in the left panel and too wide in the right. Instead, Efron (2007) estimated two empirical null densities as shown in the red curves. Apparently, the traditional statistical inference under the theoretical N(0,1) null density would yield very different results from those based on the empirical null for these two examples. Figure 2 plots the histogram of the computed F-statistics for an example to be considered in this article. Similarly, we found that the estimated empirical null (red curve) is also different from the theoretical null.
|
|
A defining feature of our approach is the parsimonious description of the distributions of test statistics under both null and alternative hypotheses. Under both models, we assume that the distribution of the test statistics can be modeled using a non-central version of a theoretical null distribution (e.g. a non-central
2 or non-central F distribution). Through judicious modeling of the non-centrality parameters, we are able to marginalize over the prior distribution on the non-centrality parameter to obtain a closed-form expression for the distributions of the test statistics under both the null and alternative hypotheses. These distributions depend on two scale parameters, which we estimate using the values of all test statistics.
As a simple illustration of our method, we focus on the analysis of data collected for the purpose of studying differential expression of genes. The conflicting hypotheses we wish to test are the null hypotheses of no differential expression versus the alternative that genes are differentially expressed. In our analysis, we consider the case in which the tests of hypotheses are based on either parametric F-statistics or non-parametric
2-statistics derived from Mann–Whitney tests. This methodology extends easily to other test statistics, including t and z tests (Johnson, 2005), as well as test statistics based on other non-parametric modeling (Yuan and Johnson, 2006).
An important step in statistical inference is model checking, in which one assesses whether a proposed model fits the data at hand. Although few previously proposed multiple comparison procedures incorporate model diagnosis, we propose simple methodology to assess our model assumptions. The resultant diagnostic is based on a comparison of sample quantiles and theoretical quantiles of the test statistics (Johnson, 2007). Provided that posterior samples are available from the Markov chain Monte Carlo (MCMC) algorithm used to compute posterior model probabilities, our diagnostic procedure requires no additional simulation and so can be implemented without additional computational cost. A simple computer program written in R is available upon request of the first author.
The remainder of the article is organized as follows. In Section 2, we present the Bayesian hierarchical models that underlie our methodology. We then describe in detail the form of the tests in the special cases of two-sample comparisons based on F-statistics and
2-statistics. In Section 3, we apply an algorithm that uses estimated posterior model probabilities to determine the threshold for declaring significance of the tests based on Bayesian false discovery rates (FDR). We describe model checking procedures in Section 4, and demonstrate our methodology using a simulation study in Section 5. We apply our method to two practical data sets in Section 6, one from an ongoing small interfering RNA (siRNA) screening experiment and the other from a published microarray experiment. In Section 7, we conclude with a brief discussion of the results.
| 2 METHODOLOGY |
|---|
|
|
|---|
Let fi denote the test statistics collected for the purpose of testing the null hypothesis H0i against the alternative hypothesis H1i, for i = 1, ... , m. As explained previously, to account for deviations from the theoretical null, we assume that the empirical null distribution of the test statistics fi follows a non-central version of the theoretical null distribution. In the case of a two-sample comparison with F-statistics, we show in the Section 3 that this empirical null distribution can often be modeled as a scaled F-distribution. Under the alternative hypothesis, this result is obtained by assuming that the difference between the true means of the two samples follows a normal distribution with mean zero. Importantly, we assume that the distributions of the non-centrality parameters in the models can be approximated by the convenience priors described in Johnson (2005) or Yuan and Johnson (2006), so that the marginal distributions of the test statistics under the null and alternative models are also known up to a single scale parameter. We will illustrate this point in Section 2.1 based on F- and
2-statistics.
To describe the particular form of these distributions, let p(fi|
0) and p(fi|
1) denote the probability density functions of fi under the null H0i and alternative H1i models. Let J denote an m-dimensional vector of 0's and 1's for which Ji = 0 if H0i is true and Ji = 1 if H1i is true, and let f = (f1, ... , fm)'. Then the likelihood function for J and
based on f may be written as
|
| (1) |
0 and
1 are the scalar parameters that describe the distributions of the non-centrality parameters.
Before we specify the priors for parameters J,
0 and
1 in Section 2.2, we first introduce two concrete examples of the general models proposed above.
2.1 Multiple hypothesis testing based on F- and
2-statistics
We consider the problem of testing the validity of linear constraints on regression parameters. This setting is relevant to many bioinformatics problems in which the value of a treatment mean is of interest. In such cases, the null hypotheses typically correspond to setting a regression parameter equal to 0.
|
| (2) |
i, where W is an r x k matrix of rank k whose range space is contained in the range space of |
|
To make this example more concrete, suppose that the observed F-statistics represent values collected from a microarray experiment in which expression levels of multiple genes are recorded under two sample conditions. Let yijk denote the measurement of gene i under condition j at the k-th replicate, i = 1, ... , m, j = 0,1 and k = 1, ... , ni, and let E(yi 0k) = µi and E(yi1k) = µi +
i. Assuming W = (0, 1)', βi = (µi,
i) and
i = 0, we define the null hypothesis H0i:
i = 0. Following Johnson (2005), we assume under the alternative hypothesis that
|
| (3) |
i is assumed to come from a normal distribution with mean 0 and variance described by a scalar
1 > 0. For testing H0i against H1i, t-statistics may be defined according to |
|
1) follows a central F-distribution F1,2ni – 2, i.e.
|
| (4) |
The theoretical null distribution for the F-statistic is F1,2ni – 2. However, due to correlation among test statistics, we assume that under H0i the test statistics fi follows an empirical null described by another scaled F-distribution. Specifically, we assume that
|
| (5) |
0) > 0 is the scale parameter that describes the variance of the empirical null. When (1 +
0) = 1, the empirical null is the same as the theoretical null; when (1 +
0) > 1, the empirical null is overdispersed; when (1 +
0) < 1, the empirical null is underdispersed. Note that (5) can be interpreted as the distribution of fi under the null hypothesis
0 > 0 is a small scalar close to zero. This null assumes that the gene differential expression
i is drawn from a normal distribution with mean 0 and a small variance. Theoretically, we can show that the assumption in
0 > 0, the empirical null is always overdispersed than the theoretical null F1,2ni – 2, since (1 +
0) is always greater than one if
0 > 0. Efron (2007) has shown that for some practical data, one might have an empirical null with a smaller variance than that of the theoretical null. Because of this, we did not follow the assumption in
0) > 0 to allow for underdispersed empirical null distributions. Using standard results derived for F-statistics in the analysis of variance, similar calculations extend directly to cases in which gene expression levels are measured under more than two conditions. In addition, the above F-statistic model can be directly applied to two-sample pooled t-statistics; as we have demonstrated, one simply needs to square the t-statistics to obtain F-statistics.
When the linear model assumption in (2) is not valid, it may be more appropriate to base hypothesis tests on a non-parametric statistic. For example, in two sample comparisons the Mann–Whitney statistic might instead be used to test whether the medians of the two samples of observations are the same. Because the asymptotic null distribution of Mann–Whitney's U-statistic follows a normal distribution with fixed and known mean and variance, the square of the standardized U-statistic follows a
2-distribution on one degree of freedom.
Now, turning attention to
2-tests, let fi denote a statistic that has, say, a theoretical
distribution. For example, fi might represent the square of the standardized Mann–Whitney U-statistic. Following along lines similar to those proposed for the F-statistics, if we assume that fi arise from squared N(
i, 1) deviates, and that the
i are marginally distributed according to normal distributions with mean 0 and variance
1 under the alternative hypothesis, then it follows that the asymptotic marginal distributions of the test statistics can be expressed
|
| (6) |
|
| (7) |
0) > 0. Thus, p(fi|
0) and p(fi|
1) correspond to the densities of two scaled
2-distributions.
2.2 Prior and posterior
Based on the deduction in the previous section,
1 is a positive scalar that describes the dispersion of the testing parameter under the alternative hypothesis. In addition,
1 should be away from zero so that the alternative density p(fi|
1), a scaled F- or
2-distribution, is different from the theoretical null. Therefore, we assume that
1 follows an inverse gamma distribution ig(a1, b1) with mean b1/(a1 – 1). We assume that (1 +
0) follows a gamma distribution g(a0, b0) with mean a0b0. Therefore, we allow for both underdispersion [when (1 +
0) < 1] and overdispersion [when (1 +
0) > 1] in the empirical null. Finally, to reflect that a larger F- or
2-value is evidence favoring the alternative, we assume that the variance of alternative density is always larger than that of the empirical null, which leads to the restriction
0 <
1.
We assume that the components of J follow independent Bernoulli distributions with probability
= Pr(Ji = 1). To allow for more modeling flexibility, we specify a hyperprior distribution for
. Following Scott and Berger (2005), we assume that
follows a beta prior distribution, β(1,
+ 1), with mean 1/(1 +
). Denoting
as a best guess for
, Scott and Berger (2006) suggested setting
|
| (8) |
is set to 5.58.
Based on the densities p(f |
0) and p(f |
1), the priors for
0 and
1, and the priors for J and
, the joint posterior model on all model parameters may be expressed as
|
| (9) |
We develop a simple MCMC algorithm for generating samples from the joint posterior distribution, which is described in the Supplementary Materials.
| 3 BAYESIAN FALSE DISCOVERY RATE |
|---|
|
|
|---|
Using posterior samples of J,
,
0 and
1, we can estimate the posterior distribution on a variety of model-derived summary statistics. For example, the marginal posterior probability that the i-th alternative hypothesis is true can be estimated as the sample average of Ji values; we denote these values by ri, which are posterior estimates of Pr(Ji = 1|data).
In bioinformatics applications, it is common to apply a threshold value, say
, to values of ri to identify test statistics that have unusually large values. For example, genes with posterior probabilities of expression of ri >
may be considered candidates for biomarkers of diseases. Müller et al. (2004) and Müller et al. (2007) provide detailed discussions of how to determine optimal thresholds for various loss functions. For purposes of illustration, we describe how such an approach can be used to define a Bayesian analog of the FDR that is optimal in the sense that it minimizes the false non-discovery rate subject to maintain a fixed FDR. This approach was originally proposed by Newton et al. (2004), although the optimality property was not shown at that time.
To this end, we compute the posterior expected number of false discoveries as
where 1(·) is the indicator function. For a target FDR value
, the optimal cutoff
(
) is the largest value among all
such that
|
| (10) |
)/ S is the observed FDR. Computationally, this procedure can be implemented through several steps, which are given in the Supplementary Materials.
| 4 MODEL CHECKING |
|---|
|
|
|---|
The simplicity of the proposed model greatly reduces the computational burden associated with estimating posterior model probabilities. However, it is important to test whether the simplified assumptions that lead to these gains are warranted, or whether more sophisticated models are required. To answer this question, we propose a simple graphical tool to examine model fit. This graphical tool, a quantile–quantile plot (qq-plot), is based on a comparison of quantiles of posterior samples of test statistics with expected order statistics from the null (Ji = 0) or alternative (Ji = 1) hypotheses. The statistical theory related to the quantiles of test statistics can be found in Johnson (2007). The author proved that, for a pivotal quantity, its sampling distribution evaluated at a posterior draw of the model parameters is the same as the sampling distribution evaluated at the true values of the model parameters that give rise to the observed data. The pivotal quantity defined in Johnson (2007) is allowed to be a function of both data and parameters, which is different from the usual pivotal statistic defined as a function of data only. Therefore, we can apply the results in Johnson (2007) to the pivotal quantities fi/(1 +
0) and fi/(1 +
1), whose distributions are known, such as F or
2, under our proposed models. Specifically, if we draw a sample - Let
denote a value of the parameter vector (
0,
1, J) sampled at, say, iteration s in the MCMC procedure described in Section 3.
- For i = 1, ... ,m, assign test statistic fi to the null group if
; otherwise assign fi to the alternative group.
- Plot the sample quantiles of fi in the null group against the theoretical quantiles based on the distribution
.
- Plot the sample quantiles of fi in the alternative group against the theoretical quantiles based on the distribution
.
| 5 SIMULATION |
|---|
|
|
|---|
We present the results of two simulation studies in which
2- or F-statistics fi were generated from the proposed models.
In the first study, we set the true values of
= 0.20, (1 +
0) = 0.5 and
1 = 5, and generated Ji from a Bernoulli distribution with probability Pr(Ji = 1) =
. If Ji = 1, we subsequently generated fi from
; if Ji = 0, we generated fi from
. Using this scheme, we obtained m = 2000
2-statistics, in which 393 of them were generated from the alternative model
. Note that by setting (1 +
0) = 0.5 < 1, we assumed that the variance of the empirical null is smaller than that of the theoretical null.
In applying the proposed Bayesian method for the simulated
2-statistics, we used the following priors for
0 and
1. Specifically, we took a1 = 3 and b1 = 5, which led to an inverse gamma prior for
1 with mean 2.5 and standard deviation 2.5. We set a0 = 1 and b0 = 1, which led to a gamma prior for (1 +
0) with mean and standard deviation equal to 1. This gamma prior reflects our belief that the empirical null distribution is center at the theoretical null with some dispersion. Compared to the true values (1 +
0) = 0.5 and
1 = 5 used in this simulation study, the priors for
0 and
1 were not informative. To specify the prior of
, we tried the beta prior b(1, 5.58) suggested by Scott and Berger (2005).
Upon convergence of the Markov chain, we first computed the posterior estimates of the important model parameters. The posterior mean of
equaled 0.21 (compared to the true value 0.20) and the posterior means of (1 +
0) and
1 equaled 0.49 and 4.43 (compared to the true values 0.5 and 5). Figure 3 contains the qq-plots comparing the observed sample quantiles based on the simulated
2-statistics and the theoretic quantiles based on the scaled
2-distributions in (6) and (7). The qq-plots indicate a good fit of the model to the simulated data. As a further illustration, Figure 4 provides some posterior estimates and a model fitting summary. The top two plots present the density estimates of (1 +
0) and
1 based on the posterior samples from MCMC computation. The bottom two plots present the boxplots of the simulated test statistics and estimated posterior probabilities of alternative under the null model and the alternative model. In the left panel, the boxplots contain the simulated test statistics from the null model (with simulated Ji = 0) and from the alternative model (with simulated Ji = 1). In general, the test statistics under the alternative model are much larger than those under the null model. Shown in the right panel, the estimated posterior probabilities of alternative Pr(Ji = 1|data) under the alternative model are also larger than those under the null model in general. These findings further confirm that the proposed Bayesian model performs well for the simulated data.
|
|
To examine the performance of the proposed model and the model checking procedure when the data depart strongly from our model assumptions, we conducted a second simulation study. We generated 20 expression levels for each of 2000 genes. We assumed that 10 expression levels were measured under one condition and the other 10 were measured under the second condition. Assuming that 400 genes were differentially expressed, for each of these genes we generated 10 expressions from N(3, 1) and 10 from N(0, 1). For the remaining 1600 genes, we assumed that they were not differentially expressed and that all of their 20 expression levels followed N(0,1). We computed the F-statistic as the squared t-statistic for each gene. Apparently, the distribution of these F-statistics would not follow our proposed models in (4) and (5). We applied our method using the generated F-statistics. As expected, the model fit poorly as shown in Figure 5. The qq-plots show severe lack of fit under the alternative model. In practice, we would not know the true models that generated the data. Therefore, we would rely on the qq-plots to examine the goodness of fit based on our proposed models. So despite the poor fit of our models for this simulation, it is reassuring that the qq-plots informed us that the model did not fit well.
|
Based on the simulation results and our experiences in applying the proposed method to other practical data, we find that (1) the proposed model may not fit well when the model assumptions are violated, especially when the prior distributions are not properly specified, and (2) when there is a lack of fit, the proposed model checking procedure will exhibit a large departure from the 45° line. Usually, the lack of fit can be eliminated by tuning (often shrinking) the prior variance of (1 +
0). We demonstrate in the next section that the tuning is quite simple to perform and takes very little time due to the fast computation of the MCMC algorithm. | 6 EXAMPLES |
|---|
|
|
|---|
6.1 siRNA screening experiment
We consider an ongoing siRNA screening experiment conducted at M. D. Anderson Cancer Center. The purpose of this experiment was to identify kinases responsible for signaling defects that result in cancer-inducing pathways. In order to identify the target kinases that contribute to cancer initiation and progression, a kinase siRNA library was screened with HeLa cells by a fluorescence-based cell viability assay. The library contained pooled siRNAs targeting m = 779 protein kinases involved in multiple cellular pathways and was organized in 96-well plates. For siRNA transfection, 6000 HeLa cells were added to each well containing 50 nM siRNAs complexed with 0.15 l of lipid-based transfection reagent. Triplicate plates were performed. By comparing measurements from this assay to values under control siRNAs, we aimed to identify subsets of siRNAs that have a strong silencing effect, i.e. it significantly reduce cell viability.
The assay yielded three intensity measurements of cell viability under each of the investigational siRNAs and control siRNAs. The control siRNAs were not expected to have any effects on cell viability. We computed pooled two-sample t-statistics using observed intensities of the cell viability for each targeted siRNA and control. We then obtained the F-statistics by squaring the t-statistics. Since three values were recorded for each siRNA and control, the theoretical null was the F-distribution on 1 and 4 degrees. Note that we did not implement the
2 model because the asymptotic assumption would not be valid with only three replicates per siRNA.
We implemented the proposed Bayesian model using the same priors for
1 and
as those in the simulation study. We first used the gamma prior g(1,1) for (1 +
0), which was also used in the simulation. We found some lack of fit under this prior. Specifically, the estimated posterior distribution for (1 +
0) was very similar to that of (1 +
1), indicating some non-identifiability betweenthe two parameters. To solve this issue, we tried the gamma prior g(10, 0.1) for (1 +
0), with prior mean 1 and 0.4. We retained the priors for the remaining parameters and reimplemented our model. The model fit very well under the new priors as indicated by Figure 6. The posterior mean of (1 +
0) equaled 1.73, which was quite different from its prior mean. The posterior mean of (1 +
1) was 5.5 and the posterior mean of
equaled 0.61, which indicated that a large proportion of the F-statistics were considered drawn from the alternative model. To confirm this, we plotted in Figure 2 the two densities under the null and alternative models,
and
, where
and
represented posterior means. It can be seen that the histogram of the F-statistics agrees with the alternative density much better than it agrees with the null density.
|
We computed the posterior probability of the null hypothesis that there is no difference in cell viability between the target siRNA and the control, and ranked the target siRNA based on the ascending values of the posterior probabilities. Applying the Bayesian FDR analysis, we obtained 5, 38 and 260 siRNAs that significantly reduced cell viability when the FDR was estimated at the level of 8, 10 and 20% level. A list of top siRNAs is currently being investigated by the investigators.
6.2 Colon cancer data
We applied our method to a colon cancer data set described in Alon et al. (1999), who used Affymetrix arrays to compare the gene expression from 40 colon cancer samples against expression levels obtained from 22 normal colon samples. Following Alon et al., we selected a subset of 2000 genes with highest minimal intensity across samples for further analysis. We performed both
2-tests and F-tests comparing gene expression levels under cancer samples and normal samples. We first present the results of
2-tests below.
For each of the 2000 genes, we performed the Mann–Whitney test and converted the resulting U-statistics to
2-statistics. We used the same priors as those for the siRNA screening data. The posterior means of (1 +
0) and (1 +
1) were 1.04 and 3.75, respectively. Interestingly, McLachlan et al. found that for the same data, the theoretical null performed better than the empirical null under their proposed normal models. Here, we have the posterior mean
indicating that the empirical null
is also close to the theoretical null
distribution. Also similar to McLachlan et al. (2006), we found that a large number of genes were differentially expressed. For example, McLachlan et al. found 433 differentially expressed genes with an estimated FDR of 3%, while our Bayesian FDR procedure indicated 509 genes with differential expression at the same FDR value.
We next applied the F-statistic model by squaring the pooled t-statistics. The resulting theoretical null was the F-distribution on 1 and 60 degrees of freedom. We assumed the same prior densities as those in the
2-statistic model. Similar results were obtained as described above.
We performed model-checking for both the
2 model and the F model. The qq-plots, presented in the Supplementary Materials, indicate that both models fit well.
| 7 DISCUSSION |
|---|
|
|
|---|
We have proposed a Bayesian method for testing multiple hypotheses. The method is based on modeling the distribution of test statistics directly, and therefore simplifies the process of obtaining posterior model probabilities. The procedure provides an easily accessible method for researchers to perform Bayesian inference and obtain posterior probabilities that can be used in Bayesian FDR and related decision-theoretic selection procedures. In addition, we have drastically reduced the computational burden associated with the implementation of the Bayesian paradigm in this high-dimensional setting.
Another new component of the proposed method is the application of the model-checking procedure in Johnson (2007) to the problem of multiple testing. Using this procedure, one can know when the proposed models fit to the data and when they do not. This feature is lacking in most published works on multiple testing.
In our models, we used a gamma distribution for (1 +
0) and an inverse gamma distribution for
1. Since the tails of gamma and inverse gamma are roughly exp(– x) and exp(– 1/x), the gamma distribution is more appropriate for parameters that are expected to be close to zero, e.g.
0, and the inverse gamma distribution is more appropriate for parameters that are expected to be away from zero, e.g.
1.
In order to implement our proposed method for a given data set, prior specification is needed for three parameters, (1 +
0),
1 and
. We recommend using the beta prior b(1, 5.58) for
as suggested by Scott and Berger (2005). We use an inverse gamma prior ig(3, 5) for
1 with equal mean and standard deviation, which is considered to be reasonably vague (e.g. Carlin and Louis, p. 326). Attention needs to be paid to the prior specification for (1 +
0). We recommend centering the gamma prior of the (1 +
0) at 1 so that the mean of the empirical null and theoretical null is the same a priori. In the simulation studies, we set the prior variance of (1 +
0) to be 1 as well, resulting in a vague prior. The model fit quite well with no apparent lack of fit. In the two practical examples, the gamma prior g(1, 1) was too vague and did not fit the data well. We then reduced the prior variance while keeping the prior mean at 1, and the model fit well due to the smaller variance. In summary, with our recommended priors for
1 and
, one only needs to tune the gamma prior of (1 +
0) when using the proposed method. An appropriate model with a good fit can be achieved by keeping the prior mean of (1 +
0) at 1 and varying its prior variance. Due to the fast computation and the intuitive model-checking qq-plot, this can be easily performed in practice.
The proposed method produces a list of posterior probabilities of null hypotheses. These probabilities can then be used in conjunction with a variety of thresholding procedures. In our examples, we used these values to implement a Bayesian FDR procedure. As noted by Müller et al. (2007) for a general class of Bayesian testing procedures, our procedure for computing posterior model probabilities automatically accounts for multiplicity. That is, posterior probabilities of null hypotheses are automatically adjusted according to the number of estimated alternatives. Practically speaking, this means that we can threshold the posterior probabilities of the null hypotheses (1 – ri) directly and select genes with probabilities smaller than the specified threshold.
Future research in this direction will involve relaxing the distributional assumptions made regarding both null and alternative distributions, and extending this approach to the study of more decision-theoretic multiple testing criteria.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Chris Stoeckert
Received on September 24, 2007; revised on December 3, 2007; accepted on January 29, 2008
| REFERENCES |
|---|
|
|
|---|
Alon U, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl Acad. Sci, ( (1999) ) 96, : 6745–6750.
Berry D, Hohberg Y. Bayesian perspectives on multiple comparisons. J. Stat. Planning Inf, ( (1999) ) 82, : 215–277.[CrossRef].
Carlin B, Louis T. Bayes and Empirical Bayes Methods for Data Analysis. 2nd edn., ( (2000) ) New York: Chapman & Hall..
Efron B, et al. Empirical Bayes Analysis of a Microarray Experiment. J. Am. Stat. Assoc, ( (2001) ) 96, : 1151–1160.[CrossRef][ISI].
Efron B. Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis. J. Am. Stat. Assoc, ( (2004) ) 99, : 96–104.[CrossRef][ISI].
Efron B. Correlation and Large-Scale Simultaneous Significance Testing. J. Am. Stat. Assoc, ( (2007) ) 102, : 93–103.[CrossRef][ISI].
Hedenfalk I, et al. Gene-expression profiles in hereditary breast cancer. New Engl. J. Med, ( (2001) ) 344, : 539–548.
Johnson V. Bayes factors based on test statistics. J. R. Stat. Soc. B, ( (2005) ) 67, : 689–701.[CrossRef].
Johnson V. Bayesian Model Assessment Using Pivotal Quantities. Bayesian Anal, ( (2007) ) 2, : 1–15..
McLachlan G, et al. A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays. Bioinformatics, ( (2006) ) 22, : 1616–1622.
Müller P, et al. Optimal sample size for multiple testing: The case of gene expression micorarrays. J. Am. Stat. Assoc, ( (2004) ) 99, : 990–1001.[CrossRef][ISI].
Müller P, et al. FDR and Bayesian multiple comparisons rules. In: Bayesian Statistics, Vol. 8, —Bernardo J, et al, eds. ( (2007) ) Oxford: Oxford University Press..
Newton M, et al. Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics, ( (2004) ) 5, : 155–176.[Abstract].
Scott J, Berger J. An exploration of aspects of Bayesian multiple testing. J. Stat. Planning Inf, ( (2006) ) 136, : 2144–2162.[CrossRef].
van't Wout A, et al. Cellular gene expression upon human immunodeficiency virus type 1 infection of CD$+ T-Cell lines. J. Virol, ( (2003) ) 77, : 1392–1402.[CrossRef][ISI][Medline].
West M. Bayesian factor regression models in the "large p, small n" paradigm. In: Bayesian Statistics, Vol. 7, —Bernardo J, et al, eds. ( (2003) ) Oxford: Oxford University Press..
Yuan Y, Johnson V. Bayesian Hypothesis Tests Using Nonparametric Statistics, ( (2006) ) Department of Biostatistics and Applied Mathematics, UT MD Anderson Cancer Center, Working Paper Series. Working Paper 21..
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






