Bioinformatics Advance Access originally published online on August 9, 2005
Bioinformatics 2005 21(19):3771-3777; doi:10.1093/bioinformatics/bti604
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Two-stage designs for experiments with a large number of hypotheses
Section of Medical Statistics, Medical University of Vienna Spitalgasse 23, A-1090 Vienna, Austria
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Motivation: When a large number of hypotheses are investigated the false discovery rate (FDR) is commonly applied in gene expression analysis or gene association studies. Conventional single-stage designs may lack power due to low sample sizes for the individual hypotheses. We propose two-stage designs where the first stage is used to screen the promising hypotheses which are further investigated at the second stage with an increased sample size. A multiple test procedure based on sequential individual P-values is proposed to control the FDR for the case of independent normal distributions with known variance.
Results: The power of optimal two-stage designs is impressively larger than the power of the corresponding singlestage design with equal costs. Extensions to the case of unknown variances and correlated test statistics are investigated by simulations. Moreover, it is shown that the simple multiple test procedure using first stage data for screening purposes and deriving the test decisions only from second stage data is a very powerful option.
Availability: An R-program is available at http://www.meduniwien.ac.at/medstat/research/fdr/application.R
Contact: Martin.Posch{at}meduniwien.ac.at
Supplementary information: Supplementary data for this paper is available at Bioinformatics online.
| 1 INTRODUCTION |
|---|
|
|
|---|
In gene expression and gene association studies, typically a large number of hypotheses tests are performed, but only a small percentage is expected to show an effect. As there are many tests but only a small number of observations for each test, we are faced with serious multiplicity problems. A widely applied concept to deal with multiple testing situations is to control the family-wise type I error rate (FWE), the probability of at least one type I error among all hypotheses, i.e. to reject at least one true null hypothesis. However, in situations with a large number of hypotheses, the control of the FWE leads to conservative procedures with a low power to identify the few existing effects. A less conservative approach for the multiple testing problem is to control the false discovery rate (FDR), see e.g. Benjamini and Hochberg (1995). The FDR controls the expected proportion of type I errors among the rejected hypotheses. A number of recent articles deal with multiple testing in classical single-stage designs (Dudoit et al., 2003; Reiner et al., 2003) that control the FWE or the FDR. Given a fixed overall sample size, Futschik and Posch (2005) showed that efficiency (defined as the expected number of detected effects) can be gained by randomly selecting a smaller number of hypotheses such that more observations for each hypothesis are available.
In extension of the single-stage design two types of two-stage designs have been proposed. In the first approach, stage wise sample sizes for each hypothesis are preplanned. However, the second stage data is collected only for a limited number of hypotheses for which the first stage data showed promising effects. Thus, the total number of observations (across stages and hypotheses) is random. Following this idea, Miller et al. (2001) advocated a two-stage design for gene expression experiments. They propose to use the first stage data only for the selection of hypotheses. To control the FWE in the second stage for the selected hypotheses a Bonferroni test is performed using only the second stage data. Satagopan and Elston (2003) improved this procedure by using group sequential methods to incorporate the first stage data in the final Bonferroni test. Both approaches are very conservative as they rely on Bonferroni adjusted critical values. Very recently an FDR-controlling two-stage design using the concept of FDR for selecting at the first stage and for confirmation at the second stage has been proposed by Benjamini and Yekutieli (2005).
In the second type of two-stage designs it is assumed that the overall number of observations (or more generally total costs) is fixed. A certain fraction of these observations is spent in the first stage. The remaining observations are then distributed among the hypotheses selected for the second stage. In this approach, the second stage sample size for each hypothesis is random. Satagopan et al. (2002) applied this idea in the context of gene association studies. In their procedure only a small prefixed number of hypotheses, which are determined by the smallest univariate P-values, is rejected in the final test. This procedure neither controls the FWE nor the FDR.
Extending these approaches, we propose two-stage designs controlling the FDR, where, based on first stage data, hypotheses are selected for the second stage. In contrast to Satagopan et al. (2002) we do not select a prefixed number of hypotheses but all hypotheses whose univariate first stage P-values lie below a certain boundary. We also assume that there is a fixed total sample size (or, more generally, fixed total costs), which is split between the two stages, and the second stage sample size is divided among the selected hypotheses. Thus, there is a trade off between the number of selected hypotheses and the sample size for each hypothesis. The final test decision is based on data from both stages. In Section 2, the definition of the FDR and an estimator based on P-values proposed in the literature are outlined. In Section 3, appropriate P-values are defined for the two-stage test procedure. Here it is assumed that the observations are independently normally distributed between hypotheses with common, known variance. In Section 4 optimal two-stage procedures controlling the FDR and maximizing the expected number of correct rejections are derived. It is assumed that under the alternative the test statistics follow independent and identical normal distributions. It turns out that such two-stage designs provide a substantial advantage with regard to the probability of correct rejections as compared with the single-stage test procedure. In Section 5 several extensions are investigated. First, optimal designs are considered, when sampling costs differ between stages. Simulations are reported when the procedure is applied in the situation of unknown variances that may differ between the hypotheses. A further extension investigated is that the test statistics under the alternative, instead of having a single mean, have means arising from a gamma distribution. Finally, the statistical properties are simulated for correlated test statistics. In Section 6, a simple procedure is considered, where the first stage is only used for selecting the promising hypotheses to be investigated at the second stage. The FDR is estimated for the set of second stage hypotheses based on the second stage sample only. Section 7 gives some concluding remarks.
| 2 ESTIMATING THE FDR |
|---|
|
|
|---|
Consider a simultaneous test of m1 null hypotheses H0i, i = 1,.,m1. The FDR is defined as the expected fraction of erroneously rejected null hypotheses in all rejected null hypotheses. More formally, let R denote the number of rejected null hypotheses and V the number of erroneously rejected null hypotheses. Then the FDR is given by
![]() | (1) |
Assume that the m1 identical null hypotheses are tested by their P-values pi,i = 1,...,m1, at some level
. Storey (2002) proposes an estimator for the resulting FDR. First, the fraction
0 of true null hypotheses in all m1 hypotheses is estimated by
![]() | (2) |
is a constant chosen a priori and
{pi >
} denotes the number of P-values exceeding
. Note that increasing
reduces the bias of
at the cost of a higher variance. Now the estimator of the FDR is given by
![]() | (3) |
is determined such that
, using the P-values observed in the sample. | 3 TWO-STAGE DESIGNS |
|---|
|
|
|---|
3.1 The test problem
We consider m1 one-sided hypotheses for the mean of independent, normally distributed observations with known variance
2, assuming also independence across hypotheses. We test the hypotheses
![]() |
3.2 The test procedure
Assume there is an overall number N of available observations. In the first stage a fraction r of the N observations is distributed equally (up to round off errors) among the m1 hypotheses, the first stage sample size per hypothesis being n1 = rN/m1. Let
denote the standardized first stage mean of the observations for hypothesis i. Then the first stage P-values are given by
, i = 1,...m1, where
denotes the cumulative distribution function of the standard normal distribution. All null hypotheses i for which
are selected for the second stage. For all others, H0i is accepted. We denote the random number of selected hypotheses to be carried over to the second stage by m2 and the set of selected null hypotheses by i1,...,im2. At the second stage, the remaining (1 r)N observations are equally distributed among the selected m2 hypotheses. Thus, the second stage sample size for each selected hypothesis is
![]() |
(zi) denote the P-value from the pooled sample after the second stage. Then, for all i
i1,...,im2, H0i is rejected in the final test if pi
q
2, for some constant
2.
In the following we show how to choose
2 to control the FDR at some specified value. To this end, we reformulate the test procedure in terms of an overall P-value for each of the sequential two-stage tests. The crucial point is that the independent increment structure of group sequential designs is preserved in our situation, where the second stage sample size n2 is a random variable.
3.3 A P-value for the two-stage design
Let us first assume that n2 is deterministic. Then the local level of the two-stage test is given by
![]() | (4) |
denotes the (1
)-quantile and
(z) the density of the standard normal distribution, respectively.
In the two-stage test introduced above, the second stage sample size is a random variable. However, the conditional distribution of n2, given that the i-th hypothesis is selected (i.e.
), is independent of
. This follows from the assumption of independence of the observations across hypotheses. Hence, (4) gives also the level of the two-stage test if n2 is a random variable, as in the above two-stage procedure.
Now, an overall P-value for the group sequential two-stage test based on a monotonic ordering of the sample space as proposed in Tsiatis et al. (1984) is given by
![]() | (5) |
replaced by the observed Z-statistics in the total sample. With (4), every critical region
for the overall P-value in the total sample corresponds to a critical region psi
for the sequential P-value and vice versa. Additionally, the P-value psi is uniformly distributed under H0i. To see this, let
, 0
1, be fixed and
2 be the solution of (4). Then,
![]() |
3.4 Control of the false discovery rate
For given
1 and
2 the FDR can be estimated with the estimator (3), where the pi are replaced by the sequential P-values, psi. In the appendix in the supplementary data we show that the psi are independent across hypotheses such that the results of Storey (2002) on the consistency and conservativeness of the estimator of the FDR apply.
To control the FDR at a specified level
, we rewrite the estimate (3) and set
![]() | (6) |
as function of
2 is given by (4) and
is estimated by (2) with pi replaced by psi. Now, (6) is solved for
2. If all hypotheses i that have been selected for the second stage and for which pi
2 are rejected, the FDR is controlled (at least asymptotically) at the specified level. An R-program (R Development Core Team, 2005) to apply the procedure to a dataset is available at http://www.meduniwien.ac.at/medstat/research/fdr/application.R | 4 OPTIMAL TWO-STAGE PROCEDURES |
|---|
|
|
|---|
4.1 Asymptotically optimal designs
Given an FDR
, an initial number of hypotheses m1 and an overall number of observations N, the two-stage procedure involves two design parameters: the futility bound
1 and the fraction of observations to be spent in the first stage r, which determines the first stage sample size n1 = rN/m1. Thus, for a specified alternative we can optimize these parameters with respect to the power, defined as the probability to reject a null hypothesis given the alternative holds.
Assume that for all alternative hypotheses the same alternative µ =
>0 holds. Asymptotically, for a large number of hypotheses and up to round off errors,
and n2 = (1 r)N/m2. Thus, asymptotically the rejection boundary
for the P-values in the final analysis of the two-stage procedure is given by the solution of
![]() | (7) |
![]() | (8) |
2 is the solution of (4). Here,
and
are the cumulative distribution and density function of the normal distribution with mean µ and variance
2. Note that the individual power (8) is equal to the expected proportion of correct rejections in the set of alternative hypotheses. In the following, we refer to this quantity
s = 1 ß (
) as the power of the multiple test procedure.
Now, we optimize the objective function (8) in
1 and r, where
as a function of the targeted FDR
is implicitly defined by (7). It is easy to see that (8), (4) and (7), and consequently also the optimal
1 and r, depend on N, m1,
and
only via
.
Table 1 shows the optimal
1, r and the resulting power for several scenarios. For comparison also the power of the corresponding single-stage design (with N/m1 observations per hypothesis) is given. With increasing N, the optimal r and
1 increase slightly. Also for increasing proportions of true null hypotheses, r increases, whereas the Power
s and the optimal
1 decrease. In the considered scenarios 1015% of the m1 hypotheses are selected for the second stage and about 2/3 of all available observations are used in the first stage. E.g. for
/
= 1,N = 8m1 and
0 = 0.99, the optimal sample size in the first and second stage is n1 = 5.5 and n2 = 19.2, and the resulting number of selected hypotheses is m2 = 0.13m1. The optimal sample sizes are similar to those proposed in Satagopan et al. (2002), where the power to select a single (or predefined number of) true alternative(s) is maximized without controlling the type I error. The comparison of the two-stage power and the single-stage power shows very large advantages for the sequential design which slightly decrease with increasing N.
|
Since the optimal r and
1 depend on
/
and
0, which are typically unknown, we investigated the impact of misspecifications in the planning phase. Table 2 shows the power of a two-stage design planned under the assumptions N = 8m1,
0 = 0.99 and
/
= 1 under different scenarios. Comparison of the power of the optimal two-stage designs in Table 2 shows that the optimum appears to be flat.
|
4.2 Simulations
To assess finite sample properties we performed a simulation study using the asymptotically optimal design parameters. We modified the optimal r to get an integer first stage sample size n1, rounded to the smaller integer. In the second stage, the value of n2 is again rounded to the smaller integer not to exceed the predefined total N. The data was assumed to be independent normally distributed with known variance. The first section in Table 3 shows the results of the simulation study for N = 40 000, m1 = 5000,
/
= 1,
0 = 0.99, a targeted FDR of
= 0.05 and
= 0.5. In all scenarios the procedure controls the FDR well and is even slightly conservative. The average power for the simulations for integer values of n1 and n2 is only marginally lower than the analytically optimal power based on non-integer sample sizes. Boxplots for the FDR from the simulations are given in Figure 1a to describe the sampling distribution of the actual FDR. The distribution is skewed with occasional quite large FDR values.
|
|
| 5 EXTENSIONS |
|---|
|
|
|---|
5.1 Different sampling costs at the two stages
If the sampling costs vary between the two stages, the total costs are given by N = m1n1 + Cm2n2, for some constant C. Then for fixed total costs N, the second stage sample size for each hypothesis is given by n2 = (N m1 n1)/(Cm2). Also for this more general setup optimal parameters can be derived as in Section 4. For increasing values of C, less sample size is used for stage two. E.g. for N = 8m1,
0 = 0.99,
/
= 1 and C = 3, the optimal power is 0.719 with the parameters r = 0.737 and
1 = 0.041 compared with a power of 0.859 and parameters r = 0.674 and
1 = 0.138 for C = 1. Thus, while the optimal r, which corresponds to the proportion of total costs to be spent at the first stage, is
2/3 in both cases, the optimal design selects much fewer hypotheses for the second stage as C increases. Hence, if, e.g. C = 3 the total sample size used at the second stage is about one third of the sample size in the scenario with equal sampling costs. The optimal second stage sample size for each selected hypothesis is
2/3 of the sample size in the scenario with equal sampling costs.
5.2 The t-test
If the variance is unknown but the same for all hypotheses, the two stage test for the known variance case is still valid because of the large sample size used for the common variance estimate. However, if
2 differs between the hypotheses, this approximation is questionable. Since the exact computation of group sequential P-values is numerically difficult we use an approximation based on the P-values of the t-test from the first stage and the pooled sample, denoted by
and
, respectively. The level of the sequential t-test which rejects if
is then approximately given by (4) (Pocock, 1977). Thus, an approximate sequential P-value is given by
![]() |
2 leading to a specified FDR can be computed as in Section 3.4.
The performance of the approximations is assessed by simulations where the variances are estimated individually for each null hypothesis. The optimal parameters
1 and r for the corresponding asymptotic known variance case are used in the simulations, again using sample sizes rounded to the lower integer. As can be seen from Table 3, the FDR is well controlled at the specified value and slightly larger than in the known variance case. As expected, the t-test has lower power than the Z-test. Looking at the boxplots in Figure 1 (b) the distribution is very similar to the known variance case (a).
5.3 Distributed alternatives
Up to this point we assumed that all alternatives have the same mean effect. Now we investigate the procedure under the assumption that, given the alternative holds, the mean effect
/
is distributed according to a gamma distribution,
![]() |
, which leads to a mean of ab = 1. As shown in Table 3, the power is smaller than for identically distributed alternatives with the same mean effect size. The average FDR falls below the targeted 0.05. Figure 1 shows the distribution of the actual false discovery rates, which are very similar to the case of fixed alternatives.
5.4 Correlated test statistics
In many testing situations the test statistics are not independent across hypotheses. To investigate the influence of correlation, we assume an order among hypotheses and an autoregressive correlation structure. Hence, the correlation between hypotheses i and j is given by
|ij|, for some 
(0,1). The alternatives are randomly distributed among the sequence of hypotheses.
We assume that the variance is known and the hypotheses have identical marginal distributions under the alternative. It can be seen that the power hardly changes with increasing correlation (Table 3). For low correlation also the FDR is still controlled, which is in line with the result of Storey et al. (2004) on the asymptotic control of the FDR also under weak dependence. However, for correlations >0.6, the procedure becomes anti-conservative. The mean and median FDR are increasing in
. For very large correlation,
= 0.98, rejections of true null hypotheses are rare so that the median FDR is zero (Fig. 1). As expected, the variability of the actual FDR increases with increasing correlation. Only for large correlations the distribution gets a large variability and shows a mean above the targeted FDR.
5.5 The two-sided case
Two-sided tests can be constructed by applying two one-sided multiple tests simultaneously as has been proposed for group sequential designs (e.g. Jennison and Turnbull, 2000). For the two-stage approach, the upper one-sided sequential P-value is computed as in (5). The lower sequential P-value is calculated accordingly integrating over the region (
,c1
1] and replacing the expression in the bracket squares by
.
Then one can simply combine the 2m1 one-sided hypotheses into a single set of null hypotheses, using the one-sided P-values and proceed as defined in 3.2.
| 6 THE PILOT DESIGN: IGNORING THE FIRST STAGE DATA FOR THE TEST DECISION |
|---|
|
|
|---|
A simple alternative to the sequential two-stage design is to use the first stage only for the selection of the hypotheses to be continued to the second stage. Testing is performed only with the observations from the second stage. We apply the procedure defined in Section 2 to the second stage P-values
aiming at an FDR of
and estimate the FDR by
and
. Note that
is an estimate of the proportion of true null hypotheses among the hypotheses tested in the second stage. Then, the power of the whole experiment
p is the product of the first stage power,
, and the second stage power,
, since they are independent. Here,
, (i = i1,...im2), where
denotes the standardized mean from the second stage data for the hypothesis i. The selection boundary
1 is specified in the planning phase, whereas the rejection boundary
of the second stage is chosen such that the FDR from the second stage equals 0.05. Thus, asymptotically
![]() |
|
Finally, we investigate the robustness of the optimal two-stage and pilot study designs if the actual parameters deviate from the assumptions made in the planning phase. To demonstrate the differences, this time we look at the scenario with a smaller overall sample size, N = 4m1. Now, for the choice of the value r, only two scenarios are possible for the two-stage design to get reasonable integer sample sizes for the first stage, r = 0.5 with n1 = 2, and r = 0.75 with n1 = 3. We choose r = 0.75 and
1 = 0.1, which is nearly optimal for
0 = 0.99 and
/
= 1.2. We calculate the asymptotic power for the scenarios
0 = 0.98,
0 = 0.99 and
0 = 0.995, and
/
= 0.8,
/
= 1 and
/
= 1.2. As shown in Table 5, the pilot study has a lower power than the two-stage study, since it does not use the first stage data in the final test statistics. The relative difference in power increases as
/
deviates from the parameter value
/
= 1.2.
|
Obviously, the method of the pilot design can be directly applied to the unknown variance case by using the single-stage P-values of the t-test applied to the second stage data. Also the two-sided test can be easily treated by taking the corresponding two-sided second stage P-values.
Storey et al. (2004) recommend to apply a modified estimate of the FDR if the number of hypotheses to be tested is small. This might be the case in the pilot design if only a few hypotheses are selected for the second stage. The modified estimate is defined by
![]() | (9) |
![]() | (10) |
| 7 CONCLUSIONS |
|---|
|
|
|---|
This manuscript deals with situations where a large number of hypotheses are investigated applying sample sizes that are constrained by costs. Such situations arise, e.g. in gene association studies, where a large number of markers are tested. Instead of distributing the sample sizes over the hypotheses in a single-stage design, a two-stage design is considered. In the first stage, promising hypotheses are selected for further investigation at the second stage. A multiple testing procedure based on data from both stages is proposed to control the FDR.
Assuming independently and normally distributed observations with known variance, we derive optimal designs in terms of power. The optimal designs depend on the total sample size, the number of hypotheses to be investigated, the a-priori assumption on the proportion of true null hypotheses and a common effect size among the alternatives. The two-stage procedure shows striking superiority in terms of power as compared with the corresponding single-stage design, where the total number of observations is equally distributed among the hypotheses. The performance of the procedure does not substantially decrease if optimal designs are used based on wrong a-priori assumptions on the proportion of true hypotheses and the effect size under the alternative. The procedure also controls the FDR if the effect sizes under the alternative are not the same for all hypotheses but distributed according to some probability distribution. Even if the test statistics are moderately correlated across hypotheses, the procedure provides satisfactory control of the FDR. Note that the sequential P-values can be used also for the control of the FWE rate applying Bonferroni adjusted critical boundaries. If observations across hypotheses are independent we proved that also the sequential P-values corresponding to the true null hypotheses are independent. Thus, all procedures based on independent univariate P-values can be applied.
Even the pilot design, where the first stage data is used only to select promising hypotheses which are tested at the second stage using only second stage data, leads to a strong improvement in power compared with the single-stage design. If, due to misspecifications in the planning phase, non-optimal designs are applied, the two-stage design appears to be more robust in terms of power than the pilot design. However, the pilot design is simpler and can be easily applied in all types of testing situations. Even different measurement procedures can be applied to the reduced set of hypotheses at the second stage.
Using two-stage designs will generally change the data structure of experiments: very few observations for very many questions, large sample sizes for few questions selected after screening. The tendency is to spend a major part of the observations for screening purposes. Distributing the remaining resources on the promising questions carried over to the second stage will nevertheless result in substantial samples at the second stage when the proportion of selected null hypotheses will be small. This is the reason why it is difficult to find real data in this area, where the two step procedure could be demonstrated for a suitable subsample. For the practical application, e.g. in gene expression experiments, first stage investigations may be performed by standard devices. Building up the devices for the reduced set of selected hypotheses at the second stage may be costly. It has been shown that optimal designs can be constructed also for the situation that the costs per observation differ between stages. Hence, in such situations it may be checked whether simple single-stage designs may be preferable considering the large second stage costs.
The methodology proposed in this paper can be easily extended to the case where early rejections are allowed in the interim analysis and to the case of procedures with more than two stages.
| Acknowledgments |
|---|
The authors thank Werner Brannath for his advice regarding the proofs in the appendix in the supplementary data, Franz Koenig for helpful comments and the referees for their constructive criticism. This work was supported by the Austrian FWF-Fund no. P15853 [GenBank] .
Conflict of Interest: none declared.
Received on April 26, 2005; revised on July 1, 2005; accepted on July 28, 2005
| REFERENCES |
|---|
|
|
|---|
Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. B, 57, 289300.
Benjamini, Y. and Yekutieli, D. (2005) Quantitative trait loci analysis using the false discovery rate. Genetics, To appear.
Dudoit, S., et al. (2003) Multiple hypothesis testing in micorarray experiments. Stat. Sci., 18, 71103[CrossRef][ISI].
Futschik, A. and Posch, M. (2005) On the optimum number of hypotheses to test when the number of observations is limited. Stat. Sinica, 15, 841855.
Jennison, C. and Turnbull, B.W. Group sequential methods with applications to clinical trials, (2000) , Boca Raton Chapman & Hall/CRC.
Miller, R.A., et al. (2001) Interpretation, design, and analysis of gene array expression experiments. J. Gerontol. A-Biol., 56, 5257.
Pocock, S.J. (1977) Group sequential methods in the design and analysis of clinical trials. Biometrika, 64, 191199
R Development Core Team. (2005) R: a language and environment for statistical computing. R Foundation for Statistical Computing, , Vienna, Austria.
Reiner, A., et al. (2003) Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics, 19, 368375
Satagopan, J.M. and Elston, R.C. (2003) Optimal two-stage genotyping in population-based association studies. Genet. Epidemiol., 25, 149157[CrossRef][ISI][Medline].
Satagopan, J.M., et al. (2002) Two-stage designs for gene-disease association studies. Biometrics, 58, 163170[CrossRef][ISI][Medline].
Storey, J.D. (2002) A direct approach to false discovery rates. J. R. Statist. Soc. B, 64, 479498[CrossRef].
Storey, J.D., et al. (2004) Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. J. R. Statist. Soc. B, 66, 187205[CrossRef].
Tsiatis, A.A., et al. (1984) Exact confidence intervals following a group sequential test. Biometrics, 40, 797804[CrossRef][ISI][Medline].
This article has been cited by other articles:
![]() |
B. Moerkerke and E. Goetghebeur Optimal screening for promising genes in 2-stage designs Biostat., October 1, 2008; 9(4): 700 - 714. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Macgregor, Z. Z. Zhao, A. Henders, M. G. Nicholas, G. W. Montgomery, and P. M. Visscher Highly cost-efficient genome-wide association studies using DNA pools and dense SNP arrays Nucleic Acids Res., April 1, 2008; 36(6): e35 - e35. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Reiner-Benaim, D. Yekutieli, N. E. Letwin, G. I. Elmer, N. H. Lee, N. Kafkafi, and Y. Benjamini Associating quantitative behavioral traits with gene expression in the brain: searching for diamonds in the hay Bioinformatics, September 1, 2007; 23(17): 2239 - 2246. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Goll and P. Bauer Two-stage designs applying methods differing in costs Bioinformatics, June 15, 2007; 23(12): 1519 - 1526. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


















