Bioinformatics Advance Access originally published online on April 27, 2006
Bioinformatics 2006 22(14):1730-1736; doi:10.1093/bioinformatics/btl161
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
How accurately can we control the FDR in analyzing microarray data?
1 Department of Biostatistics and Bioinformatics, Duke University NC 27710, USA
2 Institute of Statistics and Decision Sciences, Duke University NC 27705, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: We want to evaluate the performance of two FDR-based multiple testing procedures by Benjamini and Hochberg (1995, J. R. Stat. Soc. Ser. B, 57, 289300) and Storey (2002, J. R. Stat. Soc. Ser. B, 64, 479498) in analyzing real microarray data. These procedures commonly require independence or weak dependence of the test statistics. However, expression levels of different genes from each array are usually correlated due to coexpressing genes and various sources of errors from experiment-specific and subject-specific conditions that are not adjusted for in data analysis. Because of high dimensionality of microarray data, it is usually impossible to check whether the weak dependence condition is met for a given dataset or not. We propose to generate a large number of test statistics from a simulation model which has asymptotically (in terms of the number of arrays) the same correlation structure as the test statistics that will be calculated from the given data and to investigate how accurately the FDR-based testing procedures control the FDR on the simulated data. Our approach is to directly check the performance of these procedures for a given dataset, rather than to check the weak dependency requirement. We illustrate the proposed method with real microarray datasets, one where the clinical endpoint is disease group and another where it is survival.
Contact: jung0005{at}mc.duke.edu
| 1 INTRODUCTION |
|---|
|
|
|---|
Microarrays are high throughput technology measuring the expression levels of a large number of genes simultaneously. Discovering the genes whose expression levels are associated with a clinical endpoint, such as disease type or survival, involves a serious multiple testing problem. Suppose that, for j = 1, ... , m, we want to test the null hypothesis
![]() |
![]() |
Some investigators consider controlling the probability of one false rejection out of thousands of genes to be too strict and advocate an FDR-based multiple testing instead. Suppose that there are m genes among which the null hypotheses, Hj, are true for m0 genes, called non-prognostic genes, and the alternative hypotheses,
, are true for m1(= m m0) genes, called prognostic genes. Based on the calculated p-values, we may reject or accept the null hypotheses. Let R denote the number of genes for which the null hypotheses are rejected (discovery), and R0 denote, among these R genes, the number of genes that the null hypotheses are true (false discovery). Then, the FDR is defined as the expected value of the proportion R0/R.
Benjamini and Hochberg (1995) propose a step-up procedure to control the FDR. They prove that this procedure conservatively controls the FDR if the test statistics for the m0 non-prognostic genes are independent. The conservativeness becomes more serious as the proportion of prognostic genes m1/m increases. Benjamini and Yekutieli (2001) loosen the independence assumption among m0 non-prognostic genes to a weak dependence assumption called positive regression.
Pointing out the conservativeness of the Benjamini and Hochberg's procedure, Storey (2002) proposes a procedure that is considered to be more accurately controlling the FDR when m is large as in most microarray data. Under the independence assumption on m genes, he shows that the FDR is approximated by
![]() |
. He later loosens the independence assumption for a weak dependence assumption among whole m genes (Storey, 2003) or among m0 non-prognostic genes (Storey and Tibshirani, 2001). Jung (2005) derives a sample size formula for the Storey's procedure under the weak dependence assumption among m genes. For an accurate control of the FDR, we need the joint distribution of (R0, R), which will be decided by the joint distribution of the test statistics, under the true effect size for each gene and the dependency among m test statistics. The aforementioned FDR-based procedures have been shown to be reliably controlling the FDR through simulations based on the independence or weak dependence assumptions. However, because of high-dimensionality of microarray data and complexity of the real correlation structure, it is almost impossible to check whether the weak dependence condition holds for a given dataset or not. As an approach to tackling this difficulty, we propose to generate a large number of random vectors from the asymptotic distribution of the test statistics, to apply an FDR-based procedure and to see how accurately the procedure controls the FDR on the simulated data.
When a new dataset is given, we calculate the effect size (mean and standard deviation in case of a two-sample t-test) of each gene. For a chosen m1 value within a range of interest, we identify the top m1 genes in terms of effect size. The simulation is modeled as follows. We specify the effect sizes of these m1 genes by the observed ones from the data. For the remaining m0 = m m1 genes, the effect sizes are set at 0. Lin (2005) proposes a simulation algorithm to generate the null distribution of correlated test statistics conditioning on given data. By modifying his algorithm, we generate a large number of test statistics under the observed effect sizes and correlation structure. For a nominal FDR level to be chosen for analysis, we apply an FDR-based multiple testing procedure to each simulation sample and count the numbers of false and total discoveries. The true FDR is estimated by the empirical average of their proportions through a large number of simulations. By comparing the empirical FDR with the nominal one, we can decide how accurate the data analysis will be. Although the simulated test statistics have the same covariances as those observed from the data, the simulation scheme does not require the preliminary estimation of the covariance matrix. Our method is suggested when a sufficient number of arrays are available. The proposed method is applied to the real datasets by Golub et al. (1999) and Beer et al. (2002).
| 2 FDR-BASED MULTIPLE TESTING PROCEDURES |
|---|
|
|
|---|
We assume that, for large number of arrays, the distribution of the test statistics is approximated by a multivariate normal distribution. Let p1, ... , pm denote the p-values for m genes that are calculated by a resampling method or the theoretical distribution of the test statistics. In this paper, we obtain them from the standard normal distribution based on the large sample approximation. Let p(1)


p(m) denote the ordered p-values for m genes, and H(j) for j = 1, ... , m the corresponding null hypotheses. Benjamini and Hochberg (1995) propose to reject H(j) for all j
J = max{j : p(j)
jq*/m}. They prove that this procedure controls the FDR below q* if the m0 non-prognostic genes are independent.
Suppose that we reject Hj if pj <
. Storey (2002), under independence or a weak dependence among m0 null genes, claims that
![]() |
with the error term ignored. Here, m1 op(m)
0 in probability as m
. Hence, with R replaced by the observed value r, we have
![]() | (1) |
, estimation of FDR(
) requires estimation of m0 only. Storey (2002) proposes to estimate m0 by
![]() |
away from 0, such as 0.5. By combining this estimator with (1), we obtain
![]() |

pj
. Jung (2005) shows that this formula is simplified to
in a two-sample testing case. This procedure is implemented by a computer package called SAM (Significance Analysis of Microarrays) (Tusher et al., 2001). This procedure has been shown to reliably control the FDR under independence (Storey, 2002) or a weak dependence (Storey, 2003), such as block compound symmetry, by simulations. Given a real dataset, however, it is difficult to check whether the weak dependence requirement holds or not. In this paper, we propose a direct approach for evaluation of the FDR-based multiple testing methods in real data analysis. This approach uses a simulation algorithm by Lin (2005) modified for estimation of the true FDR.
| 3 A SIMULATION-BASED METHOD |
|---|
|
|
|---|
In this section, we propose a simulation method to generate a large number of test statistics whose correlation structure, given the data, is asymptotically identical to that of the test statistics to be calculated from the given dataset. There are two versions of large sample approximations in this paper: one with respect to large m and the other with respect to large n. All asymptotic results here and after are with respect to large n.
3.1 Two-sample t-test case: when the clinical outcome is dichotomous
Let xkij denote the expression level of gene j(= 1, ... , m) from subject i(= 1, ... , nk) in disease group k(= 1,2). We assume that {(xki1, ... , xkim), 1
i
nk} are independent and identically distributed (IID) random vectors from a multivariate distribution with means (µk1, ... , µkm), variances
and correlation matrix
= (
jj')1
j,j'
m. Let
and
denote the sample means and pooled variances, respectively, estimated from the data. If the sample sizes are large, then the test statistics
![]() |
![]() |
.
We want to generate random vectors with asymptotically the same distribution as (W1, ... , Wm) using a simulation method. Let (
ki, 1
i
nk, k = 1,2) be IID N(0,1) random variables, which are independent of the dataset, and
with
![]() |
![]() |
![]() |
i
nk, k = 1,2}, each
is a weighted average of the independent normal random variables. It follows that, given the data,
is normal with means
![]() |
which are consistent estimators of
j for j = 1, ... , m and
. Hence, the conditional joint distribution of
given the data is asymptotically identical to the unconditional joint distribution of (W1, ... , Wm). See Appendix for a proof with general test statistics. Note that our simulation method requires calculation of sample means and variances from data, but not the correlation coefficients.
The above procedure is based on an equal variance-covariance assumption in two groups. However, some genes may possibly have unequal variances and covariances between the two groups even under Hj : µ1j = µ2j. In this case, the null distribution of the t-test statistics based on equal variance-covariance assumption may not be approximated by the standard normal distribution. If the equal variance-covariance assumption is questionable, we can use the same simulation method with
replaced by
in the denominators of Wj,
and
, where
are sample variances for group k(= 1,2). In this case, the effect sizes will be expressed as
![]() |
is the population variance for gene j(= 1, ... , m) in group k(= 1,2).
3.2 Cases for cox regression and general test statistics
A large family of test statistics for Hj :
j =
j0 versus Hj :
j
j0 can be written as (U1, ... , Um) = {U1(
10), ... , Um(
m0)}, where
![]() |
j0) can be expressed as a function of the data from subject i only, so that U1j, ... , Unj are independent. Let µij(
j) = E(Uij) and
. If E{Uj(
)} is a smooth function and E{Uj(
)} = 0 has a unique solution, then the solution
to Uj(
) = 0 is consistent to
j. Further, by the central limit theorem, (U1, ... , Um) is approximately normal with means µj(
j) and covariances
jj' that can be consistently estimated by
and |
|
. Let
and
.
For IID N(0,1) random variables
1, ... ,
n that are independent of the data, let
|
| (2) |
and
. By Appendix, the conditional joint distribution of
given the data is asymptotically identical to the unconditional joint distribution of (W1, ... , Wm). Note that the standardized effect sizes of test statistics
are
.
The two-sample t-test case discussed in Section 3.1 is a specific example of this formulation. As another example, we consider the cases to discover the genes associated with a survival endpoint. Let, for subject i(= 1, ... , n), Ti denote the time to an event, such as tumor recurrence or death, and (zi1, ... , zim) denote the gene expression data from m genes. Survival time may be censored due to loss to follow-up or study completion, so that we observe Xi = min(Ti, Ci) together with a censoring indicator
i = I(Ti
Ci), where Ci is the censoring time that is assumed to be independent of Ti given the gene expression data (zi1, ... , zim). A given dataset will be expressed as {(Xi,
i, zi1, ... , zim),1
i
n}. Let Yi(t) = I(Xi
t) and Ni(t) =
iI(Xi
t) be the at-risk and event processes for patient i, respectively, and let
and
.
Suppose that, for subject i, zij is related to the hazard rate by
![]() | (3) |
j0(t) is the unknown baseline hazard specific to gene j (Cox, 1972). The hypotheses are expressed as Hj :
j = 0 versus
. The partial MLE,
, solves the partial score function
, where
![]() |
![]() |
|
|
j0(t) =
j0(t)dt and
![]() |
and variance-covariances
jj' that can be consistently estimated by
and |
|
. Note that µj = 0 under Hj.
The regression model (3) may not hold for some genes. By Lin and Wei (1989), whether model (3) is valid or not, the score statistic Uj is a meaningful measure of association between zij and Ti, and the test statistic
is asymptotically N(0,1) under Hj. For robust testing against potential outliers in gene expression data, Jung et al. (2005) propose to use the rank of zij among gene j observations, z1j, ... , znj, as the covariate of model (3), rather than the raw expression level.
For IID N(0,1) random variables
1, ... ,
n which are independent of the data, let
![]() |
. Then, the conditional joint distribution of
given the data is asymptotically identical to the unconditional joint distribution of (W1, ... , Wm). | 4 NUMERICAL STUDIES |
|---|
|
|
|---|
In this section, we take real microarray data and demonstrate our simulation-based method discussed in the previous section to check how well the FDR-control procedures will work under the dependency embedded in a given dataset.
An accurate estimation of the FDR, to be discussed below, requires identification of the prognostic genes and their effect sizes in addition to the correlation structure among the genes. While the second part in the right-hand side of (2),
, is to approximate the correlation among m test statistics, the first part,
, is a function of their effect sizes. For an accurate calculation of the FDR, we need to know which genes are prognostic. Since none of the observed
would be exactly 0, we first have to specify m1 and identify m1 genes with large effect sizes as follows. Given a dataset, we first calculate
(
in two-sample t-test case) and sj. For a chosen m1, we identify the genes with top m1 effect sizes in absolute value. The effect sizes for these m1 genes will be set at the observed absolute values
, but those for the remaining m0 = m m1 genes will be set at
j = 0. By the same arguments as in Appendix, the change in effect sizes does not change the correlation structure of the test statistics. In order to simplify the procedure, we take the absolute values of the effect sizes and use one-sided tests. We observe similar results by using two-sided tests and the raw effect sizes.
Now, we generate a large number of sets (say, B = 4,000) of the test statistics
, where
|
|
denote the set of m test statistics from the b-th (b = 1, ... , B) simulation. Also, let
![]() |
is the 100(1
)-th percentile of N(0, 1)distribution. For a large B, the true FDR is approximated by
![]() |
=
* by solving
using the bisection method, and reject all genes with p-values smaller than
*. We call this procedure exact method since it exactly controls the empirical FDR from the simulations based on the true parameter values. We estimate the average true rejections and false rejections by
![]() |
![]() |
and
will be compared with those by the Storey's method, called SAM, and the method by Benjamini and Hochberg (1995), called BH, that are discussed in Section 2. The exact method uses a common critical value for m p-values like SAM, so that it can be used as a gold standard for SAM. Note that the exact method cannot be used in a real data analysis because we do not know which genes are prognostic. We can replace z1
with a resampling-based quantile. However, we use the theoretical standard normal quantile based on the asymptotic normality.
SAM and BH are applied to each simulated set of test statistics to control the FDR level at q*, calculate r0b and rb, and estimate the FDR level, at which these procedures are really controlling, as the average of r0b/rb,
![]() |
is expected to be close to q*.
4.1 An example for two-sample t-tests
An example dataset is taken from the golubTrain object in golubEsets package (version 1.0.1) in Bioconductor release 1.7. (Gentleman et al., 2004). Golub et al. (1999) explore m = 6810 genes extracted from bone marrow in 38 patients, of which n1 = 27 with acute lymphoblastic leukemia (ALL) and n2 = 11 with acute myeloid leukemia (AML), in order to identify the genes with potential clinical heterogeneity being differentially expressed in the two subclasses of leukemia. Genes useful to distinguish ALL from AML may provide insight into cancer pathogenesis and patient treatment. We conduct simulation studies for m1 = 20, 60, 100, 400, 1000 and 2500, and q* = 0.5, 0.4, 0.3, 0.2, 0.1, 0.05 and 0.01. Using
= 0.5, we obtain
(
) from the original data.
Table 1 reports the estimated FDR,
, number of true and false rejections,
and
, respectively, for SAM and BH methods. Only
and
are reported for the exact method since it controls the FDR accurately. BH is always more conservative than SAM, i.e.
. For example, when m1 = 20 and the nominal FDR is set at q* = 5% level, SAM and BH control the FDR at
and 8.09%, respectively. The ratio
increases in m1 as in independent case (Storey, 2002). With m1
400, SAM (BH) is conservative if q*
0.3(q*
0.2) and anti-conservative otherwise. SAM controls the FDR accurately when m1
100 and q*
0.3. So does BH when m1
400 and q*
0.2. However, the bias of SAM and BH in
is more serious with a smaller m1 value.
|
There exists a similar trend in
to that in
, i.e. with m1
400, both SAM and BH have smaller
than the exact method for q*
0.2 and larger
for q* < 0.2. SAM and BH have almost the same
for m1
400, but, with a larger m1, the former tends to have a larger
.
SAM always has higher false rejections,
, than the other two methods. The discrepancy is more noticeable with smaller m1. BH also has higher false rejections than the exact method when m1
400. With m1
1000, however, BH tends to have lower
than the exact method except for a small q* such as 1 or 5%.
Figure 1 reports the distribution of R0/R observed from the 4000 simulations for q* = 0.5 or 0.01 with m1 fixed at 20. When q* = 0.5, R0/R is highly distributed around 0 and 1. When q* = 0.01, R0/R has a large density around 0 and is widely distributed in the rest of the range. Note that the horizontal axes of Figure 1c and d are rescaled using a square root transformation to show the distribution of R0/R around q* = 0.01 better. At each q* level, the distributions of R0/R for SAM and BH look almost the same, so that it is difficult to observe any difference in the amount of conservativeness or anti-conservativeness between the two procedures. Furthermore, the distributions are widely spread over the range of [0, 1], so that the figures do not clearly show the location shift of the distributions from the nominal q*.
|
4.2 An example with survival data
Beer et al. (2002) generated expression profiles of m = 4966 genes to discover the genes that can predict disease progression. The data include n = 86 stage I or III lung cancer patients, of whom 24 patients have disease progressions. By controlling the FWER at 10% level, Jung et al. (2005) discovered two genes whose expression levels are significantly associated with the time to progression. In this section, we consider the same test statistics standardized by the standard errors as described in Section 3.2. Simulations are conducted at similar settings to those in the previous example.
The simulation results are reported in Table 2. We observe that both SAM and BH conservatively control the FDR in the simulation settings. BH is slightly more conservative than SAM, but they become similar as m1 decreases. SAM controls the FDR very accurately for q*
0.1 which is the range of interest in usual data analyses. SAM becomes more accurate with a larger m1. Using
= 0.5, we obtain
(
) from the original data.
|
In all the simulation settings, the exact method has the largest
and BH has the smallest
, although the difference is small. When controlling the FDR at q* = 10% level, the exact method has only about
of true rejections for m1
60. The other two methods have lower true rejection rates.
When m1
200, the exact method has the smallest
and SAM has the largest
among the three methods. The discrepancy in
among the three methods becomes smaller as m1 increases and q* decreases.
| 5 DISCUSSION |
|---|
|
|
|---|
Benjamini and Hochberg (1995) and Benjamini and Yekutieli (2001) prove that their approach conservatively controls the FDR under independence and weak dependence of m0 test statistics for which null hypotheses are true. However, from the first example of Section 4, we found that the BH method can be anti-conservative in a general correlation combined with a small number of differentially expressing genes, m1, and a small nominal FDR level, q*. Storey et al. (2004) also claim that SAM procedure conservatively controls the FDR under weak dependency. From the first example, it is shown that SAM can be anti-conservative too (with small m1 and q*).
If an FDR-based procedure does not control the FDR accurately in a real data analysis, there may be two potential reasons: (1) the test statistics are heavily correlated or (2) the null distribution of each test statistic cannot be approximated by the standard normal distribution because the sample size is not large enough for the normal approximation of the test statistics. In order to avoid issue (2) and focus our discussion on (1), we generated
s from N(0, 1) distribution so that the simulated test statistics have exactly normal distributions. As mentioned in Appendix, however, if n is large enough,
s can be generated from any distribution with mean 0 and variance 1, such as
.
From simulations not reported in this paper, we observed that SAM closely estimates E{R0(
)}E{R1(
)} =
m0 E{R1(
)} rather than the FDR, E{R0(
)/R(
)}.
As mentioned previously, our procedure is to check the performance of the existing multiple testing methods for a given dataset, so that it is different from the simulation-based multiple testing procedures used for data analysis. However, a similar simulation method can be used to derive a testing procedure too, see Lin (2005). Given
(0, 1), van der Laan et al. (2004) and Lehmann and Romano (2005) propose a conservative procedure to control the false discovery proportion, defined as P(R0/R >
), at a certain level. Our simulation method can be easily modified to evaluate the conservativeness of their procedure for a given dataset.
We have discussed our procedure in terms of microarray data, but it can be used for any high dimensional data involving multiple testing using dependent test statistics, e.g. proteomic data. The simulation programs are written in Fortran 77. Uniform random numbers were generated using RAN2 subroutine of Press et al. (1980).
| APPENDIX |
|---|
|
|
|---|
It suffices to show that the conditional joint distribution of
given the data,
, is asymptotically identical to the unconditional joint distribution of (U1, ... , Um). As discussed in Section 3.2, (U1, ... , Um) is asymptotically normal with means and covariances that can be consistently estimated by
and
for 1
j, j'
m, respectively.
Given the data,
and
are constants, so that
for j = 1, ... , m are weighted sums of IID N(0, 1) random variables,
1, ... ,
n. Hence,
is normal with means
|
|
|
|
In fact,
i's can be generated from any distribution with mean 0 and variance 1, such as
. In this case, the conditional joint distribution of
given
is approximated by the same distribution by the central limit theorem.
| Acknowledgments |
|---|
The authors want to thank the two reviewers for their valuable comments.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: David Rocke
Received on December 5, 2005; revised on April 11, 2006; accepted on April 23, 2006
| REFERENCES |
|---|
|
|
|---|
Anderson, P.K. and Gill, R.D. (1982) Cox's regression model for counting processes: a large sample study. Ann. Stat, . 10, 11001120.
Beer, D.G., et al. (2002) Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat. Med, . 8, 816824[Web of Science][Medline].
Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B, 57, 289300.
Benjamini, Y. and Yekutieli, D. (2001) The control of the false discovery rate in multiple testing under dependency. Ann. Stat, . 29, 11651188[CrossRef].
Cox, D.R. (1972) Regression models and life-tables (with discussion). J. R. Stat. Soc. Ser. B, 34, 187220.
Gentleman, R.C., et al. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol, . 5, R80[CrossRef][Medline].
Golub, T.R., et al. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531537
Huang, Y., Xu, H., Calian, V., Hsu, J.C. (2005) To permute or not permute. Technical Report 756. , OH Department of Statistics, Ohio State University.
Jung, S.-H. (2005) Sample size for FDR-control in microarray data analysis. Bioinformatics, 21, 30973104
Jung, S.-H., et al. (2005) A multiple testing procedure to associate gene expression levels with survival. Stat. Med, . 21, 30973104.
Lehmann, E.L. and Romano, J.P. (2005) Generalizations of the familywise error rate. Ann. Stat, . 33, 11381154[CrossRef].
Lin, D.Y. (2005) An efficient Monte Carlo approach to assessing statistical significance in genomic studies. Bioinformatics, 21, 781787
Lin, D.Y. and Wei, L.J. (1989) The robust inference for the Cox proportional hazards model. J. Am. Stat. Assoc, . 84, 10741078[CrossRef][Web of Science].
Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T. Numerical Recipes: The Art of Scientific Computing, (1980) , NY Cambridge University Press.
Storey, J.D. (2002) A direct approach to false discovery rates under dependence. J. R. Stat. Soc. Ser. B, 64, 479498[CrossRef].
Storey, J.D. (2003) The positive false discovery rate: a Bayesian interpretation and the q-value. Ann. Stat, . 31, 20132035[CrossRef].
Storey, J.D., et al. (2004) Strong control, conservative point estimation, and simultaneous conservative consistency of false discovery rates: a unified approach. J. R. Stat. Soc. Ser. B, 66, 187205[CrossRef].
Storey, J.D. and Tibshirani, R. (2001) Estimating false discovery rates under dependence, with applications to DNA microarrays. , CA Technical Report 200128 Department of Statistics, Stanford University.
Tusher, V., et al. (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA, 98, 51165121
van der Laan, M.J., et al. (2004) Augmentation procedures for control of the generalized family-wise error rate and tail probabilities for the proportion of false positives. Stat. Appl. Gene. Mol. Biol, . 3, 15.
Westfall, P.H. and Young, S.S. Resampling-based Multiple Testing: Examples and Methods for P-value Adjustment, (1993) , NY Wiley.
This article has been cited by other articles:
![]() |
E. C. Kaizer, C. L. Glaser, D. Chaussabel, J. Banchereau, V. Pascual, and P. C. White Gene Expression in Peripheral Blood Mononuclear Cells from Children with Diabetes J. Clin. Endocrinol. Metab., September 1, 2007; 92(9): 3705 - 3711. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. S. Mehta, S. O. Zakharkin, G. L. Gadbury, and D. B. Allison Epistemological issues in omics and high-dimensional biology: give the people what they want Physiol Genomics, December 13, 2006; 28(1): 24 - 32. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



























