Bioinformatics Advance Access originally published online on December 20, 2005
Bioinformatics 2006 22(5):556-565; doi:10.1093/bioinformatics/btk013
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Multidimensional local false discovery rate for microarray studies
1Department of Medical Epidemiology and Biostatistics, Karolinska Institutet 17177 Stockholm, Sweden
2Dipartimento di Scienze Biomediche e Biotecnologie, Università degli Studi di Brescia 11 25123 Brescia, Italy
3MRC Biostatistics Unit, Institute of Public Health Cambridge CB2 2SR, UK
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: The false discovery rate (fdr) is a key tool for statistical assessment of differential expression (DE) in microarray studies. Overall control of the fdr alone, however, is not sufficient to address the problem of genes with small variance, which generally suffer from a disproportionally high rate of false positives. It is desirable to have an fdr-controlling procedure that automatically accounts for gene variability.
Methods: We generalize the local fdr as a function of multiple statistics, combining a common test statistic for assessing DE with its standard error information. We use a non-parametric mixture model for DE and non-DE genes to describe the observed multi-dimensional statistics, and estimate the distribution for non-DE genes via the permutation method. We demonstrate this fdr2d approach for simulated and real microarray data.
Results: The fdr2d allows objective assessment of DE as a function of gene variability. We also show that the fdr2d performs better than commonly used modified test statistics.
Availability: An R-package OCplus containing functions for computing fdr2d() and other operating characteristics of microarray data is available at http://www.meb.ki.se/~yudpaw
Contact: alexander.ploner{at}meb.ki.se
| 1 INTRODUCTION |
|---|
|
|
|---|
The extreme multiplicity of genes in microarray data has generated a keen awareness of the problem of false discoveries. Consequently, the concept of false discovery rate (fdr, Benjamini and Hochberg, 1995) has seen rapid development and wide-spread application to microarray data (e.g. Storey and Tibshirani, 2003; Reiner et al., 2003; Pawitan et al., 2005a). In addition to the multiple comparison issue, it is also well-known that conventionally significant genes with very small standard errors are more likely to be false discoveries (e.g. Tusher et al., 2001; Lönnstedt and Speed, 2002; Smyth, 2004). In current approaches, separate considerationseither ad hoc or model-basedare necessary to deal with genes with small standard errors, typically leading to a modified test statistic. In this paper, we treat both problems simultaneously by first generalizing the local fdr for multiple statistics, and then using this new concept to enhance the classical t-statistic in two-sample comparison problems.
We adopt a conventional mixture framework where a statistic Z is used to test the validity of the null hypothesis of no differential expression (DE) for each gene individually. Assuming that a proportion
0 of genes is truly non-DE, the density function f of the observed statistics is a mixture of the form
![]() | (1) |
In this setting, the local false discovery rateabbreviated as fdr to distinguish it from the global FDR as suggested by Benjamini and Hochberg (1995)is defined as
![]() | (2) |
z are declared DE. Alternatively, it can be seen as the posterior probability of a gene being non-DE, given that Z = z.
The key idea of this paper is that Equations (1) and (2) are immediately meaningful for a k-dimensional statistic Z = (Z1, ..., Zk). Specifically in the two-dimensional (2D) case k = 2, we can use two different statistics Z1 and Z2 that capture different aspects of the information contained in the data. We will refer to
![]() | (3) |
We illustrate these concepts graphically for a simple simulated dataset of 10 000 genes and two groups of seven samples each, as described in detail in Section 2.4 (Scenario 1). This allows us to estimate all densities and fdrs involved directly from a second, much larger (107) set of simulations from the same underlying model, by keeping track of the known DE or non-DE status of each simulated gene, and counting the respective numbers in small intervals. This avoids the technical smoothing issues addressed in Section 2.1, and demonstrates the relevance of the fdr2d even in an idealized setting without genegene correlations or meanvariance relationships.
For the common problem of comparing gene expression between two groups, a standard test statistic underlying the univariate local fdr is the classical t-statistic
![]() |
![]() |
, sij and nj are the gene-wise group mean, standard deviation and sample size for gene i and group j = 1, 2. We present the estimated densities f(z) and f0(z) of the t-statistics for the simulated data in Figure 1a, and the corresponding univariate local fdr in Figure 1b.
|
Various extensions to 2D are possible, but for simplicity and to facilitate smoothing in real data situations, we include the standard error information directly in the fdr assessment by choosing
![]() |
0.05, whereas for genes with log-standard error < 1, an absolute t-statistic >5.5 is required to achieve the same fdr. Thus, without extra modelling, the fdr2d offers an objective way to discount information from genes with misleadingly small standard errors. To summarize our novel contributions, in this paper we introduce the concept of multi-dimensional local fdr, describe an estimation procedure for the 2D local fdr and demonstrate its application to simulated and real datasets; we show that without introducing any extra modeling or theoretical complexity, the fdr2d performs as well as or better than the previously suggested procedures in dealing with misleadingly small standard errors. Finally, we illustrate how the fdr2d can be transformed to make graphical representations like the volcano plot more objective.
| 2 MATERIALS AND METHODS |
|---|
|
|
|---|
2.1 Estimating the local fdr
The non-parametric approach to fdr requires some smoothing operation for estimating the ratio of densities in Equation (2). We will start by presenting a feasible smoothing procedure in this section and go on to discuss the estimation of
0 in the next section. Because of the practical difficulties of smoothing in higher dimensions, we will limit ourselves to the case k = 2. To be specific, we will consider the problem of two-independent-group comparison; more complex designs may require some modifications in the permutation step, but the smoothing problem is conceptually similar.
In the first step, we can generate the null distribution of Z by the permutation method, since under the null hypothesis of no difference, the group labels can be permuted without changing the distribution of the data (c.f. Efron et al., 2001). In order to allow general dependency between genes, each permutation is applied to all genes simultaneously. Let Z be the m x k matrix of observed Zs from m genes. Each permutation of group labels generates a new dataset and statistic matrix Z*. Let p be the number of permutations, so we have a series of
representing samples of Z under the null. In all of the examples in this paper we use p = 100 permutations.
In principle, we could estimate f(z) using non-parametric density smoothing of the observed Z, and similarly f0(z) using
, then compute the fdr2d(z) by simple division. This approach, however, has two inherent problems: (1) because of the different amount of data, different smoothing is required for f(z) and f0(z) and (2) optimal smoothing for the densities may not be optimal for the fdr. While these questions apply regardless of the dimension of the underlying statistic, we have found them to be more consequential in the 2D than in the 1D setting.
We have therefore decided to implement a procedure that requires only a single smoothing of the ratio
![]() |
![]() | (4) |
- Associate all the statistics generated under permutation with successes and the observed statistics with failures. Then r(z) is the proportion of successes as a function of z.
- Perform a non-parametric smoothing of the successfailure probability as a function of z.
is quite large, we pre-bin the data into small intervals and perform discrete smoothing of binomial data on the resulting grid. The smoothing is done using the mixed-model approach of Pawitan (2001, Section 18.10) as described below, and the resulting algorithm is fast despite the initial data size.
Let yij be the number of successes in the (i, j) location of the grid and Nij the corresponding number of points. By construction, yij is binomial with size Nij and probability rij, where the latter is a discretized version of r(z). We assume a link function h() such that
![]() |
ij satisfies the smoothness condition below. Given a smoothing parameter
, the smoothed estimate of rij is the minimizer of the penalized log-likelihood
![]() |
(k, l) means that (i, j) and (j, k) are primary neighbors in the 2D lattice. The estimate is computed using the iteratively weighted least-squares algorithm (Pawitan 2001, Section 18.10), which is very stable in this case. A practical complication arises from the presence of empty cells at the edges of the distribution. Naive treatment of these cells as having no event leads to serious bias. We reduce the bias by imputing single events to the empty cells, using the fact that genes with large t-statistics are likely to be regulated, and genes with t-statistics close to zero are likely to fall under the null hypothesis. We investigated in detail the logistic and identity link functions, and found the former to be more biased. Primarily, this is because the weights implied by the logistic-link function assign large influence to points in the center of the distribution; this creates a corresponding large bias in areas with low densities, which are precisely the region of interest for fdr assessment.
While it would be desirable to have an optimal and automatic smoothing parameter
, we have found that this is quite tricky to achieve. Although theory provides optimal smoothing for r(z), we face again the problem that what is optimal for r(z) is not necessarily optimal for fdr2d. Instead, we can compute the effective number of parameters for the smooth (Pawitan, 2001, Section 18.10), which is limited by the number of bins in the grid. This allows the user to specify the desired degree of smoothness as a percentage, with lower values indicating stronger smoothing. The averaging property of fdr2d (Section 3.1.1) can be used to check whether the choice of smoothing parameter has been suitable, see Section 3.3.
To summarize, in practice we use the identity link function h(rij) = rij, and the output of the smoothing step is a smooth estimate of r(z), evaluated at discrete points (i, j). The fdr2d is then computed using (4) and a suitable estimate of
0. In our experience, a relatively coarse grid on the order of 20 x 20 will usually suffice; for these grids, we have found that relatively mild smoothing, allowing 7080% of the number of bins as the effective number of parameters of the smooth, is generally sufficient.
2.2 Estimating
0
The estimation of the proportion of non-DE genes is a pervasive problem when computing fdrs (e.g. Storey and Tibshirani, 2003; Dalmasso et al., 2005; Pawitan et al., 2005b). We can obtain a conservative estimate by observing that, as a (conditional) probability, the fdr2d is bounded above by 1, so
![]() |
0
minz f(z)/f0(z), and we can take this upper bound as an estimate of
0 (Efron et al., 2001). In practice, we do not recommend this approach for fdr2d: even if we compute the bound only in the data-rich center of the distribution in order to account for possible instability in our estimate of f(z)/f0(z) at the edges, we find that the bound is too dependent on the smoothing parameter.
Instead we suggest that
0 be estimated from the fdr1d. In principle, any procedure to estimate
0 can be used. For example, this can be done via the upper-bound argument outlined above, which provides reasonably stable, though biased, estimates in the 1D case. A less biased estimate can be found using a mixture model for the t-statistics that we have described recently (Pawitan et al., 2005b). In this paper, we have used the upper-bound estimate for the simulated datawhich is sufficiently accurate hereand the mixture-based estimates for the real datasets.
2.3 Monotonicity
Conditional on the log standard error, we make the estimated fdr2d decreasing with the absolute size of the observed test statistic by replacing it with a cumulative minimum:
![]() |
![]() |
2.4 Simulation scenarios
We use five different simulation scenarios to demonstrate properties of the fdr2d. We assume 10 000 genes per array with a proportion of truly non-DE genes
0 = 0.8 throughout, and compare two independent groups with n = 7 arrays per group. For Scenarios 14, we further assume that the log-expression values are also normally distributed in each group. In the simplest case of Scenario 1, both the mean difference for the DE genes and the gene-wise variances for all genes are fixed at D = ±1 (with equal proportions) and
2 = 1, respectively (Pawitan et al., 2005a). In contrast, both mean differences and gene-wise variances are random in Scenarios 24, which are based on the model described in Smyth (2004): for each gene i, we simulate the variance from
![]() | (5) |
![]() |
and v0 = 2, and vary the degrees of freedom d0 to control the gene-wise variances:- Scenario 2 has d0 = 1000, which results in very similar variances across genes,
- Scenario 3 has d0 = 12, which generates moderate variability in the variances,
- Scenario 4 has d0 = 2, which leads to strong variability of the variances and consequently very large t-statistics.
|
In Scenario 5 finally we attempt to generate a realistic variance structure as present in real data. This is based on taking bootstrap samples from the BRCA dataset described in the following section: we take the original 3170 genes that were measured in two groups of n = 7 and n = 8 samples and subtract the corresponding group mean from the log-expression valued for each gene. The resulting n = 15 residuals are used as the plug-in estimate for the gene-wise error-distributions. We use the same bootstrap sample from the residuals for each gene, thereby preserving the correlations between the errors across genes. For genes randomly assigned to be DE, we simulate a log-fold change from N(1, 0.25) and add it to the observations in group 1. Hence the resulting dataset preserves to a large extent the distribution of variances and correlations in the underlying dataset.
2.5 Datasets
The BRCA data set was collected from patients suffering from hereditary breast cancer, who had mutations either of the BRCA1 gene (n = 7) or the BRCA2 gene (n = 8), as described in Hedenfalk et al. (2001). Expression was originally measured for 3226 genes, but following Storey and Tibshirani (2003), we removed 56 extremely volatile genes and analysed only the remaining 3170 genes. The mixture estimate
for this dataset is 0.61. (Note: the value
reported in Pawitan et al. (2005b) was based on the analysis of the raw scale, which is not as satisfactory as the log-scale used in the current paper).
We also applied our approach to the analysis of 240 cases of diffuse large B-cell lymphoma data described in Rosenwald et al. (2002), which was collected on a custom-made DNA microarray chip yielding measurements for 7399 probes. The average clinical followup for patients was 4.4 years, with 138 deaths occurring during this period. For this analysis, we ignored the censoring information and only compared 102 survivors with 138 non-survivors. The mixture estimate
for this dataset is 0.59 (Pawitan et al., 2005b).
2.6 Modified t-statistics
Previous attempts to overcome the small standard errors are based on an ad hoc fix by simply adding a constant to the observed standard error:
![]() |
as a function sei; for our computations, we used an R implementation of SAM based on the siggenes package by Holger Schwender.
Alternatively, Smyth (2004) developed a hierarchical model for the gene-wise variance according to (5) and derived an empirical Bayes estimate of the gene-wise variance
![]() |
are unknown parameters and d1 is the degrees of freedom associated with the sample variance
. This variance estimate is used to get a moderated t:
![]() |
as described by Smyth and implemented in his R package limma. | 3 RESULTS |
|---|
|
|
|---|
3.1 Simple properties of multidimensional fdr
The global FDR is the average of the local fdr, a useful relationship for characterizing a collection of genes declared DE by local methods. Suppose R is a rejection region such that all genes with (multi-dimensional) statistics z
R are called DE. The global FDR associated with genes in R is
![]() |
3.1.1 Multidimensional fdr averages to 1D fdr
This first property is easy to demonstrate for a 2D vector (z1, z2):
![]() |
3.1.2 Multidimensional fdr is invariant under transformation
This property is useful if we consider different but mathematically equivalent choices of test statistics. Given a certain fdr2dz(z), for any differentiable mapping u = g(z), the associated fdr is
![]() |
![]() |
While this is a pleasing quality in its own right, it has practical implications for the estimation and display of the fdr2d. Specifically, it allows us to use the most appropriate statistics for estimation and to transform the result into whatever scale desired. For example, to display the fdr2d as a function of the observed fold changes, we can choose
![]() |
![]() |
3.2 Simulation study
Figure 2a shows the scatter plot of 10 000 t-statistics and standard errors for a dataset simulated from Scenario 1 in Section 2.4, overlayed with the estimated fdr2d based on 107 genes simulated from the same model. For these extra simulations, we have kept track of the known DE status of each simulated gene, and count the proportion of false positives in each grid cell; no permutation or smoothing is involved, so the plot shows the true fdr2d function. In contrast, Figure 2b shows the fdr2d estimated from the dataset of 10 000 genes using the method described in Section 2.1, without using extra simulations or extra knowledge of DE status. Comparison with Figure 2a shows that our proposed estimation method works as expected. Note also that even in this simple simulation setting, with no genegene correlations or meanvariance structures, the fdrd2d isolines are still sloped.
We now compare the performance of fdr2d using the standard t-statistic versus fdr1d based on the standard and various moderated t-statistics described in Section 2.6: Efron's t (Efron et al., 2001), SAM (Tusher et al., 2001) and Smyth's t (Smyth, 2004). For each of the Scenarios 25 in Section 2.4, 100 datasets are generated, for a total of 106 simulated genes per scenario. For each fdr procedure, the genes are then ranked according to their true local fdr values, which are computed from the known DE status of each simulated gene, as above. In order to allow easy overall comparison, we compute the global FDR associated with the top ranking genes and draw the resulting FDR curve as a function of the proportion of genes declared DE.
The resulting FDR plots are given in Figure 3. The fdr2d with standard t shows top performance under all four scenarios. The fdr1d based on Smyth's moderated t performs equally well for Scenarios 24 (somewhat unsuprisingly, as these reflect its underlying model assumptions), but not as well for Scenario 5, where the error distribution is bootstrapped from the BRCA data. The fdr1d based on standard t does overall poorly and has actually worst global FDR in Scenarios 2 and 3; fdr1d based on Efron's t and SAM are mostly somewhere in between, though with dramatic breakdowns for specific scenarios: Efron's t does very badly in Scenario 4 under highly variable gene variances, and SAM has clearly inferior FDR in the BRCA-based Scenario 5. In summary, the fdr2d performs as well as the optimal empirical Bayes test for models with known variance structure, but better when the variance structure is close to the real BRCA data.
3.3 Application to real data
Results for the BRCA and Lymphoma data are shown in Figure 4 and Table 1. Figure 4a and b compare the fdr1d estimates (solid lines) with averaged fdr2d estimates (dashed lines). In both cases, there is overall good agreement between the curves, though the agreement is better in the tails of the curves than in the center, where the averaged fdr2d falls somewhat short of the fdr1d. This indicates that while the overall degree of smoothing is quite appropriate, there is still a problem with oversmoothing for t-statistics close to zero, fortunately an area of less practical interest.
|
Figure 4c and d show scatterplots of t-statistics versus log-standard errors, overlayed with estimated fdr2d isolines. For both datasets, we find pronounced asymmetry in these plots: not only in the amount of up- and down-regulation, which in any case is already evident from the fdr1d plots, but also in the way that genes with small standard errors are relatively discounted. In case of the BRCA data, the isoline for fdr2d = 0.05 is only very mildly sloped for positive t-statistics, leading to very moderate discounting of genes with log-standard errors <2; the isoline for the same level in the left half of the plot is strongly curved, putting clearly more weight on genes with log-standard errors >1. In case of the Lymphoma data, the asymmetry is even more pronounced, in that here genes with large standard errors are discounted for negative t-statistics, and genes with small errors for positive t-statistics (e.g. following the 0.1 isoline). This seems to reflect a strong meanvariance relationship for this dataset, which is also evident in the lower center of the plot.
Table 1 summarizes the number of genes that were found to be regulated for a fixed fdr cutoff (0.05 for the BRCA dataset, which has a stronger signal, and 0.1 for the Lymphoma set). For the BRCA data, exactly the same number of genes are found using fdr1d and fdr2d. For the Lymphoma data, however, the fdr2d finds both more genes to be regulated and also more balance between up- and down-regulation than fdr1d. This is a real example where the use of standard error information increases the power to detect DE. It is also interesting to note that the asymmetry in discounting genes based on their standard error can actually serve to find a better balance between up- and down-regulation.
More extensive comparisons between different procedures are summarized in Figure 5. In these comparisons, we also use the various modified t-statistics in the fdr2d procedure. In all cases, the fdr2d with the standard t-statistics performs as well as or better than any other procedure. As we go from 1D to 2D information, the advantage of the modified over the standard t-statistics disappears. This is an appealing practical property, as it means that we do not need to develop any extra analysis on how to modify the t-statistic.
3.4 Tornado and volcano plots
In Figure 6 we show tornado and volcano plots for the BRCA and Lymphoma data. The isolines shown are the same as in Figure 4c and d, but were suitably transformed to the different scales. Both presentations show the same features that were discussed in the previous section, but as a function of the fold change instead of the t-statistic. Specifically, recall that the volcano plot is motivated by the same wish to discount significant genes with misleadingly small standard errors, i.e. genes with small fold change, yet significant P-value. The fdr2d allows us here to assess the plot objectively, and to define regions of interest that are more flexible than simple boxes and allow for asymmetry.
| 4 DISCUSSION |
|---|
|
|
|---|
The generalization of the local fdr to multivariate statistics is conceptually simple, yet powerful enough to deal with two distinct problems in microarray data analysis: multiplicity and misleading over-precision. Both problems are well known, but we believe our approach is novel and potentially useful for most analysts. The technical problems in estimating the fdr, however, which are already present in the univariate case, become increasingly harder with the dimension of the test statistic. We have therefore limited our discussion to the 2D case, for which we have found the algorithm outlined in Section 2.1 to perform well.
Misleading over-precision is a subtle issue known in conditional inference (e.g. Lehmann, 1986, p. 558). Theoretically, this problem occurs even with a single test, although it is easier to explain with the confidence interval concept: short confidence intervals tend to have lower coverage than the stated confidence level (see Pawitan 2001, Section 5.10 for a simple example). Previously, awareness of this problem and biological intuition have lead to e.g. the volcano plot. The fdr2d provides a more flexible and objective control of the fdr for varying levels of information in the data.
We have applied the concept of fdr2d to the two-sample problem using a t-statistic. The classical t-statistic has been found to perform poorly on microarray expression data, especially for small datasets. Current solutions to this problem are either (1) ad hoc, such as the volcano plot or the test statistics proposed by Tusher et al. (2001) or Efron et al. (2001), or (2) make rather specific model assumptions, leading to empirical Bayes procedures such as the test statistics proposed by Lönnsted and Speed (2002) or Smyth (2004). In a specific dataset, the performance of the different procedures can vary considerably. The analytical solutions offer protection against very small standard errors inflating the t-statistic by modifying the denominator of the t-statistics. However, our simulation and data analysis suggest that ad hoc adjustments such as Efron's t can lead to poor performance if there is in fact significant variation between gene variances, which is expected in real data.
It is noteworthy that, without introducing any theoretical complexity, the fdr2d achieves comparable performance to the optimal empirical Bayes method. The fdr2d gains its power from acknowledging the meanvariance structure found in expression datasets of all sizes. The t-statistic reduces the combined information contained in the mean difference and the standard error to a single ratio. In contrast, the fdr2d uses the full bivariate information, with the effect that genes with t-statistics corresponding to very small standard errors are discounted for fdr reasons. Furthermore, we also found that there is no further gain in fdr2d from using modified t-statistics in its definition.
Various extensions of the fdr2d approach suggest themselves: for more than two groups of samples, we can split the classical F-statistic F = MST/MSE in the same manner as the t-statistic so that e.g.
![]() |
The two main problems that make the routine application of the fdr2d in data analysis difficult are the same as for the fdr1d, namely the smoothing of the ratio of densities and the estimation of
0. These issues, however, are the subject of much ongoing research, which we believe will make possible a routine use of fdr2d.
|
|
|
|
| Acknowledgments |
|---|
The work of A.P. and Y.P. has been supported by a grant from the Swedish Cancer Society (Cancerfonden).
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: John Quackenbush
Received on August 29, 2005; revised on December 14, 2005; accepted on December 15, 2005
| REFERENCES |
|---|
|
|
|---|
Benjamini, Y. and Hochberg, Y. (1995) Controling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. B, 57, 289300.
Dalmasso, C., et al. (2005) A simple procedure for estimating the false discovery rate. Bioinformatics, 21, 660668
Efron, B., et al. (2001) Empirical Bayes analysis of a microarray experiment. J. Am. Stat. Soc, . 96, 11511160.
Hedenfalk, D., et al. (2001) Gene-expression profiles in hereditary breast cancer. N. Engl. J. Med, . 344, 539548
Lehmann, E.L. Testing statistical hypotheses, (1986) , NY Wiley.
Lönnstedt, I. and Speed, T. (2002) Replicated microarray data. Stat. Sinica, 12, 3146.
Newton, M.A., et al. (2004) Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics, 5, 155176[Abstract].
Pawitan, Y. In All Likelihood: Statistical Modelling and Inference Using Likelihood, (2001) , Oxford, UK Oxford University Press.
Pawitan, Y., et al. (2005a) False discovery rate, sensitivity and sample size for microarray studies. Bioinformatics, 21, 30173024
Pawitan, Y., et al. (2005b) Bias in the estimation of false discovery rate in microarray studies. Bioinformatics, 21, 38653872
Reiner, A., et al. (2003) Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics, 19, 368375
Rosenwald, A., et al. (2002) The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N. Engl. J. Med, . 346, 19371947
Smyth, G.K. (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol, . 3, Article 3.
Storey, J.D. and Tibshirani, R. (2003) Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA, 100, 94409445
Tusher, V.G., et al. (2001) Significance analysis of microarrays applied to the ionizing radiation response. [Erratum (2001) Proc. Natl Acad. Sci USA, 98, 10515.]. Proc. Natl Acad. Sci. USA, 98, 51165121
Wolfinger, R.D., et al. (2001) Assessing gene significance from cDNA microarray expression data via mixed models. J. Comput. Biol, . 8, 625637[CrossRef][Web of Science][Medline].
Yang, Y.H., et al. (2005) Identifying differentially expressed genes from microarray experiments via statistic synthesis. Bioinformatics, 21, 10841093
This article has been cited by other articles:
![]() |
W.-J. Hong, R. Tibshirani, and G. Chu Local false discovery rate facilitates comparison of different microarray experiments Nucleic Acids Res., October 13, 2009; (2009) gkp813v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Bukszar, J. L. McClay, and E. J. C. G. van den Oord Estimating the posterior probability that genome-wide association findings are true or false Bioinformatics, July 15, 2009; 25(14): 1807 - 1813. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. A. Watkins, A. Gusnanto, B. de Bono, S. De, D. Miranda-Saavedra, D. L. Hardie, W. G. J. Angenent, A. P. Attwood, P. D. Ellis, W. Erber, et al. A HaemAtlas: characterizing gene expression in differentiated human blood cells Blood, May 7, 2009; 113(19): e1 - e9. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Demissie, B. Mascialino, S. Calza, and Y. Pawitan Unequal group variances in microarray data analyses Bioinformatics, May 1, 2008; 24(9): 1168 - 1174. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Saeys, I. Inza, and P. Larranaga A review of feature selection techniques in bioinformatics Bioinformatics, October 1, 2007; 23(19): 2507 - 2517. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Calza, W. Raffelsberger, A. Ploner, J. Sahel, T. Leveillard, and Y. Pawitan Filtering genes to improve sensitivity in oligonucleotide microarray data analysis Nucleic Acids Res., August 28, 2007; (2007) gkm537v2. [Abstract] [Full Text] [PDF] |
||||
![]() |
G.J. McLachlan, R.W. Bean, and L. B.-T. Jones A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays Bioinformatics, July 1, 2006; 22(13): 1608 - 1615. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

































