Bioinformatics Advance Access originally published online on October 28, 2004
Bioinformatics 2005 21(7):1084-1093; doi:10.1093/bioinformatics/bti108
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Identifying differentially expressed genes from microarray experiments via statistic synthesis


1Departments of Medicine, Center for Bioinformatics and Molecular Biostatistics, University of California San Francisco, CA 94143, USA
2Department of Biopharmaceutical Sciences, Center for Bioinformatics and Molecular Biostatistics, University of California San Francisco, CA 94143, USA
3Department of Epidemiology and Biostatistics, Center for Bioinformatics and Molecular Biostatistics, University of California San Francisco, CA 94143, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Motivation: A common objective of microarray experiments is the detection of differential gene expression between samples obtained under different conditions. The task of identifying differentially expressed genes consists of two aspects: ranking and selection. Numerous statistics have been proposed to rank genes in order of evidence for differential expression. However, no one statistic is universally optimal and there is seldom any basis or guidance that can direct toward a particular statistic of choice.
Results: Our new approach, which addresses both ranking and selection of differentially expressed genes, integrates differing statistics via a distance synthesis scheme. Using a set of (Affymetrix) spike-in datasets, in which differentially expressed genes are known, we demonstrate that our method compares favorably with the best individual statistics, while achieving robustness properties lacked by the individual statistics. We further evaluate performance on one other microarray study.
Availability: The approach is implemented in an R package called DEDS, which is available for download from the Bioconductor website (http://www.bioconductor.org/).
Contact: mark{at}biostat.ucsf.edu
| 1 INTRODUCTION |
|---|
|
|
|---|
Microarrays have become increasingly common in biological and medical research. They enable the simultaneous study of thousands of genes and afford unprecedented ability to provide gene expression information on a whole genome level. There are several types of microarray technology including spotted arrays (DeRisi et al., 1996) and high-density oligonucleotide chips (Lockhart et al., 1996). The reader is referred to Schena (2000) and Bowtell and Sambrook (2003) for a more detailed introduction to the biology and technology underlying microarrays.
Microarray experiments generate large and complex multivariate datasets which present numerous data analytic challenges. These include, firstly, adjusting for the many sources of variability arising from probe and target preparation, array fabrication, and imaging; secondly, devising methods that are geared to the novel data structuresthousands of inter-related variables (genes), small sample sizes (arrays), and little or no replication; and lastly, where needed, accommodating multiple testing concerns. These complexities call for robust and integrated analysis tools to enhance reliability and efficiency.
A very common data analytic goal is the identification of important genes amongst the many for which expression measures have been obtained. Importance typically corresponds to association with a response of interest. When we are contrasting expression between different groups or conditions (i.e. the response is polytomous), such important genes are said to be differentially expressed (DE). The task of identifying DE genes can be divided into two aspects: ranking and selection. Ranking requires specification of a statistic or measure which captures evidence for DE on a per gene basis. Selection requires specification of a procedure (e.g. stipulation of a critical value) for arbitrating what constitutes significant DE. Ranking is fundamental and, arguably, is easier than selection. The primary importance of ranking arises from the fact that only a limited number of genes can be biologically validated from follow-up experiments, these being\linebreak necessary to affirm DE in view of the present maturity of microarray measurements. The goal of this paper is to develop and illustrate a novel approach to ranking and selection that integrates differing measures of DE.
The paper is organized as follows. Section 2 presents a brief overview of some recently proposed methods for identifying DE genes in multiple-array experiments. Our proposed approach, differential expression via distance synthesis (DEDS), is presented in Section 3. We apply the method to two different datasets and present results in Section 4. Finally, Section 5 discusses our findings, extensions and open questions.
| 2 EXISTING METHODS FOR DETECTING DE GENES |
|---|
|
|
|---|
The importance of developing new data analytic techniques to effectively identify DE genes is illustrated by the appreciable effort and literature dedicated to this area. We will overview several customized approaches to both aspects of DE detection with an emphasis on ranking.
2.1 Ranking genes
For simplicity, and without loss of generality, we focus on a dichotomous response; i.e. deal with two-group comparisons. We designate the groups as treatment (T) and control (C). One may consider the question of ranking genes in terms of DE in a discriminant analysis framework, i.e. consider DE genes as features or variables that best separate groups T and C (Ghosh, 2003). However, such approaches are focused on class prediction, and the rankings of genes so obtained are strongly influenced by between-gene dependencies and feature-selection strategies, so that individual gene DE is at best a by-product. For this reason, we limit our discussion to more direct approaches which assess differences between distinct groups on a single-gene basis, and we focus on those methods that we subsequently apply and describe the classes of statistics that they represent.
For two-channel competitive hybridization experiments, we assume that the comparisons of log-ratios are all indirect; that is we have nT arrays in which samples from group T are hybridized against a reference sample R, giving nT log-ratios
; i = 1, ...,nT, and similarly we get nC log-ratios
; J = 1, ..., nC from group C. For Affymetrix oligonucleotide array experiments, we have nT chips with gene expression measures from group T and nC chips with gene expression measures from group C.
2.1.1 Fold changes and t-statistics
The simplest gene ranking method is based on the fold change (FC) (i.e. ratio) in expression means between the two groups. Thus, for each gene, we compute the difference between (log) means FC =
T
C, where
and similarly for
C. While use of FC is conceptually appealing, ranking genes based on fold change alone implicitly assigns equal variance to every gene.
In contrast, the t-statistic defined as
![]() |
2.1.2 Penalized statistics
The primary shortcoming of using t-statistics for ranking genes lies in the unstable variance estimates that arise when sample size is small. Such small sample sizes are common occurrences in microarray experiments due to high costs and/or resource (RNA) limitations. Relatedly, with tens of thousands of t-statistics (corresponding to tens of thousands of genes) there is frequently a number of large t-statistics driven by very small denominator standard deviations (sp's), even though their numerators, measuring expression differences
T
C, may also be small. To overcome these shortcomings a variety of approaches have been proposed to provide a more reliable variance estimate; they can be categorized into two groups. The first group consists of variance stabilizing functions, and the second group contains error fudge factors and Bayesian methods. The former seeks to decouple the meanvariance dependency by modeling the variance of the expression of a gene as a function of the mean expression of the gene (Rocke and Durbin, 2001; Huber et al., 2002; Jain et al., 2003). The latter regularizes the t-statistics by inflating their denominators:
![]() |
2.1.3 Mixture models
These Bayes and empirical Bayes approaches are arrived at via mixture models: it is postulated that we have a mixture of non-DE and DE genes with the latter group sometimes refined into up- and down-regulated components. Often mixing of these components is done at the level of one-dimensional, gene-specific summary statistics (Efron et al., 2001; Lee et al., 2000; Pan et al., 2002; Allison et al., 2002; Ghosh, 2004, http://bioinformatics.oupjournals.org/cgi/reprint/bth139v1). Newton et al., 2004 argue that efficiency and sensitivity gains may be realized by effecting mixing on the gene expression (mean) values themselves, rather than on derived summaries. They develop so-called semi-parametric hierarchical mixture models (SHMMs) that have a number of features. Firstly, they are flexible (non-parametric) where data are rich (lots of genes) and parametric where data paucity mandates assumption-making (few replicates per gene). Secondly, the provision of per gene posterior probabilities of DE allows a straightforward approach to calibration in terms of false discovery rates (FDRs) (see below). Arguably, the main limitation surrounding the SHMM approach is the adequacy of the parametric model. Recognizing this, Newton et al., 2004 provide graphical diagnostics (stratified QQ plots) for assessment of their chosen (gamma) model with their accompanying software.
2.1.4 Linear (mixed) models
In an effort to explicitly accommodate factors systematically impacting microarray gene expression values, a number of linear model/ANOVA based approaches have been advanced. Kerr et al., 2000 use fixed effect ANOVA models for logged intensities (single channel measurements) including terms for dye, array, treatment and gene main effects, as well as select interactions between these factors. By assuming a common variance across genes, a pooled (over genes) analysis is enabled, with large attendant increases in degrees of freedom for error variance estimation. However, there will likely be appreciable erosion of these degrees of freedom in order to accommodate interaction terms needed to recover the adjustments afforded by use of log-ratios in two channel settings. To overcome the (strong) common variance assumption, Jin et al. (2001) and Wolfinger et al. (2001) adopt mixed model ANOVA formulations, performing gene-by-gene analyses treating factors such as dye and array as random effects.
2.2 Ascribing significance
Subsequent to gene ranking comes selectionthe task of declaring which genes are significantly differentially expressed. Informal approaches include graphical examination of the ranking statistics via QQ plots, whereas more formal approaches involve testing suitably constructed (joint) null hypotheses of equal expression (non-DE). Such joint formulations are mandated by the multiple testing concerns that follow from evaluating thousands of genes. Two main approaches to this problem have emerged. One seeks to control family-wise type I error rates (Dudoit et al., 2002; Ge and Dudoit, 2002, http://lib.stat.cmu.edu/R/CRAN), using step-down Bonferroni correction Westfall and Young, 1993). The other (Efron et al., 2000; Tusher et al., 2001; Storey, 2002) extends the FDR ideas of Benjamini and Hochberg (1995). Detailed discussion and comparisons of competing approaches to multiple testing correction are deferred to Dudoit et al. (2003) and Storey et al. (2002).
| 3 DIFFERENTIAL EXPRESSION VIA DISTANCE SYNTHESIS |
|---|
|
|
|---|
It is immediately apparent from Section 2.1 that there are numerous statistics available for ranking genes as a prelude to arbitrating DE. Furthermore, selecting the best statistic or ordering statistics in terms of merit is problematic due to the complexity of the variability structure present in all microarray data. Performance characteristics (e.g. efficiency, robustness) of two-sample statistics depend on data attributes such as underlying distribution(s) and nature and extent of outliers and/or contamination. To date, there has been no delineation of such attributes for microarray data, making pre-selection of the theoretically best statistic problematic. Likewise, no comparisons across a sufficiently wide range of benchmark datasets have been undertaken. Hence, investigators must make rather arbitrary choices when deciding which ranking statistic to use. It is in part to eliminate some of this arbitrariness but primarily to borrow strength across related measures that we propose synthesizing DE ranking schemes.
A somewhat related approach is that of Pareto Fronts (PF) (Fleury et al., 2002, http://www.eecs.umich.edu/~hero/Preprints/eusipcogene.pdf). Briefly, by regarding the set of measures as defining a multivariate point cloud (points corresponding to a gene's vector of measures) and employing a standard (coordinate-wise) partial order, a set of non-dominated (Pareto-optimal) genes is identified. In our context these could include genes ranked (for DE) very highly by one measure but very lowly by another.
We now sketch our approach to detecting DEDS. We also begin with the multivariate point cloud with a point corresponding to a gene's vector of DE measures. Rather than employing partial order, we exploit the fact that all measures are attempting to capture DE and synthesize (reduce to a scalar) as follows. Firstly, we define an extreme point. Without loss of generality, assume that large values of all measures indicate DE. Then the extreme point is a vector of coordinates, each of which is the overall maximum of both observed and permuted values of that measure. Note that the extreme point may not correspond to any observed point. Next, the distance from all points to the extreme is computed. Intuitively, those points closest to the extreme are most likely to correspond to DE genes, but, we need to calibrate close. This is done by generating null referent distributions analogously to the methods Tibshirani et al., 2001 devised for calibrating gap statistics. Figure 1 provides a graphical illustration of the motivation behind DEDS. Specifically, panel (b) represents a common occurrence where the concordance between two DE measures (M1 and M2 in the plot) is relatively low [compared to panel (a)]. We hope by measuring the distance of spots to the extreme point in the direction of DE and therefore taking both measures into consideration, the rankings of the genes that show discord between measures (blue spots in Fig. 1b) will be lowered to a certain extent. The steps for calculating DEDS and selecting DE genes are detailed in Box 1.
|
| Box 1. DEDSpermutation-based algorithm Calculating DEDS
Assessing DE significance
|
| 4 ILLUSTRATIVE EXAMPLES |
|---|
|
|
|---|
We evaluate the performance of our proposed method, DEDS, on two diverse datasets, each featuring a number of sub-studies. Both two-channel spotted arrays and one-channel Affymetrix arrays are represented, as are situations with minimal and considerable DE anticipated. Furthermore, the inclusion of a spike-in experiment permits assessment in the rare setting where a gold standard (a set of known DE genes) is available.
4.1 Study 1: Affymetrix spike-in experiment
4.1.1 Data description
The spike-in experiment represents a portion of the data used by Affymetrix to develop their MAS 5.0 preprocessing algorithm. Here we utilize both MAS 5.0 and RMA (Irizarry et al., 2003) probe level summaries in order to showcase a robustness property of DEDS. The data features 14 human genes spiked-in at a series of 14 known concentrations (0, 22, 21, ..., 210 pM) according to a Latin square design including 12 612 null genes. Each row of the Latin square (given spike-in gene at a given concentration) was replicated (typically three times, two rows 12 times, 59 arrays in total). Further details are available at http://www.affymetrix.com/analysis/download_center2.affx. We analyze this dataset as 91 (
) pairs of two-sample comparisons corresponding to all pairwise comparisons of differing concentrations.
4.1.2 Results
The use of spike-in allows computation of receiver operating characteristic (ROC) curves since we know which genes are DE and which are null. For each two-sample comparison, we compute five different DE measures: FC, t-statistic, a moderated t-statistic that coincides with the B-statistic (Lönnstedt and Speed, 2001) SAM with the standard deviation penalty a taken as the median (over all genes) within gene standard deviation and our synthesized measure (DEDS) based on the first four statistics.
Thus, we obtain 91 ROC curves for each of the five measures. ROC curves are created by plotting the true positive rates versus false positive rates. The summary ROC curves as presented in Figure 2 are just averages of the 91 individual curves. The two panels of Figure 2 correspond to differing approaches for probe level summarization: panel (a) uses RMA while (b) uses MAS 5.0. Contrasting panels, the ensemble of ROC curves affirms the findings of the RMA developers (Irizarry et al., 2003) that RMA summaries improve accuracy without sacrificing precision, relative to MAS 5.0. However, more important from the present standpoint, is that even though for each approach to probe level summarization there is one measure that performs relatively poorly (t-statistics for RMA, FC for MAS 5.0), in both settings DEDS performed well. Thus, by utilizing several DE measures in the analysis of the spike-in experiment, DEDS is able to not only perform competitively with the best, but is also not adversely affected by the worst. And, of course, in practice we do not know a priori which measures will be suitable, as the spike-in data exemplifies.
|
Furthermore, two array groups among the 14 array groups contain 12 replicates. This subset of data were used to evaluate the ability of DEDS in determining the correct cutoffs for declaration of differential expression. We apply the permutation procedure described in Box 1 to simulate the DEDS reference distribution and thereby estimate the number of DE genes. Controlling the FDR at 0.01, 16 genes are found to be DE, 11 of which are among the 14 true DE genes. The top 20 DE genes have 13 of the 14 genes selected. The gene that is not detected is the most difficult one, since it is spiked in at the two lowest concentrations (0 and 0.25 pM). This represents an extremely low log-ratio/log-intensity for microarray-based detection. No other DE-statistic was able to detect this gene.
4.2 Study 2: Spt splicing (SPT) experiment
4.2.1 Data description
The SPT experiment consists of 22 two-channel spotted oligonucleotide splicing arrays (Clark et al., 2002). The primary goal of this experiment is to investigate the roles of eukaryotic chromatin elongation factors, Spt4Spt5, in splicing (Hartzog et al., 1998). We have analyzed a spt4 null mutation spt4
, and three partial loss-of-function spt5 mutations, spt5194, spt5242 and spt54. In addition, we include analysis of ceg1250, a temperature-sensitive mutation that causes rapid inactivation of the capping enzyme at non-permissive temperature (Fresco and Buratowski, 1996). Two independent mRNA samples are prepared for each mutant and a pair of dye-swap experiments are performed for each mRNA sample. Figure 3 shows a graphical representation of the actual experiment which includes four hybridizations between each mutant (mut) and wild-type (wt) and an additional two selfself hybridizations of the wild-type.
|
To distinguish between spliced and unspliced transcripts for intron-containing yeast genes, oligonucleotide probes on these arrays were designed to detect splice junctions (SJ), introns (Int) and second exons (Ex) of spliced genes. After normalization, array measurements are summarized into two indices that capture splicing alterations (Clark et al., 2002). The intron accumulation (IA) index assesses the change (relative to wt) of the intron probe signal in the mutants, normalized by the change in the corresponding exon. This is defined as
![]() |
![]() |
4.2.2 Models
To effectively analyze the splicing data, we apply five competing statistical models for evaluating DE and the synthesized measure DEDS. The nested experimental design of the splice mutant study shown in Figure 4 motivates the use of four different mixed ANOVA models described next. Furthermore, the limited amount of replication coupled with the small number of genes makes the SHMM (Newton et al., 2004) appealing for conceptual reasons. Thus, we apply five individual statistics separately and identically to the SJ and IA indices and subsequently investigate their fusion via DEDS.
|
Distinguished by including wild-type self-hybridizations or not, DE assessment can be pursued by either one-sample (four SJ or IA mutant indices per gene corresponding to the four replicate hybridizations) or two-sample (four SJ or IA mutant indices versus two SJ or IA wild-type indices) comparisons. A more detailed discussion of the difference between the direct (one-sample) and indirect (two-sample) comparisons can be found in (Speed and Yang, 2002). In addition to the above two approaches, we also consider another two approaches distinguished by allowing gene-specific variance heterogeneity or not. This latter case imposes the assumption that all genes exhibit a similar degree of variability and so can be jointly analyzed using a common estimate of error variance. This pooling dramatically increases error degrees of freedom (df). The former approach, on the contrary, does not impose the common variance assumption, allowing different variances for different genes. The resulting model is then fitted gene by gene. So, we have a 2 x 2 factorial of approaches, indicated by models IIV in Table 1).
|
In addition to these four models, we also employ Newton et al. (2004) SHMM (model V in Table 1; see Section 2). Further details concerning the fitting of the five different models and the comparisons are provided in Xiao et al. (unpublished data).
4.2.3 Results
Figure 5 displays a scatter plot matrix of log10(p), where p either corresponds to the model IIV unadjusted p-value for tests of DE or to the model V posterior probability for non-DE for mutants ceg1250 [panel (a)] and spt4
[panel (b)]. Note that by relating model IIV results to model V results we may seemingly be perpetuating the severe pedagogical problem of misinterpreting p-values as posterior probabilities (Berger et al., 1997). However, this is not the case. At no stage do we make probabilistic statements in terms of these quantities. Rather, they simply constitute a quantification of DE. The large number of DE genes anticipated for mutant ceg1250 is evident in Figure 5a (note the scales) and we observe high correlations between the five models. By contrast, minimal DE is anticipated for spt4
and results from the different models show this along with much lower agreement [panel (b)]. The fact that model V conforms more closely to the homoscedastic models (I and III) than to the heteroscedastic models (II and IV) is not surprising, since the SHMM utilizes information sharing between genes which is absent for the gene-specific heteroscedastic models.
|
Here is a situation where there is no clear advantages of one model over the others. Therefore, rather than trying to arbitrate between models and pick a single model on which to base DE rankings and declarations, or informally distilling sets of genes that are DE under two or more models, we employ DEDS as a robust means for synthesizing results and compare its performance with individual models.
As the sample sizes are too small (4 versus 2 for the two-sample statistics) to employ an effective permutation scheme, we elect to use p-values for the calculation of DEDS distances. The observed distances are then calibrated against expected values under the referent null distribution, which is simulated by drawing from marginal uniform
variates with correlation structure conforming to the observed data (Tibshirani et al., 2001). The algorithm is further motivated analogously to the mixture model approaches described in Section 2.1 but on the p-value scale, with non-DE corresponding to uniformity. Figure 6a and c shows histograms of the p-values from model I applied on ceg1-250 SJ indices and p-values from the Affymetrix spike-in experiment (see Section 4.1). The dashed lines are the frequency we would expect when all genes were null. As can be seen, there are many DE genes in (a) but minimal DE genes in (c). Further details of the algorithm are provided in Supplementary Data A.
|
Critical to ascribing DE is appropriate specification of cutoffs. Table 1 compares the numbers of genes achieving significance under FDRs of 0.01 and 0.05 for the five models and DEDS for mutant ceg1250. Immediately striking are the differences in numbers of genes declared DE by the different models. While this can be attributed in part to differing operating characteristics, it serves to showcase how sensitive results are to statistic choice.
Here, evaluating differences in DE determinations by the different models is problematic since we have no gold standard and, unlike the spike-in study, we do not know which genes are truly DE. However, as DEDS synthesizes over individual statistics, we believe its rankings of genes to be more robust than single measures. To examine this, we have defined and compared the characteristics of the top DE genes for ceg1250 SJ indices by DEDS and the five individual DE models. As mentioned, ceg1250 is known to profoundly affect splicing and accordingly we treat the top 133 genes from each model as significantly DE. While this number is somewhat arbitrary, the flavor of the results below is not overly sensitive to this specification. We then classify genes into three major groups. Group I consists of DE genes by DEDS. Group II contains non-DE genes by all six models. Group III is comprised of genes that are non-DE by DEDS but DE by one or more single measures and genes in these groups are further separated into different classes as illustrated in Figure 7a. To aid comparisons between genes of different groups, we display three-dimensional scatter plots between different models. Plotted on all axes are log10p of the corresponding models. Group I genes, represented by black spots, illustrate good concordance among DE models; whereas group III genes, represented as colored numbers, lie mainly off diagonal, indicating that such genes are ranked higher in one measure than the others. Thus, by ascribing high rankings to genes that exhibit agreement on DE among different measures and low rankings to genes that demonstrate discord among related measures, DEDS arguably provides a more robust gene ranking.
|
| 5 DISCUSSION |
|---|
|
|
|---|
In this paper, we have reviewed various statistical methods for the identification of DE genes in replicated microarray experiments. Additionally, we have advanced a novel method (DEDS) for this purpose. The DEDS algorithm synthesizes statistics or methods that estimate the same quantity of interest. The underlying principle behind DEDS is that genes that are highly ranked by different measures are more likely to be truly differentially expressed than genes that rank highly on a single measure.
Consider three widely used measures as an example: FC, t and SAM. The major limitations surrounding FC and t are the equal denominator and small denominator problems, respectively. Concerns surrounding SAM include criteria for, and accuracy of, estimates for the penalty parameter a. Another problem is that with tens of thousands of statistics calculated, there is frequently a set of genes possessing large statistics for one measure only, these arising by chance and/or because of shortcomings associated with the measures. Such genes are likely to be false positives. The advantage offered by DEDS over single measures is that, by combining over measures, such false positives are ranked lowly and become true negatives. Therefore, the set of DE genes obtained via DEDS tends to be robust against limitations associated with individual measures.
The intuition behind DEDS simply draws on the concept of intersection, i.e. it attempts to select genes that are ranked highly on all measures. However, there is a clear distinction between DEDS and a simple intersection of results from individual measures, which treats the measures as independent. To further illustrate such differences, we use an additional dataset of 6320 genes for which, due to the nature of the comparison groups, we anticipate substantial DE. The data derives from a study of cardiomyopathy in transgenic mice as influenced by overexpression of a G protein-coupled receptor, Ro1 (details are provided in Redfern et al., 2000, while details on fitting of the various DE measures are available in Supplementary Data B). Figure 8 displays a volcano plot between two measures: FC and t-statistics for the Ro1 data. The categorization of genes here is similar to Figure 7. Black points are the top 905 genes selected by DEDS, and the gray ones are those failing to be among the extremes (top 905) of any measures. The horizontal and vertical reference lines are the cutoff values estimated using t-statistic (|t|
2.98) and FC (|FC|
0.66), respectively. Classes 1 (red), 2 (cyan) and 3 (blue) are genes that are extreme in single measures, FC, t or SAM, only and their occurrences are possibly due to limitations associated with each measure. A simple intersection treats all measures as independent; thus, it will select only the genes from areas A and B. On the contrary, DEDS extends the intersection idea by considering all DE measures simultaneously, implicitly taking their correlation into account, thereby giving rise to a less-stringent criteria and possibly a lower false negative rate.
|
Natural questions that arise in using DEDS are choices of component DE measures and the distance metric. For the former, we have found that synthesizing t, FC and a penalized statistic such as SAM, gives good performance for all datasets we have analyzed using DEDS. To investigate whether including highly correlated measures increases variation and so erode the efficiency of DEDS, we applied DEDS on the Affymetrix spike-in RMA data by synthesizing two, three or all four measures from t, FC, SAM and moderated t. We have observed stable performance of DEDS as the obtained ROC curves for all combinations were largely overlapping. With regard to choice of distance metric, we recommend using a distance scaled according to variation of the component statistics so that DEDS results are not dominated by a single measure. In addition, even though we have demonstrated good performance of DEDS in the assessment of DE, it should be noted that sufficient replication is still the key in obtaining power and stability in analysis, and DEDS is not a panacea for poorly replicated experiments.
Finally, the DEDS method proposed in this paper is implemented in an R package (Ihaka and Gentleman, 1996) DEDS, which may be downloaded from the Bioconductor website (http://www.bioconductor.org/).
| Acknowledgments |
|---|
We are grateful to Professor Grant Hartzog and Todd Burkin at the University of California, Santa Cruz, who have provided us the microarray data of the SPT experiment and have participated in many valuable discussions related to the analysis of the SPT data. We also thank the anonymous reviewers whose comments improved the quality of the paper. Y.X. wishes to acknowledge Professor C. Anthony Hunt at the University of California, San Francisco for funding support.
| Footnotes |
|---|
These two authors contributed equally to this work.
Received on May 21, 2004; revised on July 2, 2004; accepted on July 7, 2004
| REFERENCES |
|---|
|
|
|---|
Allison, D.B., Gadbury, G.L., Heo, M., Fernández, J.R., Lee, C.-K., Prolla, T.A., Weindruch, R. (2002) A mixture model approach for the analysis of microarray gene expression data. Comput. Statist. Data Anal., 39, 120.
Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. B, 57, 289300.
Berger, J.O., Boukai, B., Wang, Y. (1997) Unified frequentist and Bayesian testing of a precise hypothesis. Statist. Sci., 12, 133148[CrossRef].
(Eds.). DNA Microarrays: A Molecular Cloning Manual, (2003) , Cold Spring, NY Cold Spring Harbor Press.
Clark, T.A., Sugnet, C.W., Ares, M. (2002) Genomewide analysis of mRNA processing in yeast using splicing-specific microarrays. Science, 296, , pp. 907910
DeRisi, J., Penland, L., Brown, P.O., Bittner, M.L., Meltzer, P.S., Ray, M., Chen, Y., Su, Y.A., Trent, J.M. (1996) Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nat. Genet., 14, 457460[CrossRef][Web of Science][Medline].
Dudoit, S., Shaffer, J.P., Boldrick, J.C. (2003) Multiple hypothesis testing in microarray experiments. Statist. Sci., 18, 71103[CrossRef].
Dudoit, S., Yang, Y.H., Callow, M.J., Speed, T.P. (2002) Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statist. Sinica, 12, 111139.
Technical Report. Efron, B., Tibshirani, R., Goss, V., Chu, G. (2000) Microarrays and their use in a comparative experiment. , Stanford Department of Statistics, Stanford University.
Efron, E., Tibshirani, R., Storey, J., Tusher, V. (2001) Empirical Bayes analysis of a microarray experiment. J. Am. Statist. Assoc., 96, 11511160[CrossRef][Web of Science].
Fleury, G., Hero, A., Yoshida, S., Carter, T., Barlow, C., Swaroop, A. (2002) Pareto analysis for gene filtering in microarray experiments. European Signal Processing Conference .
Fresco, L.D. and Buratowski, S. (1996) Conditional mutants of the yeast mRNA capping enzyme show that the cap enhances, but is not required for, mRNA splicing. RNA, 2, , pp. 584596[Abstract].
Ge, Y. and Dudoit, S. (2002) Multiple testing procedures. R package.
Ghosh, D. (2003) Penalized discriminant methods for the classification of tumors from gene expression data. Biometrics, 59, 9921000[CrossRef][Web of Science][Medline].
Ghosh, D. (2004) Mixture models for assessing differential expression in complex tissues using microarray data. Bioinformatics, Epub ahead of print.
Hartzog, G.A., Wada, T., Handa, H., Winston, F. (1998) Evidence that spt4, spt5, and spt6 control transcription elongation by RNA polymerase ii in Saccharomyces cerevisiae. Genes and Dev., 12, 357369
Huber, W., von Heydebreck, A., Sültmann, H., Poustka, A., Vingron, M. (2002) Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics, 1, 19.
Ihaka, R. and Gentleman, R. (1996) R: a language for data analysis and graphics. J. Comput. Graph. Statist., 5, 299314[CrossRef].
Irizarry, R.A., Bolstad, B.M., Collin, F., Cope, L., Hobbs, B., Speed, T.P. (2003) Summaries of Affymetrix genechip probe level data. Nucleic Acids Res., 31, e15
Jain, N., Thatte, J., Braciale, T., Ley, K., O'Connell, M., Lee, J.K. (2003) Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays. Bioinformatics, 19, 19451951
Jin, W., Riley, R.M., Wolfinger, R.D., White, K.P., Passador-Gurgel, G., Gibson, G. (2001) The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster. Nat. Genet., 29, 389395[CrossRef][Web of Science][Medline].
Kerr, M.K., Martin, M., Churchill, G.A. (2000) Analysis of variance for gene expression microarray data. J. Comput. Biol., 7, 819837[CrossRef][Web of Science][Medline].
Lee, M.-L.T., Kuo, F.C., Whitmore, G.A., Sklar, J. (2000) Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. Proc. Natl Acad. Sci. USA, 97, 98349839
Lockhart, D.J., Dong, H.L., Byrne, M.C., Follettie, M.T., Gallo, M.V., Chee, M.S., Mittmann, M., Wang, C., Kobayashi, M., Horton, H. (1996) Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol., 14, 16751680[CrossRef][Web of Science][Medline].
Lönnstedt, I. and Speed, T.P. (2001) Replicated microarray data. Statist. Sinica, 12, 3146.
Newton, M.A., Kendziorski, C.M., Richmond, C.S., Blattner, F.R., Tsui, K.W. (2001) On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J. Comput. Biol., 8, 3752[CrossRef][Web of Science][Medline].
Newton, M.A., Noueiry, A., Sarkar, D., Ahlquist, P. (2004) Detecting differential gene expression with a semiparametric hierarchical mixture method gene expression profiles. Biostatistics, 5, 155176[Abstract].
Pan, W., Lin, J., Le, C. (2002) How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach. Genome Biol., 3, 0022.10022.10.
Redfern, C.H., Degtyarev, M.Y., Kwa, A.T., Salomonis, N., Cotte, N., Nanevicz, T., Fidelman, N., Desai, K., Vranizan, K., Lee, E.K., et al. (2000) Conditional expression of a gi-coupled receptor causes ventricular conduction delay and a lethal cardiomyopathy. Proc. Natl Acad. Sci. USA, 97, 48264831
Rocke, D.M. and Durbin, B. (2001) A model for measurement error for gene expression arrays. J. Comput. Biol., 8, 557570[CrossRef][Web of Science][Medline].
(Ed.). Microarray Biochip Technology, (2000) , NATICK, MA Eaton.
Smyth, G.K. (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statist. Appl. Genet. Mol. Biol., 3, Article 3, http://www.bepress.com.sagmb.
Speed, T.P. and Yang, Y.H. (2002) Direct and indirect hybridizations for cDNA microarray experiments. Sankhya Indian J. Statist. A, 64, , pp. 706720.
Storey, J.D. (2002) A direct approach to false discovery rates. J. R. Statist. Soc. B, 64, 479498[CrossRef].
Technical Report 623. Storey, J.D., Taylor, J.E., Siegmund, D. (2002) A unified estimation approach to false discovery rates. , Berkeley, CA Department of Statistics, University of California.
Tibshirani, R., Walther, G., Hastie, T. (2001) Estimating the number of clusters in a dataset via the gap statistic. J. R. Statist. Soc. B-Statist. Methodol., 63, 411423[CrossRef].
Tusher, V., Tibshirani, R., Chu, G. (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA, 98, 5116
Westfall, P.H. and Young, S.S. Resampling-based Multiple Testing: Examples and Methods for p-value Adjustment, (1993) , New York John Wiley and Sons.
Wolfinger, R.D., Gibson, G., Wolfinger, E.D., Bennett, L., Hamadeh, H., Bushel, P., Afshari, C., Paules, R.S. (2001) Assessing gene significance from cDNA microarray expression data via mixed models. J. Comput. Biol., 8, , pp. 625638[CrossRef][Web of Science][Medline].
This article has been cited by other articles:
![]() |
Y. Saeys, I. Inza, and P. Larranaga A review of feature selection techniques in bioinformatics Bioinformatics, October 1, 2007; 23(19): 2507 - 2517. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Liu, M. Milo, N. D Lawrence, and M. Rattray Probe-level measurement error improves accuracy in detecting differential gene expression Bioinformatics, September 1, 2006; 22(17): 2107 - 2113. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Ploner, S. Calza, A. Gusnanto, and Y. Pawitan Multidimensional local false discovery rate for microarray studies Bioinformatics, March 1, 2006; 22(5): 556 - 565. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



and store the results (see below). Obtain the corresponding coordinate-wise maximum as above:
.
d(2)
.
satisfy:
, j = 1, ..., N. Compute the median number of falsely called genes among the B sets of permutations.








