Skip Navigation


Bioinformatics Advance Access originally published online on October 28, 2004
Bioinformatics 2005 21(7):1084-1093; doi:10.1093/bioinformatics/bti108
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/7/1084    most recent
bti108v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (13)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Yang, Y. H.
Right arrow Articles by Segal, M. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Yang, Y. H.
Right arrow Articles by Segal, M. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2004. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

Identifying differentially expressed genes from microarray experiments via statistic synthesis

Yee Hwa Yang 1,{dagger}, Yuanyuan Xiao 2,{dagger} and Mark R. Segal 3,*

1Departments of Medicine, Center for Bioinformatics and Molecular Biostatistics, University of California San Francisco, CA 94143, USA
2Department of Biopharmaceutical Sciences, Center for Bioinformatics and Molecular Biostatistics, University of California San Francisco, CA 94143, USA
3Department of Epidemiology and Biostatistics, Center for Bioinformatics and Molecular Biostatistics, University of California San Francisco, CA 94143, USA

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 1 INTRODUCTION
 2 EXISTING METHODS FOR...
 3 DIFFERENTIAL EXPRESSION VIA...
 4 ILLUSTRATIVE EXAMPLES
 5 DISCUSSION
 REFERENCES
 

Motivation: A common objective of microarray experiments is the detection of differential gene expression between samples obtained under different conditions. The task of identifying differentially expressed genes consists of two aspects: ranking and selection. Numerous statistics have been proposed to rank genes in order of evidence for differential expression. However, no one statistic is universally optimal and there is seldom any basis or guidance that can direct toward a particular statistic of choice.

Results: Our new approach, which addresses both ranking and selection of differentially expressed genes, integrates differing statistics via a distance synthesis scheme. Using a set of (Affymetrix) spike-in datasets, in which differentially expressed genes are known, we demonstrate that our method compares favorably with the best individual statistics, while achieving robustness properties lacked by the individual statistics. We further evaluate performance on one other microarray study.

Availability: The approach is implemented in an R package called DEDS, which is available for download from the Bioconductor website (http://www.bioconductor.org/).

Contact: mark{at}biostat.ucsf.edu


    1 INTRODUCTION
 TOP
 Abstract
 1 INTRODUCTION
 2 EXISTING METHODS FOR...
 3 DIFFERENTIAL EXPRESSION VIA...
 4 ILLUSTRATIVE EXAMPLES
 5 DISCUSSION
 REFERENCES
 
Microarrays have become increasingly common in biological and medical research. They enable the simultaneous study of thousands of genes and afford unprecedented ability to provide gene expression information on a whole genome level. There are several types of microarray technology including spotted arrays (DeRisi et al., 1996) and high-density oligonucleotide chips (Lockhart et al., 1996). The reader is referred to Schena (2000) and Bowtell and Sambrook (2003) for a more detailed introduction to the biology and technology underlying microarrays.

Microarray experiments generate large and complex multivariate datasets which present numerous data analytic challenges. These include, firstly, adjusting for the many sources of variability arising from probe and target preparation, array fabrication, and imaging; secondly, devising methods that are geared to the novel data structures—thousands of inter-related variables (genes), small sample sizes (arrays), and little or no replication; and lastly, where needed, accommodating multiple testing concerns. These complexities call for robust and integrated analysis tools to enhance reliability and efficiency.

A very common data analytic goal is the identification of important genes amongst the many for which expression measures have been obtained. Importance typically corresponds to association with a response of interest. When we are contrasting expression between different groups or conditions (i.e. the response is polytomous), such important genes are said to be ‘differentially expressed’ (DE). The task of identifying DE genes can be divided into two aspects: ranking and selection. Ranking requires specification of a statistic or measure which captures evidence for DE on a per gene basis. Selection requires specification of a procedure (e.g. stipulation of a critical value) for arbitrating what constitutes ‘significant’ DE. Ranking is fundamental and, arguably, is easier than selection. The primary importance of ranking arises from the fact that only a limited number of genes can be biologically validated from follow-up experiments, these being\linebreak necessary to affirm DE in view of the present maturity of microarray measurements. The goal of this paper is to develop and illustrate a novel approach to ranking and selection that integrates differing measures of DE.

The paper is organized as follows. Section 2 presents a brief overview of some recently proposed methods for identifying DE genes in multiple-array experiments. Our proposed approach, differential expression via distance synthesis (DEDS), is presented in Section 3. We apply the method to two different datasets and present results in Section 4. Finally, Section 5 discusses our findings, extensions and open questions.


    2 EXISTING METHODS FOR DETECTING DE GENES
 TOP
 Abstract
 1 INTRODUCTION
 2 EXISTING METHODS FOR...
 3 DIFFERENTIAL EXPRESSION VIA...
 4 ILLUSTRATIVE EXAMPLES
 5 DISCUSSION
 REFERENCES
 
The importance of developing new data analytic techniques to effectively identify DE genes is illustrated by the appreciable effort and literature dedicated to this area. We will overview several customized approaches to both aspects of DE detection with an emphasis on ranking.

2.1 Ranking genes
For simplicity, and without loss of generality, we focus on a dichotomous response; i.e. deal with two-group comparisons. We designate the groups as treatment (T) and control (C). One may consider the question of ranking genes in terms of DE in a discriminant analysis framework, i.e. consider DE genes as ‘features’ or ‘variables’ that best separate groups T and C (Ghosh, 2003). However, such approaches are focused on class prediction, and the rankings of genes so obtained are strongly influenced by between-gene dependencies and feature-selection strategies, so that individual gene DE is at best a by-product. For this reason, we limit our discussion to more direct approaches which assess differences between distinct groups on a single-gene basis, and we focus on those methods that we subsequently apply and describe the classes of statistics that they represent.

For two-channel competitive hybridization experiments, we assume that the comparisons of log-ratios are all indirect; that is we have nT arrays in which samples from group T are hybridized against a reference sample R, giving nT log-ratios ; i = 1, ...,nT, and similarly we get nC log-ratios ; J = 1, ..., nC from group C. For Affymetrix oligonucleotide array experiments, we have nT chips with gene expression measures from group T and nC chips with gene expression measures from group C.

2.1.1 Fold changes and t-statistics
The simplest gene ranking method is based on the fold change (FC) (i.e. ratio) in expression means between the two groups. Thus, for each gene, we compute the difference between (log) means FC = TC, where and similarly for C. While use of FC is conceptually appealing, ranking genes based on fold change alone implicitly assigns equal variance to every gene.

In contrast, the t-statistic defined as

where sp is the pooled standard deviation, takes into account differing gene-specific variation across arrays. Use of t-statistics is also widespread in assessing DE (Dudoit et al., 2002). However, as indicated next, some care is needed in accommodating variation.

2.1.2 Penalized statistics
The primary shortcoming of using t-statistics for ranking genes lies in the unstable variance estimates that arise when sample size is small. Such small sample sizes are common occurrences in microarray experiments due to high costs and/or resource (RNA) limitations. Relatedly, with tens of thousands of t-statistics (corresponding to tens of thousands of genes) there is frequently a number of large t-statistics driven by very small denominator standard deviations (sp's), even though their numerators, measuring expression differences TC, may also be small. To overcome these shortcomings a variety of approaches have been proposed to provide a more reliable variance estimate; they can be categorized into two groups. The first group consists of variance stabilizing functions, and the second group contains error fudge factors and Bayesian methods. The former seeks to decouple the mean–variance dependency by modeling the variance of the expression of a gene as a function of the mean expression of the gene (Rocke and Durbin, 2001; Huber et al., 2002; Jain et al., 2003). The latter regularizes the t-statistics by inflating their denominators:

There are differing ways of motivating such penalization and, accordingly, of estimating the penalty parameter a. What unifies these approaches is the (sometimes implicit) desire to utilize between-gene information rather than relying solely on individual (within)-gene information as afforded by t-statistics. What distinguishes them is the formalism invoked in applying information sharing. At one end are somewhat ad hoc methods that avoid modeling between-gene relationships such as significance analysis of microarrays (SAM) (Tusher et al., 2001). At the other are Bayesian approaches (Newton et al., 2001) with empirical Bayes methods along the way [B-statistics; Lönnstedt and Speed (2001)]. Smyth (2004) extends and resets the hierarchical Bayesian model of Lönnstedt and Speed (2001) in the context of general linear models, and uses the moderated (penalized) t- (or F-) statistics for inference about contrasts of biological interest.

2.1.3 Mixture models
These Bayes and empirical Bayes approaches are arrived at via mixture models: it is postulated that we have a mixture of non-DE and DE genes with the latter group sometimes refined into up- and down-regulated components. Often mixing of these components is done at the level of one-dimensional, gene-specific summary statistics (Efron et al., 2001; Lee et al., 2000; Pan et al., 2002; Allison et al., 2002; Ghosh, 2004, http://bioinformatics.oupjournals.org/cgi/reprint/bth139v1). Newton et al., 2004 argue that efficiency and sensitivity gains may be realized by effecting mixing on the gene expression (mean) values themselves, rather than on derived summaries. They develop so-called semi-parametric hierarchical mixture models (SHMMs) that have a number of features. Firstly, they are flexible (non-parametric) where data are rich (lots of genes) and parametric where data paucity mandates assumption-making (few replicates per gene). Secondly, the provision of per gene posterior probabilities of DE allows a straightforward approach to calibration in terms of false discovery rates (FDRs) (see below). Arguably, the main limitation surrounding the SHMM approach is the adequacy of the parametric model. Recognizing this, Newton et al., 2004 provide graphical diagnostics (stratified QQ plots) for assessment of their chosen (gamma) model with their accompanying software.

2.1.4 Linear (mixed) models
In an effort to explicitly accommodate factors systematically impacting microarray gene expression values, a number of linear model/ANOVA based approaches have been advanced. Kerr et al., 2000 use fixed effect ANOVA models for logged intensities (single channel measurements) including terms for dye, array, treatment and gene main effects, as well as select interactions between these factors. By assuming a common variance across genes, a pooled (over genes) analysis is enabled, with large attendant increases in degrees of freedom for error variance estimation. However, there will likely be appreciable erosion of these degrees of freedom in order to accommodate interaction terms needed to ‘recover’ the adjustments afforded by use of log-ratios in two channel settings. To overcome the (strong) common variance assumption, Jin et al. (2001) and Wolfinger et al. (2001) adopt mixed model ANOVA formulations, performing gene-by-gene analyses treating factors such as dye and array as random effects.

2.2 Ascribing significance
Subsequent to gene ranking comes selection—the task of declaring which genes are significantly differentially expressed. Informal approaches include graphical examination of the ranking statistics via QQ plots, whereas more formal approaches involve testing suitably constructed (joint) null hypotheses of equal expression (non-DE). Such joint formulations are mandated by the multiple testing concerns that follow from evaluating thousands of genes. Two main approaches to this problem have emerged. One seeks to control family-wise type I error rates (Dudoit et al., 2002; Ge and Dudoit, 2002, http://lib.stat.cmu.edu/R/CRAN), using step-down Bonferroni correction Westfall and Young, 1993). The other (Efron et al., 2000; Tusher et al., 2001; Storey, 2002) extends the FDR ideas of Benjamini and Hochberg (1995). Detailed discussion and comparisons of competing approaches to multiple testing correction are deferred to Dudoit et al. (2003) and Storey et al. (2002).


    3 DIFFERENTIAL EXPRESSION VIA DISTANCE SYNTHESIS
 TOP
 Abstract
 1 INTRODUCTION
 2 EXISTING METHODS FOR...
 3 DIFFERENTIAL EXPRESSION VIA...
 4 ILLUSTRATIVE EXAMPLES
 5 DISCUSSION
 REFERENCES
 
It is immediately apparent from Section 2.1 that there are numerous statistics available for ranking genes as a prelude to arbitrating DE. Furthermore, selecting the best statistic or ordering statistics in terms of merit is problematic due to the complexity of the variability structure present in all microarray data. Performance characteristics (e.g. efficiency, robustness) of two-sample statistics depend on data attributes such as underlying distribution(s) and nature and extent of outliers and/or contamination. To date, there has been no delineation of such attributes for microarray data, making pre-selection of the theoretically ‘best’ statistic problematic. Likewise, no comparisons across a sufficiently wide range of benchmark datasets have been undertaken. Hence, investigators must make rather arbitrary choices when deciding which ranking statistic to use. It is in part to eliminate some of this arbitrariness but primarily to borrow strength across related measures that we propose synthesizing DE ranking schemes.

A somewhat related approach is that of Pareto Fronts (PF) (Fleury et al., 2002, http://www.eecs.umich.edu/~hero/Preprints/eusipcogene.pdf). Briefly, by regarding the set of measures as defining a multivariate point cloud (points corresponding to a gene's vector of measures) and employing a standard (coordinate-wise) partial order, a set of ‘non-dominated’ (Pareto-optimal) genes is identified. In our context these could include genes ranked (for DE) very highly by one measure but very lowly by another.

We now sketch our approach to detecting DEDS. We also begin with the multivariate point cloud with a point corresponding to a gene's vector of DE measures. Rather than employing partial order, we exploit the fact that all measures are attempting to capture DE and synthesize (reduce to a scalar) as follows. Firstly, we define an ‘extreme point’. Without loss of generality, assume that large values of all measures indicate DE. Then the extreme point is a vector of coordinates, each of which is the overall maximum of both observed and permuted values of that measure. Note that the extreme point may not correspond to any observed point. Next, the distance from all points to the extreme is computed. Intuitively, those points closest to the extreme are most likely to correspond to DE genes, but, we need to calibrate ‘close’. This is done by generating null referent distributions analogously to the methods Tibshirani et al., 2001 devised for calibrating gap statistics. Figure 1 provides a graphical illustration of the motivation behind DEDS. Specifically, panel (b) represents a common occurrence where the concordance between two DE measures (M1 and M2 in the plot) is relatively low [compared to panel (a)]. We hope by measuring the distance of spots to the ‘extreme point’ in the direction of DE and therefore taking both measures into consideration, the rankings of the genes that show discord between measures (blue spots in Fig. 1b) will be lowered to a certain extent. The steps for calculating DEDS and selecting DE genes are detailed in Box 1.



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 1 A graphical representation of the motivation behind DEDS. Plotted are the multivariate point cloud with a point corresponding to a gene's vector of two related DE measures, M1 and M2. E is the "extreme origin" in the direction of DE. Two scenarios are presented: when M1 and M2 show strong (panel (a)) or weak (panel (b)) concordance. Black spots are genes that are ranked highly by both measures and gray spots are genes that are ranked highly by one measure but lowly but the other.

 

Box 1. DEDS—permutation-based algorithm

Calculating DEDS

  1. Apply j = 1, ..., J appropriate (DE measuring) statistics tj to each of i = 1, ..., N genes in the target dataset yielding values tij. Without loss of generality, assume larger values indicate increased DE. Let the (observed) coordinate-wise extreme point be E0 = (maxi(ti1), ..., maxi(tiJ)).
  2. Locate the overall (observed, permutation) extreme point E:
    1. Obtain b = 1, ..., B permuted datasets by randomly assigning nT arrays to class ‘T’ and nC arrays to class ‘C’. For each permuted dataset recalculate the J DE statistics for each of the N genes yielding and store the results (see below). Obtain the corresponding coordinate-wise maximum as above: .
    2. Obtain the coordinate-wise permutation extreme point Ep by maximizing over the B permutations: Ep = (maxb(Eb1), ..., maxb(EbJ)).
    3. Obtain E as the overall maximum: E = max(Ep, E0).

  3. Calculate a distance d from each gene to E. For example, one choice for a scaled distance is

    where MAD is the median absolute deviation from the median. Order the distances: d(1) ≤ d(2) ≤ ... ≤ d(N).

Assessing DE significance

  1. Using the stored statistics for each permuted dataset b, analogously compute distances for each gene. Order the distances: .
  2. For the i-th gene as ordered by the original d, define ‘falsely called genes’ for each of the B sets of permutations as those genes, whose satisfy: , j = 1, ..., N. Compute the median number of falsely called genes among the B sets of permutations.
  3. The q-value that controls FDR for the i-th ordered gene is computed as the median of the number of falsely called genes divided by the number of genes called significant (i).

 


    4 ILLUSTRATIVE EXAMPLES
 TOP
 Abstract
 1 INTRODUCTION
 2 EXISTING METHODS FOR...
 3 DIFFERENTIAL EXPRESSION VIA...
 4 ILLUSTRATIVE EXAMPLES
 5 DISCUSSION
 REFERENCES
 
We evaluate the performance of our proposed method, DEDS, on two diverse datasets, each featuring a number of sub-studies. Both two-channel spotted arrays and one-channel Affymetrix arrays are represented, as are situations with minimal and considerable DE anticipated. Furthermore, the inclusion of a spike-in experiment permits assessment in the rare setting where a gold standard (a set of known DE genes) is available.

4.1 Study 1: Affymetrix spike-in experiment
4.1.1 Data description
The spike-in experiment represents a portion of the data used by Affymetrix to develop their MAS 5.0 preprocessing algorithm. Here we utilize both MAS 5.0 and RMA (Irizarry et al., 2003) probe level summaries in order to showcase a robustness property of DEDS. The data features 14 human genes spiked-in at a series of 14 known concentrations (0, 2–2, 2–1, ..., 210 pM) according to a Latin square design including 12 612 null genes. Each ‘row’ of the Latin square (given spike-in gene at a given concentration) was replicated (typically three times, two rows 12 times, 59 arrays in total). Further details are available at http://www.affymetrix.com/analysis/download_center2.affx. We analyze this dataset as 91 () pairs of two-sample comparisons corresponding to all pairwise comparisons of differing concentrations.

4.1.2 Results
The use of spike-in allows computation of receiver operating characteristic (ROC) curves since we know which genes are DE and which are null. For each two-sample comparison, we compute five different DE measures: FC, t-statistic, a moderated t-statistic that coincides with the B-statistic (Lönnstedt and Speed, 2001) SAM with the standard deviation penalty a taken as the median (over all genes) within gene standard deviation and our synthesized measure (DEDS) based on the first four statistics.

Thus, we obtain 91 ROC curves for each of the five measures. ROC curves are created by plotting the true positive rates versus false positive rates. The summary ROC curves as presented in Figure 2 are just averages of the 91 individual curves. The two panels of Figure 2 correspond to differing approaches for probe level summarization: panel (a) uses RMA while (b) uses MAS 5.0. Contrasting panels, the ensemble of ROC curves affirms the findings of the RMA developers (Irizarry et al., 2003) that RMA summaries improve accuracy without sacrificing precision, relative to MAS 5.0. However, more important from the present standpoint, is that even though for each approach to probe level summarization there is one measure that performs relatively poorly (t-statistics for RMA, FC for MAS 5.0), in both settings DEDS performed well. Thus, by utilizing several DE measures in the analysis of the spike-in experiment, DEDS is able to not only perform competitively with the best, but is also not adversely affected by the worst. And, of course, in practice we do not know a priori which measures will be suitable, as the spike-in data exemplifies.



View larger version (13K):
[in this window]
[in a new window]
 
Fig. 2 Comparisons of different DE measures using ROC curve for Affymetrix spike-in experiment. (a) Use RMA probe summary. (b) Use MAS 5.0 probe summary.

 
Furthermore, two array groups among the 14 array groups contain 12 replicates. This subset of data were used to evaluate the ability of DEDS in determining the correct cutoffs for declaration of differential expression. We apply the permutation procedure described in Box 1 to simulate the DEDS reference distribution and thereby estimate the number of DE genes. Controlling the FDR at 0.01, 16 genes are found to be DE, 11 of which are among the 14 true DE genes. The top 20 DE genes have 13 of the 14 genes selected. The gene that is not detected is the most ‘difficult’ one, since it is spiked in at the two lowest concentrations (0 and 0.25 pM). This represents an extremely low log-ratio/log-intensity for microarray-based detection. No other DE-statistic was able to detect this gene.

4.2 Study 2: Spt splicing (SPT) experiment
4.2.1 Data description
The SPT experiment consists of 22 two-channel spotted oligonucleotide splicing arrays (Clark et al., 2002). The primary goal of this experiment is to investigate the roles of eukaryotic chromatin elongation factors, Spt4–Spt5, in splicing (Hartzog et al., 1998). We have analyzed a spt4 null mutation spt4{Delta}, and three partial loss-of-function spt5 mutations, spt5–194, spt5–242 and spt5–4. In addition, we include analysis of ceg1–250, a temperature-sensitive mutation that causes rapid inactivation of the capping enzyme at non-permissive temperature (Fresco and Buratowski, 1996). Two independent mRNA samples are prepared for each mutant and a pair of dye-swap experiments are performed for each mRNA sample. Figure 3 shows a graphical representation of the actual experiment which includes four hybridizations between each mutant (mut) and wild-type (wt) and an additional two self–self hybridizations of the wild-type.



View larger version (11K):
[in this window]
[in a new window]
 
Fig. 3 Study design of the SPT experiment. In this representation, vertices correspond to target mRNA samples and edges to hybridizations between two samples. By convention, we place the green-labeled sample at the tail and the red-labeled sample at the head of the arrow.

 
To distinguish between spliced and unspliced transcripts for intron-containing yeast genes, oligonucleotide probes on these arrays were designed to detect splice junctions (SJ), introns (Int) and second exons (Ex) of spliced genes. After normalization, array measurements are summarized into two indices that capture splicing alterations (Clark et al., 2002). The intron accumulation (IA) index assesses the change (relative to wt) of the intron probe signal in the mutants, normalized by the change in the corresponding exon. This is defined as

Similarly, the splice junction (SJ) index, defined as

measures a similarly normalized gain or loss of the splice junction probe signal in the mutants. A total of 254 SJ indices and 263 IA indices are formed.

4.2.2 Models
To effectively analyze the splicing data, we apply five competing statistical models for evaluating DE and the synthesized measure DEDS. The nested experimental design of the splice mutant study shown in Figure 4 motivates the use of four different mixed ANOVA models described next. Furthermore, the limited amount of replication coupled with the small number of genes makes the SHMM (Newton et al., 2004) appealing for conceptual reasons. Thus, we apply five individual statistics separately and identically to the SJ and IA indices and subsequently investigate their fusion via DEDS.



View larger version (12K):
[in this window]
[in a new window]
 
Fig. 4 Nested study design of the SPT experiment. Each mutant (M) has two samples (S) and each sample is hybridized to two arrays (A). Therefore, A is nested in S and S is in turn nested in M.

 
Distinguished by including wild-type self-hybridizations or not, DE assessment can be pursued by either one-sample (four SJ or IA mutant indices per gene corresponding to the four replicate hybridizations) or two-sample (four SJ or IA mutant indices versus two SJ or IA wild-type indices) comparisons. A more detailed discussion of the difference between the direct (one-sample) and indirect (two-sample) comparisons can be found in (Speed and Yang, 2002). In addition to the above two approaches, we also consider another two approaches distinguished by allowing gene-specific variance heterogeneity or not. This latter case imposes the assumption that all genes exhibit a similar degree of variability and so can be jointly analyzed using a common estimate of error variance. This pooling dramatically increases error degrees of freedom (df). The former approach, on the contrary, does not impose the common variance assumption, allowing different variances for different genes. The resulting model is then fitted gene by gene. So, we have a 2 x 2 factorial of approaches, indicated by models I–IV in Table 1).


View this table:
[in this window]
[in a new window]
 
Table 1 A summary of the five competing DE models and the estimated number of DE genes from each mode

 
In addition to these four models, we also employ Newton et al. (2004) SHMM (model V in Table 1; see Section 2). Further details concerning the fitting of the five different models and the comparisons are provided in Xiao et al. (unpublished data).

4.2.3 Results
Figure 5 displays a scatter plot matrix of –log10(p), where p either corresponds to the model I–IV unadjusted p-value for tests of DE or to the model V posterior probability for non-DE for mutants ceg1–250 [panel (a)] and spt4{Delta} [panel (b)]. Note that by relating model I–IV results to model V results we may seemingly be perpetuating the ‘severe pedagogical problem of misinterpreting p-values as posterior probabilities’ (Berger et al., 1997). However, this is not the case. At no stage do we make probabilistic statements in terms of these quantities. Rather, they simply constitute a quantification of DE. The large number of DE genes anticipated for mutant ceg1–250 is evident in Figure 5a (note the scales) and we observe high correlations between the five models. By contrast, minimal DE is anticipated for spt4{Delta} and results from the different models show this along with much lower agreement [panel (b)]. The fact that model V conforms more closely to the homoscedastic models (I and III) than to the heteroscedastic models (II and IV) is not surprising, since the SHMM utilizes information sharing between genes which is absent for the gene-specific heteroscedastic models.



View larger version (20K):
[in this window]
[in a new window]
 
Fig. 5 Pairwise scatter plots for the five models from a subset of the SPT experiment: (a) mutant ceg1–250 and (b) mutant spt4{Delta}. The correlation coefficients of –log10p between corresponding models are shown in the lower-triangle grids.

 
Here is a situation where there is no clear advantages of one model over the others. Therefore, rather than trying to arbitrate between models and pick a single model on which to base DE rankings and declarations, or informally distilling sets of genes that are DE under two or more models, we employ DEDS as a robust means for synthesizing results and compare its performance with individual models.

As the sample sizes are too small (4 versus 2 for the two-sample statistics) to employ an effective permutation scheme, we elect to use p-values for the calculation of DEDS distances. The observed distances are then calibrated against expected values under the referent null distribution, which is simulated by drawing from marginal uniform variates with correlation structure conforming to the observed data (Tibshirani et al., 2001). The algorithm is further motivated analogously to the mixture model approaches described in Section 2.1 but on the p-value scale, with non-DE corresponding to uniformity. Figure 6a and c shows histograms of the p-values from model I applied on ceg1-250 SJ indices and p-values from the Affymetrix spike-in experiment (see Section 4.1). The dashed lines are the frequency we would expect when all genes were null. As can be seen, there are many DE genes in (a) but minimal DE genes in (c). Further details of the algorithm are provided in Supplementary Data A.



View larger version (21K):
[in this window]
[in a new window]
 
Fig. 6 (a) Histograms and (b) QQ plot of p-values from the mutant ceg1–250 in the SPT experiment. (c) Histogram and (d) QQ plot of p-values from the Affymetrix spike-in experiment where all but 14 genes remain unchanged. When no gene is expected to be differentially expression, the distribution of p-values is uniform, shown as dashed line in all panels.

 
Critical to ascribing DE is appropriate specification of cutoffs. Table 1 compares the numbers of genes achieving significance under FDRs of 0.01 and 0.05 for the five models and DEDS for mutant ceg1–250. Immediately striking are the differences in numbers of genes declared DE by the different models. While this can be attributed in part to differing operating characteristics, it serves to showcase how sensitive results are to statistic choice.

Here, evaluating differences in DE determinations by the different models is problematic since we have no gold standard and, unlike the spike-in study, we do not know which genes are truly DE. However, as DEDS synthesizes over individual statistics, we believe its rankings of genes to be more ‘robust’ than single measures. To examine this, we have defined and compared the characteristics of the top DE genes for ceg1–250 SJ indices by DEDS and the five individual DE models. As mentioned, ceg1–250 is known to profoundly affect splicing and accordingly we treat the top 133 genes from each model as ‘significantly’ DE. While this number is somewhat arbitrary, the flavor of the results below is not overly sensitive to this specification. We then classify genes into three major groups. Group I consists of DE genes by DEDS. Group II contains non-DE genes by all six models. Group III is comprised of genes that are non-DE by DEDS but DE by one or more single measures and genes in these groups are further separated into different classes as illustrated in Figure 7a. To aid comparisons between genes of different groups, we display three-dimensional scatter plots between different models. Plotted on all axes are –log10p of the corresponding models. Group I genes, represented by black spots, illustrate good concordance among DE models; whereas group III genes, represented as colored numbers, lie mainly off diagonal, indicating that such genes are ranked higher in one measure than the others. Thus, by ascribing high rankings to genes that exhibit agreement on DE among different measures and low rankings to genes that demonstrate discord among related measures, DEDS arguably provides a more robust gene ranking.



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 7 Relationship among differing DE models for the mutant ceg1-250 (SJ indices) in the SPT experiment. Black spots are DE genes by DEDS and gray spots are non-DE genes by all measures including DEDS. Numbers represent non-DE genes by DEDS but are DE genes by one or more of the five models. (a) Coding for the numbers in (b) and (c): 1 and 0 denote DE and non-DE respectively; (b) Scatter plot of –log10p values from DE models I, III and IV; (c) Scatter plot of –log10p values from DE models II, III and IV.

 

    5 DISCUSSION
 TOP
 Abstract
 1 INTRODUCTION
 2 EXISTING METHODS FOR...
 3 DIFFERENTIAL EXPRESSION VIA...
 4 ILLUSTRATIVE EXAMPLES
 5 DISCUSSION
 REFERENCES
 
In this paper, we have reviewed various statistical methods for the identification of DE genes in replicated microarray experiments. Additionally, we have advanced a novel method (DEDS) for this purpose. The DEDS algorithm synthesizes statistics or methods that estimate the same quantity of interest. The underlying principle behind DEDS is that genes that are highly ranked by different measures are more likely to be truly differentially expressed than genes that rank highly on a single measure.

Consider three widely used measures as an example: FC, t and SAM. The major limitations surrounding FC and t are the ‘equal denominator’ and ‘small denominator’ problems, respectively. Concerns surrounding SAM include criteria for, and accuracy of, estimates for the penalty parameter a. Another problem is that with tens of thousands of statistics calculated, there is frequently a set of genes possessing large statistics for one measure only, these arising by chance and/or because of shortcomings associated with the measures. Such genes are likely to be ‘false positives’. The advantage offered by DEDS over single measures is that, by combining over measures, such false positives are ranked lowly and become ‘true negatives’. Therefore, the set of DE genes obtained via DEDS tends to be robust against limitations associated with individual measures.

The intuition behind DEDS simply draws on the concept of intersection, i.e. it attempts to select genes that are ranked highly on all measures. However, there is a clear distinction between DEDS and a simple intersection of results from individual measures, which treats the measures as independent. To further illustrate such differences, we use an additional dataset of 6320 genes for which, due to the nature of the comparison groups, we anticipate substantial DE. The data derives from a study of cardiomyopathy in transgenic mice as influenced by overexpression of a G protein-coupled receptor, Ro1 (details are provided in Redfern et al., 2000, while details on fitting of the various DE measures are available in Supplementary Data B). Figure 8 displays a volcano plot between two measures: FC and t-statistics for the Ro1 data. The categorization of genes here is similar to Figure 7. Black points are the top 905 genes selected by DEDS, and the gray ones are those failing to be among the extremes (top 905) of any measures. The horizontal and vertical reference lines are the cutoff values estimated using t-statistic (|t| ≥ 2.98) and FC (|FC| ≥ 0.66), respectively. Classes 1 (red), 2 (cyan) and 3 (blue) are genes that are extreme in single measures, FC, t or SAM, only and their occurrences are possibly due to limitations associated with each measure. A simple intersection treats all measures as independent; thus, it will select only the genes from areas A and B. On the contrary, DEDS extends the intersection idea by considering all DE measures simultaneously, implicitly taking their correlation into account, thereby giving rise to a less-stringent criteria and possibly a lower false negative rate.



View larger version (20K):
[in this window]
[in a new window]
 
Fig. 8 Volcano plot between FC and |t| for the Ro1 experiment (C versus T). Black spots are DE genes by DEDS and gray spots are non-DE genes by all measures including DEDS. Colored numbers represent non-DE genes by DEDS but are DE genes by one or more of the three DE measures. Coding for the colored numbers is provided in the table; 1 and 0 denote DE and non-DE respectively.

 
Natural questions that arise in using DEDS are choices of component DE measures and the distance metric. For the former, we have found that synthesizing t, FC and a penalized statistic such as SAM, gives good performance for all datasets we have analyzed using DEDS. To investigate whether including highly correlated measures increases variation and so erode the efficiency of DEDS, we applied DEDS on the Affymetrix spike-in RMA data by synthesizing two, three or all four measures from t, FC, SAM and moderated t. We have observed stable performance of DEDS as the obtained ROC curves for all combinations were largely overlapping. With regard to choice of distance metric, we recommend using a distance scaled according to variation of the component statistics so that DEDS results are not dominated by a single measure. In addition, even though we have demonstrated good performance of DEDS in the assessment of DE, it should be noted that sufficient replication is still the key in obtaining power and stability in analysis, and DEDS is not a panacea for poorly replicated experiments.

Finally, the DEDS method proposed in this paper is implemented in an R package (Ihaka and Gentleman, 1996) DEDS, which may be downloaded from the Bioconductor website (http://www.bioconductor.org/).


    Acknowledgments
 
We are grateful to Professor Grant Hartzog and Todd Burkin at the University of California, Santa Cruz, who have provided us the microarray data of the SPT experiment and have participated in many valuable discussions related to the analysis of the SPT data. We also thank the anonymous reviewers whose comments improved the quality of the paper. Y.X. wishes to acknowledge Professor C. Anthony Hunt at the University of California, San Francisco for funding support.


    Footnotes
 
{dagger}These two authors contributed equally to this work. Back

Received on May 21, 2004; revised on July 2, 2004; accepted on July 7, 2004

    REFERENCES
 TOP
 Abstract
 1 INTRODUCTION
 2 EXISTING METHODS FOR...
 3 DIFFERENTIAL EXPRESSION VIA...
 4 ILLUSTRATIVE EXAMPLES
 5 DISCUSSION
 REFERENCES
 

    Allison, D.B., Gadbury, G.L., Heo, M., Fernández, J.R., Lee, C.-K., Prolla, T.A., Weindruch, R. (2002) A mixture model approach for the analysis of microarray gene expression data. Comput. Statist. Data Anal., 39, 1–20.

    Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. B, 57, 289–300.

    Berger, J.O., Boukai, B., Wang, Y. (1997) Unified frequentist and Bayesian testing of a precise hypothesis. Statist. Sci., 12, 133–148[CrossRef].

    (Eds.). DNA Microarrays: A Molecular Cloning Manual, (2003) , Cold Spring, NY Cold Spring Harbor Press.

    Clark, T.A., Sugnet, C.W., Ares, M. (2002) Genomewide analysis of mRNA processing in yeast using splicing-specific microarrays. Science, 296, , pp. 907–910[Abstract/Free Full Text].

    DeRisi, J., Penland, L., Brown, P.O., Bittner, M.L., Meltzer, P.S., Ray, M., Chen, Y., Su, Y.A., Trent, J.M. (1996) Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nat. Genet., 14, 457–460[CrossRef][Web of Science][Medline].

    Dudoit, S., Shaffer, J.P., Boldrick, J.C. (2003) Multiple hypothesis testing in microarray experiments. Statist. Sci., 18, 71–103[CrossRef].

    Dudoit, S., Yang, Y.H., Callow, M.J., Speed, T.P. (2002) Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statist. Sinica, 12, 111–139.

    Technical Report. Efron, B., Tibshirani, R., Goss, V., Chu, G. (2000) Microarrays and their use in a comparative experiment. , Stanford Department of Statistics, Stanford University.

    Efron, E., Tibshirani, R., Storey, J., Tusher, V. (2001) Empirical Bayes analysis of a microarray experiment. J. Am. Statist. Assoc., 96, 1151–1160[CrossRef][Web of Science].

    Fleury, G., Hero, A., Yoshida, S., Carter, T., Barlow, C., Swaroop, A. (2002) Pareto analysis for gene filtering in microarray experiments. European Signal Processing Conference .

    Fresco, L.D. and Buratowski, S. (1996) Conditional mutants of the yeast mRNA capping enzyme show that the cap enhances, but is not required for, mRNA splicing. RNA, 2, , pp. 584–596[Abstract].

    Ge, Y. and Dudoit, S. (2002) Multiple testing procedures. R package.

    Ghosh, D. (2003) Penalized discriminant methods for the classification of tumors from gene expression data. Biometrics, 59, 992–1000[CrossRef][Web of Science][Medline].

    Ghosh, D. (2004) Mixture models for assessing differential expression in complex tissues using microarray data. Bioinformatics, Epub ahead of print.

    Hartzog, G.A., Wada, T., Handa, H., Winston, F. (1998) Evidence that spt4, spt5, and spt6 control transcription elongation by RNA polymerase ii in Saccharomyces cerevisiae. Genes and Dev., 12, 357–369[Abstract/Free Full Text].

    Huber, W., von Heydebreck, A., Sültmann, H., Poustka, A., Vingron, M. (2002) Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics, 1, 1–9.

    Ihaka, R. and Gentleman, R. (1996) R: a language for data analysis and graphics. J. Comput. Graph. Statist., 5, 299–314[CrossRef].

    Irizarry, R.A., Bolstad, B.M., Collin, F., Cope, L., Hobbs, B., Speed, T.P. (2003) Summaries of Affymetrix genechip probe level data. Nucleic Acids Res., 31, e15[Abstract/Free Full Text].

    Jain, N., Thatte, J., Braciale, T., Ley, K., O'Connell, M., Lee, J.K. (2003) Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays. Bioinformatics, 19, 1945–1951[Abstract/Free Full Text].

    Jin, W., Riley, R.M., Wolfinger, R.D., White, K.P., Passador-Gurgel, G., Gibson, G. (2001) The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster. Nat. Genet., 29, 389–395[CrossRef][Web of Science][Medline].

    Kerr, M.K., Martin, M., Churchill, G.A. (2000) Analysis of variance for gene expression microarray data. J. Comput. Biol., 7, 819–837[CrossRef][Web of Science][Medline].

    Lee, M.-L.T., Kuo, F.C., Whitmore, G.A., Sklar, J. (2000) Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. Proc. Natl Acad. Sci. USA, 97, 9834–9839[Abstract/Free Full Text].

    Lockhart, D.J., Dong, H.L., Byrne, M.C., Follettie, M.T., Gallo, M.V., Chee, M.S., Mittmann, M., Wang, C., Kobayashi, M., Horton, H. (1996) Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol., 14, 1675–1680[CrossRef][Web of Science][Medline].

    Lönnstedt, I. and Speed, T.P. (2001) Replicated microarray data. Statist. Sinica, 12, 31–46.

    Newton, M.A., Kendziorski, C.M., Richmond, C.S., Blattner, F.R., Tsui, K.W. (2001) On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J. Comput. Biol., 8, 37–52[CrossRef][Web of Science][Medline].

    Newton, M.A., Noueiry, A., Sarkar, D., Ahlquist, P. (2004) Detecting differential gene expression with a semiparametric hierarchical mixture method gene expression profiles. Biostatistics, 5, 155–176[Abstract].

    Pan, W., Lin, J., Le, C. (2002) How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach. Genome Biol., 3, 0022.1–0022.10.

    Redfern, C.H., Degtyarev, M.Y., Kwa, A.T., Salomonis, N., Cotte, N., Nanevicz, T., Fidelman, N., Desai, K., Vranizan, K., Lee, E.K., et al. (2000) Conditional expression of a gi-coupled receptor causes ventricular conduction delay and a lethal cardiomyopathy. Proc. Natl Acad. Sci. USA, 97, 4826–4831[Abstract/Free Full Text].

    Rocke, D.M. and Durbin, B. (2001) A model for measurement error for gene expression arrays. J. Comput. Biol., 8, 557–570[CrossRef][Web of Science][Medline].

    (Ed.). Microarray Biochip Technology, (2000) , NATICK, MA Eaton.

    Smyth, G.K. (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statist. Appl. Genet. Mol. Biol., 3, Article 3, http://www.bepress.com.sagmb.

    Speed, T.P. and Yang, Y.H. (2002) Direct and indirect hybridizations for cDNA microarray experiments. Sankhya Indian J. Statist. A, 64, , pp. 706–720.

    Storey, J.D. (2002) A direct approach to false discovery rates. J. R. Statist. Soc. B, 64, 479–498[CrossRef].

    Technical Report 623. Storey, J.D., Taylor, J.E., Siegmund, D. (2002) A unified estimation approach to false discovery rates. , Berkeley, CA Department of Statistics, University of California.

    Tibshirani, R., Walther, G., Hastie, T. (2001) Estimating the number of clusters in a dataset via the gap statistic. J. R. Statist. Soc. B-Statist. Methodol., 63, 411–423[CrossRef].

    Tusher, V., Tibshirani, R., Chu, G. (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA, 98, 5116[Abstract/Free Full Text].

    Westfall, P.H. and Young, S.S. Resampling-based Multiple Testing: Examples and Methods for p-value Adjustment, (1993) , New York John Wiley and Sons.

    Wolfinger, R.D., Gibson, G., Wolfinger, E.D., Bennett, L., Hamadeh, H., Bushel, P., Afshari, C., Paules, R.S. (2001) Assessing gene significance from cDNA microarray expression data via mixed models. J. Comput. Biol., 8, , pp. 625–638[CrossRef][Web of Science][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
Y. Saeys, I. Inza, and P. Larranaga
A review of feature selection techniques in bioinformatics
Bioinformatics, October 1, 2007; 23(19): 2507 - 2517.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
X. Liu, M. Milo, N. D Lawrence, and M. Rattray
Probe-level measurement error improves accuracy in detecting differential gene expression
Bioinformatics, September 1, 2006; 22(17): 2107 - 2113.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Ploner, S. Calza, A. Gusnanto, and Y. Pawitan
Multidimensional local false discovery rate for microarray studies
Bioinformatics, March 1, 2006; 22(5): 556 - 565.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/7/1084    most recent
bti108v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (13)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Yang, Y. H.
Right arrow Articles by Segal, M. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Yang, Y. H.
Right arrow Articles by Segal, M. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?