Bioinformatics Advance Access originally published online on March 3, 2008
Bioinformatics 2008 24(8):1056-1062; doi:10.1093/bioinformatics/btn053
Robustified MANOVA with applications in detecting differentially expressed genes from oligonucleotide arrays
1Department of Statistics, East China Normal University, Shanghai 200241, China and 2Department of Statistics, University of California Riverside, Riverside 92521, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Oligonucleotide arrays such as Affymetrix GeneChips use multiple probes, or a probe set, to measure the abundance of mRNA of every gene of interest. Some analysis methods attempt to summarize the multiple observations into one single score before conducting further analysis such as detecting differentially expressed genes (DEG), clustering and classification. However, there is a risk of losing a significant amount of information and consequently reaching inaccurate or even incorrect conclusions during this data reduction.
Results: We developed a novel statistical method called robustified multivariate analysis of variance (MANOVA) based on the traditional MANOVA model and permutation test to detect DEG for both one-way and two-way cases. It can be extended to detect some special patterns of gene expression through profile analysis across k (
2) populations. The method utilizes probe-level data and requires no assumptions about the distribution of the dataset. We also propose a method of estimating the null distribution using quantile normalization in contrast to the pooling method (Section 3.1). Monte Carlo simulation and real data analysis are conducted to demonstrate the performance of the proposed method comparing with the pooling method and the usual Analysis of Variance (ANOVA) test based on the summarized scores. It is found that the new method successfully detects DEG under desired false discovery rate and is more powerful than the competing method especially when the number of groups is small.
Availability: The package of robustified MANOVA can be downloaded from http://faculty.ucr.edu/~xpcui/software
Contact: xinping.cui{at}ucr.edu; jxu{at}stat.ecnu.edu.cn
| 1 INTRODUCTION |
|---|
|
|
|---|
High-density oligonucleotide arrays are revolutionizing biological research by making it possible to simultaneously study the expression of thousands of genes in one experiment. In this article, we focus on the Affymetrix GeneChip platform (Lockhart et al., 1996), which uses multiple (11–16) probes, or a probe set, to assay the expression level of every gene of interest.
Recently, some methods have been developed to summarize the multiple observations from one probe set into one single score, and then use it to do further statistical analysis such as detecting differentially expressed genes (DEG), clustering and classification. Some representative methods include MAS 5 (Hubbel et al., 2002), RMA (Irizarry et al., 2003a, b), dChip (Li and Wong, 2001), GCRMA (Wu et al., 2004) and PLIER (Affymetrix Tech Note, 2005) among others. Carefully designed experiments have been carried out to examine the performance of these different summarization methods (Choe et al., 2005; Cope et al., 2004). When different summarization methods were applied to the same datasets, the results of the statistical analysis showed marked differences (Choe et al., 2005). These studies conclude that none of the summarization method is superior to the others under all circumstances. The summarization process, by its nature, runs the risk of losing a significant amount of information (by compressing a set of measurements into one number) and may lead to inaccurate and incorrect conclusions (Dallas et al., 2005; Efron et al., 2001). We reason that a method which maintains all the probe-level information might achieve superior performance.
One primary goal of a microarray-based expression analysis experiment is to detect differentially expressed genes. The traditional statistical methods for this kind of analysis have been well established [e.g., Analysis of Variance (ANOVA), Multivariate Analysis of Variance (MANOVA) models] under the assumptions that the observations are normally distributed and the number of replicates is greater than the number of unknown parameters (Anderson, 1984). However, both assumptions, especially the latter, are usually violated in the microarray studies as one will typically have a handful of replicates but thousands of genes at one time.
We present here a new method called robustified MANOVA to detect DEG for both one-way and two-way cases (Section 3). Our method utilizes probe-level data directly and requires no assumptions about the distribution of the dataset. It is based on some new test statistics and a new method to estimate its null distribution through permutation and quantile normalization under multiple testing. The method was tested on various kinds of simulated datasets of different models, sample sizes, correlation structures, etc. and on some benchmark datasets as well. The results consistently show that the proposed method is superior to the competing methods in terms of false discovery rate (FDR) and power (Section 4). Some useful discussions and extension are given in Section 5.
| 2 MATERIALS AND EXPERIMENTAL DESIGN |
|---|
|
|
|---|
2.1 Background
Affymetrix GeneChips are designed to measure every gene with 11–16 probe pairs. Each probe pair consists of a perfect match (PM) probe matching 25-base long subsequence of a gene and a mismatch (MM) probe that is identical to the PM probe except that the middle (13th) base is converted to its complement according to WatsonCrick base pairing. The MM probe is designed to distinguish background noise from signal. In this article, we will use only PM values for analysis. The incorporation of MM probes will be discussed in Section 5.4.
2.2 Data
2.2.1 Spike-in data
The first data used in this article is publicly available from a Latin-square spike-in study performed on Affymetrix Hgu133a GeneChip. It provides a benchmark to evaluate different analysis tools since the true expression levels of the spike-in transcripts are known in advance. We briefly describe one of them in Table 1, where 30 probe sets (from about 27 000 probe sets and excluding 12 Affymetrix control probe sets) are spiked in with different amounts of synthesized mRNA. The concentration of mRNA are given in consecutive fold changes (i.e. 21, 22, ..., 210). The groups are defined based on different combinations of the spike-in amounts (Table 1). And every group has three replicated samples. We refer readers to Cope et al. (2004) or www.affymetrix.com/support/datasets.affx for a complete description.
|
2.2.2 Estrogen data
The second data is partly from an experiment performed on Affymetrix Hgu95av2 GeneChip studying the estrogen effect over time on cells from an estrogen receptor positive (ER+) human breast cancer cell line (MCF–7). Two factors (estrogen and time) both have two levels (estrogen: absent or present, time: 10 h or 48 h). For every combination of factors, two samples were collected. The dataset is available from http://bioconductor.org/packages/2.0/data/experiment. See more details in Scholten et al. (2004).
| 3 METHODS |
|---|
|
|
|---|
In this section, we present the robustified MANOVA method for both one-way and two-way analysis.
3.1 One-way robustified MANOVA
Let
(
= 1, ..., nj) be independent identically distributed (i.i.d.) samples from p-dimensional multivariate population
with mean vectors
and covariance
(j = 1, ..., k). For example, in the microarray study, k represents the number of groups and p represents the number of probes. The general framework of one-way MANOVA tests the hypothesis
|
| (1) |
statistic (Johnson and Wichern, 2002), we first define a robust version of within-group error matrix and between-group error matrix, respectively, as follows,
|
| (2) |
|
| (3) |
The general distribution of G is unknown. Neither are the distributions of
. However in many cases we can assume the populations of
differ only by a location shift, i.e.
are of the same family of distribution. Then we can estimate the null distribution through permutation method (Good, 1994), which is an effective technique in non-parametric testing. The procedure is outlined as follows.
- (S1) [Relabelling] From a pooled sample of size n (dropping all group labels), rearrange the labels to form k groups with sizes n1, ..., nk. Compute the test statistic for the relabelled groups.
- (S2) [Repetition] Repeat S1 for M times of different relabelling, and denote the statistics as {G(s), s = 1, ..., M}. In case M is very large, a fixed number M*, say 1000, is used instead. See more discussion of M in Section 5.2.
- (S3) [P-value] Determine the empirical P-value for the original statistic by comparing it to the permutation distribution, obtaining

- (S2) [Repetition] Repeat S1 for M times of different relabelling, and denote the statistics as {G(s), s = 1, ..., M}. In case M is very large, a fixed number M*, say 1000, is used instead. See more discussion of M in Section 5.2.
The above are the typical steps of a permutation test for one subject or one gene in our data analysis. When testing the hypothesis for tens of thousands of genes simultaneously, i.e.
|
| (4) |
(S3*) [Normalization and P-value] Denote the distribution of the test statistics for all genes under the sth permutation by
. Apply quantile normalization to all M such distributions to obtain a normalized distribution, denoted by
. More specifically, let
be the sorted
and let
be the sorted
, then
. The empirical P-value for the original Gg is determined by
|
|
3.2 Two-way robustified MANOVA
Let
(
= 1, ..., nij) be i.i.d. samples from multivariate population
with mean vectors
and common covariance
(i = 1, ..., a, j = 1, ..., b), where a and b are, respectively, the number of levels of factor A and factor B. The two-way MANOVA models the mean vector by
|
|
|
|
We modify the error matrices of the traditional MANOVA analysis and define the robustified within-group error matrix, between-group error matrix and interaction error matrix, respectively, as follows.
|
| (5) |
j nij, n·j =
i nij, |
| (6) |
| 4 RESULTS |
|---|
|
|
|---|
4.1 One-way robustified MANOVA
We first examine the performance of the proposed method through simulated data. We choose three models by letting the components of
2 variable with degrees of freedom 4 (
Secondly, we compare the power performance between the quantile normalization method and the pooling method. Out of N (10 000) total subjects, we add
equal 3 SD of the component of
to 1% (100) subjects of the first group, i.e. replacing
by
for subject g = 1, ..., [N/100]. (For the three considered models,
is about 3, 1 and 8, respectively.) The results are given in Table 2. It is seen that FDR is well controlled (near 0) by both methods however the quantile normalization method is more powerful and has a high sensitivity of detecting contaminated subjects especially for the two or three group cases. We also conducted the similar simulations for the unbalanced cases and found the similar results.
|
Thirdly, we apply our method on the spike-in benchmark data. There are 273 DEG (including 30 true DEG) detected significant with FDR at 0.001. This seemingly high false positive rate can be explained as we take a closer look at the expressions of the DEG that are not in the list of true DEG (e.g. Fig. 1a). We found that our method actually detected genes having one or more probes differentially expressed due to probable cross hybridization. For example, see two DEG in Figure 1b and c. If we define the test statistic to be
|
|
|
Lastly, using the same spike-in data, we compare the performance of robusitified MANOVA with the traditional ANOVA model based on the RMA summarization scores. For the sake of simplicity, we show the results of one test of the k-group test (4) with k = 3, ..., 14, respectively, in Table 3. More explicitly, for k = 3, we test the hypothesis
|
4.2 Two-way robustified MANOVA
Analogous to the one-way analysis, we first study the performance of the two-way robustified MANOVA on simulated data from the same three models. We generate the data in a simple 2 x 2 factorial design with four replicates in each combination of factors (with p = 10 and N = 10 000). The results show that FDR under null hypotheses is well achieved. And for the power performance, first we contaminate 1% of subjects by adding
equal 3 SD of the component of
to the first level of both factors. And our method detected the contaminated subjects for both factors at desired level. While the same testing statistics with pooling method yield a very low power and a low detection rate (e.g. only 0.1% of contaminated subject). Next, we apply our method on the Estrogen data. At FDR 0.05, our method found all (12 387) genes are differentially expressed between 10 h and 48 h and detected 145 estrogen significant genes, which is consistent to the observations in Scholtens et al. (2004). Among the detected, there are four matches (with gene symbols TFF1, CCND1, CTSD, GREB1, respectively) to some primary ES target genes in the literature. For instance, the expression difference between estrogen levels of present and absent in both time levels are very clear as shown by gene CCND1 in the left panel of Figure 2. While other two well known estrogen genes of symbols CYP1B1 and ERBB2 are not detected in our analysis, as their expression are indistinguishable between ES present and ES absent in both time levels (the right panel of Figure 2).
|
In contrast, we also used the factDesign package proposed by Scholtens et al. (2004) and the limma package proposed by Smyth (2004, 2005) to the estrogen data (packages available at www.bioconductor.org/packages/2.1/bioc/). Both methods were implemented on the normalized/summarized data preprocessed by RMA algorithm (Irizarry et al., 2003a, b). Setting FDR at 0.05, factDesign detected 17 estrogen significant genes among which eight are contained in our DEG list, while limma detected 733 estrogen significant genes which contain all 145 genes detected by our method. As we visually went through the expressions of the genes that were declared significant by factDesign alone (nine genes) or by limma alone (588 genes), we found many genes whose expressions are very similar between estrogen present and estrogen absent such as genes 34771_at (detected by factDesign) and 160031_at (detected by limma) in Figure 3. As a matter of fact, the average p-value obtained from the robustified MANOVA method in S3* for the 145 genes is less than 0.0001 and the average p-value for the 588 genes is 0.5667. Meanwhile, the average p-value obtained from limma (after FDR adjustment) for the 145 genes is 0.004386 and the average p-value for the 588 genes is 0.01886. Moreover, among the 588 genes, some were found to be estrogen down-regulated at estrogen present, which obviously contradicted to the fact. This implies that these two competing methods can actually result in inflated FDR than what is expected, though the limma package shows a larger power. But in many expensive biological study, it is crucial to control the false positive. In conclusion, given the primary goal of controlling the type–I error or FDR, the proposed method utilizes the full raw data directly and is able to achieve a much better precision, which benefits subsequent analysis.
|
| 5 DISCUSSION |
|---|
|
|
|---|
5.1 Assumptions
The only assumption needed for a single test (1) of one subject/gene is the common covariance of expressions across all k groups. This is based on the rationale that the correlation structure among probes is inherited from the relative locations on the original gene sequence and hence remains unchanged in different groups or conditions. As for the case of multiple testing, the homogeneity of covariance of all subject/genes may not be true. We further assume that under the null hypothesis the statistics {Gg, g = 1, ..., N} or
|
|
|
5.2 Number of distinct permutations
To relabel a pooled sample of size n and form k groups with groups sizes n1, ..., nk, there are totally
|
|
5.3 Null distribution and correlated genes
It is known that some genes are correlated. This questions the validity of the assumption of independence for the estimation of the null distribution required by some parametric approaches. For example, one may assume that the test statistics are independent and identically distributed across all genes. However, our quantile normalization of the permutation distributions takes account of all genes, thereby preserving all possible relationships among genes and making no assumption of independence of gene expressions.
5.4 MM probes
Obviously the Affymetrix mismatch probes do not miss all the signals, as has been reported elsewhere (Hubbell et al., 2002). Strategies for the incorporation of information provided by MM probes are beyond the scope of this article. However, the data here indicate that the PM probes provide adequate basis for detecting DEG with robustified MANOVA.
5.5 Extension
As a direct application of one-way MANOVA model, one can perform profile analysis to detect certain patterns of the mean vectors or gene expression profiles that are characterized by the lines connecting the points (1, µ1), (2, µ2), ..., (p, µp) (Johnson and Wichern, 2002). The patterns of interest are (1) parallel across k groups (PR), (2) parallel and of same level across k groups (LE), and (3) parallel and flat across k groups (FL), as shown in Figure 5, and the corresponding hypotheses are formulated as
|
| (7) |
|
|
We carry out the profile analysis on the benmark dataset. The results consist of two parts. First, the expression of most genes are found to be parallel and of the same level across all 14 groups as expected. Second, we found 79 genes including all 30 spike-in genes at FDR 0.001 whose expressions are not parallel. This in turn substantiates the observation that expressions of 11 probes respond differently to the same amount of mRNA concentration, therefore it is more reasonable to treat them as a vector rather than to summarize them into a single score.
Finally, it is noted that although the detection method is motivated for arrays with multiple probes for one gene, it works for arrays with a single probe for each gene as well. We will evaluate its performance in a separate work.
| 6 CONCLUSION |
|---|
|
|
|---|
A new statistical method, called robustified MANOVA, has been developed to detect DEG at the probe level of oligonucleotide arrays to avoid potential information loss during data reduction. Our method is robust and requires minimum assumptions of the distribution of the data. We have shown our method consistently outperforms other competing methods based on summarization scores in terms of power and FDR under multiple testing. Implementation of our algorithm will result in more precise discovery of DEG, which benefits subsequent analysis such as clustering and classification.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
The research of J.X. is supported by University of California Riverside grants IC A0179519900 and AES–CE RSAP A01869 [GenBank] and partly supported by Science and Technology Commission of Shanghai Municipality Pujiang Project 07PJ14037. The research of X.C. is supported by NSF DBI 0646024.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Trey Ideker
Received on December 3, 2007; revised on December 3, 2007; accepted on February 4, 2008
| REFERENCES |
|---|
|
|
|---|
Affymetrix (2005). Guide to Probe Logarithmic Intensity Error (PLIER) Estimation. Affymetrix Whitepaper (2005) Available: http://www.affymetrix.com/support/technical/technotes/plier_technote.pdf.
Anderson TW. An Introduction to Multivariate Analysis. (1984) 2nd. New York: John Wiley & Sons.
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (1995) 57:289–300.
Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann. Statist. (2001) 29:1165–1188.[CrossRef]
Choe S, et al. Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset. Genome Biol. (2005) 6:R16.[CrossRef][Medline]
Cope LM, et al. A benchmark for Affymetrix GeneChip expression measures. Bioinformatics (2004) 20:323–331.
Dallas PB, et al. Gene expression levels assessed by oligonucleotide microarray analysis and quantitative real-time RT-PCR – how well do they correlate? BMC Genomics (2005) 6:59.[CrossRef][Medline]
Dempster AP. A high dimensional two sample significance test. Ann. Math. Statist. (1958) 29:995–1010.[CrossRef]
Efron B, et al. Empirical Bayes analysis of a microarray experiment. J. Am. Stat. Assoc. (2001) 96:1151–1160.[CrossRef][Web of Science]
Gao X. Construction of null statistics in permutation-based multiple testing for multi-factorial microarray experiments. Bioinformatics (2006) 22:1486–1494.
Good P. Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses. (1994) New York: Springer-Verlag.
Hubbell E, et al. Robust estimators for expression analysis. Bioinformatics (2002) 18:1585–1592.
Irizarry RA, et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics (2003a) 4:249–264.[Abstract]
Irizarry RA, et al. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res (2003b) 31:e15.
Johnson RA, Wichern DW. Applied Multivariate Statistical Analysis. (2002) 5th. New York: Prentice Hall.
Li C, Wong W. Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc. Natl Acad. Sci. USA (2001) 98:31–36.
Lockhart DJ, et al. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol. (1996) 14:1675–1680.[CrossRef][Web of Science][Medline]
Pan W. On the use of permutation in and the performance of a class of nonparametric methods to detect differential gene expression. Bioinformatics (2003) 19:1333–1340.
Scholtens D, et al. Analyzing factorial designed microarray experiments. J. Multivar. Anal. (2004) 90:19–43.[CrossRef]
Smyth GK. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. (2004) 3(No.1). Article 3.
Smyth GK. Limma: linear models for microarray data. In: Bioinformatics and Computational Biology Solutions using R and Bioconductor—Gentleman R, Carey V, Dudoit S, Irizarry R, Huber W, eds. (2005) New York: Springer.
Srivastava M, Fujikoshi Y. Multivariate analysis of variance with fewer observations than the dimension. J. Multivar. Anal. (2006) 97:1927–1940.[CrossRef]
Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA (2003) 100:9440–9445.
Tusher VG, et al. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA (2001) 96:5116–5121.
Wu Z, et al. A model-based background adjustment for oligonucleotide expression arrays. J. Am. Stat. Assoc. (2004) 99:909–917.[CrossRef][Web of Science]
Xie Y, et al. A note on using permutation based false discoveray rate estimate to compare different analysis methods for microarray data. Bioinformatics (2005) 21:4280–4288.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






