Skip Navigation


Bioinformatics Advance Access originally published online on March 3, 2008
Bioinformatics 2008 24(8):1056-1062; doi:10.1093/bioinformatics/btn053
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/8/1056    most recent
btn053v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Xu, J.
Right arrow Articles by Cui, X.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Xu, J.
Right arrow Articles by Cui, X.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Robustified MANOVA with applications in detecting differentially expressed genes from oligonucleotide arrays

Jin Xu 1 and Xinping Cui 2,*

1Department of Statistics, East China Normal University, Shanghai 200241, China and 2Department of Statistics, University of California Riverside, Riverside 92521, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND EXPERIMENTAL...
 3 METHODS
 4 RESULTS
 5 DISCUSSION
 6 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Motivation: Oligonucleotide arrays such as Affymetrix GeneChips use multiple probes, or a probe set, to measure the abundance of mRNA of every gene of interest. Some analysis methods attempt to summarize the multiple observations into one single score before conducting further analysis such as detecting differentially expressed genes (DEG), clustering and classification. However, there is a risk of losing a significant amount of information and consequently reaching inaccurate or even incorrect conclusions during this data reduction.

Results: We developed a novel statistical method called robustified multivariate analysis of variance (MANOVA) based on the traditional MANOVA model and permutation test to detect DEG for both one-way and two-way cases. It can be extended to detect some special patterns of gene expression through profile analysis across k (≥2) populations. The method utilizes probe-level data and requires no assumptions about the distribution of the dataset. We also propose a method of estimating the null distribution using quantile normalization in contrast to the ‘pooling’ method (Section 3.1). Monte Carlo simulation and real data analysis are conducted to demonstrate the performance of the proposed method comparing with the ‘pooling’ method and the usual Analysis of Variance (ANOVA) test based on the summarized scores. It is found that the new method successfully detects DEG under desired false discovery rate and is more powerful than the competing method especially when the number of groups is small.

Availability: The package of robustified MANOVA can be downloaded from http://faculty.ucr.edu/~xpcui/software

Contact: xinping.cui{at}ucr.edu; jxu{at}stat.ecnu.edu.cn


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND EXPERIMENTAL...
 3 METHODS
 4 RESULTS
 5 DISCUSSION
 6 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
High-density oligonucleotide arrays are revolutionizing biological research by making it possible to simultaneously study the expression of thousands of genes in one experiment. In this article, we focus on the Affymetrix GeneChip platform (Lockhart et al., 1996), which uses multiple (11–16) probes, or a probe set, to assay the expression level of every gene of interest.

Recently, some methods have been developed to summarize the multiple observations from one probe set into one single score, and then use it to do further statistical analysis such as detecting differentially expressed genes (DEG), clustering and classification. Some representative methods include MAS 5 (Hubbel et al., 2002), RMA (Irizarry et al., 2003a, b), dChip (Li and Wong, 2001), GCRMA (Wu et al., 2004) and PLIER (Affymetrix Tech Note, 2005) among others. Carefully designed experiments have been carried out to examine the performance of these different summarization methods (Choe et al., 2005; Cope et al., 2004). When different summarization methods were applied to the same datasets, the results of the statistical analysis showed marked differences (Choe et al., 2005). These studies conclude that none of the summarization method is superior to the others under all circumstances. The summarization process, by its nature, runs the risk of losing a significant amount of information (by compressing a set of measurements into one number) and may lead to inaccurate and incorrect conclusions (Dallas et al., 2005; Efron et al., 2001). We reason that a method which maintains all the probe-level information might achieve superior performance.

One primary goal of a microarray-based expression analysis experiment is to detect differentially expressed genes. The traditional statistical methods for this kind of analysis have been well established [e.g., Analysis of Variance (ANOVA), Multivariate Analysis of Variance (MANOVA) models] under the assumptions that the observations are normally distributed and the number of replicates is greater than the number of unknown parameters (Anderson, 1984). However, both assumptions, especially the latter, are usually violated in the microarray studies as one will typically have a handful of replicates but thousands of genes at one time.

We present here a new method called robustified MANOVA to detect DEG for both one-way and two-way cases (Section 3). Our method utilizes probe-level data directly and requires no assumptions about the distribution of the dataset. It is based on some new test statistics and a new method to estimate its null distribution through permutation and quantile normalization under multiple testing. The method was tested on various kinds of simulated datasets of different models, sample sizes, correlation structures, etc. and on some benchmark datasets as well. The results consistently show that the proposed method is superior to the competing methods in terms of false discovery rate (FDR) and power (Section 4). Some useful discussions and extension are given in Section 5.


    2 MATERIALS AND EXPERIMENTAL DESIGN
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND EXPERIMENTAL...
 3 METHODS
 4 RESULTS
 5 DISCUSSION
 6 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
2.1 Background
Affymetrix GeneChips are designed to measure every gene with 11–16 probe pairs. Each probe pair consists of a perfect match (PM) probe matching 25-base long subsequence of a gene and a mismatch (MM) probe that is identical to the PM probe except that the middle (13th) base is converted to its complement according to WatsonCrick base pairing. The MM probe is designed to distinguish background noise from signal. In this article, we will use only PM values for analysis. The incorporation of MM probes will be discussed in Section 5.4.

2.2 Data
2.2.1 Spike-in data
The first data used in this article is publicly available from a Latin-square spike-in study performed on Affymetrix Hgu133a GeneChip. It provides a benchmark to evaluate different analysis tools since the true expression levels of the spike-in transcripts are known in advance. We briefly describe one of them in Table 1, where 30 probe sets (from about 27 000 probe sets and excluding 12 Affymetrix control probe sets) are spiked in with different amounts of synthesized mRNA. The concentration of mRNA are given in consecutive fold changes (i.e. 21, 22, ..., 210). The groups are defined based on different combinations of the spike-in amounts (Table 1). And every group has three replicated samples. We refer readers to Cope et al. (2004) or www.affymetrix.com/support/datasets.affx for a complete description.


View this table:
[in this window]
[in a new window]

 
Table 1. A brief description of the spike-in design performed on Affymetrix Hgu133a GeneChips

 
2.2.2 Estrogen data
The second data is partly from an experiment performed on Affymetrix Hgu95av2 GeneChip studying the estrogen effect over time on cells from an estrogen receptor positive (ER+) human breast cancer cell line (MCF–7). Two factors (estrogen and time) both have two levels (estrogen: absent or present, time: 10 h or 48 h). For every combination of factors, two samples were collected. The dataset is available from http://bioconductor.org/packages/2.0/data/experiment. See more details in Scholten et al. (2004).


    3 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND EXPERIMENTAL...
 3 METHODS
 4 RESULTS
 5 DISCUSSION
 6 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
In this section, we present the robustified MANOVA method for both one-way and two-way analysis.

3.1 One-way robustified MANOVA
Let Formula ({alpha} = 1, ..., nj) be independent identically distributed (i.i.d.) samples from p-dimensional multivariate population Formula with mean vectors Formula and covariance {Sigma} (j = 1, ..., k). For example, in the microarray study, k represents the number of groups and p represents the number of probes. The general framework of one-way MANOVA tests the hypothesis


Formula 1

(1)
Analogous to the construction of Wilk's {Lambda} statistic (Johnson and Wichern, 2002), we first define a robust version of within-group error matrix and between-group error matrix, respectively, as follows,


Formula 2

(2)
where Formula , Formula Formula Formula . Note that the operator median is used in contrast to mean in the traditional normal theory and it is taken component-wise over the indices. It has to be pointed out that it is not true in general that Formula is equal to Formula due to the non-linearity property of median. The motivation of using median average is to retain the accuracy of the estimation in situations when only small number of replications are available and may contain various kinds of outliers, as often happens in microarray research. However, we are not able to formally adopt Wilk's Formula with the robustified Formula and Formula as they are both singular (not full rank) when k is less than p. So to compromise, we propose a new test statistic


Formula 3

(3)
in the spirit of F-statistic of one-way ANOVA model. It is noted that for G the two types of variation/error are simply suppressed to the sum of component variation/error, ignoring all correlation structure of the original random vectors, or equivalently treating all components as if they were uncorrelated. It is another version of Dempster test (Dempster, 1958) whose asymptotic distribution (as p goes to infinity) has been studied by Srivastava and Fujikoshi (2006). Here we consider p is fixed. (See more discussion in Section 5.1.)

The general distribution of G is unknown. Neither are the distributions of Formula . However in many cases we can assume the populations of Formula differ only by a location shift, i.e. Formula are of the same family of distribution. Then we can estimate the null distribution through permutation method (Good, 1994), which is an effective technique in non-parametric testing. The procedure is outlined as follows.

(S1) [Relabelling] From a pooled sample of size n (dropping all group labels), rearrange the labels to form k groups with sizes n1, ..., nk. Compute the test statistic for the relabelled groups.
(S2) [Repetition] Repeat S1 for M times of different relabelling, and denote the statistics as {G(s), s = 1, ..., M}. In case M is very large, a fixed number M*, say 1000, is used instead. See more discussion of M in Section 5.2.
(S3) [P-value] Determine the empirical P-value for the original statistic by comparing it to the permutation distribution, obtaining


Formula


The above are the typical steps of a permutation test for one subject or one gene in our data analysis. When testing the hypothesis for tens of thousands of genes simultaneously, i.e.


Formula 4

(4)
corrections have to be made to account for multiplicity and possible correlations among genes. For instance, in a two group comparison study with triplicates in each group, the total distinct permutation statistics for every gene is 10 (explained in Section 5.2), if we still use (S3) to determine the P-value, the error rate of rejection would be in a unit of 10%. It will greatly effect the precision of overall error rate measured, e.g. by FDR. Hence in the multiple testing scenario with a large number of tests/genes N, say 10 000, additional assumption has been made, such as the test statistics are independent or weakly dependent (Gao, 2006; Pan, 2003; Storey and Tibshirani, 2003). Under this condition, the statistics from different permutation are pooled together to form a null distribution. However, it has been reported to have possible power loss for this ‘pooling’ method (Xie et al., 2005). Here we propose a different approach to estimate the null distribution by applying quantile normalization to the distributions obtained from permutation. It is obtained by modifying (S3) of the previous procedure as follows. (See also Irizarry et al. (2003a); Tusher et al. (2001).)

(S3*) [Normalization and P-value] Denote the distribution of the test statistics for all genes under the sth permutation by Formula . Apply quantile normalization to all M such distributions to obtain a normalized distribution, denoted by Formula . More specifically, let Formula be the sorted Formula and let Formula be the sorted Formula , then Formula . The empirical P-value for the original Gg is determined by


Formula

At last, we use FDR criterion (Benjamini and Hochberg, 1995) to select the DEG based on the P-values obtained in (S3*). We show the performance of this new method against the pooling method in Section 4.

3.2 Two-way robustified MANOVA
Let Formula ({alpha} = 1, ..., nij) be i.i.d. samples from multivariate population Formula with mean vectors Formula and common covariance {Sigma} (i = 1, ..., a, j = 1, ..., b), where a and b are, respectively, the number of levels of factor A and factor B. The two-way MANOVA models the mean vector by


Formula

where Formula , Formula , Formula , Formula stand, respectively, for the common effect, the ith level effect of factor A, the jth level effect of factor B and the (i, j)th interaction effect. The hypothesis of interest is


Formula

We modify the error matrices of the traditional MANOVA analysis and define the robustified within-group error matrix, between-group error matrix and interaction error matrix, respectively, as follows.


Formula 5

(5)
where ni· = {sum}j nij, n·j = {sum}i nij, Formula , Formula Formula , Formula , Formula . Similar to the construction in previous one-way analysis, we propose three statistics for testing effects of factor A, factor B and their interaction, respectively, as follows,


Formula 6

(6)
The significance level under a single test or under multiple tests framework can be obtained through the permutation method as illustrated in the one-way analysis. The only differences lie on the modifications that (1) labels are rearranged within all a x b groups and (2) three permutation distributions are generated for Ga, Gb and Gab. And consequently the multiplicity correction for the overall significance is carried out for each factor separately.


    4 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND EXPERIMENTAL...
 3 METHODS
 4 RESULTS
 5 DISCUSSION
 6 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
4.1 One-way robustified MANOVA
We first examine the performance of the proposed method through simulated data. We choose three models by letting the components of Formula be independent and identically distributed standard normal variable (N(0, 1)), uniform variable (U(0, 1)) and {chi}2 variable with degrees of freedom 4 (Formula ), respectively. Set the dimension p equal 10, the number of replicates for each group equal 4 (n1 = ... = nk = 4) and the total subjects N equal 10 000. And let group number k vary from 2 to 6. It is found that for all models FDR under null hypotheses is well achieved for both the proposed method which uses quantile normalization and the pooling method. Both methods literarily detect none significant subject at FDR level 0.01, as expected. And both methods perform equally well for cases of different dimensions of the random vector and unbalanced sample sizes (of k groups). Next we study the impact of correlation to the FDR. We impose correlation structure to about 30% of the simulated data simply by replacing Formula by Formula for g = 1, ..., [N/3], where [N/3] is the largest integer less than N/3. The proposed method also detects nearly none significant subjects, which is as good as the simulated data without correlation structure. [For mathematical rigor, see Benjamini and Yekutieli, 2001].

Secondly, we compare the power performance between the quantile normalization method and the pooling method. Out of N (10 000) total subjects, we add {delta} equal 3 SD of the component of Formula to 1% (100) subjects of the first group, i.e. replacing Formula by Formula for subject g = 1, ..., [N/100]. (For the three considered models, {delta} is about 3, 1 and 8, respectively.) The results are given in Table 2. It is seen that FDR is well controlled (near 0) by both methods however the quantile normalization method is more powerful and has a high sensitivity of detecting contaminated subjects especially for the two or three group cases. We also conducted the similar simulations for the unbalanced cases and found the similar results.


View this table:
[in this window]
[in a new window]

 
Table 2. Comparison of power and FDR between the quantile normalization method and the pooling method given the nominal FDR at 0.01

 
Thirdly, we apply our method on the spike-in benchmark data. There are 273 DEG (including 30 true DEG) detected significant with FDR at 0.001. This seemingly high false positive rate can be explained as we take a closer look at the expressions of the DEG that are not in the list of true DEG (e.g. Fig. 1a). We found that our method actually detected genes having one or more probes differentially expressed due to probable cross hybridization. For example, see two DEG in Figure 1b and c. If we define the test statistic to be


Formula

where median average is taken over the diagonal elements of the error matrices to measure the overall variation, then the cross hybridization effect on single probe or few probes will be eliminated, and we can expect an improved performance of detecting DEG with majority changes over all probes like the spike-in genes. In fact, it turns out that statistic Formula detected 51 DEG (including all 30 true DEG) at the same FDR level. Again, non-spike-in probe sets with expression profile similar as ‘207777_s_at’ are detected, so the FDR is actually well achieved if we take these genes into account. It is worth noting that the freedom of choosing a most suitable test statistic to the problem at hand is just another advantage of permutation test. Nevertheless, both statistics G and Formula show high sensitivity in that the top 30 DEG with the largest statistics actually capture 28 known spike-in genes.


Figure 1
View larger version (14K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Group expression levels in median average for (a) a spike-in gene, (b) and (c) two non-spike-in genes detected due to possible cross hybridization effect. Group labels are added along with the expressions.

 
Lastly, using the same spike-in data, we compare the performance of robusitified MANOVA with the traditional ANOVA model based on the RMA summarization scores. For the sake of simplicity, we show the results of one test of the k-group test (4) with k = 3, ..., 14, respectively, in Table 3. More explicitly, for k = 3, we test the hypothesis Formula , for g = 1, ..., N. It is seen that the robustified MANOVA method perform much better in term of FDR (of overall 17.0%) than the ANOVA method based on the summarized scores (overall FDR 59.8%), especially when the group number increases. It is most likely due to the fact that summarization methods lose more and more information as the number of chips increases.


View this table:
[in this window]
[in a new window]

 
Table 3. Comparison of power and FDR between robustified MANOVA and ANOVA with RMA summarization given nominal FDR at 0.001

 
4.2 Two-way robustified MANOVA
Analogous to the one-way analysis, we first study the performance of the two-way robustified MANOVA on simulated data from the same three models. We generate the data in a simple 2 x 2 factorial design with four replicates in each combination of factors (with p = 10 and N = 10 000). The results show that FDR under null hypotheses is well achieved. And for the power performance, first we contaminate 1% of subjects by adding {delta} equal 3 SD of the component of Formula to the first level of the first factor, the proposed method precisely detected the contaminated subjects of the first factor at desired FDR level 0.01 through the corresponding test statistic Ga in (6), meanwhile detected none significant subjects in the second factor and interaction. Second, we contaminate 1% of subjects by adding {delta} to the first level of both factors. And our method detected the contaminated subjects for both factors at desired level. While the same testing statistics with pooling method yield a very low power and a low detection rate (e.g. only 0.1% of contaminated subject).

Next, we apply our method on the Estrogen data. At FDR 0.05, our method found all (12 387) genes are differentially expressed between 10 h and 48 h and detected 145 estrogen significant genes, which is consistent to the observations in Scholtens et al. (2004). Among the detected, there are four matches (with gene symbols ‘TFF1’, ‘CCND1’, ‘CTSD’, ‘GREB1’, respectively) to some primary ES target genes in the literature. For instance, the expression difference between estrogen levels of ‘present’ and ‘absent’ in both time levels are very clear as shown by gene ‘CCND1’ in the left panel of Figure 2. While other two well known estrogen genes of symbols ‘CYP1B1’ and ‘ERBB2’ are not detected in our analysis, as their expression are indistinguishable between ES present and ES absent in both time levels (the right panel of Figure 2).


Figure 2
View larger version (10K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Expression of gene ‘38418_at (CCND1)’ and gene ‘33218_at (ERBB2)’ at two time levels.

 
In contrast, we also used the factDesign package proposed by Scholtens et al. (2004) and the limma package proposed by Smyth (2004, 2005) to the estrogen data (packages available at www.bioconductor.org/packages/2.1/bioc/). Both methods were implemented on the normalized/summarized data preprocessed by RMA algorithm (Irizarry et al., 2003a, b). Setting FDR at 0.05, factDesign detected 17 estrogen significant genes among which eight are contained in our DEG list, while limma detected 733 estrogen significant genes which contain all 145 genes detected by our method. As we visually went through the expressions of the genes that were declared significant by factDesign alone (nine genes) or by limma alone (588 genes), we found many genes whose expressions are very similar between estrogen present and estrogen absent such as genes ‘34771_at’ (detected by factDesign) and ‘160031_at’ (detected by limma) in Figure 3. As a matter of fact, the average p-value obtained from the robustified MANOVA method in S3* for the 145 genes is less than 0.0001 and the average p-value for the 588 genes is 0.5667. Meanwhile, the average p-value obtained from limma (after FDR adjustment) for the 145 genes is 0.004386 and the average p-value for the 588 genes is 0.01886. Moreover, among the 588 genes, some were found to be estrogen down-regulated at estrogen present, which obviously contradicted to the fact. This implies that these two competing methods can actually result in inflated FDR than what is expected, though the limma package shows a larger power. But in many expensive biological study, it is crucial to control the false positive. In conclusion, given the primary goal of controlling the type–I error or FDR, the proposed method utilizes the full raw data directly and is able to achieve a much better precision, which benefits subsequent analysis.


Figure 3
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. Expressions of genes ‘34771_at’ and ‘160031_at’ at two time levels.

 

    5 DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND EXPERIMENTAL...
 3 METHODS
 4 RESULTS
 5 DISCUSSION
 6 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
5.1 Assumptions
The only assumption needed for a single test (1) of one subject/gene is the common covariance of expressions across all k groups. This is based on the rationale that the correlation structure among probes is inherited from the relative locations on the original gene sequence and hence remains unchanged in different groups or conditions. As for the case of multiple testing, the homogeneity of covariance of all subject/genes may not be true. We further assume that under the null hypothesis the statistics {Gg, g = 1, ..., N} or Formula are identically but not necessarily independently distributed, so that we can apply quantile normalization to estimate the null distribution. However, it is not a restricted assumption because the effect of heterogenous covariance is minimized by the following two facts, (1) G (or Formula ) is invariant to a scaler multiplier, i.e.


Formula

[In fact, G with the usual W and B is invariant under a larger group of transformation (Srivastava and Fujikoshi, 2006).] (2) the operations trace and ratio are robust to off-diagonal elements of the covariance and eventually pick up the information of overall between-group variation with respect to within-group variation. We demonstrate the performance of the quantile normalization estimation for the distribution of G in the simulated one-way analysis in Figure 4, where the distribution of the original statistics {Gg, g = 1, ..., N} is given in Figure 4a and the estimated null distribution Formula is given in Figure 4b. We can see through the Q-Q plot Figure 4d that the majority of their quantiles matched very well along a straight line as reflected as well by their distributions in histograms (Figure 4b and c), and that 1% contaminated subjects stand out clearly in the upper right tail. (Here we set N = 1000.) Similar behavior of performance can be seen for the other two statistics testing the second factor and the interaction (Figures not shown).


Figure 4
View larger version (13K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4. (a) histogram of original statistics {Gg, g = 1, ..., N}, (b) histogram of estimated null distribution Figure 4, (c) histogram of original statistics {Gg} in the same range as Figure 4 and (d) Q-Q plot of {Gg} vs Figure 4.

 
5.2 Number of distinct permutations
To relabel a pooled sample of size n and form k groups with groups sizes n1, ..., nk, there are totally Formula possible outcomes of combinations. However, it needs to be adjusted according to the test statistic in order to get the number of distinct permutations because if the statistic is invariant to the permutation of groups with same size, like G or Ga, Gb, Gab in our case, many combinations of partition/grouping shall result in identical statistic. For instance, given k = 3 and n1 = n2 = n3 = 3, partitions {g1 = {1, 2, 3}, g2 = {4, 5, 6}, g3 = {7, 8, 9}} and {g1 = {4, 5, 6}, g2 = {1, 2, 3}, g3 = {7, 8, 9}} lead to the same statistic. The repetition factor can be expressed by


Formula

Therefore, the final number of distinct permutations with respect to the statistic is M = M0/A. For the example above, M = 9!/3!3!3!/3! = 280. This explains that for a simple two-group comparison with triplicates in each group, the distinct number of permutations is actually as few as M = 6!/3!3!/2! = 10. It is a point that has been sometimes ignored.

5.3 Null distribution and correlated genes
It is known that some genes are correlated. This questions the validity of the assumption of independence for the estimation of the null distribution required by some parametric approaches. For example, one may assume that the test statistics are independent and identically distributed across all genes. However, our quantile normalization of the permutation distributions takes account of all genes, thereby preserving all possible relationships among genes and making no assumption of independence of gene expressions.

5.4 MM probes
Obviously the Affymetrix mismatch probes do not miss all the signals, as has been reported elsewhere (Hubbell et al., 2002). Strategies for the incorporation of information provided by MM probes are beyond the scope of this article. However, the data here indicate that the PM probes provide adequate basis for detecting DEG with robustified MANOVA.

5.5 Extension
As a direct application of one-way MANOVA model, one can perform profile analysis to detect certain patterns of the mean vectors or gene expression profiles that are characterized by the lines connecting the points (1, µ1), (2, µ2), ..., (p, µp) (Johnson and Wichern, 2002). The patterns of interest are (1) parallel across k groups (PR), (2) parallel and of same level across k groups (LE), and (3) parallel and flat across k groups (FL), as shown in Figure 5, and the corresponding hypotheses are formulated as


Formula 7

(7)
where Formula and C(p – 1) x p is a contrast matrix of full row rank such that C1 = 0. The traditional methods under normal theory does not work well because of the singularity of the error matrix mentioned before and lack of normality. However, based on the previous section, we only need to modify the non-parametric testing method through some suitable transformation given in Table 4. For example, the null hypothesis for parallelism Formula is equivalent to Formula with respect to Formula , thus reduces to a standard one-way MANOVA problem in (1), and the rest of the procedure follows exactly the same as in Section 3.1.


Figure 5
View larger version (3K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 5. Three patterns of profile.

 

View this table:
[in this window]
[in a new window]

 
Table 4 Transformations made for profile analysis

 
We carry out the profile analysis on the benmark dataset. The results consist of two parts. First, the expression of most genes are found to be parallel and of the same level across all 14 groups as expected. Second, we found 79 genes including all 30 spike-in genes at FDR 0.001 whose expressions are not parallel. This in turn substantiates the observation that expressions of 11 probes respond differently to the same amount of mRNA concentration, therefore it is more reasonable to treat them as a vector rather than to summarize them into a single score.

Finally, it is noted that although the detection method is motivated for arrays with multiple probes for one gene, it works for arrays with a single probe for each gene as well. We will evaluate its performance in a separate work.


    6 CONCLUSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND EXPERIMENTAL...
 3 METHODS
 4 RESULTS
 5 DISCUSSION
 6 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
A new statistical method, called robustified MANOVA, has been developed to detect DEG at the probe level of oligonucleotide arrays to avoid potential information loss during data reduction. Our method is robust and requires minimum assumptions of the distribution of the data. We have shown our method consistently outperforms other competing methods based on summarization scores in terms of power and FDR under multiple testing. Implementation of our algorithm will result in more precise discovery of DEG, which benefits subsequent analysis such as clustering and classification.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND EXPERIMENTAL...
 3 METHODS
 4 RESULTS
 5 DISCUSSION
 6 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
The research of J.X. is supported by University of California Riverside grants IC A0179519900 and AES–CE RSAP A01869 [GenBank] and partly supported by Science and Technology Commission of Shanghai Municipality Pujiang Project 07PJ14037. The research of X.C. is supported by NSF DBI 0646024.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Trey Ideker

Received on December 3, 2007; revised on December 3, 2007; accepted on February 4, 2008

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND EXPERIMENTAL...
 3 METHODS
 4 RESULTS
 5 DISCUSSION
 6 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Affymetrix (2005). Guide to Probe Logarithmic Intensity Error (PLIER) Estimation. Affymetrix Whitepaper (2005) Available: http://www.affymetrix.com/support/technical/technotes/plier_technote.pdf.

    Anderson TW. An Introduction to Multivariate Analysis. (1984) 2nd. New York: John Wiley & Sons.

    Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (1995) 57:289–300.

    Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann. Statist. (2001) 29:1165–1188.[CrossRef]

    Choe S, et al. Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset. Genome Biol. (2005) 6:R16.[CrossRef][Medline]

    Cope LM, et al. A benchmark for Affymetrix GeneChip expression measures. Bioinformatics (2004) 20:323–331.[Abstract/Free Full Text]

    Dallas PB, et al. Gene expression levels assessed by oligonucleotide microarray analysis and quantitative real-time RT-PCR – how well do they correlate? BMC Genomics (2005) 6:59.[CrossRef][Medline]

    Dempster AP. A high dimensional two sample significance test. Ann. Math. Statist. (1958) 29:995–1010.[CrossRef]

    Efron B, et al. Empirical Bayes analysis of a microarray experiment. J. Am. Stat. Assoc. (2001) 96:1151–1160.[CrossRef][Web of Science]

    Gao X. Construction of null statistics in permutation-based multiple testing for multi-factorial microarray experiments. Bioinformatics (2006) 22:1486–1494.[Abstract/Free Full Text]

    Good P. Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses. (1994) New York: Springer-Verlag.

    Hubbell E, et al. Robust estimators for expression analysis. Bioinformatics (2002) 18:1585–1592.[Abstract/Free Full Text]

    Irizarry RA, et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics (2003a) 4:249–264.[Abstract]

    Irizarry RA, et al. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res (2003b) 31:e15.[Abstract/Free Full Text]

    Johnson RA, Wichern DW. Applied Multivariate Statistical Analysis. (2002) 5th. New York: Prentice Hall.

    Li C, Wong W. Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc. Natl Acad. Sci. USA (2001) 98:31–36.[Abstract/Free Full Text]

    Lockhart DJ, et al. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol. (1996) 14:1675–1680.[CrossRef][Web of Science][Medline]

    Pan W. On the use of permutation in and the performance of a class of nonparametric methods to detect differential gene expression. Bioinformatics (2003) 19:1333–1340.[Abstract/Free Full Text]

    Scholtens D, et al. Analyzing factorial designed microarray experiments. J. Multivar. Anal. (2004) 90:19–43.[CrossRef]

    Smyth GK. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. (2004) 3(No.1). Article 3.

    Smyth GK. Limma: linear models for microarray data. In: Bioinformatics and Computational Biology Solutions using R and Bioconductor—Gentleman R, Carey V, Dudoit S, Irizarry R, Huber W, eds. (2005) New York: Springer.

    Srivastava M, Fujikoshi Y. Multivariate analysis of variance with fewer observations than the dimension. J. Multivar. Anal. (2006) 97:1927–1940.[CrossRef]

    Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA (2003) 100:9440–9445.[Abstract/Free Full Text]

    Tusher VG, et al. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA (2001) 96:5116–5121.

    Wu Z, et al. A model-based background adjustment for oligonucleotide expression arrays. J. Am. Stat. Assoc. (2004) 99:909–917.[CrossRef][Web of Science]

    Xie Y, et al. A note on using permutation based false discoveray rate estimate to compare different analysis methods for microarray data. Bioinformatics (2005) 21:4280–4288.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/8/1056    most recent
btn053v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Xu, J.
Right arrow Articles by Cui, X.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Xu, J.
Right arrow Articles by Cui, X.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?