Bioinformatics Advance Access originally published online on March 7, 2007
Bioinformatics 2007 23(10):1217-1224; doi:10.1093/bioinformatics/btm081
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Pooling mRNA in microarray experiments and its effect on power
1Department of Statistics, 2Department of Animal Science and 3Center for Integrated Animal Genomics, Iowa State University, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Microarrays can simultaneously measure the expression levels of many genes and are widely applied to study complex biological problems at the genetic level. To contain costs, instead of obtaining a microarray on each individual, mRNA from several subjects can be first pooled and then measured with a single array. mRNA pooling is also necessary when there is not enough mRNA from each subject. Several studies have investigated the impact of pooling mRNA on inferences about gene expression, but have typically modeled the process of pooling as if it occurred in some transformed scale. This assumption is unrealistic.
Results: We propose modeling the gene expression levels in a pool as a weighted average of mRNA expression of all individuals in the pool on the original measurement scale, where the weights correspond to individual sample contributions to the pool. Based on these improved statistical models, we develop the appropriate F statistics to test for differentially expressed genes. We present formulae to calculate the power of various statistical tests under different strategies for pooling mRNA and compare resulting power estimates to those that would be obtained by following the approach proposed by Kendziorski et al. (2003). We find that the Kendziorski estimate tends to exceed true power and that the estimate we propose, while somewhat conservative, is less biased. We argue that it is possible to design a study that includes mRNA pooling at a significantly reduced cost but with little loss of information.
Contact: alicia{at}iastate.edu
| 1 INTRODUCTION |
|---|
|
|
|---|
Microarray experiments are widely used to measure the expression levels of tens of thousands of genes simultaneously under different experimental conditions or during different time periods. One of the major interests in microarray experiments is to identify genes which are differentially expressed between conditions or time periods, and enable a deeper knowledge of complex biological problems at the genetic level. However, the unit cost of microarrays continues to be high; even for a moderate number of subjects, cost can be significant. One option to control costs is to pool the mRNA of a group of individuals and then run microarrays on the pools rather than on each individual. Pooling mRNA may also be required when there is not enough mRNA from each subject to hybridize individual microarrays.
The effect and efficiency of pooling mRNA in microarray experiments have been investigated by several researchers. Kendziorski et al. (2003) showed that pooling is most advantageous when biological variability (variability across subjects) in expression level is larger than technical variability (variability introduced in the experimental process). They also derived a formula for the total number of arrays and individuals required in an experiment involving mRNA pools to obtain gene expression estimates and confidence intervals comparable to those that would be obtained when analyzing individual arrays. They concluded that by increasing the total number of individuals in the experiment, it was possible to maintain precision of estimates by pooling, while decreasing the total number of arrays. Shih et al. (2004) also discussed the impact of pooling mRNA on the power of statistical tests. They derived an expression to carry out power calculations for a given number of arrays and individuals. Further, they used expression data obtained from mice to check the adequacy of the assumption that mRNA expression levels in the pool are close to the average expression levels of individuals in the pool. They showed that the assumption does not hold, especially when the signals are high.
Both Kendziorski et al. (2003) and Shih et al. (2004) derived their results on the transformed scale, that is, after the data were normalized and signal intensity was transformed. Thus, both studies assumed that the mRNA expression in the pool is the average expression of individual samples, and applied the assumption on the transformed scale. This assumption is, however, not realistic in a biological sense because in the laboratory, the mRNA is extracted from samples and then mixed to form an mRNA pool. Therefore, pooling occurs on the original scale and an assumption that holds on the transformed scale may not hold before transformation.
In this article, we address the issue of testing for differences in gene expression across treatments. We assume that the expression level in a pool is approximately equal to the average expression of individuals in the pool on the original scale. More precisely, we assume that the expression level in a pool is a weighted average expression of individual samples on the original scale, where weights correspond to the proportional contribution of each individual to the pool. By including the weights in the average, we account for the fact that in the process of combining individual mRNA samples, the mixing proportions may not be identical and thus, that the pool may contain more mRNA from some individuals than from others.
Under the assumptions above, we use a single gene as an example and derive expressions to calculate power under different treatment effect sizes, number of mRNA pools, number of individuals per mRNA pool and number of repeated measurements per pool. We wish to understand how much power is lost by pooling mRNA. We also wish to find efficient experimental designs for pooling mRNA samples, while keeping costs down.
| 2 SYSTEMS AND METHODS |
|---|
|
|
|---|
2.1 Notation and model
Microarray gene expression measurements tend to be right skewed and thus not normally distributed. Therefore, data are usually transformed and normalized before statistical analysis. The most common transformation is the log transformation (Geller et al., 2003). The transformed data can then be modeled as a linear function of treatment effects and one or more normally distributed random effects (Han et al., 2004; Lu 2004; Shih et al., 2004). The sources of variation in a microarray experiment are multiple and can be generally classified into two groups: biological variation and technical variation (Chen et al., 2004; Kendziorski et al., 2003). Biological variation is subject-to-subject variation in gene expression and is due to subject-specific genetic or environmental factors. Technical variation arises from the errors that can potentially be introduced at each of multiple steps in a microarray experiment. These include RNA sample preparation, microarray construction, hybridization and washing procedures, and signal detection methods. Here, we focus on the two major categories: biological variation and technical variation.
The expression levels of tens of thousands of genes are measured simultaneously in a microarray experiment. For simplicity of notation, a single gene is considered in the following derivation and analysis. The true gene expression level of a gene on the jth individual in the ith treatment is denoted mij and can be modeled on the log scale as
|
| (1) |
ij is biological error which is assumed to be independently, identically distributed as |
| (2) |
ij is technical error which is also assumed to be independently, identically distributed as
Suppose now that the mRNA from n subjects from the same treatment is combined to form a pool. We use
to denote true expression for a gene in the ith treatment group and jth pool. We assume that the true expression level of a gene in the mRNA pool is a weighted average of the true expression levels of the gene in all individuals in the same mRNA pool (denoted mij1, ..., mijn) so that
|
| (3) |
|
|
|
| (4) |
ijl is technical error as defined earlier and l = 1,2,... R. Here, R is the number of replicated array measurements per pool and R*P = A, where A is the total number of arrays per treatment group. P = A if each mRNA pool is measured only once. Note that model (4) for the transformed observed mRNA level in a pool is similar to model (2) formulated for observed mRNA in an individual array on the transformed scale.
2.2 Expectation and variance of ![]()
The distribution of
is analytically intractable, but simulations and goodness-of-fit testing show that it can be adequately approximated by a normal distribution (see additional discussion of this point in Section 4). We will use
and
to denote the mean and variance of this normal distribution. We can then write
|
| (5) |
|
| (6) |
ij is assumed to be independent and identically distributed
We are interested in testing for differences in gene expression across treatments of the form µ i–µ j. In this subsection, we will derive expressions for
and
and show that
, so that the tests of interest can be conducted using data from pools.
To derive an expression for
, we expand
using a Taylor series to obtain
|
| (7) |
|
| (8) |
|
| (9) |
|
| (10) |
|
|
|
| (11) |
To obtain an approximation for
, we use the Delta method to obtain
|
| (12) |
|
| (13) |
ij in model (6) is an error term attributable to biological variation in expression level and to the additional variability that is introduced when pooling mRNA samples.
2.3 Power in a design that includes pooling mRNA
One interesting finding is that
(see Expression 11). Therefore, the hypothesis for testing
in the design that includes pooling is equivalent to the hypothesis for testing µ 1 = µ 2 = ... = µ T in a design that involves individual microarrays. The corresponding F-test statistic for the design with pooling is given by
|
| (14) |
The null hypothesis of no treatment differences is rejected at significance level
if the F statistic is larger than FT–1, T*(P–1),
, where Fdf1,df2,
is the (1–
)*100th percentile of a central F distribution with df1,df2 degrees of freedom.
If the type I error is controlled at level
, power of the test is given by
|
| (15) |
2, where
|
| (16) |
For a more general test of hypothesis for a linear combination of the means Cµ p = d, where
with C a known matrix of constants with full rank r, power is calculated as
|
| (17) |
|
| (18) |
For notational simplicity, we have assumed balance in the number of pools per treatment (P), the number of individuals per pool (n) and the number of arrays per pool (R). The F statistic and power formulas presented in subsection can be generalized in a straightforward manner to account for differing numbers of pools per treatment. Differing numbers of individuals per pool and/or differing numbers of arrays per pool, however, create problems with the standard assumption of homogeneity of variance. We have shown that the variance of
is non-trivial function of the number of individuals per pool (n), the biological variance (
), variance associated with varying contributions of samples to each pool (
), additional technical variance (
), and the number of arrays per pool (R). If n and/or R are not constant for all pools, the variance of
will not be the same for all values of i and j, and the tests and power calculations presented in this subsection will not be valid. Furthermore, if n and/or R vary across pools, the resulting heterogeneity of variance cannot be easily dealt with by weighting the observations because optimal weights will depend on unknown variance components
,
, and
. Thus, varying numbers of individuals per pool and/or varying numbers of arrays per pool should be avoided.
| 3 DISCUSSION |
|---|
|
|
|---|
3.1 Comparing estimates of power
The power approximations presented in the previous section are based on an assumption of normality and Delta-method approximations. To estimate the impact of these approximations, we simulated data and calculated power numerically and using the analytical expressions derived in Section 2.3. We also compared power from simulation with power calculated analytically under the Kendziorski model (Kendziorski et al., 2003).
We simulated individual data under two different scenarios. For the first scenario, we fixed the number of treatment groups at three (T = 3) and the number of individuals per treatment group at 100 (N = 100). The mean expression difference between adjacent treatment groups on the log scale (µ 1–µ 2 = µ 2–µ 3) was assumed to be 0.2, 0.3, 0.4 or 0.5. Biological and technical variances were fixed at 0.75 and 0.25 (
), respectively. Pooled data under our model were simulated as a weighted average of five or three individuals (weight variation was
) on the original scale. Therefore, n = 5 and we considered 20 pools per treatment (P = 20). For the second scenario, we simulated less individuals and less pools with N = 15, n = 3 and P = 5 while keeping all the other parameters the same. For each scenario, a one-way ANOVA model was fitted to the simulated pooled data to test whether the mean expression level was different across treatment groups. We compared the power of the tests at
= 0.05. Results are presented in Table 1. Power calculated by simulation was based on 10 000 replicates of the experiment. The entries in the column labeled Analytical power were calculated under two different models: the proposed model (Expression 6) and the Kendziorski model (Kendziorski et al., 2003).
|
In both scenarios, the predicted power as computed using the approach proposed by Kendziorski et al. (2003) appears to be overly optimistic in that it consistently exceeds power calculated from simulation. This may be because their approach does not account for the additional variance introduced in the pooling step and because they assume that mRNA samples can be pooled on the log scale directly. If we set
3.2 The effect of repeated measurements on power
For a given set of experimental conditions, biological, technical and weight variation in the pooled data are often fixed. Therefore, the power of the test for a given set of conditions depends on the number of pools, the number of repeated measurements per pool and the number of individuals per pool. Consider the following example: suppose that there are three treatment groups (T = 3) and 100 individuals per treatment (N = 100), and let the mean expression difference between any two adjacent treatment groups on the log scale be 0.5 (
), which represents a 1.65-fold difference on the original scale. Suppose that total variation is equal to 1, biological variance is three times as large as technical variance (
and
), and technical SD in the pooling step is 5% of the standardized mean (
). Then, for a fixed number of arrays per treatment (A = 5,10,15,20), the effect of obtaining repeated measurements on each pool on power is shown in Figure 1. We computed power analytically using Expressions (15) and (16) with any R and P values that match the equation R*P = A. Note, however, that R and P will always have integer values in an actual experiment. Power decreases as the number of repeated measurements per sample increases for fixed numbers of individuals and arrays. Therefore, when the number of subjects is fixed and the number of arrays is limited, a more efficient strategy is to create multiple pools and measure each once rather than to create fewer pools and measure each multiple times. This is consistent with findings in Kendziorski et al. (2003). In the remainder, we assume that each pool is measured once (R = 1,P = A).
|
3.3 The effect of the number of mRNA pools on power
Figure 2 shows power that is computed using Expressions (15) and (16) when different numbers of pools are created under various mean expression differences between adjacent treatments (
|
3.4 The effect of biological, technical and weight variability on power
From the results presented in Section 2.3, we know that power depends on
|
The additional technical variation introduced in the pooling step does not appear to affect power much (Fig. 4), even if the pooling technical variance is rather high (
|
Samples of mRNA from individuals are sometimes pooled in microarray experiments, either because the biological material available from each individual is not sufficient to array or to keep costs down. It is to be expected that statistical tests to detect differences in mean gene expression levels across treatments will be affected when they are based on pools of mRNA samples rather than on individual samples, since some information is bound to be lost. In particular, the power of F-tests in the usual ANOVA models is expected to decrease when the experimental design involves pooling of individual samples.
Several authors have investigated the statistical properties of F-tests based on pooled mRNA samples (Kendziorski et al., 2003 and Shih et al., 2004). One limitation in these studies is that the statistical models adopted imply that the mRNA samples are pooled on the log scale, which is unrealistic. We investigated the power of F-tests in ANOVA models when mRNA samples are pooled, but extended the models so that the pooling process is carried out on the original scale. In our formulation, mRNA pools are weighted averages of individual mRNA samples and consider the measurement error that is introduced when pooling potentially different amounts of mRNA from individuals into a pool. We argue that when pooling is assumed to occur on the log scale, the power of the tests is overestimated and propose an approach to calculate power under the more realistic scenario of pooling on the original scale.
It is not possible to derive an analytical expression for the distribution of pooled gene expression on the log scale. Therefore, we assume that gene expression on the log scale is normally distributed. To check this assumption, we conducted simulation studies and found that, at least for the range common to microarray data, the normality assumption appears to be reasonable. Our focus is on deriving expressions to calculate the power of F-tests to detect mean gene expression difference across treatments in designs that involve pooling. Because the F-test is robust to modest departures from normality (Mendes and Pala, 2004), we anticipate that assuming a normal model for the gene log-expression values will not have a noticeable effect on our results. We show that the power estimated using the approach we propose here is conservative in that it tends to slightly underestimate true power; therefore, true power is at least as high as the estimates resulting from implementing the method we propose.
As might be expected, the power of the tests depends not only on the size of the treatment effect but also on the total number of individuals and pools, the number of pools per treatment, the number of replicated measurements obtained for each pool and the magnitude of biological and technical variability. For the technical variability, we distinguish the usual variance introduced in the various steps of microarray experiments and the variability that is introduced during the pooling process, resulting from the possibly differential contributions of individual samples to the pool.
We used simulated gene expression data to compare the power of F-tests that can be achieved when analyzing individual mRNA samples and under various pooling designs. We computed power analytically and also via simulation, and compared results to those that would be obtained by implementing the approach proposed by Kendziorski et al. (2003). We found that given a fixed number of individuals and arrays, power tends to be higher when a larger number of pools is measured once than when replicate measurements are obtained on a smaller number of pools. This holds for all values of the biological, technical and pooling variabilities considered in our study. Not surprisingly, we also found that power of tests based on individual samples is always higher than power based on pooled samples. For large enough effect sizes, however, it is possible to design an experiment that involves pooling mRNA samples that almost achieves the power that would be obtained when arraying individual samples, but at a fraction of the cost. Thus, our results suggest that under some conditions, pooling mRNA samples in microarray experiments can be a good strategy if cost is a consideration.
One of the important features of our model is that it attempts to mimic the pooling process as it happens in the lab. We assume that the mRNA pool is a weighted average of expression levels from individual mRNA samples, where the weights are random variables with mean 1/n. Because the log is a non-linear transformation, the log of this weighted average will be different from a weighted average of log-transformed individual expression levels. Based on (10), we have
|
|
2,
One other interesting finding is that after log transformation and assuming of normality, the expected mean expression difference in a design that involves pooling is the same as in a design without pooling, i.e.
. Thus, a test for the hypothesis that
is equivalent to the test µ i = µ j. This property might not hold under other transformations, however.
We have focused on power calculation under designs that pool or do not pool mRNA when testing expression differences for a single gene. In microarray experiments, tests involve tens of thousands of genes and the biological variation may differ from gene to gene. Therefore, pool designs required to reach a certain power may be different between genes due to differences in biological variation across genes. Thus, finding a single efficient design for pooling mRNA which results in the desired power for all the genes in the microarray experiment might be a challenge.
Jung (2005) proposed a microarray sample size calculation method for the two-group comparison problem that allows researchers to determine the sample size necessary to identify approximately r1 differentially expressed genes while controlling the false discovery rate (FDR) at a desired level f. This basic concept is easily extended to the T-treatment case and designs that involve pooling. Borrowing notation and concepts from Jung (2005), we have that the individual significance level necessary for identification of approximately r1 differentially expressed genes while controlling FDR at level f is
|
| (19) |
*, we seek values of n and P such that
|
| (20) |
g(
*) denotes the power of the
* level test for gene g that can be approximated using (15) and (16). Based on the results of our simulation study in Section 3.1, the use of our power approximation in (22) should suggest sample sizes that are at least as large as necessary to identify the desired number of differentially expressed genes. On the other hand, we would expect the method of Kendziorski et al. (2003) to recommend sample sizes that are smaller than those actually needed to achieve the specified level of discovery. As in any power and sample size calculation, users are required to provide the values of unknown parameters like m0 and—separately for each gene—the variance and mean parameters necessary for computing (15) and (16). While it is conceivable to estimate such parameters from pilot microarray experiments, a discussion of such strategies is beyond the scope of this article. As a starting point in the absence of pilot data, researchers may wish to assume that effect sizes and variance components are identical for all differentially expressed genes, in which case (22) can be simplified as discussed by Jung (2005).
Whether to pool individuals and how to pool them to minimize the loss of information are important issues in microarray experiments. For a fixed total number of individuals and arrays, a design that includes mRNA pools always leads to smaller power than a design in which each array corresponds to an individual. Under some conditions, however, the loss of power is small, and it is possible to find a low-cost design which almost achieves the power that can be obtained when arraying each individual.
| APPENDIX |
|---|
|
|
|---|
Expression (8) is derived as follows:
|
|
|
|
|
|
x and g'(µ x) denotes the vector of partial derivatives of g evaluated at µx. If we let |
|
|
|
| ACKNOWLEDGEMENT |
|---|
|
|
|---|
This study was funded by the Center for Integrated Animal Genomics at Iowa State University.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Charlie Hodgman
Received on October 2, 2006; revised on February 26, 2007; accepted on February 28, 2007
| REFERENCES |
|---|
|
|
|---|
Chen J. Analysis of variance components in gene expression data. Bioinformatics (2004) 20:1436–1446.
Geller S, et al. Transformation and normalization of oligonucleotide microarray data. Bioinformatics (2003) 19:1817–1823.
Han E, et al. Reproducibility, sources of variability, pooling, and sample size: important considerations for the design of high-density oligonucleotide array experiments. J. Gerontol. Biol. Sci (2004) 4:306–315.
Jung S-H. Sample size for FDR control in microarray data analysis. Bioinformatics (2005) 21:3097–3104.
Kendziorski CM, et al. The efficiency of pooling mRNA in microarray experiments. Biostatistics (2003) 4:465–477.[Abstract]
Kendziorski CM, et al. On the utility of pooling biological samples in microarray experiments. Proc. Natl Acad. Sci. USA (2005) 102:4252–4257.
Lu C. Improving the scaling normalization for high-density oligonucleotide GeneChip expression microarrays. BMC Bioinformatics (2004) 5:103–108.[CrossRef][Medline]
Mendes M, Pala L. Evaluation of four tests when normality and homogeneity of variance assumptions are violated. J. App. Sci (2004) 4:38–42.
Shih JH. Effects of pooling mRNA in microarray class comparisons. Bioinformatics (2004) 20:3318–3325.
This article has been cited by other articles:
![]() |
L. E. McNamara, M. J. Dalby, M. O. Riehle, and R. Burchmore Fluorescence two-dimensional difference gel electrophoresis for biomaterial applications J R Soc Interface, July 1, 2009; (2009) rsif.2009.0177.focusv1. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||












