Bioinformatics Advance Access originally published online on April 7, 2005
Bioinformatics 2005 21(12):2803-2804; doi:10.1093/bioinformatics/bti428
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment on Evaluation of the gene-specific dye bias in cDNA microarray experiments
Biometric Research Branch, National Cancer Institute, National Institutes of Health Bethesda, MD, USA
*To whom correspondence should be addressed.
In their paper in Bioinformatics, Martin-Magniette et al. (2005) recommend complete dye-swap1 designs for both direct and indirect dual label microarray experiments. These recommendations contradict our previous recommendations (Dobbin et al., 2003) for designing experiments, where we suggested minimizing or eliminating the use of dye-swap arrays. We show here that the recommendations of Martin-Magniette et al. are fundamentally flawed, and that in most realistic situations performing extensive dye-swap arrays results in a poor experimental design.
The key error made by these authors is that they focus on over-simplified situations in which only two RNA samples are being compared. There are two problems with this approach. First, if the goal is really just to compare gene expression in two RNA samples, then obviously the best design will be to place aliquots from both samples together on each array and label each sample with each dye half the time. So there really is no design question. The second and more serious problem with this approach, however, is that comparing gene expression in two RNA samples is almost never the goal of a microarray experiment. The goal is almost always to draw conclusions that are applicable beyond the particular RNA samples being studied, and this requires independent replication (Simon et al., 2002). Without independent experimental replication, either independent biological samples or independent replications of the entire experiment, depending on the context, one cannot make statistical inferences that apply beyond the RNA samples used. For example, in an experiment to evaluate the effect of different conditions on cell line gene expression, one must perform independent replicates of the experiment, in which multiple, different cell line cultures are grown up under each condition. Similarly, one cannot draw validconclusions about differential expression in two populations of mice from an experiment that involves just two mice. One needs multiple independent mice from each population to capture the biological variation in the populations.
When multiple independent replicates from different conditions or populations are used in an experiment, then the equation Martin-Magniette et al. have derived, based on the model of Kerr et al. (2002) is no longer valid. The specific model equation2 for the log-ratios is
, where Zig is the normalized log-ratio for gene g on array i, (VG)1g (VG)2g is the variety effect,
is gene-specific dye bias and Fig is the error term. The reason the model is not valid is that it contains a single term, variety, which represents both a sample and a condition or population. But samples are different from conditions or populations, so terms need to be added to the model to distinguish between the two, as indicated in Dobbin and Simon (2002). When such terms are added to the model, so that samples are conceptually separated from conditions or populations, the impact of taking multiple subsamples from the same batch of RNA (technical replication) becomes different from the impact of performing biologically independent replicates of the experiment. Without introducing additional terms into the model, technical replication is indistinguishable from biologically independent replication. If we let variety represent condition or population, then a term for sample effects needs to be added to the model. Let S(v) indicate a sample from condition or population V. Then the model of Martin-Magniette et al. (2003) needs to be changed to:
![]() | (1) |
Martin-Magniette et al. (2005) recommend dye swapping every array in a reference3 design. For class comparison experiments, there are situations in which a reference design may be reasonable, although we have shown that balanced block designs,4 which do not use a reference, are more efficient (Dobbin and Simon, 2002). The motivation Martin-Magniette et al. present for recommending that reference designs always dye-swap every array is the existence of a three-way sample-by-dye-by-gene interaction, which they hypothesize exists based on a previous study by Dombkowski et al. (2004). In the over-simplified model Martin-Magniette et al. use, with just two samples and no distinction between samples on the one hand and conditions or populations on the other, the three-way interaction term introduces bias into comparisons in a reference design. The reason this bias is introduced is because the three-way interaction term can be viewed as sample-specific dye bias, and because the model fails to distinguish between samples and conditions/populations, bias related to the sample automatically becomes bias related to the condition/population. But for class comparison experiments, which allow for statistical inference beyond the particular samples studied, and which require a more sophisticated model like the one in Equation (1), dye-swap arrays are not required to remove the bias. Indeed, as we will show, a complete dye-swap reference design is clearly inferior to a reference design in this situation.
For a fixed number of arrays, a complete dye-swap reference design involves half as many independent samples as a reference design. So, dye-swapping every array in a reference design halves the effective sample size. Is such a radical reduction in sample size justified by the existence of the three-way interaction term? The answer is no. To see this, add the three-way interaction terms, (DGS)dgs to the model of Equation (1), and let condition V = 0 represent the reference sample on each array,
![]() | (2) |
Zg(1)0 Zg(2)0
= (VG)1g (VG)2g + (DGS)dg(1) (DGS)dg(2), where (DGS)dg(v) is the average of the interaction effects over samples from condition or population V. (Note that the individual SG sample effects will cancel out of the expected value, so we have omitted them.) If a random effects model is used for the three-way interaction, then (DGS)dgs
N(µ,
2 ), E
Zg(1)0 Zg(2)0
= (VG)1g (VG)2g, and the reference design yields unbiased estimates of the class difference. Alternatively, if fixed effects are used for the interaction term, then under the usual model constraints, required for model identifiability,
s
V(DGS)dgs (DGS)dg0 = 0 for V = 1,2, yielding E
Zg(1)0 Zg(2)0
= (VG)1g (VG)2g, and the reference design estimates are unbiased. So, under both a fixed-effects and a random-effects model for the interaction term, the reference design yields unbiased estimates of the class distinctionwithout any dye-swaps. Moreover, the reference design will be more efficient than the complete dye-swap reference design. Intuitively, the reason for the improved efficiency of the reference design is that it allows twice as many samples to be used in the same number of arrays. A more detailed proof of the efficiency advantage appears in Dobbin et al. (2003). In conclusion, a reference design provides unbiased and more efficient estimates of differential gene expression than a complete dye-swap reference design for class comparison experiments. Now we turn to designs that do not involve a reference sample. In this case also, Martin-Magniette et al. (2005) recommend a dye-swap design, but it is unclear whether by this they mean a complete dye-swap design, which dye swaps every array, or not. The motivation for recommending dye-swapping arrays in this case is somewhat different from that in the reference design case. But it is still based on the same flawed model. Their motivation is to remove the two-way dye by gene interaction, which we have shown can be done without dye-swapping arrays (Dobbin et al., 2003). When one properly distinguishes between samples and conditions/populations, as in our Equation (1), one finds that dye-swapping is much less efficient than independent replication of the experiment with the labeling reversed (such as in a balanced block design). And, using arguments analogous to the reference design situation above, even if sample-by-gene-by-dye interaction terms are present, dye-swapping individual arrays is not necessary to remove the bias from the class comparisons. So, systematically dye-swapping individual arrays in a non-reference design is inadvisable when the goal is class comparison.
Finally, while we have shown that neither the existence of interactions between gene and dye, nor interactions between gene and dye and sample, justify systematically dye-swapping individual arrays, one might wonder if interactions between gene and dye and population/condition would change the situation. These interaction terms would appear as (DGV)dgv (DGV)dg0 in Equation (2). Such an interaction term has not to our knowledge been empirically evaluated. But, for the sake of argument, suppose it did exist. In the case of non-reference designs for class comparison, the bias would cancel out of comparisons between the populations/conditions in a balanced block design, so this design would remain optimal. No dye-swaps would be required. Hence, even under this fairly unlikely scenario, dye-swapping is not a good idea.
In conclusion, the findings of Martin-Magniette et al. (2005) must be carefully interpreted within their very limited context, and in practice dye-swap arrays should be used sparingly if at all, particularly in class comparison experiments.
| Footnotes |
|---|
1An individual array is dye-swapped when, for each of the original batches of RNA which were tagged with Cy3 and Cy5, RNA is drawn from the same two batches and labeled in the opposite way as on the original microarray, and the two labeled samples are hybridized to a second array. When every array in an experiment is dye-swapped, this is called a complete dye-swap design.
2Here we follow the notation of Martin-Magniette et al. (2005). A simpler and reformulation of the model is presented in the supplemental material. ![]()
3A dual-label reference design experiment is an experiment that includes the same reference sample on each array, tagged with the same dye. ![]()
4A balanced block design for two classes pairs a samples from one class with a sample from the other class on each array, balancing the labels used for each class but using each biologically independent sample only once. Balanced block designs generalize to multiple classes, and have a long history in statistical literature (see, for example, Cochran and Cox, 1992). ![]()
Received on February 16, 2005; revised on April 1, 2005; accepted on April 1, 2005
| REFERENCES |
|---|
|
|
|---|
Cochran, W.G. and Cox, G.M. Experimental Designs, (1999) 2nd ed , New York, NY John Wiley and Sons.
Dobbin, K. and Simon, R. (2002) Comparison of microarray designs for class comparison and class discovery. Bioinformatics, 18, 14381445
Dobbin, K., et al. (2003) Statistical design of reverse dye microarrays. Bioinformatics, 19, 803810
Dombkowski, A.A., et al. (2004) Gene-specific dye bias in microarray reference designs. FEBS Lett., 560, 120124[CrossRef][Web of Science][Medline].
Kerr, M.K., et al. (2002) Statistical analysis of gene expression microarray experiment with replication. Statist. Sinica, 12, 203217.
Martin-Magniette, M., et al. (2005) Evaluation of gene-specific dye bias in cDNA microarray experiments. Bioinformatics, 9, 19952000.
Simon, R., et al. (2002) Design of studies using DNA microarrays. Genet. Epidemiol., 23, 2136[CrossRef][Web of Science][Medline].
This article has been cited by other articles:
![]() |
I. C. Macaulay, M. R. Tijssen, D. C. Thijssen-Timmer, A. Gusnanto, M. Steward, P. Burns, C. F. Langford, P. D. Ellis, F. Dudbridge, J.-J. Zwaginga, et al. Comparative gene expression profiling of in vitro differentiated megakaryocytes and erythroblasts identifies novel activatory and inhibitory platelet membrane proteins Blood, April 15, 2007; 109(8): 3260 - 3269. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Fu and R. C. Jansen Optimal Design and Analysis of Genetic Studies on Gene Expression Genetics, March 1, 2006; 172(3): 1993 - 1999. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



