Comment on 'Evaluation of the gene-specific dye bias in cDNA microarray experiments'
In their paper in Bioinformatics, Martin-Magniette et al. (2005) recommend complete dye-swap 1 designs for both direct and indirect dual label microarray experiments. These recommendations contradict our previous recommendations (Dobbin et al., 2003) for designing experiments, where we suggested minimizing or eliminating the use of dye-swap arrays. We show here that the recommendations of Martin-Magniette et al. are fundamentally flawed, and that in most realistic situations performing
... dye-swap arrays results in a poor experimental design. The key error made by these authors is that they focus on oversimplified situations in which only two RNA samples are being compared. There are two problems with this approach. First, if the goal is really just to compare gene expression in two RNA samples, then obviously the best design will be to place aliquots from both samples together on each array and label each sample with each dye half the time. So there really is no design question. The second and more serious problem with this approach, however, is that comparing gene expression in two RNA samples is almost never the goal of a microarray experiment. The goal is almost always to draw conclusions that are applicable beyond the particular RNA samples being studied, and this requires independent replication . Without independent experimental replication, either independent biological samples or independent replications of the entire experiment, depending on the context, one cannot make statistical inferences that apply beyond the RNA samples used. For example, in an experiment to evaluate the effect of different conditions on cell line gene expression, one must perform independent replicates of the experiment, in which multiple, different cell line cultures are grown up under each condition. Similarly, one cannot draw valid conclusions about differential expression in two populations of mice from an experiment that involves just two mice. One needs multiple independent mice from each population to capture the biological variation in the populations. When multiple independent replicates from different conditions or populations are used in an experiment, then the equation Martin-Magniette et al. have derived, based on the model of * To whom correspondence should be addressed. 1 An individual array is dye-swapped when, for each of the original batches of RNA which were tagged with Cy3 and Cy5, RNA is drawn from the same two batches and labeled in the opposite way as on the original microarray, and the two labeled samples are hybridized to a second array. When every array in an experiment is dye-swapped, this is called a complete dye-swap design. Kerr et al. (2002) , is no longer valid. The specific model equation 2 for the log-ratios is where Z ig is the normalized log-ratio for gene g on array i, (VG) 1g − (VG) 2g is the 'variety' effect, (DG) 1g − (DG) 2g is genespecific dye bias and F ig is the error term. The reason the model is not valid is that it contains a single term, 'variety,' which represents both a sample and a condition or population. But samples are different from conditions or populations, so terms need to be added to the model to distinguish between the two, as indicated in Dobbin and Simon (2002) . When such terms are added to the model, so that samples are conceptually separated from conditions or populations, the impact of taking multiple subsamples from the same batch of RNA (technical replication) becomes different from the impact of performing biologically independent replicates of the experiment. Without introducing additional terms into the model, technical replication is indistinguishable from biologically independent replication. If we let 'variety' represent condition or population, then a term for sample effects needs to be added to the model. Let S(v) indicate a sample from condition or population V . Then the model of Martin-Magniette et al. (2003) needs to be changed to: (1) The goal of the experiment is still to make inferences about the (VG) 1g − (VG) 2g term which represents differential expression between the classes of samples. But the model change is critical, because it distinguishes between different levels of replication, and results in different conclusions about the optimal experimental design. Also, conclusions about differential expression from such a model apply beyond the individual RNA samples used in the experiment, whereas conclusions based on Kerr et al. (2002) model do not (they apply only to the RNA samples used). Experiments with independent replicates from different classes (populations or conditions) are commonly called class comparison experiments (Simon et al., 2002). Martin-Magniette et al. (2005) recommend dye swapping every array in a reference 3 design. For class comparison experiments, there are situations in which a reference design may be reasonable, 2 Here we follow the notation of Martin-Magniette et al. (2005) . A simpler and reformulation of the model is presented in the supplemental material. 3 A dual-label reference design experiment is an experiment that includes the same reference sample on each array, tagged with the same dye.