Bioinformatics Advance Access originally published online on August 12, 2007
Bioinformatics 2007 23(20):2686-2691; doi:10.1093/bioinformatics/btm399
Expression ratio evaluation in two-colour microarray experiments is significantly improved by correcting image misalignment
1Centre de Génétique Moléculaire, CNRS UPR2167 and Gif/Orsay DNA MicroArray Platform (GODMAP), 91190 Gif-sur-Yvette, 2Université Pierre et Marie Curie - Paris 6, 75005 Paris, 3Université Paris-Sud - Paris 11, 91405 Orsay, France and 4Department of ARTEMIS, INT, GET, Evry, F-91000
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Two-colour microarrays are widely used to perform transcriptome analysis. In most cases, it appears that the red and green images resulting from the scan of a microarray slide are slightly shifted one with respect to the other. To increase the robustness of the measurement of the fluorescent emission intensities, multiple acquisitions with the same or different PMT gains can be used. In these cases, a systematic correction of image shift is required.
Results: To accurately detect this shift, we first developed an approach using cross-correlation. Second, we evaluated the most appropriate interpolation method to be used to derive the registered image. Then, we quantified the effects of image shifts on spot quality, using two different quality estimators. Finally, we measured the benefits associated with a systematic image registration. In this study, we demonstrate that registering the two images prior to data extraction provides a more reliable estimate of the two colours ratio and thus increases the accuracy of measurements of variations in gene expression.
Availability: http://bioinfome.cgm.cnrs-gif.fr/
Contact: tang{at}cgm.cnrs-gif.fr
| 1 INTRODUCTION |
|---|
|
|
|---|
Transcriptome analysis is now routinely performed using two-colour microarrays. In this technology, first developed by P. Brown (Schena et al., 1995; Shalon et al., 1996), cDNA targets are obtained for each condition to be studied, by reverse transcription of extracted mRNA. Differential analysis of the transcriptomes is conducted by taking equal amounts of cDNA targets from each condition, labelling them with different fluorophores (usually Cy5 and Cy3 dyes, which fluoresce at red and green wavelengths, respectively) and then hybridizing them competitively on the same slide. The determination of the levels of expression of all the genes is done in parallel by measuring and comparing, for each spot, the levels of the intensities of the fluorescence emission at the appropriate wavelengths (red and green in this case). Therefore, microarray experiments strongly rely on the quality of the data extraction process, i.e. the image acquisition and analysis steps (Ahmed et al., 2004). Image segmentation is used to determine, for each spot, the area or set of pixels that are related to the foreground signal whereas the remaining neighbouring ones are usually considered as background (Yang et al., 2001). Ratio analysis, associated with quality scores and weights (Novikov and Barillot, 2005), is commonly employed to determine expression differences between two samples.
However, several other approaches have been studied, in order to increase the robustness of the measurement. Among them, the method using multiple acquisitions of fluorescence emission intensities with the same or different photo-multiplier tube (PMT) gains has shown interesting results (Garcia de la Nava et al., 2004; Khondocker et al., 2006; Romualdi et al., 2003). Image processing in this case requires registration between the red and green images but also among the multiple scans. In most cases, registration is necessary because the red and green images are shifted slightly one with respect to the other because of slight optical misalignments or mechanical drifts. Thus, the contours of any spot in the two images are no longer perfectly superimposed. The same is true for multiple acquisitions.
In the present study, we evaluated the impact of image misalignment on the accuracy of the measurement of the expression ratio, particularly with respect to the homogeneity of the spots. Further, to correct the effects of misalignment, we propose a simple and efficient methodology using cross-correlation registering.
| 2 METHODS |
|---|
|
|
|---|
2.1 Microarrays and scanning of images
In order to assess a broad range of experimental datasets, several types of slides were evaluated as well as different types of experiments. The slides correspond to competitive hybridizations of Cy5 and Cy3 labelled cDNA targets, in the context of various research projects underway on the Gif/Orsay DNA Microarray Platform (GODMAP). A total of 123 slides were processed. They can be classified into five different groups according to their characteristics (see Table 1).
|
Two different scanners, an Axon Genepix 4000B two-laser scanner and a Tecan LS400 Reloaded four-laser scanner, were used to scan the slides and to obtain the pairs of red and green images. Images were acquired by measuring the red and green emission fluorescence intensities following excitation of the fluorophores either at the same time (Axon 4000B) or sequentially (Tecan LS400 Reloaded). Most of the images were acquired with a resolution of 10 µm, whereas a few slides were scanned at both 5 and 10 µm resolution in order to provide a further comparison. A total of 179 scans were done with the PMTs set either automatically or manually to balance the distributions of the red and green intensities and to optimize the dynamics of image quantification.
Slides from groups 1 to 3 were scanned only once at 10 µm resolution with an Axon scanner, whereas for the two other groups, different scanning conditions were used. Note that for group 4, scans with the Tecan scanner at 10 µm were repeated three times, yielding 24 scans for the 8 slides.
2.2 Determination of image shift
Determining the relative translation (shift) between a pair of red and green images can be done by cross-correlation (Barnea and Silverman, 1972; Pratt, 1974), which gives a measure of the similarity of the two images. Since the major features (arrays of spots) present in the two images are geometrically equivalent, cross-correlation is sufficient to give robust values (Brown, 1992; Zitova and Flusser, 2003).
The cross-correlation is defined as
|
| (1) |
In analogy with the convolution theorem, the cross-correlation satisfies
|
| (2) |
|
| (3) |
This yields a cross-correlation image (referred to below as the correlation map) from which the maximum of correlation (usually close to the correlation map's origin) is determined. The relative position of this maximum of correlation with respect to the cross-correlation origin (centre of the map) constitutes an accurate measure of the translation (
x,
y) between the two images. In practice, the cross-correlation is computed using the central square region of each image. These square regions have a size of 2048 by 2048 pixels, when possible. Otherwise a size of 1024 by 1024 pixels is used. For instance, for the same slide scanned at both 5 and 10 µm resolution, a square region of 2048 by 2048 pixels is evaluated for the images at 5 µm, whereas one of 1024 by 1024 pixels is used at 10 µm, in order to have the same area of computation and then, to be able to compare the shifts. Doing so, the precision with which the shift is determined is an integer. In order to enhance the precision, i.e. to have an estimation of the shift of less than 1 pixel, we introduced a second step in which the sub-image around the maximum is zoomed by a factor of 2, 4 or 8. This provides a precision in the estimation of the shift of 1/2, 1/4 or 1/8 pixel. The zoom is achieved by computing the Fourier transform of this sub-image, extending this Fourier transform with null values to get a 2-, 4- and 8-fold larger complex image prior to reverse transformation to real space, thus yielding a cross-correlation (sub-)image, i.e. 2, 4 or 8 times zoomed. Thus, a one pixel translation in this zoomed correlation image corresponds to a 1/2, 1/4 or 1/8 pixel translation of the original image, respectively (see Fig. 1). This method was validated and is now routinely used to search the correlation maxima in electron microscopic images (Henderson et al., 1986).
|
In addition, a lack of precision in the introduction of the slide into the scanner can generate a rotation of the images. However, this rotation will be the same for both the red and the green images resulting from the same acquisition and, if the slide is not removed between two acquisitions, it will be the same for multiple acquisitions. For this reason, this work is focused only on translational shifts.
2.3 Image shift correction
The registered (translation corrected) image is computed from the original shifted image by applying the inverse translation (–
x, –
y). Calculation of the image is performed using the bilinear interpolation method (Lehmann et al., 1999; Thévenaz et al., 2000), the formula for which is given by
|
| (4) |
2.4 Quality estimators
In order to evaluate, for each spot, the benefit of registering the images, we use two quality estimators. The first one evaluates the local similarity between the red and the green images, while the second one gives a measure of the homogeneity of the spot.
2.4.1 Local correlation between images
The first estimator is the coefficient of determination (Novikov and Barillot, 2005) (square of the local correlation). It is computed considering, in each image, a region of interest containing a given spot and its neighbourhood. The region of interest is defined as a window the width of which (respectively height) is the column (respectively row) spacing between the spots, given by the spotting data. It is centred on the spot. The formula used is the following:
|
| (5) |
2.4.2 Coefficient of variation
The second estimator is the locally reduced SD of the signal within a spot, also called the coefficient of variation (Cv). It provides a measure of the homogeneity of each spot and is defined as follows:
|
| (6) |
|
|
A Cv value close to 0 corresponds to a regular (homogeneous) spot (the red and green image signals are correlated). Conversely, a large Cv value indicates an irregular (heterogeneous) spot (appearing as a green/red mosaic spot in the composite RGB image).
| 3 RESULTS |
|---|
|
|
|---|
3.1 Interpolation method
To apply a translation (–
x, –
y) to an image, where
x and
y can be real values, an interpolation method is required. To determine which interpolation method is the most appropriate to shift microarray images, we tested several interpolation schemes. As microarray images are contrasted images with rather sharp transitions between signal and background regions, interpolation schemes like bicubic or quadratic interpolations were found to be inappropriate because they tend to introduce negative (or zero) values at the boundaries of the spots. Conversely, bilinear interpolation does not exhibit such a drawback as it yields only values that are bounded within the original image's pixel values. In order to estimate the effect of bilinear interpolation on microarray images, we performed the following. For a given image, we first translated it 1/8th of a pixel forward along x and then translated it backward the same amount and, finally, computed the correlation with the original image. We did the same for a 1/4, 1/2, 1 pixel move forward and backward. We repeated the same test on the different types of images scanned at 5 or 10 µm resolutions. The greatest effect corresponds to a decrease of <1.5% of the correlation between the original image and the image obtained after two opposite translations. Thus, we can estimate that interpolation has a very limited effect on the image content (the maximum effect is up to 1% of signal variation on the average). Therefore, and despite this slight smoothing effect, we have used bilinear interpolation throughout this study.
3.2 Measuring translational shifts between images
We computed the translational shift between the red and green images over our whole dataset. The distributions (histograms) of the measured
x and
y shifts are shown in Figure 2. Note that the shifts between the images are not constant for a given scanner nor for a given resolution, but rather vary randomly from 0 to more than 10 µm in some cases. This can be observed in each image group, whatever the resolution or the scanner used.
|
For all the slides scanned at 5 µm resolution, a shift in at least one direction was detected. Of the 104 slides scanned at 10 µm with the Axon scanner, 60 image pairs presented a shift in the x and y directions, 41 presented a shift in only one direction and only 3 were already correctly registered. For the Tecan scanner, all image pairs (10 µm) presented a shift in the x and y directions.
3.3 Measuring the effect of a shift
We investigated the effects of translational misalignment between images on the quality of the extracted signal for each spot. Since we observed that we cannot control or predict the shift between the red and green images in a real scanning situation, we began by studying the impact of translation by using a single image as a reference for the two images. In this case, since the two images are identical, if they are not shifted, the value of the coefficient of variation, Cv, would be equal to 0 (and the coefficient of local determination,
2, equal to 1 for each spot). When applying shifts, we observed that
2 decreases whereas Cv increases as expected. Consequently, registering images tends to improve the homogeneity of the spots.
We then used slides upon which were hybridized the same sample labelled with both fluorophores to study the distributions of the quality factors before and after registration (Fig. 3, group 1), since the ratios are known. The process clearly moves the distribution of the coefficient of local determination towards 1, reducing the number of coefficients with low values. This global increase of the determination coefficients demonstrates that the regions of interest studied (each spot and its neighbourhood) are more similar between channels after registration than before and, thus, that the registration has been successful. Similarly, the distribution of Cv moves towards 0, showing more homogeneous ratios inside the majority of each spot. Finally, the conclusion is similar when considering several other pairs of red and green images listed in Table 1 (Fig. 3)
|
Although the registration process is conducted on a global level, it is important to notice that it is efficient enough to locally ensure registration at the level of each spot. However, in some cases, the local coefficient of determination is not improved by the registration process. A survey of the corresponding spots shows that these spots have specific textures, contents or have barely detectable signals that make the two local images different enough, one compared to the other, to be only loosely correlated.
3.4 Signal homogeneity and intensity
To investigate why the registration process does not benefit all the spots in the same way, we attempted to characterize the effects of registration upon spots according to the logarithm of their median fluorescence signal A defined as the median of pixel to pixel geometric means of the red and green intensities:
|
| (7) |
To study how A influences the variation of Cv with increasing shifts, we proceeded as follows. For each pair of red and green images, we first registered the red image with respect to the green one using the correlation procedure previously described.
Then, we introduced successive increasing shifts between the two images and evaluated for different intervals of A values (less than 8, 8–9, 9–10, etc.), the corresponding variations of Cv.
- As a rule, as shown in Figure 4, for each series of images, increasing the image shifts increases the coefficient of variation Cv, (variance of the pixel-to-pixel ratios). Conversely, correcting the image shifts would decrease Cv.
- Second, the effects of the registration vary according to the value A of the spot. When A was low, e.g. lower than 9 (corresponding to a mean intensity lower than 512), we noted that Cv does not significantly decrease when the shift is reduced, whereas for values of A higher than 9, Cv decreases as the shift decreases.
- Finally, the values of Cv are lower for spots with high A values than for those with low A values.
|
3.5 Impact of registration on the identification of differentially expressed genes
To determine the impact of the registration on the differentially expressed genes, we used a reference design experiment, belonging to the image set 3. In this experiment, two conditions were tested against a common reference using six slides (three biological repetitions for each condition). There were three replicates of each gene present on each slide. The whole statistical analysis was done using the MAnGO software (Marisa et al., 2007).
First, we used the same methodology as described in Section 3.4 to generate increasing shifts up to 2 pixels from registered images. For each shift, we computed the list of the differentially expressed genes and we compared it to the one obtained with the registered images. As the shift increases, we noticed that the lists diverge increasingly.
We then computed an inter-slide analysis before and after registration. We used the Bonferroni–Hochberg method to adjust the P-values. A cut-off of 1.4 was chosen for the fold-change, and the alpha error risk was set to 0.05. Two major differences showing the importance of registration were noticed:
- Three more genes (79 against 82) were found differentially expressed after registration of the six image pairs.
- When considering the adjusted P-value of each differentially expressed gene, a smaller value was found for more than 70% (58 of 82) of the cases after registration. On the average, the P-value was reduced by one third, which increases the overall confidence that the genes are true positive differentially expressed genes.
| 4 DISCUSSION AND CONCLUSION |
|---|
|
|
|---|
First, this study shows that, in most cases, a shift exists between the red and green images after scanning microarray slides using two-colour laser scanners. The shift evaluation method that we developed using correlation was shown to be efficient and fast and, thus, can be performed at the same time the images are loaded into the image analysis software without a noticeable computing load. Image shift correction using bilinear interpolation was found to be satisfactory since the measured distortion (image smoothing) between the original image and a forward and back translated image was found to be negligible (<1.5%). Furthermore, these bilinear interpolation smoothing effects are largely compensated by the fact that the method is fast and it avoids negative values.
Our results also show that addressing the misregistration problem at the image level is efficient for correcting the shift at the spot level. This avoids any drawbacks that could occur with local approaches at the spot level, when for instance, a spot is absent or spurious in at least one of the two images. Furthermore, our study demonstrates that sub-pixel registration is required to optimally increase the accuracy of the evaluation of the expression ratio. Indeed, shift effects have been shown to be significant even for sub-pixel translations. Therefore, it is useful to measure and correct carefully the shift between the images.
Low- and high-intensity spots were found to behave differently. For bright spots, the two-colour images are quite similar after competitive hybridization. As there have been enough targets hybridized to result in the same ratio on each pixel of the spot, red and green image signals are correlated; whereas for weak spots, an unpaired hybridization process seems to be dominant yielding mosaic-like images. The two-colour image signals are no longer correlated as is the case for the bright spots. Therefore, a global correction of the image translation would increase the quality of intense spots, while for low-intensity spots, no benefit should be expected. Indeed, in some cases, even a decrease was occasionally observed that was not imputable to registration but solely to the mosaic-like nature of those spots. Furthermore, we observed that registering images prior to differential analysis may increase the homogeneity of responses amongst replicated spots and also reduce the P-values associated with differentially expressed genes.
One question remains: why are the effects of misalignment so important for Cv? Indeed, Cv variations were found to increase linearly with the shift (up to shifts of 1 pixel) and in most cases those variations were far from negligible. In principle, however, for a perfectly flat spot (where redx,y and greenx,y are constant over the spot's surface), the Cv variations should be null (as the pixel to pixel ratio remains constant over the spot). Therefore, it must be emphasized that the main reason for which this registration is crucial is that the spots are definitely heterogeneous with respect to deposition of material or in reactivity and, therefore, image alignment will reduce the pixel to pixel ratio fluctuations. We ran several simulations inducing shifts on model spots containing increasing levels of heterogeneity (data not shown) and we were able to reproduce the Cv variations observed on real spots, particularly the linear dependency of Cv on small shifts.
In conclusion, our study clearly demonstrates that, because of the heterogeneous nature of the spots, images should be registered prior to analysis and that such a registration process significantly reduces the variability of the pixel to pixel ratio measurement (Cv) and increases the reliability associated with differentially expressed genes. Since the spots are more homogeneous, the median of ratios is closer to the ratio of median values. The latter is used by most biologists, and therefore, the registration confers increased reliability on the measured values, yielding a stronger prediction of the variations in expression between conditions.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
The authors thank all the members of the GODMAP platform and the Bioinfome team of the Centre de Génétique Moléculaire, especially Dr Nancie Reymond for their help and remarks.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: David Rocke
Received on June 13, 2007; revised on July 26, 2007; accepted on August 2, 2007
| REFERENCES |
|---|
|
|
|---|
Ahmed AA, et al. Microarray segmentation methods significantly influence data precision. Nucleic Acids Res (2004) 32:e50.
Barnea DI, Silverman HF. A class of algorithms for fast digital image registration. IEEE Trans. Comput (1972) 21:179–186.
Brown LG. A survey of image registration techniques. ACM Comput. Surv (1992) 24:326–376.
Garcia de la Nava J, et al. Saturation and quantization reduction in microarray experiments using two scans at different sensitivities. Stat. Appl. Genet. Mol. Biol (2004) 3. Article11.
Henderson R, et al. Structure of purple membrane from halobacterium halobium: recording, measurement and evaluation of electron micrographs at 3.5 Å resolution. Ultramicroscopy (1986) 19:147–178.[CrossRef][Web of Science]
Khondoker MR, et al. Statistical estimation of gene expression using multiple laser scans of microarrays. Bioinformatics (2006) 22:215–219.
Lehmann TM, et al. Survey: interpolation methods in medical image processing. IEEE Trans. Med. Imaging (1999) 18:1049–1075.[CrossRef][Web of Science][Medline]
Marisa L, et al. MAnGO: an interactive R-based tool for two-colour microarray analysis. Bioinformatics (2007) doi:10.1093/bioinformatics/btm321.
Novikov E, Barillot E. An algorithm for automatic evaluation of the spot quality in two-color DNA microarray experiment. BMC Bioinformatics (2005) 6:293.[CrossRef][Medline]
Pratt WK. Correlation techniques of image registration. IEEE Transactions on Aerospace and Electronic Systems (1974) Vol. AES-10, pp. 353–358.
Romualdi C, et al. Improved detection of differentially expressed genes in microarray experiments through multiple scanning and image integration. Nucleic Acids Res (2003) 31:e149.
Schena M, et al. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science (1995) 270:467–470.
Shalon D, et al. A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. Genome Res (1996) 6:639–645.
Thévenaz P, et al. Interpolation revisited. IEEE Trans. Med. Imaging (2000) 19:739–758.[CrossRef][Web of Science][Medline]
Yang YH, et al. Analysis of cDNA microarray images. Brief. Bioinformatics (2001) 2:341–349.
Zitova B, Flusser J. Image registration methods: a survey. Elsevier Image Vis. Compu (2003) 21:977–1000.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




