Skip Navigation


Bioinformatics Advance Access originally published online on October 10, 2006
Bioinformatics 2006 22(23):2910-2917; doi:10.1093/bioinformatics/btl502
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/23/2910    most recent
btl502v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Lehmussola, A.
Right arrow Articles by Yli-Harja, O.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lehmussola, A.
Right arrow Articles by Yli-Harja, O.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Evaluating the performance of microarray segmentation algorithms

Antti Lehmussola *, Pekka Ruusuvuori and Olli Yli-Harja

Institute of Signal Processing, Tampere University of Technology PO Box 553, 33101 Tampere, Finland

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHMS
 3 PERFORMANCE EVALUATION
 4 RESULTS
 5 CONCLUSION
 REFERENCES
 

Motivation: Although numerous algorithms have been developed for microarray segmentation, extensive comparisons between the algorithms have acquired far less attention. In this study, we evaluate the performance of nine microarray segmentation algorithms. Using both simulated and real microarray experiments, we overcome the challenges in performance evaluation, arising from the lack of ground-truth information. The usage of simulated experiments allows us to analyze the segmentation accuracy on a single pixel level as is commonly done in traditional image processing studies. With real experiments, we indirectly measure the segmentation performance, identify significant differences between the algorithms, and study the characteristics of the resulting gene expression data.

Results: Overall, our results show clear differences between the algorithms. The results demonstrate how the segmentation performance depends on the image quality, which algorithms operate on significantly different performance levels, and how the selection of a segmentation algorithm affects the identification of differentially expressed genes.

Availability: Supplementary results and the microarray images used in this study are available at the companion web site http://www.cs.tut.fi/sgn/csb/spotseg/

Contact: antti.lehmussola@tut.fi


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHMS
 3 PERFORMANCE EVALUATION
 4 RESULTS
 5 CONCLUSION
 REFERENCES
 
Massive parallel identification of gene expression levels using microarray technology provides an insight into various cellular processes. Since the discovery of microarray technology (Schena et al., 1995), microarrays have been utilized in wide variety of different biomedical disciplines. Nevertheless, it is a rather well-known fact that different sources of variability hinder the quality of microarray experiments, compromising e.g. the reproducibility of the obtained results (Marshall, 2004; Tan et al., 2003). For this reason, recent years have introduced various improvements in different aspects of core technology and analysis of experiments. When considering the analysis of microarray experiments, one of the continuously developed stages affecting the quality of the resulting data is the analysis of microarray images (Zhang et al., 2004).

Image analysis allows automated quantification of gene expression levels from the scanned images. The analysis pipeline can be categorized into three stages. First, spots are preliminarily located from the images with gridding. Second, using the available gridding information, each microarray spot is individually segmented into regions of two classes: foreground and background. Finally, the true gene expression levels are estimated from the areas determined by segmentation. Ideally, the image analysis would be a rather trivial process, if the background was noise and artifact free and all the spots had circular shape, similar size, and homogeneous intensity. Instead, the fairly common low quality of microarrays makes the analysis more challenging.

In this study, we explore relative performance differences between various microarray segmentation algorithms, and evaluate how the methods affect the resulting data. Previously, the influence of segmentation inaccuracy has been seen as a minor error component when comparing with the effect of a background estimation method (Yang et al., 2002). However, recently it has been shown that segmentation methods can significantly influence microarray data precision (Ahmed et al., 2004). Although new microarray segmentation algorithms are actively introduced, extensive comparisons between different methods have acquired less attention. For example, in (Bajcsy, 2006) a broad overview of different methods is given, however, without any quantitative comparison between the methods. In (Yang et al., 2002; Ahmed et al., 2004) segmentation methods are quantitatively compared, but both studies concentrate only on four algorithms. For this study, we selected a more extensive set of algorithms. The selected set encompasses a wide variety of different methods published for microarray segmentation in recent years.

Measuring the performance of microarray segmentation algorithms is a challenging issue. Although the most natural way for evaluating the segmentation algorithms would be to measure the segmentation error on a pixel-level, it is practically impossible to obtain any essential ground-truth information from the microarray experiments. To overcome this problem we utilize simulated microarray images, which allow measuring the pixel-level accuracy of the segmentation results. In addition, we use replicated microarray experiments for calculating indirect measures of segmentation performance and for studying the characteristics of the resulting data. In addition to the results published in this paper, supplementary results are available at the companion web page http://www.cs.tut.fi/sgn/csb/spotseg/.


    2 ALGORITHMS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHMS
 3 PERFORMANCE EVALUATION
 4 RESULTS
 5 CONCLUSION
 REFERENCES
 
Nine segmentation algorithms having divergent approaches were selected for the study. The set of algorithms covers a wide range of available methods for microarray segmentation from traditional algorithms to more recent ones. In order to keep the analysis more informative, we did not aim at selecting all existing methods for this study. Some methods were excluded due to unsubstantially heavy computational complexity which would have limited the extent of used test data [e.g. (Gottardo et al., 2006)]. The nine selected algorithms are listed in Table 1.


View this table:
[in this window]
[in a new window]

 
Table 1 Summary of the algorithms used in this study

 
The fixed circle algorithm is probably one of the first segmentation methods used in microarray studies. The algorithm is available e.g. in the software tool ScanAnalyze (Eisen, 1999; http://rana.lbl.gov/EisenSoftware.htm) and provided as an option in most microarray image analysis software (Yang et al., 2001). The algorithm relies on a simplistic assumption: All microarray spots are considered circular with a constant radius. After gridding, a circular mask of a fixed radius is placed on each spot location, and everything inside the mask is considered as spot foreground, everything else background.

The adaptive circle algorithm provides flexibility for the traditional fixed circle. Similarly, the algorithm assumes all spots as circular. However, the radius for each spot is estimated separately. Some software allow users to manually adjust the radius for each spot. Considering the large amount of microarray spots, such approach is extremely laborious and time consuming. An automated version of the adaptive circle is available in the Dapple software (Buhler et al., 2000), where the radius for each spot is estimated using edge detection. First, the outline of each spot is enhanced using the second-difference approximation of Laplacian. Thereafter, the radius of a circle matching the given enhanced edges is identified with matched filtering.

In the software Spot (Yang et al., 2002), the seeded region growing algorithm (Adams and Bischof, 1994) was used for microarray segmentation for the first time. The algorithm segments each spot by iteratively growing separate regions with respect to a set of predefined seed points providing a starting point for the segmentation. In each iteration, the algorithm includes the most homogenous pixels from the neighborhood to the segmented regions. The algorithm aims at ensuring that the final segmented regions are as homogeneous as possible given the connectivity constraint. Finally, the region originating from the foreground seeds is considered as the spot foreground, and the region originating from the background seeds as the background.

A segmentation algorithm based on the statistical Mann–Whitney test was first introduced in (Chen et al., 1997). The algorithm iteratively computes the threshold between foreground and background using the Mann–Whitney test. The Mann–Whitney test is a non-parametric statistical test for assessing the statistical significance of the difference between two distributions. First, a circular target mask enclosing all possible foreground pixels separating them from the known background is selected. Second, a set of random pixels from the background are compared against a selected amount of pixels with the lowest intensity within the target mask using Mann–Whitney test. If the difference between the two sets is not significant, the algorithm discards some predetermined number of pixels from the target area and selects new pixels from the target area. The iteration is terminated when the two sets significantly differ from each other. Finally, the spot foreground is considered as the pixels remaining inside the target mask after iterations.

The k-means segmentation algorithm is based on the traditional k-means clustering, and was first used in connection with microarray images in (Bozinov and Rahnenführer, 2002). The segmentation result is derived using simultaneous information from both channels. That is, for each spatial location the intensities from both channels are combined as one feature vector. Since the segmentation is used for dividing the image into the regions of foreground and background, the number of cluster centers k is assigned to two. As the initial cluster centers, the pixels with minimum and maximum intensities are selected. All data points are then assigned into nearest cluster centers according to Euclidean distance. Thereafter, new cluster centers are calculated. Finally, the algorithm is iteratively repeated until the cluster centers stay unaltered.

The hybrid k-means algorithm (Rahnenführer and Bozinov, 2004) is an extended version of the original k-means segmentation approach (Bozinov and Rahnenführer, 2002). The algorithm uses repeated clustering to increase the number of foreground pixels. As long as the minimum amount of foreground pixels is not reached, the remaining background pixels are clustered into two groups and the group with higher intensity pixels are assigned into the foreground. In addition, the number of outlier pixels in the segmentation result is reduced with mask matching. A bivalence mask representing average segmented spot shape is generated, and all pixels originally assigned to the foreground in the original segmentation result and to the background in the mask are deleted, and vice versa.

The Markov random field (MRF) modeling for the microarray spot segmentation was introduced in (Demirkaya et al., 2005). The method models spot foreground and background intensities as exponential distributions. In addition to the intensity information, the method takes the spatial information into account by modeling the neighborhood pixel labelings with MRF. Initial classification into spot foreground and background is used as a basis for the segmentation, and the initial segmentation affects the final result given by MRF.

Model-based segmentation algorithm (Li et al., 2005) is a two-step method for spot segmentation. The main steps of the method are model-based clustering of pixel values and spatial extraction of connected components. Model-based clustering forms the initial segmentation into at most three different clusters sharing similar intensity values, which are the background, the spot with background or artifact, and the spot foreground. Model-based clustering relies on Gaussian mixture models, and the number of clusters is defined based on data by using Bayesian Information Criterion (BIC). Spatial connected component removal is used for excluding small disconnected clusters that are assumed to be artifacts from the spot foreground pixels. Though the algorithm actually provides spot foreground as a separate cluster, we used both the foreground and the spot with artifact clusters to denote the foreground. Otherwise the model-based segmentation algorithm would have mostly given missing values, and as a result, it would have been excluded from the performance evaluation.

In this study, we call the segmentation method included in the Matarray toolbox (Wang et al., 2001) for Matlab (Mathworks Inc, MA) as the Matarray method. The algorithm combines both spatial and intensity information in segmentation. Similarly as in the Mann–Whitney method, a circular target mask is first used for separating all possible foreground pixels from the known background. Pixels inside the target mask having larger intensity than µbkg + 2 x {sigma}bkg, where µbkg and {sigma}bkg denote the local background mean and standard deviation, are considered as putative foreground pixels and are used for calculating new spot centroids. Thereafter, the foreground and background pixels for each spot are iteratively redefined. Satisfactory result is obtained normally after two iterations (Wang et al., 2001).


    3 PERFORMANCE EVALUATION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHMS
 3 PERFORMANCE EVALUATION
 4 RESULTS
 5 CONCLUSION
 REFERENCES
 
The performance evaluation of microarray segmentation algorithms is not a straightforward task to consider. In traditional image processing, image segmentation and evaluation of the segmentation performance are fundamental problems. Commonly, the segmentation performance is derived using expert-segmented test images and some proper error metrics (Zhang, 1996; Sankur and Sezgin, 2004; Jiang et al., 2006). Unfortunately, the high-throughput nature of microarrays makes such approach unrealistic for this study. Already a single microarray may contain tens of thousands of spots, making all kind of manual segmentation of test images extremely laborious and error-prone. For this reason, we use a set of synthetic microarray images produced by microarray simulator (Nykter et al., 2006) as test images, for which the necessary ground-truth information is available; that is, each pixel is pre-assigned either to foreground or background class.

Although real microarray experiments do not allow direct performance measurements for segmentation results, indirect measures provide possibility for comparing different algorithms. Moreover, it is important to study how the selection of a segmentation algorithm affects the characteristics of the resulting data. In this study, we use the properties of replicated experiments to compare segmentation algorithms.

3.1 Simulated experiments
Perhaps the most natural way for evaluating segmentation algorithms is to measure the segmentation error on a pixel-level. Hence, we used microarray simulator (Nykter et al., 2006), designed to produce synthetic microarray data and images with realistic characteristics, for producing an extensive set of test images. For each image in the test set, the ground truth information is given, meaning that the correct segmentation result is known on a single pixel level. We simulated two sets of images: 50 good quality images and 50 low quality images, where each simulated microarray consisted of 1000 spots. That is, overall we had 50 000 spots from good quality microarrays and the same amount from low quality microarrays. The good quality images had low variability in spot sizes and shapes, and noise level was reasonably low. For low quality images, more irregularities were introduced in spot shapes, and variability in sizes was increased by allowing the spot diameter to vary more than for good quality spots. In addition, both background and foreground noise levels were set significantly higher for the low quality spots. Examples of both image types are available at the companion web page.

In order to study the pixel-level accuracy of segmentation, we selected two traditional measures: the probability of error and the discrepancy distance (Zhang, 1996). These two methods measure the segmentation accuracy from two different perspectives. Probability of error measures only the mis-segmented pixels, whereas discrepancy distance gives different weights for mis-segmented pixels based on how far they are located spatially from the nearest correct segmentation result. In spot segmentation, only one object, the spot, is discriminated from the background. Therefore, the performance measures are not required to be as complex as in e.g. segmentation of natural scenes. First, the probability of error is defined as

Formula 1(1)
where P(B|F) is the probability of error in classifying foreground as background, P(F|B) the probability of error in classifying background as foreground, P(F) and P(B) are a priori probabilities of foreground and background in spots. Second, discrepancy distance is defined as

Formula 2(2)
where N is the number of mis-segmented pixels, d(i) the Euclidean distance from the i-th mis-segmented pixels to the nearest pixel that actually belongs to the mis-segmented class, and A the number of pixels in the image. All simulated images were segmented with algorithms introduced in Section 2, and the probability of error and the discrepancy distance was measured individually for each segmented spot. The experiment resulted in 50 000 values for both error measures and both image types.

3.2 Real experiments
When evaluating the performance of segmentation algorithms, the usage of real microarray images can pose some challenges. In practice, it is not possible to obtain direct ground-truth information about the images. For example, we can not know exact gene expression levels or pixel-level segmentation results simultaneously for tens of thousands of spots. Therefore, we indirectly measure the performance of the algorithms and explore how different algorithms affect the resulting data.

The dataset used in this study consists of replicated experiments. Altogether, five replicate hybridizations from one experiment were performed using custom printed cDNA microarray slides from the same print batch. Briefly, labeling, hybridization and washing were done as follows. 50 µg of total RNA from MDA-361 and UACC-812 breast cancer cell lines were labeled with Cy3-dUTP and Cy5-dUTP (Amersham, Piscataway, NJ). The custom printed cDNA microarrays were comprised of 11 520 clones from Incyte Genomics IRAL cDNA library and 1136 clones from Research Genetics library (Incyte, Palo Alto, CA). Microarrays were printed on poly-l-lysine coated slides using an Omnigrid arrayer (GeneMachines/A1 Biotech) and scanned by a confocal laser scanner (Agilent Technologies, Palo Alto, CA). Normally, replicate experiments are used for reducing experimental variation in the study (Dudoit et al., 2002). Due to the replication, each spot should have the same gene expression ratio throughout the replicated experiments, and therefore the correlations between replicated experiments should be maximal.

All available images were analyzed similarly. First, all scanned images were gridded using commercial DeArray software (Scanalytics, Fairfax, VA), and the same gridding information was utilized for all segmentation algorithms. Although the effect of gridding accuracy on resulting microarray data is not evaluated in this study, it is essential to provide the same gridding information for all segmentation algorithms in order to obtain unbiased comparison. Second, images were segmented with each algorithm introduced in Section 2. The concept of ‘empty’ spot divided algorithms into two categories. Some methods always make an attempt to segment some pixels into spot, whereas the other methods allow spots to be totally absent if necessary. Since the calculation of expression ratios for empty spots is ambiguous, such spots have to be excluded from the analysis. However, if some algorithm excludes all low intensity spots and segments only good quality spots with high intensity there is a risk that our performance evaluation will be biased. Thus, for further analysis, we selected only spots, for which the segmentation result was obtained with all algorithms from all five replicates. Overall, we discarded 1904 spots from each replicate experiment, leaving 10 572 spots for further analysis. Thereafter, expression level for each included spot was estimated by subtracting the morphological background estimate (Yang et al., 2002) from the median intensity of the spot. Finally, the resulting data were normalized with LOWESS method (Cleveland, 1979).


    4 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHMS
 3 PERFORMANCE EVALUATION
 4 RESULTS
 5 CONCLUSION
 REFERENCES
 
Microarray simulation allowed us to measure the segmentation accuracy on a single pixel level. The measures of segmentation accuracy, the probability of error and the discrepancy distance, obtained after analyzing the set of images described in Section 3.1 are summarized in Table 2. The table contains estimates for the median probability of error and the median discrepancy distance for both good and low quality images. The results indicate that the k-means algorithm gave nearly error-free segmentation for the good quality images, whereas the Mann–Whitney algorithm produced clearly more erroneous segmentation for the same images. A more detailed view on the distributions of the error measures for both image types and each algorithm is available on the companion web page.


View this table:
[in this window]
[in a new window]

 
Table 2 The table summarizes the median probability of error and the median discrepancy distance for each algorithm and both types of simulated images

 
The interaction plot in Figure 1 displays segmentation accuracy represented by the median discrepancy distance as the dependent variable and the image quality as the independent variable. The plot demonstrates how the segmentation accuracy is affected by the image quality. A similar interaction plot with the probability of error as the dependent variable can be found at the companion web page. Logically, the majority of the algorithms perform better with the images of good quality compared with the images of low quality, the MRF algorithm being the only clear exception. The reason for this adverse behaviour was that the algorithm slightly oversegmented the good quality images in our test set. Although the oversegmentation was not visually substantial, it created enough error to produce this inverse effect. The largest degradation in segmentation accuracy was measured for the adaptive circle and Matarray algorithms. This degradation resulted from the increased background noise level, which clearly complicated the ability of both algorithms to discriminate the spot foreground from background. In contrast, the segmentation accuracy of the fixed circle algorithm did not degrade among image quality, demonstrating that the noise level is not critical for the performance of the algorithm. However, if the variability of spot sizes was increased more extensively in the simulation, the performance of the fixed circle algorithm would naturally have decreased.


Figure 1
View larger version (14K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 The interaction plot reveals how the discrepancy distance for each algorithm changes while the quality of analyzed images degrades. The highest decrease in segmentation accuracy is obtained with the adaptive circle and Matarray algorithms. The fixed circle algorithm preserves the same segmentation accuracy despite the lower image quality. Similar interaction plot for the probability of error is presented at the companion web page.

 
Following the results obtained by simulation, we present how the data extracted from the real microarray experiments were utilized. First, the relative similarities between the extracted intensity levels from all spots given by different algorithms were studied using hierarchical clustering. The resulting dendrogram is presented in Figure 2. Two algorithms most closely resembling each other are the k-means and hybrid k-means algorithms. This result suggests that with our test images, the bivalence mask of hybrid k-means did not substantially modify the original segmentation result of the k-means algorithm. Interestingly, also the Matarray and model based segmentation algorithms produce very similar results, as can also be seen from the results with low quality simulated images.


Figure 2
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2 Hierarchical clustering of the expression intensities from the replicated experiments reveals similarities between the algorithms. The most similar results are produced with the k-means and hybrid k-means algorithms. The intensitiesmost differing from the others result from the adaptive circle and Markov random field algorithms.

 
The correlation between the replicated experiments should be maximal. For each algorithm, we calculated the correlations between expression ratios of all possible pairs of the analyzed experiments (altogether ten correlation values per algorithm). The obtained pairwise correlations are presented as boxplots in Figure 3A. Generally, most of the algorithms show decent correlation between the replicates. The highest correlations were obtained with the k-means and the hybrid k-means algorithms, whereas the lowest correlations resulted from the Mann–Whitney and the adaptive circle algorithms. The variance of the results is the lowest for the seeded region growing, and significantly larger for the adaptive circle, the model-based segmentation, and the Matarray. Since correlation coefficient only measures the linear relationship between the replicates, not necessarily the sameness, we studied the obtained expression levels in more detail by measuring pairwise mean absolute error (MAE) between the replicates. Box plots summarizing the MAE values are presented in Figure 3B. The results clearly support the ones obtained with correlations. The boxplots demonstrate that the selection of the segmentation algorithm has a clear impact on the results.


Figure 3
View larger version (13K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3 Boxplots illustrating the pairwise (A) correlations and (B) mean absolute errors between all replicates. Higher correlation and lower MAE values can be considered as indications of higher segmentation performance.

 
With statistical testing, we explored significant performance differences between the algorithms. First, we globally compared all pairwise correlations for the replicated experiments obtained with different segmentation algorithms using nonparametric Kruskal–Wallis test. The test indicated a significant difference (p {approx} 0) between the performance of the segmentation algorithms. As Figure 3 illustrates, such result is unsurprising since a clear dispersion between the results can be observed. Thereafter, we used multiple pairwise comparisons for investigating which algorithms significantly differ from each other. The multiple testing was carried out using Wilcoxon rank sum with Bonferroni corrected alpha values. Figure 4 displays the obtained P-values between all pairs of algorithms and the result of hypothesis testing with {alpha} = 0.05. As Figure 4 demonstrates, there is no significant difference between the results of all algorithms. For example, four algorithms with the highest correlations: the k-means, the hybrid k-means, the model based segmentation, and the fixed circle produce all results with no significant difference. The algorithms diverging most notably from all others are the Mann–Whitney and the adaptive circle.


Figure 4
View larger version (25K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4 Identifying significant differences between correlations of replicate experiments obtained with different algorithms. (A) P-values for multiple comparisons using Wilcoxon rank sum. (B) The multiple hypothesis testing using with Bonferroni corrected alpha values ({alpha} = 0.05). White rectangles represent algorithms with no significant difference, and black rectangles algorithms with significant difference in results.

 
All previous results have demonstrated how the segmentation performance varies along the used algorithm. Thus, it is reasonable to assume that the algorithm selection also has an impact on the characteristics of the resulting gene expression data. One of the main goals in microarray data analysis is the finding of genes with differential expression. We used volcano plots, shown in Figure 4, to demonstrate the effect of the selected segmentation algorithm for the detection of differential gene expression. The volcano plots, displaying the P-value of each gene against the median fold change, show clear differences between the algorithms. For each algorithm, the differentially expressed genes are defined as the genes with two-fold change in expression and significantly low P-value (≤0.001) across the replicates (separated with lines in Figure 5). The number of genes identified as differentially expressed varies greatly between the algorithms, the two extremes being the Mann–Whitney and k-means algorithms. Although the true number of differentially expressed genes is not available, the results indicate that the ability to detect differential expression depends heavily on the applied segmentation algorithm.


Figure 5
View larger version (29K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 5 Volcano plots illustrating the characteristics of the resulting data, display the P -value of each gene against the median fold change. When identifying the differentially expressed genes as the genes with two-fold change in expression and significantly low P -value (≤ 0.001) across the replicates, clear differences can be found between algorithms.

 

    5 CONCLUSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHMS
 3 PERFORMANCE EVALUATION
 4 RESULTS
 5 CONCLUSION
 REFERENCES
 
We have studied the performance of nine microarray segmentation algorithms under simulated and real microarray experiments. The use of synthetic images allowed us to measure pixel-level segmentation accuracy for each algorithm, as is commonly done in traditional image processing. For example, we used this information to demonstrate which algorithms are more sensitive to degrading image quality. The real microarray experiments served as examples on how the segmentation process affects the resulting gene expression data. With the obtained data, we showed significant differences in the segmentation performances, and illustrated clear dissimilarities in the characteristics of the data. In addition, the results for the real images, being in accordance with the ones for simulated images, further confirmed that the simulated experiments gave indicative results about the segmentation performance. Overall, our results enable high-throughput comparison between the presented algorithms and provide possible guidelines for development of novel microarray segmentation algorithms.

When going through the results, some algorithms clearly stand out. Although no algorithm produced superior results, in the light of our results, the most efficient algorithms were the k-means and the hybrid k-means. Both of these algorithms accurately segmented the simulated images, and provided high correlations and low MAE values for the replicated experiments. The good performance can be explained by effective detection of low intensity spots and spots with abnormal shapes. More divergent results between these two algorithms would have resulted if the images contained more high intensity artifacts. Another important consideration is the simple fixed circle method. Although being fundamentally incapable of handling any shape variations in the single-spot level, the algorithm showed great robustness when performing equally well for both simulated image sets, and providing good results with the real experiments. Based on our results, the fixed circle method could be beneficial in cases where noise level is high. The k-means and the fixed circle algorithms provide interesting trade-off for microarray segmentation: the k-means algorithm, relying solely in the intensity information, can identify spots with any possible shapes, but is extremely sensitive to high-intensity artifacts, while the fixed-circle algorithm, being constrained with the circularity assumption, is insensitive to all other errors except the shape variation. Finally, other distinctive results were obtained with the adaptive circle and Mann–Whitney algorithms. With our test images, both algorithms gave rather unstable results. The weak performance of the adaptive circle algorithm probably originated from the failures of matched filtering to detect weak spot edges. A possible explanation for the poor performance of the Mann–Whitney algorithm is that the segmentation results contained a component of randomness as explained in Section 2; that is, the segmentation results for the same image were not always the same for subsequent segmentations. This can be considered as unwanted behavior, because one of the main motivations for automated image analysis is that the obtained results for same images are always consistent e.g. despite the analysis time or place (c.f. the images analyzed by humans). In addition, the reproducibility of microarray experiments has raised a great concern. Therefore, it would be essential that also the gene expression levels for the same scanned experiment would be reproducible.

Generally, our results emphasize the importance of segmentation in microarray experiments. Although significant performance differences were not found between all pairs of algorithms, clear distinctions were still identified. Thus, the selection of the segmentation algorithm can be seen as an important factor that should be considered carefully in microarray-based studies. A good example of the divergence in the data obtained with the algorithms is given by the selection of differentially expressed genes. Our results demonstrated clear differences in the data characteristics and especially in the number of genes identified as differentially expressed. In the worst case, such results may even lead to false biological conclusions.

Finally, it should be noted that even though the general quality of microarrays is likely to increase in the future, the differences of segmentation algorithms should be taken into account when doing further analysis with microarray data. When carrying out an experiment, the least one should do is to report the algorithm that was used for segmentation, as suggested by the MIAME standard (Brazma et al., 2001). However, it is very difficult to overcome the errors originating from a faulty segmentation when only the information about the used algorithm is available, without any possibility to re-analyze the original images. Although the current MIAME standard categorizes the preserving of the original images as optional, we agree with (Ahmed et al., 2004) that the original images should always be made publicly available.


    Acknowledgments
 
We would like to thank Dr Sampsa Hautaniemi for insightful discussions about the topic. In addition, we would like to thank Dr Outi Monni and Tuula Airaksinen for providing the microarray experiments for this study. This work was supported by the National Technology Agency of Finland and the Academy of Finland, project No. 213462 (Finnish Centre of Excellence program (2006–2011)), and partially supported by Tampere Graduate School in Information Science and Engineering (TISE).

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Joaquin Dopazo

Received on July 21, 2006; revised on September 13, 2006; accepted on September 30, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHMS
 3 PERFORMANCE EVALUATION
 4 RESULTS
 5 CONCLUSION
 REFERENCES
 

    Adams, R. and Bischof, L. (1994) Seeded region growing. IEEE T Pattern Anal, . 16, 641–647[CrossRef].

    Ahmed, A.A., et al. (2004) Microarray segmentation methods significantly influence data precision. Nucleic Acids Res, . 32, e50[Abstract/Free Full Text].

    Bajcsy, P. (2006) An overview of DNA microarray grid alignment and foreground separation approaches. EURASIP J. Appl. Si. Pr, . 1–13.

    Bozinov, D. and Rahnenführer, J. (2002) Unsupervised technique for robust target separation and analysis of DNA microarray spots through adaptive pixel clustering. Bioinformatics, 18, 747–756[Abstract/Free Full Text].

    Brazma, A., et al. (2001) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat. Genet, . 29, 365–371[CrossRef][Web of Science][Medline].

    Buhler, J., Ideker, T., Haynor, D. (2000) Dapple: improved techniques for finding spots on DNA microarrays. UWCSE Tech Report UWTR 2000-08-05, Department of Computer Science and Engineering, , Seattle, WA August University of Washington.

    Chen, Y., et al. (1997) Ratio-based decisions and the quantitative analysis of cDNA microarray images. J. Biomed. Opt, . 2, 364–374[CrossRef].

    Cleveland, W.S. (1979) Robust locally weighted regression and smoothing scatterplots. J. Am. Stat. Assoc, . 74, 829–836[CrossRef][Web of Science].

    Demirkaya, O., et al. (2005) Segmentation of cDNA microarray spots using markov random field modeling. Bioinformatics, 21, 2994–3000[Abstract/Free Full Text].

    Dudoit, S., et al. (2002) Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat. Sinica, . 12, 111–139.

    Eisen, M.B. (1999) Scanalyze.

    Gottardo, R., et al. (2006) Probabilistic segmentation and intensity estimation for microarray images. Biostatistics, 7, 85–99[Abstract/Free Full Text].

    Jiang, X., et al. (2006) Distance measures for image segmentation evaluation. EURASIP J. Appl. Si. Pr, . 1–10.

    Li, Q., et al. (2005) Donuts, scratches and blanks: robust model-based segmentation of microarray images. Bioinformatics, 21, 2875–2882[Abstract/Free Full Text].

    Marshall, E. (2004) Getting the noise out of gene arrays. Science, 306, 630–631[Abstract/Free Full Text].

    Nykter, M., et al. (2006) Simulation of microarray data with realistic characteristics. BMC Bioinformatics, 7, .

    Rahnenführer, J. and Bozinov, D. (2004) Hybrid clustering for microarray image analysis combining intensity and shape features. BMC Bioinformatics, 5, 1–11[Free Full Text].

    Sankur, B. and Sezgin, M. (2004) A survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging, 13, 146–165.

    Schena, M., et al. (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science, 270, 467–470[Abstract/Free Full Text].

    Tan, P.K., et al. (2003) Evaluation of gene expression measurements from commercial microarray platforms. Nucleic Acids Res, . 31, 5676–5684[Abstract/Free Full Text].

    Wang, X., et al. (2001) Quantitative quality control in microarray image processing and data acquisition. Nucleic Acids Res, . 29, E75–E75.

    Yang, Y.H., et al. (2002) Comparison of methods for image analysis on cDNA microarray data. J. Comput. Graph. Stat, . 11, 108–136[CrossRef].

    Yang, Y.H., et al. (2001) Analysis of cDNA microarray images. Brief Bioinform, . 2, 341–349[Abstract/Free Full Text].

    Zhang, W., Shmulevich, I., Astola, J. Microarray Quality Control, (2004) , Hoboken, New Jersey John Wiley & Sons, Inc.

    Zhang, Y.J. (1996) A survey on evaluation methods for image segmentation. Pattern Recognit, . 29, 1335–1346[CrossRef][Web of Science].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
A. Daskalakis, D. Cavouras, P. Bougioukos, S. Kostopoulos, D. Glotsos, I. Kalatzis, G. C. Kagadis, C. Argyropoulos, and G. Nikiforidis
Improving gene quantification by adjustable spot-image restoration
Bioinformatics, September 1, 2007; 23(17): 2265 - 2272.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/23/2910    most recent
btl502v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Lehmussola, A.
Right arrow Articles by Yli-Harja, O.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lehmussola, A.
Right arrow Articles by Yli-Harja, O.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?