Skip Navigation


Bioinformatics Advance Access originally published online on July 24, 2006
Bioinformatics 2006 22(18):2305-2307; doi:10.1093/bioinformatics/btl367
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/18/2305    most recent
btl367v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Lee, E.-K.
Right arrow Articles by Park, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lee, E.-K.
Right arrow Articles by Park, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

arrayQCplot: software for checking the quality of microarray data

Eun-Kyung Lee , Sung-Gon Yi and Taesung Park *

Department of Statistics, Seoul National University Seoul, Korea

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 QUALITY PLOTS
 CONCLUSION
 IMPLEMENTATION
 REFERENCES
 

Summary: arrayQCplot is a software for the exploratory analysis of microarray data. This software focuses on quality control and generates newly developed plots for quality and reproducibility checks. It is developed using R and provides a user-friendly graphical interface for graphics and statistical analysis. Therefore, novice users will find arrayQCplot as an easy-to-use software for checking the quality of their data by a simple mouse click.

Availability: arrayQCplot software is available from Bioconductor at http://www.bioconductor.org. A more detailed manual is available at http://bibs.snu.ac.kr/software/arrayQCplot

Contact: tspark{at}stats.snu.ac.kr


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 QUALITY PLOTS
 CONCLUSION
 IMPLEMENTATION
 REFERENCES
 
The initial stage of microarray data analysis is usually exploratory. Data quality should be checked before a high-level follow-up analysis. There have been several approaches for checking quality and reproducibility in microarray experiments (Buness et al., 2004; Garrett-Mayer et al., 2004; King et al., 2004; Park et al., 2005). However, current microarray softwares such as arrayMagic (Buness et al., 2004) and MergeMaid (Garrett-Mayer et al., 2004) provide only a couple of functions to calculate some statistics and draw plots for checking quality.

arrayQCplot has been developed for the exploratory analysis of microarray data, and it focuses on checking the data quality by generating a variety of plots. Further, it includes newly developed plots as well as commonly used ones. These new plots provide useful information on the quality and reproducibility of microarray data. Users can easily evaluate their data using these plots. In addition, the software also includes a couple of statistical testing procedures for checking data quality.

The advantage of arrayQCplot is its user-friendly graphical interface and its flexibility when performing exploratory analysis by repeating the same analysis for a subset of a dataset. arrayQCplot is implemented using the statistical software R (Ihaka and Gentleman, 1996). Although several contributed packages based on R exist, it is not easy for a novice user to use them. In comparison, arrayQCplot provides a user-friendly interface that runs on R (Fig. 1). Therefore, novice users can use arrayQCplot to check their data easily and interactively by a simple mouse click. Further, this interface will enable users to select chips of interest for their analysis.


Figure 1
View larger version (39K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 Main GUI of arrayQCplot. This software inlcudes a menu for data input/output and all functions. It also has windows that display information on experiments, genes and test results.

 
Figure 1 shows the several features of arrayQCplot that are accessible through its user interface. The main GUI of arrayQCplot includes a menu bar comprising various menus and three different sections. Chip List provides a list of chips and their treatment conditions. The chips of interest for drawing plots and checking quality can be selected from this list. Gene List displays a list of genes, and the results of the test are shown in the Result window. The File menu in the menu bar enables the microarray data to be easily loaded.


    QUALITY PLOTS
 TOP
 ABSTRACT
 INTRODUCTION
 QUALITY PLOTS
 CONCLUSION
 IMPLEMENTATION
 REFERENCES
 
arrayQCplot provides various basic plots for exploring microarray data. For one-channel data, arrayQCplot provides basic plots such as MvA plot, boxplot, and scatter plot matrix for the selected chips. When replicates of each treatment level are available, arrayQCplot provides new types of plots—the chip-wise correlation plot and summary correlation plot—to visually check for reproducibility (Fig. 2a and b). These two plots are based on the correlations between two chips.


Figure 2
View larger version (6K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2 Correlation plots. (a) Chip-wise correlation plot. Within-group correlations for most of the chips are between 0.6 and 0.8. Chips 23–27 show different patterns (in this dataset, these are replicates from the same treatment). In particular, the within-group correlations for chip 25 are near 0, which indicates a low quality. (b) Summary correlation plot. Most of the chips (circles) except five, i.e. chips 23–27, show high reproducibility, the correlations within treatments are higher than those between treatments. Chip 25 is the outlier in this plot.

 
Let Formula be the correlations of the j-th chip with the other chips in the i-th treatment group (within-group correlations). Further, let Formula be the correlations of the j-th chip with the other chips in different treatment groups (between-group correlations). The chip-wise correlation plots show the distribution of the within-group correlations for each chip, as shown in Figure 2a. If one chip is more reproducible than the others, its within-group correlations should be relatively larger than those of the other chips.

The summary correlation plot uses two summary correlation coefficients, Formula and Formula, which are the averages of all the components of Formula and Formula, respectively. In Figure 2b, each chip is represented by a dot. The reference line is provided for checking the reproducibility, particularly in the case of specificity. The chips corresponding to the dots that lie in the triangular region above the reference line exhibit a high specificity. On the other hand, the dots lying in the lower triangular region correspond to chips with a low specificity. If a large number of chips exist in this low specificity region, it is difficult to detect the differences between the treatments.

For two-channel microarray data, arrayQCplot provides two additional quality control plots–the within-slide correlation plot and diagnostic plot (Park et al., 2005). In the within-slide correlation plot (Fig. 3a), each chip is represented by a line. If there is a strong positive linear relationship between two original channels, the lines tend to be flat. On the other hand, if there is a non-linear pattern in two channels, the lines tend to show steep slopes. In the diagnostic plot (Fig. 3b) the dots lying in the lower left corner represent the chips with a strong linear pattern, while those on the upper left corner represent the ones with a weak linear pattern. These plots are considerably useful for detecting outlying chips with different patterns from the others.


Figure 3
View larger version (7K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3 Quality control plots. (a) Within-slide correlation plot. Each line represents a chip (slide). The y-axis is the correlation coefficient and the x-axis is a tuning parameter, gamma. (b) Diagnostic plot. Each dot represents a chip. The y-axis is the the range of correlation coefficients and the x-axis is a 1 minus the mean of correlation coefficients. The result is similar to that in the correlation plot. From (a) and (b), we can conclude that chip 13 is significantly different from the other chips.

 
In summary, arrayQCplot provides quality control plots such as the chip-wise correlation plot and summary correlation plot for checking reproducibility, and the within-slide correlation plot and diagnostic plot for identifying outlying chips.

arrayQCplot also provides a couple of tests for checking quality. In the correlation-based test, we apply the one-sided Kolmogorov–Smirnov test and the Wilcoxon rank sum test for comparing the within-group correlations between all pairs of treatment combinations. These tests check the reproducibility of a treatment in comparison to that of the other treatments. Another approach, the intensity-based test, provides a tool for checking the gene-wise data quality using the actual intensity values for each gene.


    CONCLUSION
 TOP
 ABSTRACT
 INTRODUCTION
 QUALITY PLOTS
 CONCLUSION
 IMPLEMENTATION
 REFERENCES
 
Most microarray data comprise thousands of genes from many chips (experiments). In order to check the quality of microarray data, analysts need to use the same procedures repeatedly for various combinations of genes and chips. This requires the use of graphical tools that are flexible and easy to use. arrayQCplot is designed and adapted to microarray data in order to provide informative plots in an easy and convenient manner. It provides newly developed plots as well as well-known plots for checking the quality and reproducibility through a user-friendly graphical interface. arrayQCplot is a useful tool in the first stage of microarray data analysis.


    IMPLEMENTATION
 TOP
 ABSTRACT
 INTRODUCTION
 QUALITY PLOTS
 CONCLUSION
 IMPLEMENTATION
 REFERENCES
 
This software runs on R with a few R packages—–RGtk, gtkDevice and LPE–for graphics and data analysis. Furthermore, it integrates into the Bioconductor project (Gentleman et al., 2004).


    Acknowledgments
 
This work was supported by the National Research Laboratory Program of Korea Science and Engineering Foundation (M10500000126).

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Martin Bishop

Received on April 18, 2006; revised on June 13, 2006; accepted on July 2, 2006

    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 QUALITY PLOTS
 CONCLUSION
 IMPLEMENTATION
 REFERENCES
 

    Bolstad, B.M., et al. (2003) A comparison of normalization methods for high density oligonucleotide array based on variance and bias. Bioinformatics, 19, 185–193[Abstract/Free Full Text].

    Buness, A., et al. (2005) array-Magic: two-color cDNA microarray quality control and preprocessing. Bioinformatics, 21, 554–556[Abstract/Free Full Text].

    Garrett-Mayer, E., Parmigiani, G., Zhong, X., Cope, L., Gabrielson, E. (2004) Cross-study validation and combined analysis of gene expression microarray data. Working Papers, Year 2004 Paper 65 Department of Biostatistics, Johns Hopkins University, MD, USA.

    Gentleman, R.C., et al. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol, . 5, P80.

    Ihaka, R. and Gentleman, R. (1996) R: a language for data analysis and graphics. J. Comput. Graph. Stat, . 5, 299–314[CrossRef].

    Jain, N., et al. (2003) Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays. Bioinformatics, 19, 1945–1951[Abstract/Free Full Text].

    King, C., et al. (2005) Reliability and reproducibility of gene expression measurements using amplified RNA from laser-microdissected primary breast tissue with oligonucleotide arrays. J. Mol. Diagn, . 7, 57–64[Abstract/Free Full Text].

    Park, T., et al. (2005) Diagnostic plots for detecting outlying slides in a cDNA microarray experiments. BioTechniques, 38, 463–471[ISI][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/18/2305    most recent
btl367v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Lee, E.-K.
Right arrow Articles by Park, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lee, E.-K.
Right arrow Articles by Park, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?