Bioinformatics Advance Access originally published online on July 24, 2006
Bioinformatics 2006 22(18):2305-2307; doi:10.1093/bioinformatics/btl367
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
arrayQCplot: software for checking the quality of microarray data
Department of Statistics, Seoul National University Seoul, Korea
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: arrayQCplot is a software for the exploratory analysis of microarray data. This software focuses on quality control and generates newly developed plots for quality and reproducibility checks. It is developed using R and provides a user-friendly graphical interface for graphics and statistical analysis. Therefore, novice users will find arrayQCplot as an easy-to-use software for checking the quality of their data by a simple mouse click.
Availability: arrayQCplot software is available from Bioconductor at http://www.bioconductor.org. A more detailed manual is available at http://bibs.snu.ac.kr/software/arrayQCplot
Contact: tspark{at}stats.snu.ac.kr
| INTRODUCTION |
|---|
|
|
|---|
The initial stage of microarray data analysis is usually exploratory. Data quality should be checked before a high-level follow-up analysis. There have been several approaches for checking quality and reproducibility in microarray experiments (Buness et al., 2004; Garrett-Mayer et al., 2004; King et al., 2004; Park et al., 2005). However, current microarray softwares such as arrayMagic (Buness et al., 2004) and MergeMaid (Garrett-Mayer et al., 2004) provide only a couple of functions to calculate some statistics and draw plots for checking quality.
arrayQCplot has been developed for the exploratory analysis of microarray data, and it focuses on checking the data quality by generating a variety of plots. Further, it includes newly developed plots as well as commonly used ones. These new plots provide useful information on the quality and reproducibility of microarray data. Users can easily evaluate their data using these plots. In addition, the software also includes a couple of statistical testing procedures for checking data quality.
The advantage of arrayQCplot is its user-friendly graphical interface and its flexibility when performing exploratory analysis by repeating the same analysis for a subset of a dataset. arrayQCplot is implemented using the statistical software R (Ihaka and Gentleman, 1996). Although several contributed packages based on R exist, it is not easy for a novice user to use them. In comparison, arrayQCplot provides a user-friendly interface that runs on R (Fig. 1). Therefore, novice users can use arrayQCplot to check their data easily and interactively by a simple mouse click. Further, this interface will enable users to select chips of interest for their analysis.
|
Figure 1 shows the several features of arrayQCplot that are accessible through its user interface. The main GUI of arrayQCplot includes a menu bar comprising various menus and three different sections. Chip List provides a list of chips and their treatment conditions. The chips of interest for drawing plots and checking quality can be selected from this list. Gene List displays a list of genes, and the results of the test are shown in the Result window. The File menu in the menu bar enables the microarray data to be easily loaded.
| QUALITY PLOTS |
|---|
|
|
|---|
arrayQCplot provides various basic plots for exploring microarray data. For one-channel data, arrayQCplot provides basic plots such as MvA plot, boxplot, and scatter plot matrix for the selected chips. When replicates of each treatment level are available, arrayQCplot provides new types of plotsthe chip-wise correlation plot and summary correlation plotto visually check for reproducibility (Fig. 2a and b). These two plots are based on the correlations between two chips.
|
Let
be the correlations of the j-th chip with the other chips in the i-th treatment group (within-group correlations). Further, let
be the correlations of the j-th chip with the other chips in different treatment groups (between-group correlations). The chip-wise correlation plots show the distribution of the within-group correlations for each chip, as shown in Figure 2a. If one chip is more reproducible than the others, its within-group correlations should be relatively larger than those of the other chips.
The summary correlation plot uses two summary correlation coefficients,
and
, which are the averages of all the components of
and
, respectively. In Figure 2b, each chip is represented by a dot. The reference line is provided for checking the reproducibility, particularly in the case of specificity. The chips corresponding to the dots that lie in the triangular region above the reference line exhibit a high specificity. On the other hand, the dots lying in the lower triangular region correspond to chips with a low specificity. If a large number of chips exist in this low specificity region, it is difficult to detect the differences between the treatments.
For two-channel microarray data, arrayQCplot provides two additional quality control plotsthe within-slide correlation plot and diagnostic plot (Park et al., 2005). In the within-slide correlation plot (Fig. 3a), each chip is represented by a line. If there is a strong positive linear relationship between two original channels, the lines tend to be flat. On the other hand, if there is a non-linear pattern in two channels, the lines tend to show steep slopes. In the diagnostic plot (Fig. 3b) the dots lying in the lower left corner represent the chips with a strong linear pattern, while those on the upper left corner represent the ones with a weak linear pattern. These plots are considerably useful for detecting outlying chips with different patterns from the others.
|
In summary, arrayQCplot provides quality control plots such as the chip-wise correlation plot and summary correlation plot for checking reproducibility, and the within-slide correlation plot and diagnostic plot for identifying outlying chips.
arrayQCplot also provides a couple of tests for checking quality. In the correlation-based test, we apply the one-sided KolmogorovSmirnov test and the Wilcoxon rank sum test for comparing the within-group correlations between all pairs of treatment combinations. These tests check the reproducibility of a treatment in comparison to that of the other treatments. Another approach, the intensity-based test, provides a tool for checking the gene-wise data quality using the actual intensity values for each gene.
| CONCLUSION |
|---|
|
|
|---|
Most microarray data comprise thousands of genes from many chips (experiments). In order to check the quality of microarray data, analysts need to use the same procedures repeatedly for various combinations of genes and chips. This requires the use of graphical tools that are flexible and easy to use. arrayQCplot is designed and adapted to microarray data in order to provide informative plots in an easy and convenient manner. It provides newly developed plots as well as well-known plots for checking the quality and reproducibility through a user-friendly graphical interface. arrayQCplot is a useful tool in the first stage of microarray data analysis.
| IMPLEMENTATION |
|---|
|
|
|---|
This software runs on R with a few R packagesRGtk, gtkDevice and LPEfor graphics and data analysis. Furthermore, it integrates into the Bioconductor project (Gentleman et al., 2004).
| Acknowledgments |
|---|
This work was supported by the National Research Laboratory Program of Korea Science and Engineering Foundation (M10500000126).
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Martin Bishop
Received on April 18, 2006; revised on June 13, 2006; accepted on July 2, 2006
| REFERENCES |
|---|
|
|
|---|
Bolstad, B.M., et al. (2003) A comparison of normalization methods for high density oligonucleotide array based on variance and bias. Bioinformatics, 19, 185193
Buness, A., et al. (2005) array-Magic: two-color cDNA microarray quality control and preprocessing. Bioinformatics, 21, 554556
Garrett-Mayer, E., Parmigiani, G., Zhong, X., Cope, L., Gabrielson, E. (2004) Cross-study validation and combined analysis of gene expression microarray data. Working Papers, Year 2004 Paper 65 Department of Biostatistics, Johns Hopkins University, MD, USA.
Gentleman, R.C., et al. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol, . 5, P80.
Ihaka, R. and Gentleman, R. (1996) R: a language for data analysis and graphics. J. Comput. Graph. Stat, . 5, 299314[CrossRef].
Jain, N., et al. (2003) Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays. Bioinformatics, 19, 19451951
King, C., et al. (2005) Reliability and reproducibility of gene expression measurements using amplified RNA from laser-microdissected primary breast tissue with oligonucleotide arrays. J. Mol. Diagn, . 7, 5764
Park, T., et al. (2005) Diagnostic plots for detecting outlying slides in a cDNA microarray experiments. BioTechniques, 38, 463471[Web of Science][Medline].
This article has been cited by other articles:
![]() |
A. Kauffmann, R. Gentleman, and W. Huber arrayQualityMetrics--a bioconductor package for quality assessment of microarray data Bioinformatics, February 1, 2009; 25(3): 415 - 416. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



