Skip Navigation


Bioinformatics Advance Access originally published online on December 23, 2008
Bioinformatics 2009 25(3):415-416; doi:10.1093/bioinformatics/btn647
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrowOA All Versions of this Article:
25/3/415    most recent
btn647v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Kauffmann, A.
Right arrow Articles by Huber, W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kauffmann, A.
Right arrow Articles by Huber, W.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

arrayQualityMetrics—a bioconductor package for quality assessment of microarray data

Audrey Kauffmann 1,*, Robert Gentleman 2 and Wolfgang Huber 1

1EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK and 2Computational Biology - FHCRC, 1100 Fairview Avenue North, Seattle, WA 98109, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DESCRIPTION
 3 CONCLUSION
 References
 

Summary:: The assessment of data quality is a major concern in microarray analysis. arrayQualityMetrics is a Bioconductor package that provides a report with diagnostic plots for one or two colour microarray data. The quality metrics assess reproducibility, identify apparent outlier arrays and compute measures of signal-to-noise ratio. The tool handles most current microarray technologies and is amenable to use in automated analysis pipelines or for automatic report generation, as well as for use by individuals. The diagnosis of quality remains, in principle, a context-dependent judgement, but our tool provides powerful, automated, objective and comprehensive instruments on which to base a decision.

Availability:: arrayQualityMetrics is a free and open source package, under LGPL license, available from the Bioconductor project at www.bioconductor.org. A users guide and examples are provided with the package. Some examples of HTML reports generated by arrayQualityMetrics can be found at http://www.microarray-quality.org

Contact:: audrey{at}ebi.ac.uk

Supplementary information:: Supplementary data are available at Bioinformatics online.


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DESCRIPTION
 3 CONCLUSION
 References
 
As microarray data quality can be affected at each step of the microarray experiment processing (Schuchhardt et al., 2000), quality assessment is an integral part of the analysis. There are freely available tools allowing quality assessment for a specific microarray type, such as Affymetrix (Parman and Halling, 2005), Illumina (Dunning et al., 2007) and two-colour cDNA arrays (Buness et al., 2005). Other free tools are designed to identify a particular problem among which are spot quality (Li et al., 2005) or hybridization quality (Petri et al., 2004). Some tools perform outlier detection from quality metrics done before (Freue et al.,2007), or propose interactive quality plots (Lee et al., 2006). We developed a Bioconductor (Gentleman et al., 2004) package, arrayQualityMetrics, with the aim to provide a comprehensive tool that works on all expression arrays and platforms and produces a self-contained report which can be web-delivered. The Supplementary table shows a comparison with the functionality and scope of other Bioconductor packages concerned with quality assessment or outlier detection.


    2 DESCRIPTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DESCRIPTION
 3 CONCLUSION
 References
 
Input: to perform an analysis using the arrayQualityMetrics package, one needs to provide the matrix of microarray intensities and optionally, information about the samples and the probes in a Bioconductor object of class AffyBatch, ExpressionSet, NChannelSet or BeadLevelList. These classes are widely used and well documented. The manner of calling the arrayQualityMetrics function to create a report is the same for all of these classes, and it can be applied to raw array intensities as well as to normalized data. Applied to raw intensities, the quality metrics can help with monitoring experimental procedures and with the choice of normalization procedure; application to the normalized data is more relevant for assessing the utility of the data in downstream analyses.

Individual array quality: the MA-plot allows the evaluation of the dependence between the intensity levels and the distribution of the ratios (Fig. 1a) (Dudoit et al., 2002). For two-colour arrays, a probe's M-value is the log-ratio of the two intensities and the A-value is the mean of their logarithms. In the case of one colour arrays, the M-value is computed by dividing the intensity by the median intensity of the same probe across all arrays. A false colour representation of each array's spatial distribution of feature intensities (Fig. 1b) helps in identifying spatial effects that may be caused by, for example, gradients in the hybridization chamber, air bubbles or printing problems.

Homogeneity between arrays: to assess the homogeneity between the arrays, boxplots of the log2 intensities and density estimate plots (Fig. 1c) are presented.

Between array comparison: Figure 1d shows a heatmap of between array distances, computed as the mean absolute difference of the M-value for each pair of arrays


Formula 1

(1)
where Mxi is the M-value of the i-th probe on the x-th array.

Consider the decomposition of Mxi.


Formula 2

(2)
where zi is the probe effect for probe i (the same across all arrays), {varepsilon}xi are i.i.d random variables with mean zero and βxi is a sparse matrix representing differential expression effects. Under these assumptions, all values dxy are approximately the same and deviations from this can be used to identify outlier arrays. The dendrogram can serve to check if the experiments cluster in accordance with the sample classes.

Affymetrix specific plots: four Affymetrix-specific metrics are evaluated if the input object is an AffyBatch. The RNA degradation plot from the affy package (Gautier et al., 2004),, the relative log expression (RLE) boxplots and the normalized unscaled standard error (NUSE) boxplots from the affyPLM package (Brettschneider et al., 2007) and the QC stat plot from the simpleaffy package (Wilson and Miller, 2005) are represented.


Figure 1
View larger version (42K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. (a) MA-plot for an Agilent microarray. The M-values are not centered on zero meaning that there is a dependency between the intensities and the log-ratio. (b) Spatial distribution of the background of the green channel for an Illumina chip. There is an abnormal distribution of high intensities at the top border of the array. (c) Density plot of the log-intensities of an Affymetrix set of arrays (E-GEOD-349 ArrayExpress set). The density of one of the arrays is shifted on the x-axis. (d) Heatmap of the ArrayExpress Affymetrix data set E-GEOD-1571. Array 18 is an outlier.

 
Scores: to guide the interpretation of the report, we have included the computation of numeric scores associated with the plots. Outliers are detected on the MA-plot, spatial distributions of the features’ intensities, boxplot, heatmap, RLE and NUSE. The mean of the absolute value of M is computed for each array and those that lie beyond the extremes of the boxplot's whiskers are considered as possible outliers arrays. The same approach, i.e. using the whiskers of the boxplot, is applied to the following: the mean and interquartile range (IQR) from the boxplots and NUSE, the sums of the rows of the distance matrix, and the relative amplitude of low versus high frequence components of the Fourier transformation. In the case of the RLE plot, any array with a median RLE higher than 0.1 is considered an outlier.

Report: the metrics are rendered as figures with legends in a detailed report and the scores are used to provide a summary table. Examples of reports are provided at http://www.microarray-quality.org/quality_metrics.html.


    3 CONCLUSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DESCRIPTION
 3 CONCLUSION
 References
 
arrayQualityMetrics supports the quality assessment of many types of microarrays in R. After preparation of the data, a single command line is used to create the report. The main benefits of arrayQualityMetrics are its simplicity of use, the ability to have the same report for different types of platforms, and the opportunity for users or developers to extend it for their needs. This tool can be used for individual data analyses or in routine data production pipelines, to provide fast uniform reporting.


    Acknowledgments
 
We would like to thank the developers of the R and Bioconductor packages that we are using, especially Ben Bolstad, Mark Dunning, Crispin Miller, Gregoire Pau and Deepayan Sarkar.

Funding: EU FP6 (EMERALD, Project no. LSHG-CT-2006-037686 to A.K.). National Institutes of Health (P41HG004059 R.G.)

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: David Rocke

Received on October 1, 2008; revised on December 2, 2008; accepted on December 11, 2008

    References
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DESCRIPTION
 3 CONCLUSION
 References
 

    Brettschneider J, et al. Quality assessment for short oligonucleotide arrays. In: arXiv:0710.0178v2. (2007).

    Buness A. array{M}agic: two-colour c{DNA} microarray quality control and preprocessing. Bioinformatics (2005) 21:554–556.[Abstract/Free Full Text]

    Dudoit S. Statistical methods for identifying differentially expressed genes in replicated cdna microarray experiments. Stat. Sinica (2002) 12:111–139.

    Dunning MJ. beadarray: R classes and methods for {I}llumina bead-based data. Bioinformatics (2007) 23:2183–2184.[Abstract/Free Full Text]

    Freue GVC, et al. MDQC: a new quality assessment method for microarrays based on quality control reports. Bioinformatics (2007) 23:3162–3169.[Abstract/Free Full Text]

    Gautier L. affy – analysis of affymetrix genechip data at the probe level. Bioinformatics (2004) 20:307–315.[Abstract/Free Full Text]

    Gentleman RC. Bioconductor: Open software development for computational biology and bioinformatics. Genome Biol. (2004) 5:R80.[CrossRef][Medline]

    Lee E-K, et al. array{QC}plot: software for checking the quality of microarray data. Bioinformatics (2006) 22:2305–2307.[Abstract/Free Full Text]

    Li Q. Donuts, scratches and blanks: robust model-based segmentation of microarray images. Bioinformatics (2005) 21:2875–2882.[Abstract/Free Full Text]

    Parman C, Halling C. affyQCReport: QC Report Generation for affyBatch objects. (2005) R package version 1.17.0.

    Petri A. Array-a-lizer: a serial DNA microarray quality analyzer. BMC Bioinformatics (2004) 5:12.[CrossRef][Medline]

    Schuchhardt J. Normalization strategies for c{DNA} microarrays. Nucleic Acids Res (2000) 28:E47.[CrossRef][Medline]

    Wilson CL, Miller CJ. Simpleaffy: a bioconductor package for {A}ffymetrix quality control and data analysis. Bioinformatics (2005) 21:3683–3685.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
A. Kauffmann, T. F. Rayner, H. Parkinson, M. Kapushesky, M. Lukk, A. Brazma, and W. Huber
Importing ArrayExpress datasets into R/Bioconductor
Bioinformatics, August 15, 2009; 25(16): 2092 - 2094.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrowOA All Versions of this Article:
25/3/415    most recent
btn647v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Kauffmann, A.
Right arrow Articles by Huber, W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kauffmann, A.
Right arrow Articles by Huber, W.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?