Skip Navigation


Bioinformatics Advance Access originally published online on July 10, 2007
Bioinformatics 2007 23(18):2493-2494; doi:10.1093/bioinformatics/btm357
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/18/2493    most recent
btm357v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Harbron, C.
Right arrow Articles by South, M. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Harbron, C.
Right arrow Articles by South, M. C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

RefPlus: an R package extending the RMA Algorithm

Chris Harbron 1,*, Kai-Ming Chang 2 and Marie C. South 3

1Statistical Sciences, AstraZeneca, Alderley Park, Macclesfield, Cheshire SK10 4TG, UK, 2Department of Research, Koo Foundation Sun Yat-Sen Cancer Center, 125 Lihder Road, Taipei 112, Taiwan and 3Cancer Discovery Medicine, AstraZeneca, Alderley Park, Macclesfield, Cheshire SK10 4TG, UK

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHMS
 3 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 

Summary: RMA has become a widely used methodology to pre-process Affymetrix gene expression microarrays. A limitation of RMA is that the calculated probeset intensities change when a set of microarrays is re-pre-processed after the inclusion of additional microarrays into the analysis set. Here we report the availability of the RefPlus package containing functions to perform the Extrapolation Strategy and Extrapolation Averaging algorithms which address these issues.

Availability: The software is implemented in the R language and can be downloaded from the Bioconductor project website (http://www.bioconductor.org).

Contact: Chris.Harbron{at}AstraZeneca.Com

Supplementary information: Further details of the workings and evaluation of these functions are given in the documentation available on the Bioconductor website.


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHMS
 3 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
It is often necessary to analyse microarray data at one or more interim stages throughout the course of a study. Multiple-microarray pre-processing algorithms for Affymetrix microarrays such as RMA (Irizarry et al., 2003) have the undesirable property that the probeset intensities change when microarrays are re-pre-processed due to the inclusion of additional microarrays. A similar situation can occur when developing and applying prediction or classification models using microarrays. Any new sample that is to be predicted by the model will need to be pre-processed and pre-processing this sample along with the training set of samples used to develop the model will change the probeset intensities of these microarrays and the parameters of the fitted model.

An extension to RMA, the Extrapolation Strategy, provides a solution to these problems. This method was independently developed by Goldstein (2006) and also by Katz et al. (2006) as refRMA. It avoids having to re-pre-process already pre-processed microarrays when new arrays are added to the data set, but still maintains many of the desirable properties of RMA. RMA is applied to a reference set of microarrays, storing the parameters of the RMA fit. To process additional microarrays, these parameters are directly applied, without any re-estimation, to the new microarrays leaving the gene expression measurements of the reference microarrays unchanged. A similar strategy has also been considered in the PLIER algorithm (Affymetrix, 2005), where a model file fitted by a set of microarrays can be stored and used later.

The use of the RMA algorithm for processing large numbers of microarrays can be limited by available computer memory. One approach is to apply the Extrapolation Strategy, using a subset of microarrays as the reference set and processing the remaining microarrays using the parameters calculated from this reference set. Alternatively the Extrapolation Averaging algorithm (Goldstein 2006) gives an improved approximation to RMA by averaging multiple Extrapolation Strategy results over different reference sets.


    2 ALGORITHMS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHMS
 3 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
2.1 RMA
RMA consists of three steps:

  1. Background correction: probe-level data for each microarray are background corrected independently using a probabilistic model.
  2. Quantile normalization: the background corrected probe-level data on each microarray are normalized to a com-mon set of quantiles, derived from background corrected data from all microarrays.
  3. Expression calculation: estimated separately for each probeset using median polish on the linear model:


Formula 1

(1)
where Ii is the logarithmic intensity for the ith microarray, Nij is the background corrected and normalized intensity of the jth probe of the ith microarray, Pj is the effect of the jth probe in the probeset and {varepsilon}ij is an error term.

For further details on the RMA algorithm refer to Irizarry et al. (2003).

2.2 Extrapolation strategy
The extrapolation strategy divides the set of microarrays into two distinct sets: the reference set used to generate reference sets of parameters for future processing and the future set of all other microarrays which are subsequently processed. The extrapolation strategy consists of four steps:

  1. RMA: RMA is applied to the reference set to obtain the probeset intensities of the reference set microarrays. The reference quantiles and reference probe effects are stored.
  2. Background correction: as in RMA, applied to the future set.
  3. Normalization: the background corrected probe level data from the future microarrays are quantile normalized to the reference quantiles.
  4. Expression calculation: the probeset intensities of the future microarrays are estimated using model (1) assuming that the probe effects of the future microarrays are the same as the probe effects of the reference set. The estimated logarithmic intensity If of a probeset on a future array is:


Formula 2

(2)
Figure 1 compares the relationships between variables for RMA and the extrapolation strategy.


Figure 1
View larger version (23K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Graphical representations of the RMA and extrapolation strategy algorithms. In both RMA and the extrapolation strategy, the calculated probeset intensities are dependent on both the normalizing quantiles and the probe effects and given these terms the calculated probeset intensities of all microarrays are conditionally independent of each other. In RMA, these terms and so the calculated probeset intensities depend on all of the other microarrays. In the extrapolation strategy, the calculated probeset intensities depend only on the probe intensities from that microarray and the reference quantiles and reference probe effects, calculated only from the reference set of microarrays.

 
2.3 Extrapolation averaging
Extrapolation averaging consists of repeated application of the extrapolation strategy using different reference sets and can be described in four steps:
  1. Randomly select n microarrays as a reference set, the remainder of the microarrays form the future set. n is the maximum number of microarrays that can be processed in one batch by RMA within the available computer memory.
  2. Apply the extrapolation strategy to this reference and future set.
  3. Repeat steps 1 and 2 several times.
  4. Calculate the probeset intensities as an average on the log2 scale of the gene expression profiles calculated in steps 1–3.

Any additional microarrays can be pre-processed by using the extrapolation strategy to calculate a gene expression profile based on the saved parameters from all of the reference sets and averaging these gene expression profiles across the reference sets.


    3 CONCLUSIONS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHMS
 3 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
The RMA algorithm has been found to have good performance characteristics in the pre-processing of Affymetrix gene expression data (Irizarry et al., 2006). A limitation of RMA is that the probe intensities change when the analysis set of microarrays changes. This can be an issue when a study is analysed at interim stages, as the processed data for the same samples will vary between analyses. This property also makes the application of predictive models difficult as additional microarrays need to be pre-processed to apply the model, but without changing the model parameters. Also for large sets of microarrays, computer memory can also be limiting to performing RMA. The extrapolation strategy and extrapolation averaging algorithms implemented in the R package RefPlus provide an easily applied solution to these issues. An evaluation using the data from Bhattacharjee et al. (2001) showing that the extrapolation strategy and extrapolation averaging algorithms provide a close approximation to RMA, even in challenging situations, can be found along with the R package on the Bioconductor website.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHMS
 3 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
The authors would like to acknowledge colleagues within AstraZeneca who provided valuable suggestions and comments, and thank the authors of Bhattacharjee et al. (2001) who permitted the use of their microarray data.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: David Rocke

Received on May 25, 2007; revised on June 25, 2007; accepted on July 4, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHMS
 3 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Affymetrix. Guide to Probe Logarithmic Intensity Error (PLIER) Estimation. ( (2005) )..

    Bhattacharjee A, et al. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc. Natl Acad. Sci. USA., ( (2001) ) 98, : 13790–13795.[Abstract/Free Full Text].

    Goldstein DR. Partition resampling and extrapolation averaging: approximation methods for quantifying gene expression in large numbers of short oligonucleotide arrays. Bioinformatics, ( (2006) ) 22, : 2364–2372.[Abstract/Free Full Text].

    Irizarry RA, et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, ( (2003) ) 4, : 249–264.[Abstract].

    Irizarry RA, et al. Comparison of Affymetrix GeneChip expression measures. Bioinformatics, ( (2006) ) 22, : 789–794.[Abstract/Free Full Text].

    Katz S, et al. A summarization approach for Affymetrix GeneChip data using a reference training set from a large, biologically diverse database. BMC Bioinformatics, ( (2006) ) 7, : 464.[CrossRef][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/18/2493    most recent
btm357v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Harbron, C.
Right arrow Articles by South, M. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Harbron, C.
Right arrow Articles by South, M. C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?