Bioinformatics Vol. 19 no. 14 2003
Pages 1817-1823
© 2003 Oxford University Press
Transformation and normalization of oligonucleotide microarray data
1 Department of Mathematics, Texas A&M University, College Station, TX 778433368, USA, 2 Department of Pathology, 3 Department of Biological Chemistry, School of Medicine, and 4 Department of Applied Science and Division of Biostatistics, University of California, Davis, CA 95616, USA
Received on July 23, 2002
; revised on February 17, 2003
; accepted on April 9, 2003
Motivation: Most methods of analyzing microarray data or doing power calculations have an underlying assumption of constant variance across all levels of gene expression. The most common transformation, the logarithm, results in data that have constant variance at high levels but not at low levels. Rocke and Durbin showed that data from spotted arrays fit a two-component model and Durbin, Hardin, Hawkins, and Rocke, Huber et al. and Munson provided a transformation that stabilizes the variance as well as symmetrizes and normalizes the error structure. We wish to evaluate the applicability of this transformation to the error structure of GeneChip microarrays.
Results: We demonstrate in an example study a simple way to use the two-component model of Rocke and Durbin and the data transformation of Durbin, Hardin, Hawkins and Rocke, Huber et al. and Munson on Affymetrix GeneChip data. In addition we provide a method for normalization of Affymetrix GeneChips simultaneous with the determination of the transformation, producing a data set without chip or slide effects but with constant variance and with symmetric errors. This transformation/normalization process can be thought of as a machine calibration in that it requires a few biologically constant replicates of one sample to determine the constant needed to specify the transformation and normalize. It is hypothesized that this constant needs to be found only once for a given technology in a lab, perhaps with periodic updates. It does not require extensive replication in each study. Furthermore, the variance of the transformed pilot data can be used to do power calculations using standard power analysis programs.
Availability: SPLUS code for the transformation/normalization for four replicates is available from the first author upon request. A program written in C is available from the last author.
Contact: geller{at}math.tamu.edu
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
J. Hu and F. Hu Estimating equation-based causality analysis with application to microarray time series data Biostat., July 1, 2009; 10(3): 468 - 480. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. F. Kingsmore, N. Kennedy, H. L. Halliday, J. C. Van Velkinburgh, S. Zhong, V. Gabriel, J. Grant, W. D. Beavis, V. T. Tchernev, L. Perlee, et al. Identification of Diagnostic Biomarkers for Infection in Premature Neonates Mol. Cell. Proteomics, October 1, 2008; 7(10): 1863 - 1875. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. P. Turner, V. J. Findlay, A. D. Kirven, O. Moussa, and D. K. Watson Global Gene Expression Analysis Identifies PDEF Transcriptional Networks Regulating Cell Migration during Cancer Progression Mol. Biol. Cell, September 1, 2008; 19(9): 3745 - 3757. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Tang, J. Gal, X. Zhuang, W. Wang, H. Zhu, and G. Tang A simple array platform for microRNA analysis and its application in mouse tissues RNA, October 1, 2007; 13(10): 1803 - 1822. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Zhang, A. Carriquiry, D. Nettleton, and J. C.M. Dekkers Pooling mRNA in microarray experiments and its effect on power Bioinformatics, May 15, 2007; 23(10): 1217 - 1224. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Grotkjaer, O. Winther, B. Regenberg, J. Nielsen, and L. K. Hansen Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm Bioinformatics, January 1, 2006; 22(1): 58 - 67. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Khatri, B. Done, A. Rao, A. Done, and S. Draghici A semantic analysis of the annotations of the human genome Bioinformatics, August 15, 2005; 21(16): 3416 - 3421. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. V. Abruzzo, J. Wang, M. Kapoor, L. J. Medeiros, M. J. Keating, W. E. Highsmith, L. L. Barron, C. C. Cromwell, and K. R. Coombes Biological Validation of Differentially Expressed Genes in Chronic Lymphocytic Leukemia Identified by Applying Multiple Statistical Methods to Oligonucleotide Microarrays J. Mol. Diagn., August 1, 2005; 7(3): 337 - 345. [Abstract] [Full Text] [PDF] |
||||
![]() |
A.-M. K. Hein, S. Richardson, H. C. Causton, G. K. Ambler, and P. J. Green BGX: a fully Bayesian integrated approach to the analysis of Affymetrix GeneChip data Biostat., July 1, 2005; 6(3): 349 - 373. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Zhou, J. A. Young, A. Santrosyan, K. Chen, S. F. Yan, and E. A. Winzeler In silico gene function prediction using ontology-based pattern identification Bioinformatics, April 1, 2005; 21(7): 1237 - 1245. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. L. Yap, M. P. Wong, X. W. Zhang, D. Hernandez, R. Gras, D. K. Smith, and A. Danchin Conserved transcription factor binding sites of cancer markers derived from primary lung adenocarcinoma microarrays Nucleic Acids Res., January 14, 2005; 33(1): 409 - 421. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. K. Lin, D. Chudova, G. W. Hatfield, P. Smyth, and B. Andersen Identification of hair cycle-associated genes from time-course gene expression profile data by using replicate variance PNAS, November 9, 2004; 101(45): 15955 - 15960. [Abstract] [Full Text] [PDF] |
||||







