Bioinformatics Advance Access originally published online on January 19, 2007
Bioinformatics 2007 23(11):1339-1347; doi:10.1093/bioinformatics/btm002
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Modeling nonlinearity in dilution design microarray data
1Department of Mathematical Sciences, 2Department of Computer Science, 3Department of Molecular and Cell Biology, University of Texas at Dallas, Richardson, TX 75083-0688, 4Department of Statistics, Iowa State University, Ames, IA 50011 and 5Microarray Core facility, University of Texas Southwestern Medical Center, Dallas, TX 75390 USA
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: Dilution design (Mixed tissue RNA) has been utilized by some researchers to evaluate and assess the performance of multiple microarray platforms. Current microarray data analysis approaches assume that the quantified signal intensities are linearly related to the expression of the corresponding genes in the sample. However, there are sources of nonlinearity in microarray expression measurements. Such nonlinearity study in the expressions of the RNA mixtures provides a new way to analyze gene expression data, and we argue that the nonlinearity can reveal novel information for microarray data analysis. Therefore, we proposed a statistical model, called proportion model, which is based on the linear regression analysis. To approximately quantify the nonlinearity in the dilution design, a new calibration, beta ratio (BR) was derived from the proportion model. Furthermore, a new adjusted fold change (adj-FC) was proposed to predict the true FC without nonlinearity, in particular for large FC.
Results: We applied our method to one microarray dilution dataset. The experimental results indicated that, to some extent, there are global biases comparing with the linear assumption for the significant genes. Further analysis of those highly expressed genes with significant nonlinearity revealed some promising results, e.g. poison effect was discovered for some genes in RNA mixtures. The adj-FCs of those genes with poison effect, indicate that the nonlinearity can be also caused by the inherent feature of the genes besides signal noise and technical variation. Moreover, when percentage of overlapping genes (POG) was used as a cross-platform consistency measure, adj-FC outperformed simple fold change to show that Affymetrix and Illumina platforms are consistent.
Availability: The R codes which implements all described methods, and some Supplementary material, are freely available from http://www.utdallas.edu/~ying.liu/BetaRatio.htm
Contact: ying.liu{at}utdallas.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
Received on September 25, 2006; revised on January 2, 2007; accepted on January 8, 2007