Bioinformatics Advance Access published online on January 19, 2007
Bioinformatics, doi:10.1093/bioinformatics/btm002
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Modeling Nonlinearity in Dilution Design Microarray Data
aDepartment of Mathematical Sciences, bDepartment of Computer Science, cDepartment of Molecular and Cell Biology, University of Texas at Dallas, Richardson, TX 75083-0688, USA dDepartment of Statistics, Iowa State University, Ames, IA 50011, USA eMicroarray Core facility, University of Texas Southwestern Medical Center, Dallas, TX 75390 USA
*to whom correspondence should be addressed. Ying Liu, E-mail: ying.liu{at}utdallas.edu
| Abstract |
|---|
Motivation: Dilution design (Mixed tissue RNA) has been utilized by some researchers to evaluate and assess the performance of multiple microarray platforms. Current microarray data analysis approaches assume that the quantified signal intensities are linearly related to the expression of the corresponding genes in the sample. However, there are sources of nonlinearity in microarray expression measurements (Ramdas et al., 2001). Such nonlinearity study in the expressions of the RNA mixtures provides a new way to analyze gene expression data, and we argue that the nonlinearity can reveal novel information for microarray data analysis. Therefore, we proposed a statistical model, called proportion model, which is based on the linear regression analysis. To approximately quantify the nonlinearity in the dilution design, a new calibration, Beta Ratio (BR) was derived from the proportion model. Furthermore, a new adjusted fold change (adj-FC) was proposed to predict the true FC without nonlinearity, in particular for large FC.
Results: We applied our method to the microarray dilution data set used by Barnes et al. (2005). The experimental results indicated that, to some extent, there are global biases comparing with the linear assumption for the significant genes. Further analysis of those highly expressed genes with significant nonlinearity revealed some promising results, e.g. "poison" effect was discovered for some genes in RNA mixtures. The adj-FCs of those genes with "poison" effect, indicate that the nonlinearity can be also caused by the inherent feature of the genes besides signal noise and technical variation. Moreover, when Percentage of Overlapping Genes (POG) was used as a crossplatform consistency measure (Shi et al., 2005), adj-FC outperformed simple fold change to show that Affymetrix and Illumina platforms are consistent.
Availability: The R codes which implements all described methods, and some supplementary data, are freely available from http://www.utdallas.edu/~ying.liu/BetaRatio.htm
Received on September 25, 2006; revised on January 2, 2007; accepted on January 8, 2007