GenXHC: a probabilistic generative model for cross-hybridization compensation in high-density genome-wide microarray data
1Probabilistic and Statistical Inference Group, Department of Electrical and Computer Engineering, University of Toronto Toronto, ON, Canada M5S 3G4
2Banting and Best Department of Medical Research, University of Toronto Toronto, ON, Canada M5G 1L6
*To whom correspondence should be addressed.
Motivation: Microarray designs containing millions to hundreds of millions of probes that tile entire genomes are currently being released. Within the next 2 months, our group will release a microarray data set containing over 12 000 000 microarray measurements taken from 37 mouse tissues. A problem that will become increasingly significant in the upcoming era of genome-wide exon-tiling microarray experiments is the removal of cross-hybridization noise. We present a probabilistic generative model for cross-hybridization in microarray data and a corresponding variational learning method for cross-hybridization compensation, GenXHC, that reduces cross-hybridization noise by taking into account multiple sources for each mRNA expression level measurement, as well as prior knowledge of hybridization similarities between the nucleotide sequences of microarray probes and their target cDNAs.
Results: The algorithm is applied to a subset of an exon-resolution genome-wide Agilent microarray data set for chromosome 16 of Mus musculus and is found to produce statistically significant reductions in cross-hybridization noise. The denoised data is found to produce enrichment in multiple gene ontologybiological process (GOBP) functional groups. The algorithm is found to outperform robust multi-array analysis, another method for cross-hybridization compensation.
Contact: jim{at}psi.toronto.edu
Received on January 15, 2005; accepted on March 27, 2005
This article has been cited by other articles:
![]() |
L. A. Marcelino, V. Backman, A. Donaldson, C. Steadman, J. R. Thompson, S. P. Preheim, C. Lien, E. Lim, D. Veneziano, and M. F. Polz Accurately quantifying low-abundant targets amid similar sequences by revealing hidden correlations in oligonucleotide microarray data PNAS, September 12, 2006; 103(37): 13629 - 13634. [Abstract] [Full Text] [PDF] |
||||
