Bioinformatics Advance Access originally published online on June 6, 2007
Bioinformatics 2007 23(16):2088-2095; doi:10.1093/bioinformatics/btm306
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Characterization of mismatch and high-signal intensity probes associated with Affymetrix genechips
1SAIC-Frederick, Inc., NCI-Frederick, Frederick, Maryland 21702, 2Laboratory of Molecular Pharmacology, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20892, 3Microarray Core Facility, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA and 4Current address: Shanghai Institute of Materia Medica, Chinese Academy of Science, Shanghai, 201203, P.R. China
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: For Affymetrix microarray platforms, gene expression is determined by computing the difference in signal intensities between perfect match (PM) and mismatch (MM) probesets. Although the use of PM is not controversial, MM probesets have been associated with variance and ultimately inaccurate gene expression calls. A principal focus of this study was to investigate the nature of the MM signal intensities and demonstrate its contribution to the experimental results.
Results: While most MM intensities were likely associated with random noise, a subset of
20% (99 485) of the MM probes displayed relatively high signal intensities to the corresponding PM probes (MM > PM) in a non-random fashion; 13 440 of these probes demonstrated exceptionally high outlier intensities. About 15 938 PM probes also demonstrated exceptionally high outlier intensities consistently across all hybridizations. About 92% of the MM > PM probes had either a dThymidine (dT) or a dCytidine (dC) at the 13th position of the probe sequence. MM and PM probes displaying extremely high outlier intensities contained high dC rich nucleotides, and low dA contents at other nucleotides positions along the 25mer probe sequence. Differentially expressed genes generated using Genechip Operating System (GCOS) or modified PM-only methods were also examined. Of those candidate genes identified in the PM-only method, 157 of them were designated by GCOS as absent across all datasets and many others contained probes with MM > PM signal intensities. Our data suggests that MM intensity from PM signal can be a major source of error analysis, leading to fewer potentially biologically important candidate genes.
Contact: wangyong{at}mail.nih.gov
Supplementary information: Supplementary data are available at Bioinformatics online.
Associate Editor: Chris Stoeckert
Received on January 18, 2007; revised on May 30, 2007; accepted on June 1, 2007