Bioinformatics Advance Access published online on June 6, 2007
Bioinformatics, doi:10.1093/bioinformatics/btm306
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Characterization of Mismatch and High-signal Intensity Probes Associated with Affymetrix Genechips
1SAIC-Frederick, Inc., NCI-Frederick, Frederick, Maryland 21702, 2Laboratory of Molecular Pharmacology, Center for Cancer Research, National Cancer Institute, Bethesda, MD, 20892, 3Microarray Core Facility, National Cancer Institute, Na-tional Institutes of Health, Bethesda, MD, 20892, 4Current address: Shanghai Institute of Materia Medica, Chinese Academy of Science, Shanghai, 201203, P.R. China
*To whom correspondence should be addressed. Dr. Yonghong Wang, E-mail: wangyong{at}mail.nih.gov
| Abstract |
|---|
Motivation: For Affymetrix microarray platforms, gene expression is determined by computing the difference in signal intensities between Perfect Match (PM) and Mis-Match (MM) probesets. Although the use of PM is not controversial, MM probesets have been associated with variance and ultimately inaccurate gene expression calls. A principal focus of this study was to investigate the nature of the MM signal intensities and demonstrate its contribution to the experimental results.
Results: While most MM intensities were likely associated with random noise, a subset of approximately 20% (99485) of the MM probes displayed relatively high signal intensities to the corresponding PM probes (MM>PM) in a non-random fashion; 13440 of these probes demonstrated exceptionally high "outlier" intensities. About 15938 PM probes also demonstrated exceptionally high outlier intensities consistently across all hybridizations. About 92% of the MM>PM probes had either a dThymidine (dT) or a dCytidine (dC) at the 13th position of the probe sequence. MM and PM probes displaying extremely high outlier intensities contained high dC rich nucleotides, and low dA contents at other nucleotides positions along the 25mer probe sequence. Differentially expressed genes generated using Genechip Operating System (GCOS) or modified PM-only methods were also examined. Of those candidate genes identified in the PM-only method, 157 of them were designated by GCOS as absent across all data-sets and many others contained probes with MM7gt;PM signal intensities. Our data suggests that MM intensity from PM signal can be a major source of error analysis, leading to fewer potentially biologically important candi-date genes.
Associate Editor: Dr. Chris Stoeckert
Received on January 18, 2007; revised on May 30, 2007; accepted on June 1, 2007