Bioinformatics Advance Access originally published online on June 17, 2009
Bioinformatics 2009 25(17):2149-2156; doi:10.1093/bioinformatics/btp371
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A single-array preprocessing method for estimating full-resolution raw copy numbers from all Affymetrix genotyping arrays including GenomeWideSNP 5 & 6
1 Department of Statistics, University of California, Berkeley, USA, 2 Bioinformatics Core Facility, Swiss Institute of Bioinformatics, Lausanne, Switzerland and 3 Bioinformatics Division, Walter & Eliza Hall Institute of Medical Research, Parkville, Australia
* To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: High-resolution copy-number (CN) analysis has in recent years gained much attention, not only for the purpose of identifying CN aberrations associated with a certain phenotype, but also for identifying CN polymorphisms. In order for such studies to be successful and cost effective, the statistical methods have to be optimized. We propose a single-array preprocessing method for estimating full-resolution total CNs. It is applicable to all Affymetrix genotyping arrays, including the recent ones that also contain non-polymorphic probes. A reference signal is only needed at the last step when calculating relative CNs.
Results: As with our method for earlier generations of arrays, this one controls for allelic crosstalk, probe affinities and PCR fragment-length effects. Additionally, it also corrects for probe sequence effects and co-hybridization of fragments digested by multiple enzymes that takes place on the latest chips. We compare our method with Affymetrix's CN5 method and the dChip method by assessing how well they differentiate between various CN states at the full resolution and various amounts of smoothing. Although CRMA v2 is a single-array method, we observe that it performs as well as or better than alternative methods that use data from all arrays for their preprocessing. This shows that it is possible to do online analysis in large-scale projects where additional arrays are introduced over time.
Availability: A bounded-memory implementation that can process any number of arrays is available in the open source R package aroma.affymetrix.
Contact: hb{at}stat.berkeley.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
Associate Editor: John Quackenbush
Received on November 4, 2008; revised on June 1, 2009; accepted on June 11, 2009