Bioinformatics Advance Access originally published online on January 19, 2008
Bioinformatics 2008 24(6):751-758; doi:10.1093/bioinformatics/btn003
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A segmental maximum a posteriori approach to genome-wide copy number profiling
1The Linnaeus Centre for Bioinformatics, Uppsala University, 751 24 Uppsala, Sweden, 2Department of Genetics, University of Alabama at Birmingham, Birmingham AL 35294-0024, USA, 3Department of Genetics and Pathology, Rudbeck Laboratory, Uppsala University, 4Department of Surgical Sciences, Uppsala University Hospital, 751 85 Uppsala, Sweden and 5Interdisciplinary Center for Mathematical and Computational Modelling, Warsaw University, 02-106 Warsaw, Poland
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: Copy number profiling methods aim at assigning DNA copy numbers to chromosomal regions using measurements from microarray-based comparative genomic hybridizations. Among the proposed methods to this end, Hidden Markov Model (HMM)-based approaches seem promising since DNA copy number transitions are naturally captured in the model. Current discrete-index HMM-based approaches do not, however, take into account heterogeneous information regarding the genomic overlap between clones. Moreover, the majority of existing methods are restricted to chromosome-wise analysis.
Results: We introduce a novel Segmental Maximum A Posteriori approach, SMAP, for DNA copy number profiling. Our method is based on discrete-index Hidden Markov Modeling and incorporates genomic distance and overlap between clones. We exploit a priori information through user-controllable parameterization that enables the identification of copy number deviations of various lengths and amplitudes. The model parameters may be inferred at a genome-wide scale to avoid overfitting of model parameters often resulting from chromosome-wise model inference. We report superior performances of SMAP on synthetic data when compared with two recent methods. When applied on our new experimental data, SMAP readily recognizes already known genetic aberrations including both large-scale regions with aberrant DNA copy number and changes affecting only single features on the array. We highlight the differences between the prediction of SMAP and the compared methods and show that SMAP accurately determines copy number changes and benefits from overlap consideration.
Availability: SMAP is available from Bioconductor and within the Linnaeus Centre for Bioinformatics Data Warehouse.
Contact: Jan.Komorowski{at}lcb.uu.se
Supplementary information: Supplementary data are available at http://www.lcb.uu.se/papers/r_andersson/SMAP/
Associate Editor: Alex Bateman
Received on December 19, 2007; revised on December 19, 2007; accepted on January 2, 2008