Bioinformatics Advance Access published online on January 18, 2008
Bioinformatics, doi:10.1093/bioinformatics/btm601
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sparse representation and Bayesian detection of genome copy number alterations from microarray data
aSignal and Image Processing Institute, Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, EEB 400, 3740 McClintock Ave, Los Angeles, CA 90089-2564, USA.
bDivision of Hematology - Oncology, Childrens Hospital Los Angeles, Department of Pediatrics, Keck School of Medicine, University of Southern California
*To whom correspondence should be addressed. Dr. Shahab Asgharzadeh, E-mail: jpei{at}chop.swmed.edu, Roger Pique-Regi, rpique{at}ieee.org
| Abstract |
|---|
Motivation: Genomic instability in cancer leads to abnormal genome copy number alterations (CNA) that are associated with the development and behavior of tumors. Advances in microarray technology have allowed for greater resolution in detection of DNA copy number changes (amplifications or deletions) across the genome. However, the increase in number of measured signals and accompanying noise from the array probes present a challenge in accurate and fast identification of breakpoints that define CNA. This paper proposes a novel detection technique that exploits the use of piece-wise constant (PWC) vectors to represent genome copy number and sparse Bayesian learning (SBL) to detect CNA breakpoints.
Methods: First, a compact linear algebra representation for the genome copy number is developed from normalized probe intensities. Second, SBL is applied and optimized to infer locations where copy number changes occur. Third, a backward elimination (BE) procedure is used to rank the inferred breakpoints; and a cut-off point can be efficiently adjusted in this procedure to control for the false discovery rate (FDR).
Results: The performance of our algorithm is evaluated using simulated and real genome datasets and compared to other existing techniques. Our approach achieves the highest accuracy and lowest FDR while improving computational speed by several orders of magnitude. The proposed algorithm has been developed into a free standing software application (GADA, Genome Alteration Detection Algorithm).
Availability: http://biron.usc.edu/~piquereg/GADA
Contact: rpique{at}ieee.org; shahab{at}chla.usc.edu
Associate Editor: Dr. Chris Stoeckert
Received on May 8, 2007; revised on October 26, 2007; accepted on November 30, 2007