Bioinformatics Advance Access originally published online on January 18, 2007
Bioinformatics 2007 23(6):657-663; doi:10.1093/bioinformatics/btl646
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A faster circular binary segmentation algorithm for the analysis of array CGH data
Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, New York, NY 10021, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: Array CGH technologies enable the simultaneous measurement of DNA copy number for thousands of sites on a genome. We developed the circular binary segmentation (CBS) algorithm to divide the genome into regions of equal copy number. The algorithm tests for change-points using a maximal t-statistic with a permutation reference distribution to obtain the corresponding P-value. The number of computations required for the maximal test statistic is O(N2), where N is the number of markers. This makes the full permutation approach computationally prohibitive for the newer arrays that contain tens of thousands markers and highlights the need for a faster algorithm.
Results: We present a hybrid approach to obtain the P-value of the test statistic in linear time. We also introduce a rule for stopping early when there is strong evidence for the presence of a change. We show through simulations that the hybrid approach provides a substantial gain in speed with only a negligible loss in accuracy and that the stopping rule further increases speed. We also present the analyses of array CGH data from breast cancer cell lines to show the impact of the new approaches on the analysis of real data.
Availability: An R version of the CBS algorithm has been implemented in the "DNAcopy" package of the Bioconductor project. The proposed hybrid method for the P-value is available in version 1.2.1 or higher and the stopping rule for declaring a change early is available in version 1.5.1 or higher.
Contact: venkatre{at}mskcc.org
Supplementary information: Supplementary data are available at Bioinformatics online.
Associate Editor: Chris Stoeckert
Received on June 6, 2006; revised on December 12, 2006; accepted on December 18, 2006
This article has been cited by other articles:
![]() |
B I Dimitrov, T de Ravel, J Van Driessche, C de Die-Smulders, A Toutain, J R Vermeesch, J P Fryns, K Devriendt, and P Debeer Distal limb deficiencies, micrognathia syndrome, and syndromic forms of split hand foot malformation (SHFM) are caused by chromosome 10q genomic rearrangements J. Med. Genet., February 1, 2010; 47(2): 103 - 111. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Unger, J. Wienberg, A. Riches, L. Hieber, A. Walch, A. Brown, P. C M O'Brien, C. Briscoe, L. Gray, E. Rodriguez, et al. Novel gene rearrangements in transformed breast cells identified by high-resolution breakpoint analysis of chromosomal aberrations Endocr. Relat. Cancer, January 29, 2010; 17(1): 87 - 98. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. P.P. van Houte and J. Heringa Accurate confidence aware clustering of array CGH tumor profiles Bioinformatics, January 1, 2010; 26(1): 6 - 14. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Navin, A. Krasnitz, L. Rodgers, K. Cook, J. Meth, J. Kendall, M. Riggs, Y. Eberling, J. Troge, V. Grubor, et al. Inferring tumor progression from genomic heterogeneity Genome Res., January 1, 2010; 20(1): 68 - 80. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Shen, A. B. Olshen, and M. Ladanyi Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis Bioinformatics, November 15, 2009; 25(22): 2906 - 2912. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Bicciato, R. Spinelli, M. Zampieri, E. Mangano, F. Ferrari, L. Beltrame, I. Cifola, C. Peano, A. Solari, and C. Battaglia A computational procedure to identify significant overlap of differentially expressed and genomic imbalanced regions in cancer datasets Nucleic Acids Res., August 1, 2009; 37(15): 5057 - 5070. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. M. Rueda and R. Diaz-Uriarte RJaCGH: Bayesian analysis of aCGH arrays for detecting copy number changes and recurrent regions Bioinformatics, August 1, 2009; 25(15): 1959 - 1960. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Daines, H. Wang, Y. Li, Y. Han, R. Gibbs, and R. Chen High-Throughput Multiplex Sequencing to Discover Copy Number Variants in Drosophila Genetics, August 1, 2009; 182(4): 935 - 941. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. LaFramboise Single nucleotide polymorphism arrays: a decade of biological, computational and technological advances Nucleic Acids Res., July 1, 2009; (2009) gkp552v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. T. Hennessy, A.-M. Gonzalez-Angulo, K. Stemke-Hale, M. Z. Gilcrease, S. Krishnamurthy, J.-S. Lee, J. Fridlyand, A. Sahin, R. Agarwal, C. Joy, et al. Characterization of a Naturally Occurring Breast Cancer Subset Enriched in Epithelial-to-Mesenchymal Transition and Stem Cell Characteristics Cancer Res., May 15, 2009; 69(10): 4116 - 4124. [Abstract] [Full Text] [PDF] |
||||
![]() |
K.-T. Kuo, B. Guan, Y. Feng, T.-L. Mao, X. Chen, N. Jinawath, Y. Wang, R. J. Kurman, I.-M. Shih, and T.-L. Wang Analysis of DNA Copy Number Alterations in Ovarian Serous Tumors Identifies New Molecular Genetic Changes in Low-Grade and High-Grade Carcinomas Cancer Res., May 1, 2009; 69(9): 4036 - 4042. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Nilsson, M. Johansson, F. Al-Shahrour, A. E. Carpenter, and B. L. Ebert Ultrasome: efficient aberration caller for copy number studies of ultra-high resolution Bioinformatics, April 15, 2009; 25(8): 1078 - 1079. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Bengtsson, A. Ray, P. Spellman, and T. P. Speed A single-sample method for normalizing and combining full-resolution copy numbers from multiple platforms, labs and analysis methods Bioinformatics, April 1, 2009; 25(7): 861 - 867. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Budinska, E. Gelnarova, and M. G. Schimek MSMAD: a computationally efficient method for the analysis of noisy array CGH data Bioinformatics, March 15, 2009; 25(6): 703 - 713. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. LaFramboise, W. Winckler, and R. K. Thomas A flexible rank-based framework for detecting copy number aberrations from array data Bioinformatics, March 15, 2009; 25(6): 722 - 728. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. F. Attiyeh, S. J. Diskin, M. A. Attiyeh, Y. P. Mosse, C. Hou, E. M. Jackson, C. Kim, J. Glessner, H. Hakonarson, J. A. Biegel, et al. Genomic copy number determination in cancer cells from single nucleotide polymorphism microarrays based on quantitative genotyping corrected for aneuploidy Genome Res., February 1, 2009; 19(2): 276 - 283. [Abstract] [Full Text] [PDF] |
||||
![]() |
L.-y. Wang, A. Abyzov, J. O. Korbel, M. Snyder, and M. Gerstein MSB: A mean-shift-based approach for the analysis of structural variation in the genome Genome Res., January 1, 2009; 19(1): 106 - 117. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. E. Corver, A. Middeldorp, N. T. ter Haar, E. S. Jordanova, M. van Puijenbroek, R. van Eijk, C. J. Cornelisse, G. J. Fleuren, H. Morreau, J. Oosting, et al. Genome-wide Allelic State Analysis on Flow-Sorted Tumor Fractions Provides an Accurate Measure of Chromosomal Aberrations Cancer Res., December 15, 2008; 68(24): 10333 - 10340. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Ionita-Laza, N. M. Laird, B. A. Raby, S. T. Weiss, and C. Lange On the frequency of copy number variants Bioinformatics, October 15, 2008; 24(20): 2350 - 2355. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Erdman and J. W. Emerson A fast Bayesian change point analysis for the segmentation of microarray data Bioinformatics, October 1, 2008; 24(19): 2143 - 2148. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-I H. Chen, F.-H. Hsu, Y. Jiang, M.-H. Tsai, P.-C. Yang, P. S. Meltzer, E. Y. Chuang, and Y. Chen A probe-density-based analysis method for array CGH data: simulation, normalization and centralization Bioinformatics, August 15, 2008; 24(16): 1749 - 1756. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Andersson, C. E. G. Bruder, A. Piotrowski, U. Menzel, H. Nord, J. Sandgren, T. R. Hvidsten, T. Diaz de Stahl, J. P. Dumanski, and J. Komorowski A segmental maximum a posteriori approach to genome-wide copy number profiling Bioinformatics, March 15, 2008; 24(6): 751 - 758. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Pique-Regi, J. Monso-Varona, A. Ortega, R. C. Seeger, T. J. Triche, and S. Asgharzadeh Sparse representation and Bayesian detection of genome copy number alterations from microarray data Bioinformatics, February 1, 2008; 24(3): 309 - 318. [Abstract] [Full Text] [PDF] |
||||
![]() |
N M C Maas, G Van Buggenhout, F Hannes, B Thienpont, D Sanlaville, K Kok, A Midro, J Andrieux, B-M Anderlid, J Schoumans, et al. Genotype-phenotype correlation in 21 patients with Wolf-Hirschhorn syndrome using high resolution array comparative genome hybridisation (CGH) J. Med. Genet., February 1, 2008; 45(2): 71 - 80. [Abstract] [Full Text] [PDF] |
||||






