Skip Navigation


Bioinformatics Advance Access originally published online on May 19, 2005
Bioinformatics 2005 21(15):3308-3311; doi:10.1093/bioinformatics/bti500
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/15/3308    most recent
bti500v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (10)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Margolin, A. A.
Right arrow Articles by Weber, B. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Margolin, A. A.
Right arrow Articles by Weber, B. L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

CGHAnalyzer: a stand-alone software package for cancer genome analysis using array-based DNA copy number data

Adam A. Margolin 1, Joel Greshock 1, Tara L. Naylor 1, Yael Mosse 2, John M. Maris 2, Graham Bignell 3, Alexander I. Saeed 4, John Quackenbush 4 and Barbara L. Weber 1,*

1Abramson Family Cancer Research Institute, University of Pennsylvania 514 BRB II/III, 421 Curie Blrd, Philadelphia, PA 19104, USA
2The Children's Hospital of Philadelphia Philadelphia, PA, USA
3Wellcome Trust Sanger Institute Hinxton, Cambridgeshire, UK
4The Institute for Genomic Research Rockville, MD, USA

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 INTRODUCTION
 PROGRAM OVERVIEW
 CONCLUSIONS
 REFERENCES
 

Summary: This synopsis provides an overview of array-based comparative genomic hybridization data display, abstraction and analysis using CGHAnalyzer, a software suite, designed specifically for this purpose. CGHAnalyzer can be used to simultaneously load copy number data from multiple platforms, query and describe large, heterogeneous datasets and export results. Additionally, CGHAnalyzer employs a host of algorithms for microarray analysis that include hierarchical clustering and class differentiation.

Availability: CGHAnalyzer, the accompanying manual, documentation and sample data are available for download at http://acgh.afcri.upenn.edu. This is a Java-based application built in the framework of the TIGR MeV that can run on Microsoft Windows, Macintosh OSX and a variety of Unix-based platforms. It requires the installation of the free Java Runtime Environment 1.4.1 (or more recent) (http://www.java.sun.com).

Contact: weberb{at}mail.med.upenn.edu


    INTRODUCTION
 TOP
 Abstract
 INTRODUCTION
 PROGRAM OVERVIEW
 CONCLUSIONS
 REFERENCES
 
Copy number assays
Genome copy number is widely regarded as an important aspect of identifying etiological factors in a range of human diseases (Weber, 2002). Complete and partial non-diploid genomes resulting from cytogenetic alterations have been implicated in the diagnosis of congenital disorders (Milunsky and Huang, 2003), as well as predictors of clinical outcomes of many cancer types (Look et al., 1991). Recently, various widely accessible microarray-based genome copy number measurement assays, including array-based comparative genomic hybridization (aCGH) (Snijders et al., 2001), dual-channel oligonucleotide assays (Lucito et al., 2003) and single channel ‘SNP chips’ (Affymetrix, Inc.) (Bignell et al., 2004) provide superior resolution to metaphase CGH using high-throughput platforms.

Most freely available microarray analysis packages are specifically designed to analyze gene expression data (reviewed in Dudoit et al., 2003). The great utility of these is their capability of loading multiple experiments and providing descriptive and analytical algorithms necessary to cope with a rapidly growing body of data; however, they are not designed to optimize data extraction and analysis from whole genome copy number datasets. Currently available software designed specifically for copy number assays are limited by single-experiment visualization (Chi et al., 2004), data abstraction from a single platform (Awad, 2004 http://dahlia.stanford.edu:8080/caryoscope/index.html), or copy number breakpoint analysis in a single sample (Eilers and de Menezes, 2005). CGHAnalyzer is a freely available, open source software suite that is designed specifically for analysis of multiple-experiment copy number data. CGHAnalyzer can display detailed copy number profiles for multiple experiments, query large datasets for minimal common regions of aberration, integrate other genomic features with copy number data (e.g. known/predicted genes), conduct higher order analyses such as hierarchical/k-means clustering and perform statistical tests to identify regions that are differentially altered between classes of samples. This software can apply these operations to large numbers of experiments from various platforms simultaneously and display them in a range of customizable interactive views. These include hotlinks to the Ensembl and UCSC Genome Browser databases.

The primary advantage of CGHAnalyzer over existing packages is the degree of flexibility of visualization and analysis. Estimating copy number changes in a single experiment or a set of experiments can be done with one of several methods provided by the software or one that may be imported from other packages (Myers et al., 2004). Unlike currently available software, CGHAnalyzer employs a generic genome coordinate-based framework that offers probe-independent analyses. This provides the capability of side-by-side analysis of experiments from many platforms and increases the utility of aCGH data in public repositories. Further, CGHAnalyzer's full algorithm set is designed specifically for microarray analysisand presents a wide range of analytical capabilities.


    PROGRAM OVERVIEW
 TOP
 Abstract
 INTRODUCTION
 PROGRAM OVERVIEW
 CONCLUSIONS
 REFERENCES
 
Loading data
To highlight the capacity of CGHAnalyzer, we obtained six publicly available datasets for demonstrating the steps in configuring, processing and analyzing data (Table 1). These datasets were chosen to (1) compare data from multiple array platforms, (2) query and annotate large datasets and (3) apply commonly used microarray algorithms to copy number datasets. Although more detailed annotations can be incorporated, the minimal amount of information required to use the CGHAnalyzer are (1) a probe set with an associated copy number metric and (2) a genome location for each probe. CGHAnalyzer use an UCSC Human Genome Server-based coordinate system (http://genome.ucsc.edu) that can be configured to retrieve data from a database; however, datasets can be loaded from a number of standard file formats. Normalized input can include that from a peripheral application employing global normalization procedures (available at http://acgh.afcri.upenn.edu) or output from any number of published protocols (Kim et al., 2005).


View this table:
[in this window]
[in a new window]
 
Table 1 Data were collected from a series of publicly available human genome copy number datasets from several platforms

 
Visualizing data
After data are loaded into CGHAnalyzer, a graphical interface prompts users for the essential data required for display and analysis, then copy number designations for each probe are made based on a user-selected method of either a standard ratio threshold, or P-value derived from a series of reference experiments. The primary graphical window used for all analyses is a series of ideograms that depict the color-coded copy number status for every probe of each sample in an ordered column, a view standard for multiple-sample genome copy number visualization (Fig. 1A). To analyze the copy number of regions not directly covered by an experiment probe set, and for cross-platform analysis where few, if any, probes are common to all samples, CGHAnalyzer uses a copy number assignment algorithm that estimates the copy number status for sequences between probes, allowing estimates for all regions of the entire genome. Aberrant regions of any sample are defined as the sequence extending from an aberrant probe to a neighboring probe of differing copy number. Consequently, cross-experiment analysis has no probe dependence; assays of any platform can be simultaneously scrutinized without the requirement that probes are shared between them.



View larger version (28K):
[in this window]
[in a new window]
 
Fig. 1 Display windows allow the simultaneous visualization of many samples and provide mouse-driven direct access to raw data. (A) An estimated copy number comparison of chromosome 10 for the breast cancer cell line SKBR3 shows a high degree of concordance across four platforms. Regions of identified losses (red, seen on left) and gains (green, seen on right) are shown in separate views along chromosome 10. (B) Scatter plot of the test to reference ratio along chromosome 10 for the breast cancer cell line SKBR3 across four platforms, identifying a localized single copy loss spanning ~4 Mb at 10p12.1 (region A) and a larger region of gain (~50–90 Mb; region B). The distribution of probes between assays can be used to infer a minimal region of aberration in both cases. (C) The five tumor samples appear to have a more highly localized region of amplification than do eight neuroblastoma cell lines. The MYCN amplicon is highlighted in yellow.

 
Copy number assay data for breast cancer cell line SKBR3 from four platforms (Table 1) demonstrate the utility of this protocol. Little probe homology exists between these platforms. Furthermore, the mean direct sequence coverage of one platform by another is 11.3% ({sigma} = 8.6, n = 12), with a maximum of 25.0% of Platform 2 covered by Platform 1. Despite the lack of coverage similarity, full genome copy number estimations of SKBR3 were highly concordant between platforms, with all confirming the large regions of 10q gain (~55–85 Mb) and 10q loss (~90–120 Mb), previously described by non array-based analysis (Davidson et al., 2000) (Fig. 1A and B). Additionally, all displayed regions are hot-linked directly to both the UCSC Genome Browser and Ensembl.

Querying data
A detailed knowledge of the most frequent aberrant regions can provide important insight into the underlying disease. Thus, the identification and annotation of these regions is a fundamental operation in genome copy number data analysis. CGHAnalyzer offers a sophisticated query interface for identifying aberrant entities in a dataset. Users can query for the most commonly aberrant probes, genes or customized feature. For example, amplification of MYCN (2p24.3), a known clinical prognostic indicator in neuroblastoma (Look et al., 1991), occurs with variable amplicon size (Reiter and Brodeur, 1996). In this example, nine primary neuroblastoma tumors were compared to ten neuroblastoma tumor-derived cell lines (Table 1, Platforms 1 and 4, respectively). The minimal common region of amplification for all samples with an MYCN amplicon (Fig. 1C), obtained from converting all platform data units into genome regions, is a 600 kb region of copy number gain (16.0–16.6 Mb).

Analyzing data
All of the algorithms for data mining implemented in TIGR's TM4 software (Saeed et al., 2003) have been adapted to handle higher order copy number data analysis in CGHAnalyzer. For example, the copy number status of selected gene sets may be meaningful classifiers for specific cancer types. Comparing the copy number status of 512 cancer-related genes (Futreal et al., 2004) among 15 breast tumors, 9 neuroblastoma tumors and 10 neuroblastoma cell lines [Platforms 1 and 3 (breast tumors); 1 and 4 (neuroblastoma tumors and cell lines)] using the T-test algorithm in CGHAnalyzer, which uses a conservative step-down max-T adjustment for multiple comparisons (Dudoit et al., 2003), identified a collection of genes important for class differentiation. Although both of these tumor types have a high rate of copy number variation, of the 42 genes deleted in >25% of either group, only deletion of PPP2R1B, a protein phosphotase at 11q22–24 that is lost in many human cancers (Wang et al., 1998), differed significantly between tumor groups (P=0.002). A deletion of this gene occurred in 76% of breast tumors and 19% of neuroblastoma tumors. A much greater number of gains (126 genes gained in >25% of samples) differentiated groups (n=14, P ≤ 0.005), and as expected, gains of MYCN (2p24.1) in neuroblastoma and PTK2 (8q24) in breast tumors distinguished the groups.


    CONCLUSIONS
 TOP
 Abstract
 INTRODUCTION
 PROGRAM OVERVIEW
 CONCLUSIONS
 REFERENCES
 
The CGHAnalyzer is a free, flexible utility for genome copy number analysis. It increases the accessibility of high-resolution copy number data by applying an assay-independent data structure. For copy number analyses, it provides elementary visualization tools and advanced data-mining algorithms with an interactive, user-friendly interface; functions important for studying cytogenetic disorders and cancer genomes.

Conflict of Interest: Barbara L. Weber declares that she is a full time employee of GlaxoSmithKline and will use this software for research sponsored by and internal to this company.

Received on March 14, 2005; revised on May 2, 2005; accepted on May 13, 2005

    REFERENCES
 TOP
 Abstract
 INTRODUCTION
 PROGRAM OVERVIEW
 CONCLUSIONS
 REFERENCES
 

    Awad, I.A.B. (2004) Caryoscope. , Stanford, CA Stanford University.

    Beheshti, B., et al. (2003) Chromosomal localization of DNA amplifications in neuroblastoma tumors using cDNA microarray comparative genomic hybridization. Neoplasia, 5, 53–62[Web of Science][Medline].

    Bignell, G.R., et al. (2004) High-resolution analysis of DNA copy number using oligonucleotide microarrays. Genome Res., 14, 287–295[Abstract/Free Full Text].

    Chi, B., et al. (2004) SeeGH—a software tool for visualization of whole genome array comparative genomic hybridization data. BMC Bioinformatics, 5, 13[CrossRef][Medline].

    Davidson, J.M., et al. (2000) Molecular cytogenetic analysis of breast cancer cell lines. Br. J. Cancer, 83, 1309–1317[CrossRef][Web of Science][Medline].

    Dudoit, S., et al. (2003) Open source software for the analysis of microarray data. Biotechniques, Suppl., 45–51.

    Dudoit, S., et al. (2003) Multiple hypothesis testing in microarray experiments. Stat. Sci., 18, 71–103[CrossRef][Web of Science].

    Eilers, P.H. and de Menezes, R.X. (2005) Quantile smoothing of array CGH data. Bioinformatics, 21, 1146–1153[Abstract/Free Full Text].

    Futreal, P.A., et al. (2004) A census of human cancer genes. Nat. Rev. Cancer, 4, 177–183[CrossRef][Web of Science][Medline].

    Greshock, J., et al. (2004) 1-Mb resolution array-based comparative genomic hybridization using a BAC clone set optimized for cancer gene analysis. Genome Res., 14, 179–187[Abstract/Free Full Text].

    Kim, S.Y., et al. (2005) ArrayCyGHt: a web application for analysis and visualization of arrayCGH data. Bioinformatics, 21, 2554–2555[Abstract/Free Full Text].

    Look, A.T., et al. (1991) Clinical relevance of tumor cell ploidy and N-myc gene amplification in childhood neuroblastoma: a Pediatric Oncology Group study. J. Clin. Oncol., 9, 581–591[Abstract].

    Lucito, R., et al. (2003) Representational oligonucleotide microarray analysis: a high-resolution method to detect genome copy number variation. Genome Res., 13, 2291–2305[Abstract/Free Full Text].

    Milunsky, J.M. and Huang, X.L. (2003) Unmasking Kabuki syndrome: chromosome 8p22–8p23.1 duplication revealed by comparative genomic hybridization andBAC-FISH. Clin. Genet., 64, 509–516[CrossRef][Web of Science][Medline].

    Myers, C.L., et al. (2004) Accurate detection of aneuploidies in array CGH and gene expression microarray data. Bioinformatics, 20, 3533–3543[Abstract/Free Full Text].

    Pollack, J.R., et al. (2002) Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors. Proc. Natl Acad. Sci. USA, 99, 12963–12968[Abstract/Free Full Text].

    Reiter, J.L. and Brodeur, G.M. (1996) High-resolution mapping of a 130-kb core region of the MYCN amplicon in neuroblastomas. Genomics, 32, 97–103[CrossRef][Web of Science][Medline].

    Saeed, A.I., et al. (2003) TM4: a free, open-source system for microarray data management and analysis. Biotechniques, 34, 374–378[Web of Science][Medline].

    Snijders, A.M., et al. (2001) Assembly of microarrays for genome-wide measurement of DNA copy number. Nat. Genet., 29, 263–264[CrossRef][Web of Science][Medline].

    Wang, S.S., et al. (1998) Alterations of the PPP2R1B gene in human lung and colon cancer. Science, 282, 284–287[Abstract/Free Full Text].

    Weber, B.L. (2002) Cancer genomics. Cancer Cell, 1, 37–47[CrossRef][Web of Science][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Clin. Cancer Res.Home page
L. Melchor, E. Honrado, J. Huang, S. Alvarez, T. L. Naylor, M. J. Garcia, A. Osorio, D. Blesa, M. R. Stratton, B. L. Weber, et al.
Estrogen Receptor Status Could Modulate the Genomic Pattern in Familial and Sporadic Breast Cancer
Clin. Cancer Res., December 15, 2007; 13(24): 7305 - 7313.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
P. L. Rosa, E. Viara, P. Hupe, G. Pierron, S. Liva, P. Neuvial, I. Brito, S. Lair, N. Servant, N. Robine, et al.
VAMP: Visualization and analysis of array-CGH, transcriptome and other molecular profiles
Bioinformatics, September 1, 2006; 22(17): 2066 - 2073.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
L. Zhang, J. Huang, N. Yang, J. Greshock, M. S. Megraw, A. Giannakakis, S. Liang, T. L. Naylor, A. Barchetti, M. R. Ward, et al.
microRNAs exhibit high frequency genomic alterations in human cancer
PNAS, June 13, 2006; 103(24): 9136 - 9141.
[Abstract] [Full Text] [PDF]


Home page
Cancer Res.Home page
L. Zhang, J. Huang, N. Yang, S. Liang, A. Barchetti, A. Giannakakis, M. G. Cadungog, A. O'Brien-Jenkins, M. Massobrio, K. F. Roby, et al.
Integrative Genomic Analysis of Protein Kinase C (PKC) Family Identifies PKC{iota} as a Biomarker and Potential Oncogene in Ovarian Carcinoma.
Cancer Res., May 1, 2006; 66(9): 4627 - 4635.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
C. Rouveirol, N. Stransky, Ph. Hupe, Ph. L. Rosa, E. Viara, E. Barillot, and F. Radvanyi
Computation of recurrent minimal genomic alterations from array-CGH data
Bioinformatics, April 1, 2006; 22(7): 849 - 856.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/15/3308    most recent
bti500v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (10)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Margolin, A. A.
Right arrow Articles by Weber, B. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Margolin, A. A.
Right arrow Articles by Weber, B. L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?