Skip Navigation


Bioinformatics Advance Access originally published online on May 6, 2005
Bioinformatics 2005 21(14):3193-3194; doi:10.1093/bioinformatics/bti489
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/14/3193    most recent
bti489v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (8)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by van de Wiel, M. A.
Right arrow Articles by Ylstra, B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by van de Wiel, M. A.
Right arrow Articles by Ylstra, B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

CGHMultiArray: exact P-values for multi-array comparative genomic hybridization data

Mark A. van de Wiel 1,3,*, Serge J. Smeets 2, Ruud H. Brakenhoff 2 and Bauke Ylstra 3

1Department of Mathematics and Computer Science, Technische Universiteit Eindhoven PO Box 513, 5600 MB, Eindhoven, The Netherlands
2Department of Otolaryngology—Head and Neck Surgery, VU University Medical Center Amsterdam, The Netherlands
3Microarray Core Facility, VU University Medical Center Amsterdam, The Netherlands

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 REFERENCES
 

Summary: We compute P-values, based on the Wilcoxon test with ties, to compare two conditions with array comparative genomic hybridization data, and we provide a simple interface to export and plot these P-values.

Availability: CGHMultiArray is freely available at http://www.win.tue.nl/~markvdw/CGHMultiArray.html

Contact: m.a.v.d.wiel{at}tue.nl

Supplementary information: Programs, the manual and supplementary information are available on the website.

Array comparative genomic hybridization (array CGH) is applied to the detection of genomic abnormalities in cancer and inheritable DNA copy number aberrations that cause genetic disorders. It is a high-resolution, high-throughput technique that allows for genome-wide measurement of chromosomal DNA copy number changes and determination of the associated breakpoints along the chromosomes (Oostlander et al., 2004).

Software such as aCGHsmooth (Jong et al., 2004) and similar programs (Olshen et al., 2004) enables the visualization and identification of aberrated chromosomal regions by individual separately. CGH-Miner (Wang et al., 2005) has additional features to summarize alteration information over groups. We developed CGHMultiArray to integrate array CGH data over individuals by computing P-values per clone and visualizing these to find generic patterns. The program deals with the most common situation: comparison of two conditions.

When considering a suitable statistic to measure generic DNA copy number changes among individuals, we have to consider the nature of array CGH data. Although technical errors may disperse the data somewhat, the data in reality represent discrete levels of genetic aberrations. The normal DNA copy number of mammalian clones is two: one from both the paternal and maternal chromosomes. In particular diseases, such as cancer, changes in the DNA copy number with respect to the ‘normal’ value may occur as a ‘deletion’ (at least one copy is lost) or a ‘gain’ (at least one additional copy is present). These non-normal levels may be further detailed, e.g. by including ‘amplification’, which is a high level of copy number gains.

The granularity of (discretized) CGH data makes the t-statistic, or variations thereof, unsuitable. Moreover, the discrete levels possess a natural ordering, which rules out the Fisher exact test. The Wilcoxon test makes explicit use of both features. However, the data naturally contain many ties, i.e. sets of equal observations. The distribution of the Wilcoxon statistic, and consequently the P-values, depends on the tie structure (Hájek et al., 1999). Hence, it has to be re-computed for each new case.

Define the Wilcoxon statistic W as the sum of the mid-ranks assigned to the smallest sample. The observed value of W is denoted by w. The two-sided P-value is then defined by 2P(W ≤ w) if w ≤ E(W), and 2P(W ≥ w) otherwise, where probabilities and expected values are computed under the null hypothesis of equally likely permutations of the mid-ranks. Since the number of tests is of the order of thousands, one needs a fast calculation method. Moreover, asymptotic theory is often not applicable, because the number of biological replicates per condition is small and the presence of ties worsens the accuracy of asymptotic approximations. For example, when both sample sizes equal eight, cases with asymptotic P-value approximations in the range 0.0001–0.05 correspond to 2–3 times larger exact (true) P-values, which leads to more than a doubling of the number of false calls when using the approximations.

Therefore, a fast algorithm to compute exact P-values is needed. The relevance of such algorithms to solve bioinformatics problems was recently shown by Bejerano et al. (2004). We developed the split-up algorithm (van de Wiel, 2001), which suits the requirements well: it is fast, exact and deals with ties. The algorithm represents the probability distribution of the test statistic under the null hypothesis (‘no change between two conditions’) as a generating function. Baglivo et al. (1996) showed that generating functions are powerful tools to represent null distributions of discrete test statistics. We used the generating function introduced by Streitberg and Röhmel (1986). This generating function is a polynomial in product form, expansion of which would reveal the entire null distribution, but this may be time consuming. The split-up algorithm splits the product into two parts and requires the expansion of these two smaller parts, which is several orders of magnitude faster than full expansion. Then, these two results are efficiently combined to compute the P-value. CGHMultiArray is written in Mathematica (Wolfram, 1999). The basic algorithm is also available as R code and as an executable. To make the algorithm easily accessible, we provide a web implementation too.

The website provides a tool to convert smoothed log2-ratios from other software to input data for CGHMultiArray. First, it transforms observed array CGH log2 values to discretized data: ‘1’ for gains, ‘–1’ for losses and ‘0’ for normals. It allows for the introduction of extra levels for amplifications or double deletions. Next, it counts occurrences of all levels for both conditions.

The input for CGHMultiArray then consists of a simple text file, the rows of which represent clones in the chromosomal order. When {ell} is the number of levels used (three in this example), the first (second) {ell} columns represent counts of the number of control (treatment) samples attaining level j, j = 1,..., {ell}. Optional columns may provide name, chromosomal information and base pair position information, which allows for separate P-value plots by chromosome. The algorithm stores all count configurations, so P-value computations for clones with the same configuration are not repeated. More details are provided in the manual. Note that when one would like to use only {ell} = 2 levels, e.g. for comparing gains, the Wilcoxon statistic with tie correction is equivalent to the hypergeometric statistic.

We have used CGHMultiArray to analyze the genomic changes of two groups of 12 head and neck tumors. Illustrative data shown here (also available on the website) are identical to the real data except for permutations of chromosomes. CGHMultiArray generates an exportable list of univariate P-values, a DNA view of these (Fig. 1) and, optionally, a view by chromosome (data not shown). These views are useful to identify regions with unusually many differential aberrations. One may wish to perform a multiple testing correction to the P-values afterwards, such as the Benjamini and Yekutieli (2001) FDR rule. In this case, one may want to focus on a limited number of clones or consider chromosomal regions instead of separate clones. An implementation for the latter option is provided on thewebsite.



View larger version (24K):
[in this window]
[in a new window]
 
Fig. 1 DNA view of P-values by position for an illustrative dataset.

 


    Acknowledgments
 
Part of this research has been funded by the Dutch BSIK/BRICKS project. We thank Marko Boon, Kyung In Kim and Marcel van Verk for their software support.

Received on March 22, 2005; revised on May 3, 2005; accepted on May 4, 2005

    REFERENCES
 TOP
 Abstract
 REFERENCES
 

    Baglivo, J., et al. (1996) Permutation distributions via generating functions, with applications to sensitivity analysis of discrete data. J. Am. Stat. Assoc., 91, 1037–1046.

    Bejerano, G., et al. (2004) Efficient exact P-value computation for small sample, sparse, and surprising categorical data. J. Comput. Biol., 11, 867–886[Web of Science][Medline].

    Benjamini, Y. and Yekutieli, D. (2001) The control of the false discovery rate in multiple testing under dependency. Ann. Stat., 29, 1165–1188[CrossRef].

    Hájek, J., Sidák, Z., Sen, P.K. The Theory of Rank Tests, (1999) 2nd edn , London Academic Press.

    Jong, K., et al. (2004) Breakpoint identification and smoothing of array comparative genomic hybridization data. Bioinformatics, 20, 3636–3637[Abstract/Free Full Text].

    Olshen, A.B., et al. (2004) Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics, 5, 557–572[Abstract].

    Oostlander, A.E., et al. (2004) Microarray-based comparative genomic hybridization and its applications in human genetics. Clin. Genet., 66, 488–495[CrossRef][Web of Science][Medline].

    Streitberg, B. and Röhmel, J. (1986) Exact distributions for permutation and rank tests: an introduction to some recently published algorithms. Stat. Softw. Newslett., 12, 10–17.

    van de Wiel, M.A. (2001) The split-up algorithm: a fast symbolic method for computing P-values of rank statistics. Comput. Stat., 16, 519–538[CrossRef].

    Wang, P., et al. (2005) A method for calling gains and losses in array CGH data. Biostatistics, 6, 45–58[Abstract].

    Wolfram, S. The Mathematica Book, (1999) 4th edn , Cambridge Cambridge University Press.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
GutHome page
B Carvalho, C Postma, S Mongera, E Hopmans, S Diskin, M A van de Wiel, W van Criekinge, O Thas, A Matthai, M A Cuesta, et al.
Multiple putative oncogenes at the chromosome 20q amplicon contribute to colorectal adenoma to carcinoma progression
Gut, January 1, 2009; 58(1): 79 - 89.
[Abstract] [Full Text] [PDF]


Home page
Clin. Cancer Res.Home page
H. Vekony, B. Ylstra, S. M. Wilting, G. A. Meijer, M. A. van de Wiel, C. R. Leemans, I. van der Waal, and E. Bloemena
DNA Copy Number Gains at Loci of Growth Factors and Their Receptors in Salivary Gland Adenoid Cystic Carcinoma
Clin. Cancer Res., June 1, 2007; 13(11): 3133 - 3139.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/14/3193    most recent
bti489v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (8)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by van de Wiel, M. A.
Right arrow Articles by Ylstra, B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by van de Wiel, M. A.
Right arrow Articles by Ylstra, B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?