Skip Navigation


Bioinformatics Advance Access originally published online on November 30, 2004
Bioinformatics 2005 21(9):2112-2113; doi:10.1093/bioinformatics/bti183
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/9/2112    most recent
bti183v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (10)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Toedling, J.
Right arrow Articles by Roepcke, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Toedling, J.
Right arrow Articles by Roepcke, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2004. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

MACAT—microarray chromosome analysis tool

Joern Toedling , Sebastian Schmeier , Matthias Heinig , Benjamin Georgi and Stefan Roepcke *

Freie Universitaet Berlin, Bioinformatics programme and Max Planck Institute for Molecular Genetics Ihnestr 73, D-14195 Berlin, Germany

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 REFERENCES
 

Summary: By linking differential gene expression to the chromosomal localization of genes, one can investigate microarray data for characteristic patterns of expression phenomena involving sizeable parts of specific chromosomes. We have implemented a statistical approach for identifying significantly differentially expressed chromosome regions. We demonstrate the applicability of the approach on a publicly available data set on acute lymphocytic leukemia.

Availability: The R-package MACAT can be obtained from http://www.compdiag.molgen.mpg.de/software/macat.shtml

Contact: roepcke{at}molgen.mpg.de

Supplementary information: http://www.compdiag.molgen.mpg.de/software/macat.shtml

Microarray data analysts have defined tumor subtypes by specific gene expression profiles, consisting of genes that show differential expression between subtypes (Yeoh et al., 2002). However, tumor subtypes have also been characterized by phenomena involving large chromosomal regions. For instance, Christiansen et al. (2004) report on a subtype of acute myeloid leukemia, showing mutations in the AML1 gene on chromosome 21 along with deletions or loss of chromosome arm 7q. A natural approach to bridging the gap between these two paradigms is to link scoring for differential gene expression to the chromosomal localization of genes. Tumor subtypes can be defined by differential expression patterns affecting sizeable regions of certain chromosomes.

To assist in the identification of significantly differentially expressed chromosome regions, we provide the implementation of a statistical approach. MACAT is written in the R statistical programming language and is part of the developmental branch of the popular Bioconductor package (Gentleman et al., 2004). We assume normalized expression data, which can be provided as a matrix or expression set in R or as a delimited text file. In a preprocessing step the expression data is integrated with gene location data into one common data format. To date, this step has only been implemented for commercial Affymetrix® oligo-nucleotide microarrays.

For each gene, we compute a statistic denoting the degree of differential expression between two groups of samples. This statistic is the regularized t-score introduced in Tusher et al. (2001). In essence, it is Student's t-statistic augmented by a fudge factor s0 in the denominator, which prevents a high statistic for genes with a low variance. We set s0 to the median over all gene standard deviations, analogous to Tibshirani et al. (2002).

The distribution of measured genes is not uniform over the length of the chromosome. Since we want to evaluate differential expression over the whole chromosome, we interpolate the statistic for positions between measured genes. This interpolation, however, does not aim to assign statistics to non-coding regions, but to provide a smooth estimate of differential expression over large chromosomal regions.

The following kernel functions are used for interpolation:

  • k-nearest neighbor: For every chromosomal coordinate compute the average of the k nearest genes.
  • Radial basis function (rbf): For every coordinate compute the average over all genes weighted by distance from the coordinate.
  • Base-pair distance: Similar to the k-nearest-neighbors, but the average is taken over all genes within a certain radius of the coordinate.

The free parameters of the kernels determine the degree of smoothing. By default, optimal parameter settings are estimated from the data by cross-validation (for details see the package's vignette).

To judge the significance of differential expression, we investigate random permutations of the class labels. To obtain a reliable simulation of the empirical distribution, we suggest observing at least B ≥ 1000 permutations. For each permutation, the regularized t-statistic is computed for each gene. Thus, for each gene we obtain B permutation statistics and consequently an empirical p-value, denoting the proportion of the permutation statistics being greater or equal than the gene's actual statistic that is based on the true class labels. The permutation statistics also provide upper and lower significance borders, which are smoothed using the same kernel function as for the original statistics.

Optionally, to judge the significance of differential expression over chromosomal regions, one can instead investigate permutations of the ordering of genes on chromosomes.

Meaningful and concise visualization facilitates a better understanding of both the data and the statistical analysis. MACAT includes functions for plotting expression levels and statistical scores versus base-pair coordinate on the chromosome. Regions showing significant differential expression are highlighted in these plots. One can also generate HTML-pages, which contain additional information on genes located within the highlighted chromosomal regions (Fig. 1). For each gene comprehensive annotation, a LocusLink ID, with a hyperlink to the NCBI website, and the empirical p-value are provided. In addition, MACAT includes functions for writing gene expression levels and statistics into text files, which can be used with other programs for further analyses.



View larger version (21K):
[in this window]
[in a new window]
 
Fig. 1 Excerpt of a generated HTML-page for the T-versus B-lymphocyte ALL analysis.

 
As an example, we present the results of an analysis on T-versus B-lymphocyte ALL within the publicly available data set described in Yeoh et al. (2002). A region on the p-arm of chromosome 6 could be identified as significantly under-expressed (Fig. 1). Among the genes within that region are the MHC class II genes, which are known to be expressed by B-lymphocytes, but not by T-lymphocytes. Since these genes are distributed over a large part of the p-arm of chromosome 6, it makes sense to assume that all genes in this region are significantly less transcribed in T-lymphocytes compared to B-lymphocytes. This gives an indication that chromosomal regions highlighted by our method are indeed biologically meaningful.

The method which we have described can detect significant differential expression for chromosomal regions. However, the reason for the differential expression, be it a mutation, translocation, hypermethylation, loss of heterozygosity, or another event, remains to be investigated.

Received on October 4, 2004; revised on November 15, 2004; accepted on November 24, 2004

    REFERENCES
 TOP
 Abstract
 REFERENCES
 

    Christiansen, D.H., Andersen, M.K., Pedersen-Bjergaard, J. (2004) Mutations of AML1 are common in therapy-related myelodysplasia following therapy with alkylating agents and are significantly associated with deletion or loss of chromosome arm 7q and with subsequent leukemic transformation. Blood, 104, 1474–1481[Abstract/Free Full Text].

    Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., et al. (2004) Bioconductor: Open software development for computational biology and bioinformatics. Genome Biol., 5, R80[CrossRef][Medline].

    Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G. (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl Acad. Sci. USA, 99, 6567–6572[Abstract/Free Full Text].

    Tusher, V., Tibshirani, R., Chu, G. (2001) Significance analysis of microarrays applied to ionizing radiation response. Proc. Natl Acad. Sci., 98, 5116–5121[Abstract/Free Full Text].

    Yeoh, E.J., Ross, M.E., Shurtleff, S.A., Williams, W.K., Patel, D., Mahfouz, R., Behm, F.G., Raimondi, S.C., Relling, M.V., Patel, A., et al. (2002) Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell, 1, 133–143[CrossRef][Web of Science][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
S. Bicciato, R. Spinelli, M. Zampieri, E. Mangano, F. Ferrari, L. Beltrame, I. Cifola, C. Peano, A. Solari, and C. Battaglia
A computational procedure to identify significant overlap of differentially expressed and genomic imbalanced regions in cancer datasets
Nucleic Acids Res., August 1, 2009; 37(15): 5057 - 5070.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
K. De Preter, R. Barriot, F. Speleman, J. Vandesompele, and Y. Moreau
Positional gene enrichment analysis of gene sets for high-resolution identification of overrepresented chromosomal regions
Nucleic Acids Res., April 1, 2008; 36(7): e43 - e43.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Buness, R. Kuner, M. Ruschhaupt, A. Poustka, H. Sultmann, and A. Tresch
Identification of aberrant chromosomal regions from gene expression microarray studies applied to human breast cancer
Bioinformatics, September 1, 2007; 23(17): 2273 - 2280.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
T.-P. Yang, T.-Y. Chang, C.-H. Lin, M.-T. Hsu, and H.-W. Wang
ArrayFusion: a web application for multi-dimensional analysis of CGH, SNP and microarray data
Bioinformatics, November 1, 2006; 22(21): 2697 - 2698.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Callegaro, D. Basso, and S. Bicciato
A locally adaptive statistical procedure (LAP) to identify differentially expressed chromosomal regions
Bioinformatics, November 1, 2006; 22(21): 2658 - 2666.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Blake, C. Schwager, M. Kapushesky, and A. Brazma
ChroCoLoc: an application for calculating the probability of co-localization of microarray gene expression
Bioinformatics, March 15, 2006; 22(6): 765 - 767.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/9/2112    most recent
bti183v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (10)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Toedling, J.
Right arrow Articles by Roepcke, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Toedling, J.
Right arrow Articles by Roepcke, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?