Bioinformatics Advance Access originally published online on September 11, 2008
Bioinformatics 2008 24(22):2586-2591; doi:10.1093/bioinformatics/btn465
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Gene set enrichment analysis using linear models and diagnostics
1Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, Seattle, WA 98109-1024, 2Department of Statistics, Box 354322, University of Washington, Seattle, WA 98195-4322 and 3Rosetta Inpharmatics LLC, 401 Terry Avenue N, Seattle, WA 98109, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: Gene-set enrichment analysis (GSEA) can be greatly enhanced by linear model (regression) diagnostic techniques. Diagnostics can be used to identify outlying or influential samples, and also to evaluate model fit and explore model expansion.
Results: We demonstrate this methodology on an adult acute lymphoblastic leukemia (ALL) dataset, using GSEA based on chromosome-band mapping of genes. Individual residuals, grouped or aggregated by chromosomal loci, indicate problematic samples and potential data-entry errors, and help identify hyperdiploidy as a factor playing a key role in expression for this dataset. Subsequent analysis pinpoints suspected DNA copy number abnormalities of specific samples and chromosomes (most prevalent are chromosomes X, 21 and 14), and also reveals significant expression differences between the hyperdiploid and diploid groups on other chromosomes (most prominently 19, 22, 3 and 13)—differences which are apparently not associated with copy number.
Availability: Software for the statistical tools demonstrated in this article is available as Bioconductor package GSEAlm.
Contact: assaf.oron{at}gmail.com
Supplementary information: Supplementary data are available at Bioinformatics online.
Associate Editor: John Quackenbush
Received on June 27, 2008; revised on July 29, 2008; accepted on August 26, 2008
This article has been cited by other articles:
![]() |
A. C. Culhane, T. Schwarzl, R. Sultana, K. C. Picard, S. C. Picard, T. H. Lu, K. R. Franklin, S. J. French, G. Papenhausen, M. Correll, et al. GeneSigDB--a curated database of gene expression signatures Nucleic Acids Res., November 24, 2009; (2009) gkp1015v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Caldas, N. Gehlenborg, A. Faisal, A. Brazma, and S. Kaski Probabilistic retrieval and visualization of biologically relevant microarray experiments Bioinformatics, June 15, 2009; 25(12): i145 - i153. [Abstract] [Full Text] [PDF] |
||||

