Skip Navigation


Bioinformatics Advance Access originally published online on September 27, 2005
Bioinformatics 2005 21(21):4069-4070; doi:10.1093/bioinformatics/bti663
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/21/4069    most recent
bti663v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (6)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Sykacek, P.
Right arrow Articles by Micklem, G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sykacek, P.
Right arrow Articles by Micklem, G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oxfordjournals.org

A friendly statistics package for microarray analysis

P. Sykacek 1,2,*, R. A. Furlong 2 and G. Micklem 1,3

1Department of Genetics, University of Cambridge UK
2Department of Pathology, University of Cambridge UK
3Cambridge Computational Biology Institute at the Department of Applied Mathematics and Theoretical Physics, University of Cambridge UK

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 INTRODUCTION
 DATA LOADING
 IMPUTING AND NORMALIZATION
 ANOVA AND CONTRAST BASED...
 DISCUSSION
 REFERENCES
 

Summary: The friendly statistics package for microarray analysis (FSPMA) is a tool that aims to fill the gap between simple to use and powerful analysis. FSPMA is a platform-independent R-package that allows efficient exploration of microarray data without the need for computer programming. Analysis is based on a mixed model ANOVA library (YASMA) that was extended to allow more flexible comparisons and other useful operations like k nearest neighbour imputing and spike-based normalization. Processing is controlled by a definition file that specifies all the steps necessary to derive analysis results from quantified microarray data. In addition to providing analysis without programming, the definition file also serves as exact documentation of all the analysis steps.

Availability: The library is available under GPL 2 license and, together with additional information, provided at http://www.ccbi.cam.ac.uk/software/psyk/software.html#fspma

Contact: peter{at}sykacek.net


    INTRODUCTION
 TOP
 Abstract
 INTRODUCTION
 DATA LOADING
 IMPUTING AND NORMALIZATION
 ANOVA AND CONTRAST BASED...
 DISCUSSION
 REFERENCES
 
The number of analysis packages for microarray data is vast and yet one is still faced with the problem of how to best analyse any particular dataset. Easy to use tools are appealing but many are available only commercially. More elaborate packages such as LIMMA (Smyth et al., 2003) or YASMA (Wernisch et al., 2003) require programming skills and are thus out of reach for non-specialists. The friendly statistics package for microarray analysis (FSPMA) aims to fill the gap between simple to use, yet powerful analysis. It is a set of R-scripts based on YASMA that makes it possible to explore data efficiently without computer programming. The entire process is controlled by a definition file that specifies all steps to generate analysis results from microarray data. The analysis is centred around an existing tool for mixed model ANOVA (analysis of variance) for balanced experiments. Mixed model ANOVA was chosen as this allows for correct treatment of nested effects that would otherwise be regarded as independent identically distributed samples. We thus obtain more realistic P-values in the ANOVA table and in subsequent tests. The library introduced here provides some useful extensions of the original YASMA package; to allow for more general comparisons, gene ranking is based on contrasts. We provide a k nearest neighbour (knn)-based method to impute missing values and also spike-based normalization which can be equally well used with ‘housekeeping genes’. The tool operates on quantified single- and two-channel microarray data whether normalized or not, as long as the experiment is a balanced reference design. In addition to providing analysis without programming, the definition file serves as an exact documentation of all analysis steps, which is important in its own right.


    DATA LOADING
 TOP
 Abstract
 INTRODUCTION
 DATA LOADING
 IMPUTING AND NORMALIZATION
 ANOVA AND CONTRAST BASED...
 DISCUSSION
 REFERENCES
 
Analysis requirements in a microarray laboratory can be rather diverse. Experiments are typically done with single or two colour arrays and sometimes the data have been preprocessed; e.g. conversion to log ratios or application of a favourite normalization method. To obtain a generic solution these various data sources have to be standardized. This is done by having default values for unavailable channels, a boolean dye swap indicator for each file and a flag indicating whether the data are log transformed or not. Headers are ignored and data columns are identified via their column names, so that column order is unimportant and use of heterogeneous file structures in one analysis run does not matter.


    IMPUTING AND NORMALIZATION
 TOP
 Abstract
 INTRODUCTION
 DATA LOADING
 IMPUTING AND NORMALIZATION
 ANOVA AND CONTRAST BASED...
 DISCUSSION
 REFERENCES
 
To accommodate poor quality flagged spots or missing information, the library provides an implementation of knn imputation, (Troyanskaya et al., 2001). Alternatively, all such spots can be taken care of by removing their corresponding genes. In terms of normalization, the library uses YASMA's functionality to provide removal of within-slide location and scale, or removal of the amplitude-dependent mean by subtracting a loess fit. In addition, FSPMA allows normalization based on RNAs of known concentration, spiked into the RNA samples, where the spike residual log ratio (i.e. the difference between actual and theoretical log ratio of spike concentration) is used to normalize the data. Options for spike based normalization include removing a global mean or a loess fit, and/or adjusting the variance of each slide to the global variance across all slides. The loess fit can be based on spot position (spatial effects), subgrid number (pin effect) or spot intensity as well as interactions of the above.


    ANOVA AND CONTRAST BASED RANKING
 TOP
 Abstract
 INTRODUCTION
 DATA LOADING
 IMPUTING AND NORMALIZATION
 ANOVA AND CONTRAST BASED...
 DISCUSSION
 REFERENCES
 
We chose YASMA (Wernisch et al., 2003) for ANOVA calculations, because it implements a mixed effects ANOVA including the ‘gene’ effect. FSPMA requires that all ‘non-gene’ effects of an experiment are specified. Each effect is either random or fixed. Random effects are variables where the experiment does not contain instances of all possible levels (e.g. biological replicate). Fixed effects are those variables where all possible levels are a part of the experiment, or other levels are not of interest (e.g. ‘time point’ in a longitudinal study). The description of an experiment is automatically converted to an ANOVA model equation, where each effect is considered hierarchically and modelled as an interaction term with the previous grouping. As an example, gene, G, within slide replicate, r, technical replicate, s, and time point, t, where gene and time point are fixed effects and technical replicate and within slide replication are random effects, will result in the ANOVA model equation

We use yG,t,s,r as the expression value, µG as the gene-specific global mean and µG + {alpha}G,t as the mean of each gene–time interaction. Variable BG,t,s is a Gaussian random variable that represents interactions of gene and time with the random effect ‘technical replicate’. Finally, {varepsilon}G,t,s,r is the residual. Such equations are used to calculate the ANOVA table and variance components using the functions provided by YASMA.

If the ANOVA table allows rejection of the null hypothesis for the fixed effect of interest (e.g. time), the user may further assess the differences between groups. In order to do that, the library allows for general contrasts, such that evaluations beyond pairwise comparisons are possible. We illustrate this in Table 1 using a longitudinal study of mammary gland development (Clarkson et al., 2004): the first column shows the time points of the experiment; the second column illustrates a contrast for pairwise comparisons between the last lactation day and involution onset; the third column is a more general contrast that tests for significant differences between groups of time points and is here indicative for causes of type II apoptosis.


View this table:
[in this window]
[in a new window]
 
Table 1 Time course of mammary gland development, gene expression on day of lactation and hours (h) into involution

 
In addition, a gene-based ANOVA rank list can be produced. This ranks genes by the P-values of an F-statistic that is obtained from the null hypothesis that all levels of the corresponding effect have identical mean. The total number of comparisons within a definition file is used to adjust P-values for multiple testing. For each comparison an ordered gene list is written into a separate tab-delimited file.


    DISCUSSION
 TOP
 Abstract
 INTRODUCTION
 DATA LOADING
 IMPUTING AND NORMALIZATION
 ANOVA AND CONTRAST BASED...
 DISCUSSION
 REFERENCES
 
FSPMA-based analysis of microarray experiments is accessible to non-programmers with a basic understanding of ANOVA, random and fixed effects and contrasts, which are supported by FSPMA's quite elaborate consistency checks of definition files. For the expert, FSPMA allows efficient analysis of balanced reference designs by providing pre-defined definition files. In non-standard situations that go beyond what is possible with mixed effects ANOVA, the library can still serve as a front end for data loading and normalization.


    Acknowledgments
 
The authors are grateful for suggestions, on how to improve the package that were kindly provided by the reviewers of this paper. This work was funded by the BBSRC's Exploiting Genomics initiative under ref. 8/EGH16106, ‘Shared Genetic Pathways in Cell Number Control’.

Conflict of Interest: none declared.

Received on February 21, 2005; revised on July 18, 2005; accepted on September 5, 2005

    REFERENCES
 TOP
 Abstract
 INTRODUCTION
 DATA LOADING
 IMPUTING AND NORMALIZATION
 ANOVA AND CONTRAST BASED...
 DISCUSSION
 REFERENCES
 

    Clarkson, R.W.E., et al. (2004) Gene expression profiling of mammary gland development reveals putative roles for death receptors and immune mediators in post-lactational regression. Breast Cancer Res., 6, R92–R109[CrossRef][Medline].

    Smyth, G.K., et al. (2003) Statistical issues in microarray data analysis. In Brownstein, M.J. and Khodursky, A.B. (Eds.). Functional Genomics: Methods and Protocols, , Totowa, NJ Humana Press, pp. 111–136.

    Troyanskaya, G.O., et al. (2001) Missing value estimation methods for DNA microarrays. Bioinformatics, 17, 520–525[Abstract/Free Full Text].

    Wernisch, L., et al. (2003) Analysis of whole-genome microarray replicates using mixed models. Bioinformatics, 19, 53–61[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J. Med. Genet.Home page
P. J I Ellis, R. A Furlong, S. J Conner, J. Kirkman-Brown, M. Afnan, C. Barratt, D. K Griffin, and N. A Affara
Coordinated transcriptional regulation patterns associated with infertility phenotypes in men
J. Med. Genet., August 1, 2007; 44(8): 498 - 508.
[Abstract] [Full Text] [PDF]


Home page
DevelopmentHome page
W. T. Khaled, E. K. C. Read, S. E. Nicholson, F. O. Baxter, A. J. Brennan, P. J. Came, N. Sprigg, A. N. J. McKenzie, and C. J. Watson
The IL-4/IL-13/Stat6 signalling pathway promotes luminal mammary epithelial cell development
Development, August 1, 2007; 134(15): 2739 - 2750.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/21/4069    most recent
bti663v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (6)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Sykacek, P.
Right arrow Articles by Micklem, G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sykacek, P.
Right arrow Articles by Micklem, G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?