Bioinformatics Advance Access originally published online on February 2, 2006
Bioinformatics 2006 22(7):897-899; doi:10.1093/bioinformatics/btl025
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
affylmGUI: a graphical user interface for linear modeling of single channel microarray data
Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research 1G Royal Pde, Parkville 3050, Australia
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: affylmGUI is a graphical user interface (GUI) to an integrated workflow for Affymetrix microarray data. The user is able to proceed from raw data (CEL files) to QC and pre-processing, and eventually to analysis of differential expression using linear models with empirical Bayes smoothing. Output of the analysis (tables and figures) can be exported to an HTML report. The GUI provides user-friendly access to state-of-the-art methods embodied in the Bioconductor software repository.
Availability: affylmGUI is an R package freely available from http://www.bioconductor.org. It requires R version 1.9.0 or later and tcl/tk 8.3 or later and has been successfully tested on Windows 2000, Windows XP, Linux (RedHat and Fedora distributions) and Mac OS/X with X11. Further documentation is available at http://bioinf.wehi.edu.au/affylmGUI
Contact: keith{at}wehi.edu.au
| 1 BACKGROUND |
|---|
|
|
|---|
The Bioconductor project (Gentleman et al., 2004) is an enormous repository of academic software for the analysis of genomic data, especially the analysis of microarray data. The use of cutting-edge methodology implemented in Bioconductor packages can greatly improve the power and consistency of experimental results (Irizarry et al., 2005; Kooperberg et al., 2005). Yet the command-line computing environment (R Core Development Team, 2005) used for Bioconductor is very challenging for users without programming experience. There is a pressing need for a menu-driven or graphical user interface (GUI) for the Bioconductor packages to allow biologists to access the methodology without becoming R programmers. Wettenhall and Smyth (2004) earlier described a GUI software package, limmaGUI, for the analysis of two-color spotted microarrays. Here we describe a GUI for the analysis of Affymetrix GeneChip data (http://www.affymetrix.com). Although the overall design and philosophy of the new package is similar to that of limmaGUI, Affymetrix data require different analysis tools.
affylmGUI provides an interface to the affy, gcrma, affyPLM and limma packages of Bioconductor. The software is itself implemented as an R package and uses the interface to Tcl/Tk provided by the R package tcltk. The package enables users to pre-process and visualize their data and generate lists of putatively differentially expressed genes (Gierer et al., 2005). Users have a choice of several state-of-the-art pre-processing methods for Affymetrix CEL files and advanced statistical methods for assessing differential expression. The package provides powerful statistical methods for dealing with small sample sizes and with complex experiments involving many different RNA sources. The package is therefore most useful in those experimental situations which are the most challenging.
| 2 SESSION CONTROL |
|---|
|
|
|---|
The analysis session is controlled via a main window. A session begins by prompting the user to specify a targets file. The targets file is a tab-delimited text file specifying the Affymetrix CEL files to be analyzed and the source of RNA hybridized to each chip. A valid targets file features a Name column, containing a unique identifier for each array; a Filename column, specifying the corresponding CEL file and a Target column which indicates the different RNA sources, thereby specifying which arrays are replicates. A session can be saved at any time and reloaded at a later point. An option to export an HTML report featuring diagnostic plots, summary plots and lists of differentially expressed genes is available.
| 3 PRE-PROCESSING AND QUALITY ASSESSMENT |
|---|
|
|
|---|
After the targets file has been read, the expression data are read from the CEL files using the affy package. At this point, the user may produce several diagnostic plots, including MA-plots of the perfect match probes and histograms of the raw intensities.
Quality assessment is followed by background correction, normalization and summarization of the probe-level data into probe-set expression values. These three steps are accomplished by one of three algorithms, namely Robust Multi-Array Analysis (RMA) (Irizarry et al., 2003; Gautier et al., 2004), GCRMA (Wu et al., 2004) or Robust Probe Level Models (RPLM) (Bolstad, 2005). GCRMA differs from RMA only in the background correction step, using probe sequence information to help estimate the background. This gives more accurate fold changes at the expense of marginally lower precision. RPLM differs from RMA only in the summarization step, using robust M-estimators rather than median polish to summarize the probe-level measurements. If this option is selected, additional quality assessment can be performed by plotting false-color images of the weights from the robust regression to look for spatial artifacts or for whole chips that are outliers (Bolstad et al., 2005). If a chip is of very poor quality, the user may wish to omit it; this requires the creation of a new targets file which does not contain the aberrant chip.
| 4 DIFFERENTIAL EXPRESSION |
|---|
|
|
|---|
After probe-set expression summaries are obtained, the user can proceed to differential expression. The approach taken by the limma package is to analyze the differential expression in terms of linear models (Smyth, 2005). This approach has many advantages as it allows very general experiments to be analyzed in a unified framework, including factorial, saturated or loop designs and time course experiments, but it requires some mathematical sophistication. It requires the user to specify two matrices, the design matrix, which provides a representation of the RNA targets that have been hybridized to the arrays, and the contrast matrix which defines which comparisons between the RNA targets are of interest to the experimenter. affylmGUI greatly eases this process by largely automating the formation of the two matrices. The design matrix is constructed without user intervention. A set of dialogs help the user to define a set of comparisons of interest from which the contrast matrix is constructed (Fig. 1). This could be as simple as a comparison between two groups (e.g. mutant versus wild-type), or something more complicated such as an interaction effect in a factorial design or contrasts in a time course experiment. In simple situations the comparisons are easily selectable using drop-down menus. In more complex situations the contrast matrix is specified using the Advanced option in the contrast definition dialog.
|
A number of statistics for differential expression are provided. For each contrast, affylmGUI returns the log2-fold change, the moderated t-statistic, P-value and the posterior log-odds of differential expression. This moderated t-statistic is similar to an ordinary t-statistic but with standard errors shrunk towards a common value using empirical Bayes methods (Lönnstedt and Speed, 2002; Smyth, 2004). This provides more stable inference and is particularly effective when the number of replicates is low (Kooperberg et al., 2005). The log-odds of differential expression, or B-statistic, is a Bayesian measure which is essentially equivalent to the moderated-t for ranking purposes. When two or more comparisons have been done, the moderated F-statistics are also computed.
The differential expression results can be presented in tables or plots. The estimated fold changes may be displayed in MA-plots. The user can display (or export) a table of the top genes for each contrast, ranked in order of log2-fold change, moderated t-statistic, P-value (adjusted for multiple testing, using one of six different methods) or B-statistic. If there are multiple comparisons for each gene, Venn diagrams and heat diagrams can also be generated.
| 5 NON-STANDARD ANALYSES |
|---|
|
|
|---|
If analysis methods or plots other than those available in affylmGUI are desired, a command-line window is provided which allows the user to interact directly with R. More adventurous users have in this way complete flexibility to access the full power of the underlying packages.
Arbitrary R code can be executed. For example, the user may want to view a list of all objects in the workspace with the function ls(). Doing this after reading in the CEL files would reveal that there is an object called RawAffyData. Suppose a user wishes to use the Affymetrix MAS5 algorithm for pre-processing rather than the algorithms in the affylmGUI menus. This could be achieved by passing RawAffyData to the expresso() function with parameters indicating that MAS5 is desired. Storing the result of this procedure in an object called NormalizedAffyData will ensure that it is recognized by affylmGUI for the purposes of creating plots and fitting linear models.
Any code that is used frequently can be saved and incorporated into the pull-down menus.
| Acknowledgments |
|---|
This research was supported by an NHMRC Transitional Institute Grant. The authors thank Terry Speed for discussions and inspiration and many affylmGUI users for bug reports. Funding to pay the Open Access publication charges was provided by an NHMRC Transitional Grant awarded to the Walter and Eliza Hall Institute for Medical Research.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: David Rocke
Received on November 17, 2005; revised on January 23, 2006; accepted on January 23, 2006
| REFERENCES |
|---|
|
|
|---|
Bolstad, B. (2005) affyPLM: Methods for fitting probe-level models. R package version 1.6.0.
Bolstad, B.M., Collin, F., Brettschneider, J., Simpson, K., Cope, L., Irizarry, R.A., Speed, T.P. (2005) Quality Assessment of Affymetrix GeneChip Data. In Gentleman, R., Carey, V., Huber, W., Irizarry, R., Dudoit, S. (Eds.). Bioinformatics and Computational Biology Solutions Using R and Bioconductor, Springer, NY, pp. 3347.
Gentleman, R.C., et al. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol, . 5, R80[CrossRef][Medline].
Gautier, L., et al. (2004) affyanalysis of Affymetrix GeneChip data at the probe-level. Bioinformatics, 20, 307315
Gierer, P., et al. (2005) Gene expression profile and synovial microcirculation at early stages of collagen-induced arthritis. Arthritis. Res. Ther, . 7, R868R876[Medline].
Irizarry, R.A., et al. (2003) Exploration, normalization and summaries of high density oligonucleotide array probe level data. Biostatistics, 4, 249264[Abstract].
Irizarry, R.A., et al. (2005) Multiple-laboratory comparison of microarray platforms [Erratum (2005) Nat. Methods, 2, 477.]. Nat. Methods, 2, 15.
Kooperberg, C., et al. (2005) Significance testing for small microarray experiments. Stat. Med, . 24, 22812298[CrossRef][Web of Science][Medline].
Lönnstedt, I. and Speed, T.P. (2002) Replicated microarray data. Stat. Sinica, 12, 3146.
R Development Core Team. R: A Language and Environment for Statistical Computing, . (2005) , Vienna, Austria R Foundation for Statistical Computing.
Smyth, G.K. (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol, . 3, Article 3.
Smyth, G.K. (2005) Limma: linear models for microarray data. In Gentleman, R., Carey, V., Dudoit, S., Irizarry, R., Huber, W. (Eds.). Bioinformatics and Computational Biology Solutions using R and Bioconductor, , NY Springer, pp. 397420.
Wettenhall, J.M. and Smyth, G.K. (2004) limmaGUI: a graphical user interface for linear modeling of microarray data. Bioinformatics, 20, 37053706
Wu, Z., et al. (2004) A model based background adjustment for oligonucleotide expression arrays. J. Am. Stat. Assoc, . 99, 909917[CrossRef][Web of Science].
This article has been cited by other articles:
![]() |
J. Garbarino, M. Padamsee, L. Wilcox, P. M. Oelkers, D. D'Ambrosio, K. V. Ruggles, N. Ramsey, O. Jabado, A. Turkish, and S. L. Sturley Sterol and Diacylglycerol Acyltransferase Deficiency Triggers Fatty Acid-mediated Cell Death J. Biol. Chem., November 6, 2009; 284(45): 30994 - 31005. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Pandelova, M. F. Betts, V. A. Manning, L. J. Wilhelm, T. C. Mockler, and L. M. Ciuffetti Analysis of Transcriptome Changes Induced by Ptr ToxA in Wheat Provides Insights into the Mechanisms of Plant Susceptibility Mol Plant, September 1, 2009; 2(5): 1067 - 1083. [Abstract] [Full Text] [PDF] |
||||
![]() |
L Lundholm, G Bryzgalova, H Gao, N Portwood, S Falt, K D Berndt, A Dicker, D Galuska, J R Zierath, J-A Gustafsson, et al. The estrogen receptor {alpha}-selective agonist propyl pyrazole triol improves glucose tolerance in ob/ob mice; potential molecular mechanisms J. Endocrinol., November 1, 2008; 199(2): 275 - 286. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Thimon, E. Calvo, O. Koukoui, C. Legare, and R. Sullivan Effects of Vasectomy on Gene Expression Profiling along the Human Epididymis Biol Reprod, August 1, 2008; 79(2): 262 - 273. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. C. Lopez-Martin, M. Becana, L. C. Romero, and C. Gotor Knocking Out Cytosolic Cysteine Synthesis Compromises the Antioxidant Capacity of the Cytosol to Maintain Discrete Concentrations of Hydrogen Peroxide in Arabidopsis Plant Physiology, June 1, 2008; 147(2): 562 - 572. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S. Barthold, S. M. Mccahan, A. V. Singh, T. B. Knudsen, X. Si, L. Campion, and R. E. Akins Altered Expression of Muscle- and Cytoskeleton-Related Genes in a Rat Strain With Inherited Cryptorchidism J Androl, May 1, 2008; 29(3): 352 - 366. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Bourdeau, J. Deschenes, D. Laperriere, M. Aid, J. H. White, and S. Mader Mechanisms of primary and secondary estrogen target gene regulation in breast cancer cells Nucleic Acids Res., January 17, 2008; 36(1): 76 - 93. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Sanges, F. Cordero, and R. A. Calogero oneChannelGUI: a graphical interface to Bioconductor tools, designed for life scientists who are not familiar with R language Bioinformatics, December 15, 2007; 23(24): 3406 - 3408. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Thimon, O. Koukoui, E. Calvo, and R. Sullivan Region-specific gene expression profiling along the human epididymis Mol. Hum. Reprod., October 1, 2007; 13(10): 691 - 704. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Manabe, N. Tinker, A. Colville, and B. Miki CSR1, the Sole Target of Imidazolinone Herbicide in Arabidopsis thaliana Plant Cell Physiol., September 1, 2007; 48(9): 1340 - 1358. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Tian, E. Chikayama, Y. Tsuboi, T. Kuromori, K. Shinozaki, J. Kikuchi, and T. Hirayama Top-down Phenomics of Arabidopsis thaliana: METABOLIC PROFILING BY ONE- AND TWO-DIMENSIONAL NUCLEAR MAGNETIC RESONANCE SPECTROSCOPY AND TRANSCRIPTOME ANALYSIS OF ALBINO MUTANTS J. Biol. Chem., June 22, 2007; 282(25): 18532 - 18541. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||










