Bioinformatics Advance Access originally published online on December 15, 2005
Bioinformatics 2006 22(4):507-508; doi:10.1093/bioinformatics/btk005
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
EDGE: extraction and analysis of differential gene expression
Department of Biostatistics, University of Washington Seattle 98195, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: EDGE (Extraction of Differential Gene Expression) is an open source, point-and-click software program for the significance analysis of DNA microarray experiments. EDGE can perform both standard and time course differential expression analysis. The functions are based on newly developed statistical theory and methods. This document introduces the EDGE software package.
Availability: EDGE is freely available for non-commercial users. EDGE can be downloaded for Windows, Macintosh and Linux/UNIX from http://faculty.washington.edu/jstorey/edge
Contact: jtleek{at}u.washington.edu
| 1 INTRODUCTION |
|---|
|
|
|---|
DNA microarrays have become a standard tool used in identifying and characterizing gene expression variation across differing biological conditions. A variety of software packages are available for the significance analysis of microarray experiments. Many of these packages are closed source, difficult to use or available for only one operating system. Most are unable to analyze data from time course microarray experiments. EDGE is a user friendly software package that includes functions for missing data imputation, data transformation and visualization, eigen-genes/eigen-array analysis, hierarchical clustering, differential expression analysis (static and time course) and automatic internet-based NCBI queries of user chosen genes. EDGE can be used to analyze microarray data across all platforms, although interpretation of the results may depend on the experimental design. The EDGE interface is multithreaded, and reports real time updates for the time remaining in lengthy calculations. Many of these calculations are performed through C++ extensions for R that dramatically reduce computation time. Differential expression analyses in EDGE are based on newly developed statistical methodology, including the Optimal Discovery Procedure for static differential expression (Storey, 2005, http://www.bepress.com/uwbiostat/paper259). EDGE is open source and is available for Windows, Macintosh and Linux/UNIX operating systems.
| 2 EDGE |
|---|
|
|
|---|
EDGE runs on top of the statistical software package R (R Development Core Team, 2005, http://www.R-project.org). Detailed downloading and installation instructions are available from the EDGE website. At the beginning of each EDGE session, the main menu should appear as in Figure 1. The first step in an EDGE analysis is to load the pre-normalized expression data and covariate files using the Load/Save Expression Data and Covariates menu. (The covariate file contains information about the experimental design, such as which biological group from which each array comes.) If the expression matrix has missing values, they can be imputed using the KNN imputation algorithm from the Impute Missing Data menu (Troyanskaya et al., 2001). After loading expression data and covariate information, the covariates can be checked for accuracy using the View Covariates menu. It is also possible to center, scale or log transform the expression values using the Transform Data menu.
|
Several tools for visual exploratory analysis are included in the EDGE interface. Boxplots and eigengenes (Alter et al., 2000) can be displayed for each array, or stratified by a covariate using the Display Boxplots option and Display Eigengenes and Eigenarrays options, respectively. EDGE also allows the user to plot clusters of genes with similar expression patterns (Eisen et al., 1998) from the Display Hierarchical Clustering menu. Clustering can be performed on the entire set of genes, or only on the significant genes from a differential expression analysis. A variety of plotting options are available for visualizing the clusters.
The Identify Differentially Expressed Genes menu allows users to set options for performing both static and time course differential expression analyses. For a static analysis, the user should select a class variable indicating the biological group assignment, or the option None (within class differential expression) to identify differentially expressed genes in a single biological sample. In the static setting, significance calculations are based on the Optimal Discovery Procedure (Storey, 2005), which estimates the optimal rule for identifying differentially expressed genes (Storey et al., 2005a, http://www.bepress.com/uwbiostat/paper260). For time course data, the user can perform either a between class analysis by selecting a variable distinguishing biological groups, or a within class analysis by selecting None (within class differential expression) for the class variable. A between class analysis assesses the evidence for a difference in expression over time between two or more biological groups, while a within class analysis looks for any differential expression over time within a single group. The user must specify a covariate for the time points, and if necessary, should also specify a covariate corresponding to which individuals were sampled. EDGE implements statistical methodology specifically designed for time course experiments (Storey et al., 2005b).
For either type of analysis, the user should specify the number of permutations to be used in the significance calculations and, in some cases, set a seed for reproducible results. For time course analyses, the user can also specify the type of spline used in fitting the longitudinal model, the dimension of the basis for the spline model and whether to include the baseline expression level in the time course analysis. If the baseline level is included, EDGE will not only identify genes showing different patterns of expression over time, but will also identify genes with different baseline levels of expression.
Once the appropriate options have been selected and the user clicks GO, the expression analysis is performed and the Differential Expression Results menu is displayed. A significance measure is assigned to each gene via the Q-value methodology (Storey and Tibshirani, 2003). The user can select a Q- or P-value cutoff to display the genes that meet that significance threshold. For advanced users, optional Q-value arguments can also be adjusted. The user can plot a histogram of the P-values from all significance tests, create a Q-plot, or cluster significant genes based on similarities in their expression profiles. If the EDGE session is being performed on a computer with internet access, the user can select a significant gene in the results window, and access NCBI information for that gene name. Results of differential expression analyses can be saved for further analysis or reporting.
| 3 RESULTS |
|---|
|
|
|---|
Figure 2 shows the results of a differential expression analysis on a subset of 3170 genes on 15 arrays from the Hedenfalk et al. (2001) study. The analysis compared expression levels for BRCA1 and BRCA2 tumors. EDGE shows substantial improvements over five leading methodologies.
|
| Acknowledgments |
|---|
This software development was supported in part by NIH grant R01 HG002913-01.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: John Quackenbush
Received on October 18, 2005; revised on December 10, 2005; accepted on December 11, 2005
| REFERENCES |
|---|
|
|
|---|
Alter, O., et al. (2000) Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl Acad. Sci, . 97, 1010110106
Cui, X., et al. (2005) Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics, 6, 5975[Abstract].
Dudoit, S., et al. (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc, . 97, 7787[CrossRef][Web of Science].
Efron, B., et al. (2001) Empirical Bayes analysis of a microarray experiment. J. Am. Stat. Assoc, . 96, 11511160[CrossRef][Web of Science].
Eisen, M. B., et al. (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl Acad. Sci, . 95, 1486314868
Hedenfalk, I., et al. (2002) Gene-expression profiles in hereditary breast cancer. N. Engl. J. Med, . 344, 539548.
Lonnstedt, I. and Speed, T. (2002) Replicated microarray data. Stat. Sinica, 12, 3146.
R: A language and environment for statistical computing R Development Core Team. (2005) , Vienna, Austria R Foundation for Statistical Computing.
Storey, J.D. (2005) The optimal discovery procedure: a new approach to simultaneous significance testing. UW Biostatistics Working Paper Series Working Paper, 259, .
Storey, J.D. and Tibshirani, R. (2003) Statistical significance for genome-wide studies. Proc. Natl Acad. Sci, . 100, 94409445
Storey, J.D., Dai, J.Y., Leek, J.T. (2005a) The Optimal Discovery Procedure for Large-Scale Significance Testing, with Applications to Comparative Microarray Experiments. UW Biostatistics Working Paper Series, Working Paper 260.
Storey, J.D., et al. (2005b) Significance analysis of time course microarray experiments. Proc. Natl Acad. Sci, . 36, 1283712842.
Troyanskaya, O., et al. (2001) Missing value estimation methods for DNA microarrays. Bioinformatics, 17, 520525
This article has been cited by other articles:
![]() |
Y. Pang, J. P. Wenger, K. Saathoff, G. J. Peel, J. Wen, D. Huhman, S. N. Allen, Y. Tang, X. Cheng, M. Tadege, et al. A WD40 Repeat Protein from Medicago truncatula Is Necessary for Tissue-Specific Anthocyanin and Proanthocyanidin Biosynthesis But Not for Trichome Development Plant Physiology, November 1, 2009; 151(3): 1114 - 1129. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. D. Maningat, P. Sen, M. Rijnkels, A. L. Sunehag, D. L. Hadsell, M. Bray, and M. W. Haymond Gene expression in the human mammary epithelium during lactation: the milk fat globule transcriptome Physiol Genomics, March 3, 2009; 37(1): 12 - 22. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. M. Cameron, M. J. Cameron, J. F. Bermejo-Martin, L. Ran, L. Xu, P. V. Turner, R. Ran, A. Danesh, Y. Fang, P.-K. M. Chan, et al. Gene Expression Analysis of Host Innate Immune Responses during Lethal H5N1 Infection in Ferrets J. Virol., November 15, 2008; 82(22): 11308 - 11317. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. W. Chang, S. Hayashi, S. A. Gharib, T. Vaisar, S. T. King, M. Tsuchiya, J. T. Ruzinski, D. R. Park, G. Matute-Bello, M. M. Wurfel, et al. Proteomic and Computational Analysis of Bronchoalveolar Proteins during the Course of the Acute Respiratory Distress Syndrome Am. J. Respir. Crit. Care Med., October 1, 2008; 178(7): 701 - 709. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Jiao, J. L. Riechmann, and E. M. Meyerowitz Transcriptome-Wide Analysis of Uncapped mRNAs in Arabidopsis Reveals Regulation of mRNA Degradation PLANT CELL, October 1, 2008; 20(10): 2571 - 2585. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Pang, G. J. Peel, S. B. Sharma, Y. Tang, and R. A. Dixon A transcript profiling approach reveals an epicatechin-specific glucosyltransferase expressed in the seed coat of Medicago truncatula PNAS, September 16, 2008; 105(37): 14210 - 14215. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Cipollina, J. van den Brink, P. Daran-Lapujade, J. T. Pronk, D. Porro, and J. H. de Winde Saccharomyces cerevisiae SFP1: at the crossroads of central metabolism and ribosome biogenesis Microbiology, June 1, 2008; 154(6): 1686 - 1699. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Handfield, H.V. Baker, and R.J. Lamont Beyond Good and Evil in the Oral Cavity: Insights into Host-Microbe Relationships Derived from Transcriptional Profiling of Gingival Cells Journal of Dental Research, March 1, 2008; 87(3): 203 - 223. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. White, C. Lee May, R. N. Lamounier, J. E. Brestelli, and K. H. Kaestner Defining Pancreatic Endocrine Precursors and Their Descendants Diabetes, March 1, 2008; 57(3): 654 - 668. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Le Gac, M. D. Brazas, M. Bertrand, J. G. Tyerman, C. C. Spencer, R. E. W. Hancock, and M. Doebeli Metabolic Changes Associated With Adaptive Diversification in Escherichia coli Genetics, February 1, 2008; 178(2): 1049 - 1060. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Anand, S. R. Uppalapati, C.-M. Ryu, S. N. Allen, L. Kang, Y. Tang, and K. S. Mysore Salicylic Acid and Systemic Acquired Resistance Play a Role in Attenuating Crown Gall Disease Caused by Agrobacterium tumefaciens Plant Physiology, February 1, 2008; 146(2): 703 - 715. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Cordero, M. Botta, and R. A. Calogero Microarray data analysis and mining approaches Brief Funct Genomic Proteomic, January 22, 2008; (2008) elm034v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Y. Kassim, S. A. Gharib, B. H. Mecham, T. P. Birkland, W. C. Parks, and J. K. McGuire Individual Matrix Metalloproteinases Control Distinct Transcriptional Responses in Airway Epithelial Cells Infected with Pseudomonas aeruginosa Infect. Immun., December 1, 2007; 75(12): 5640 - 5650. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. E. Lovegrove, S. A. Gharib, S. N. Patel, C. A. Hawkes, K. C. Kain, and W. C. Liles Expression Microarray Analysis Implicates Apoptosis and Interferon-Responsive Mechanisms in Susceptibility to Experimental Cerebral Malaria Am. J. Pathol., December 1, 2007; 171(6): 1894 - 1903. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Saeys, I. Inza, and P. Larranaga A review of feature selection techniques in bioinformatics Bioinformatics, October 1, 2007; 23(19): 2507 - 2517. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. D. Teaster, C. M. Motes, Y. Tang, W. C. Wiant, M. Q. Cotter, Y.-S. Wang, A. Kilaru, B. J. Venables, K. H. Hasenstein, G. Gonzalez, et al. N-Acylethanolamine Metabolism Interacts with Abscisic Acid Signaling in Arabidopsis thaliana Seedlings PLANT CELL, August 1, 2007; 19(8): 2454 - 2469. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. D. Storey, J. Y. Dai, and J. T. Leek The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments Biostat., April 1, 2007; 8(2): 414 - 432. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
















