Bioinformatics Advance Access originally published online on August 7, 2006
Bioinformatics 2006 22(20):2500-2506; doi:10.1093/bioinformatics/btl424
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
© 2006 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Group testing for pathway analysis improves comparability of different microarray datasets
1 Theoretical Bioinformatics, German Cancer Reseach Center 69120 Heidelberg, Germany
2 Medical Research Center, University Hospital Mannheim 68167 Mannheim, Germany
3 Cellular and Molecular Pathology, German Cancer Research Center 69120 Heidelberg, Germany
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: The wide use of DNA microarrays for the investigation of the cell transcriptome triggered the invention of numerous methods for the processing of microarray data and lead to a growing number of microarray studies that examine the same biological conditions. However, comparisons made on the level of gene lists obtained by different statistical methods or from different datasets hardly converge. We aimed at examining such discrepancies on the level of apparently affected biologically related groups of genes, e.g. metabolic or signalling pathways. This can be achieved by group testing procedures, e.g. over-representation analysis, functional class scoring (FCS), or global tests.
Results: Three public prostate cancer datasets obtained with the same microarray platform (HGU95A/HGU95Av2) were analyzed. Each dataset was subjected to normalization by either variance stabilizing normalization (vsn) or mixed model normalization (MMN). Then, statistical analysis of microarrays was applied to the vsn-normalized data and mixed model analysis to the data normalized by MMN. For multiple testing adjustment the false discovery rate was calculated and the threshold was set to 0.05. Gene lists from the same method applied to different datasets showed overlaps between 42 and 52%, while lists from different methods applied to the same dataset had between 63 and 85% of genes in common. A number of six gene lists obtained by the two statistical methods applied to the three datasets was then subjected to group testing by Fisher's exact test. Group testing by GSEA and global test was applied to the three datasets, as well. Fisher's exact test followed by global test showed more consistent results with respect to the concordance between analyses on gene lists obtained by different methods and different datasets than the GSEA. However, all group testing methods identified pathways that had already been described to be involved in the pathogenesis of prostate cancer. Moreover, pathways recurrently identified in these analyses are more likely to be reliable than those from a single analysis on a single dataset.
Contact: b.brors{at}dkfz.de
Supplementary Information: Supplementary Figure 1 and Supplementary Tables 14 are available at Bioinformatics online.
| 1 INTRODUCTION |
|---|
|
|
|---|
DNA microarrays accelerated dramatically the gene expression profiling of cells, and became therefore an important tool in many biomedical applications. In cancer research, microarray experiments are frequently performed for tumour classification and identification of marker genes (van't Veer et al., 2002; Golub et al., 1999; Giltnane and Rimm, 2004). When results obtained by different microarray studies examining the same biological conditions, e.g. differential expression between tumour and normal samples, are compared, the lists of differentially expressed genes hardly overlap (Ein-Dor et al., 2005). Moreover, numerous statistical methods that are being used for the processing of microarray data exist, and result in dissimilar lists of differentially expressed genes as well (Allison et al., 2006).
A recent addition to the repertoire of analysis methods is group testing, i.e. testing whether predefined lists of genes that belong to, e.g. a metabolic pathway, the same cellular function or cellular component are significantly changed as a group in a microarray dataset (Curtis et al., 2005). Three approaches for this task are: (1) an over-representation analysis (ORA), where the genes in the predefined lists are analyzed to see which categories are represented more than expected by chance (Draghici et al., 2003); (2) a functional class scoring (FCS; Pavlidis et al., 2004; Mootha et al., 2003), where the genes are ranked based on the correlation between their expression and the given phenotype and (3) a global test looking for associations between gene expression in predefined gene sets and a target variable (Goeman et al., 2004).
Until now, results coming from different datasets and statistical methods have only been compared on the level of lists of differentially expressed genes. The goal of this study is to extend the comparison of different statistical methods and datasets to the level of affected pathways, and to compare the output of three recent group testing methods used for pathway analysis. Three public prostate cancer datasets were subjected to two distinct statistical analyses of differential expression, the Statistical Analysis of Microarrays (SAM; Tusher et al., 2001) and Mixed Model Analysis (MMA; Hsieh et al., 2003; Chu et al., 2002). The six lists of genes obtained from these analyses were then subjected to ORA by Fisher's exact test (Draghici et al., 2003). Normalized data of the three datasets were subjected to FCS by GSEA (Mootha et al., 2003; Sweet-Cordero et al., 2005) and to global test (Goeman et al., 2004).
| 2 METHODS |
|---|
|
|
|---|
2.1 Public dataset collection and pre-processing
Data for this study were downloaded from public websites (Table 1) and were pre-processed by software packages included in the R-project (R Development Core Team, 2004; Ihaka and Gentleman, 1996), Bioconductor (Gentleman et al., 2004), and SAS Microarray 1.3 (SAS Institute Inc, 2004). In all cases raw data were imported from CEL files. Normalization was carried out using the vsn (Huber et al., 2002) or MMN (Hsieh et al., 2003; Chu et al., 2002) algorithms with default parameters as implemented in the Bioconductor package vsn 1.5.0 or SAS Microarray 1.3, respectively. Data were two-base log-transformed. Vsn-normalized data were summarized by the medianpolish method (Tukey, 1977). To calculate fold changes for vsn-normalized expression values the robust estimator
was used (Huber et al., 2003). For MMN data, summarization of probes was only necessary to calculate fold changes; this was done by calculating the median of each probe-set. Fold changes were then obtained by dividing the two averages.
|
2.2 Statistical analysis
For the identification of differentially expressed genes, SAM (Tusher et al., 2001), as implemented in the siggenes 1.2.11 Bioconductor package, has been applied on vsn-normalized data and MMA (Hsieh et al., 2003; Chu et al., 2002), as implemented in SAS Microarray 1.3, has been applied on MMN-normalized data. In all three datasets the same two classes, tumour versus normal, were compared. For multiple testing adjustment, the false discovery rate (FDR) was calculated, using the algorithm of Storey and Tibshirani (2003) for SAM and the algorithm of Benjamini and Hochberg (1995) for MMA. A threshold of 0.05 was used.
2.3 Group testing for pathway analysis
To identify pathways that are likely to be affected by differential expression three approaches were used; an ORA approach using Fisher's exact test as described by Draghici et al. (2003), an FCS approach using a modified GSEA as described by Sweet-Cordero et al. (2005) and the global test approach as described by Goeman et al. (2004) and as implemented in the Bioconductor package globaltest 3.0.4. We used a total number of 227 pathway lists from which 132 were generated from the KEGG database (Kyoto Encyclopedia of Genes and Genomes, http://www.genome.ad.jp/kegg/pathway.html) using the Bioconductor annotation package hgu95av2 1.8.4. A number of 95 pathways was generated manually (M. Kenzelmann).
2.4 Fisher's exact test (Draghici et al., 2003)
We consider that there are N single-symbol-annotated genes on the microarray (replicates were averaged by calculating the mean), which are either significantly differentially expressed (S) or not (F), and either belong to a pre-defined pathway list (P) or not (NP), see Table 2. If we pick randomly P genes, we would like to estimate the probability of having exactly
genes in S. The p-value of having
genes or fewer in S can be calculated by summing the probabilities of a random list of K genes having 1, 2, ... ,
genes in S:
![]() | (1) |
|
This is a one-sided test in which the P values correspond to over-represented lists of genes.
A review about similar current tools used for group testing on the level of Gene Ontology (GO) terms was given by Khatri and Draghici (2005).
2.5 GSEA (Mootha et al., 2003; Sweet-Cordero et al., 2005)
An earlier version of this approach, called also gene set enrichment analysis (GSEA), has previously been described by Lamb et al. (2003) and Mootha et al. (2003). This procedure was extended by Sweet-Cordero et al., 2005 to address the case of multiple gene sets as well as multiple datasets. A refinement of the GSEA methodology with a broader applicability along several kinds of datasets has been given by Subramanian et al. (2005). We use the basic GSEA procedure as described by Sweet-Cordero et al. (2005), applying a phenotype permutation but no gene permutation. This is based on a maximum deviation statistic of two empirical distribution functions. First, the genes are ranked using the SNR (signal to noise ratio) with respect to their correlation with the phenotype of interest, in our case the comparison of tumour tissue versus normal tissue. The SNR is defined as SNR = |(µC µT)|/(
C +
T), where, µC, µT,
C and
T are the mean expression values and the standard deviations of the control group C and the test group T, respectively. Second, an enrichment measure ES is calculated and assigned to each gene set as following. If L = {g1, ... , gN} are the ranked genes, the two empirical cumulative distribution functions are defined as
![]() | (2) |
the number of genes (cardinality) ranked above the ith gene that are in the gene set (hits) and
the number of genes (cardinality) ranked above the ith gene that are not in the gene set (miss).
The enrichment score ES is the maximum difference between Phit and Pmiss, i.e.
![]() | (3) |
Finally we calculate a nominal p-value to estimate the statistical significance of the enrichment score ES. This is done by permuting the class labels 999 times to produce reshuffled datasets D(
). Then we re-compute ranked lists by calculating the SNR of D(
). After that we calculate the enrichment score of each gene set for the new ranked lists,
. The nominal p-value of a gene set is determined by:
![]() | (4) |
2.6 Global test (Goeman et al., 2004)
This test investigates whether samples with similar clinical outcomes tend to have similar gene expression patterns. For a significant result, it suffices if many genes in the group are correlated with the outcome and not necessarily have similar expression patterns. The model used is
![]() | (5) |
an element of the n x m data matrix of samples i and genes j, and ßj the regression coefficient for gene j (j = 1, ... , m).
To test whether there is a predictive effect of the gene expressions on the clinical outcome, the null hypothesis that all regression coefficients are zero is tested, H0 : ß1 = ß2 = ... = ßm = 0. The statistic Q used for testing H0 in equation (5) can be written as:
![]() | (6) |
is the second central moment of Y under H0,
the covariance matrix of the gene-expression patterns between the samples where Xi (i = 1, ... , m) the n x 1 vector of the gene expressions of gene i, and
the covariance matrix of the clinical outcomes of the samples. | 3 RESULTS |
|---|
|
|
|---|
Three public datasets (Table 1) were used to compare different statistical methods applied to these datasets on the level of apparently affected pathways. All three studies used the HGU95A (Welsh and Ernst) or HGU95Av2 (Singh) microarray platform from Affymetrix, and were consisting of the same two sample classes, normal prostate and prostate cancer. Each dataset was subjected to normalization by either vsn or MMN. Then, SAM was applied to the vsn-normalized data and MMA to the data normalized by MMN. Comparisons were made between prostate tumour and normal prostate tissue to detect genes that are significantly differentially expressed under these conditions. For multiple testing adjustment the FDR was calculated using the algorithm of Storey and Tibshirani for SAM and the algorithm of Benjamini and Hochberg for MMA. A threshold of 0.05 was used to assign differentially expressed genes. All differentially expressed genes with their significance values and fold-changes are to be found in Supplementary Table 1. The numbers of these genes and their overlap with those obtained from other statistical methods and datasets are shown by Venn diagrams (Fig. 1). Figure 1a and b show the numbers of genes obtained by SAM or MMA applied to three datasets. The intersections of all three datasets contains 146 or 132 genes, respectively, which represent only 52 or 48 % of the smallest sets of significantly differentially expressed genes, which are from the data of Ernst. Comparing the overlaps between SAM or MMA genes for the three datasets (Fig. 1ce), the dataset of Welsh shows the highest rate of common genes of 84%, while the datasets of Singh and Ernst show rates of 63 or 65%, respectively. Finally, the common differentially expressed genes between SAM and MMA were compared with each other (Fig. 1f). Here, the overall overlap of 76 genes represents just the 42% of the smallest set of significantly differentially expressed genes (dataset of Ernst).
|
Although the overlaps between different methods or between different datasets appear to be rather small, these are better compared by concordance plots that allow us to examine how big these discrepancies really are (Fig. 2, Supplementary Fig. 1). These plots show the numbers of common genes between significantly differentially expressed genes found by one analysis (y-axis) along the ranked list of all examined genes according to another analysis (x-axis). The area under the curve (AUC) determines the extent of concordance between the two analyses being compared. Hence, curves that are fast steeply increasing indicate a high concordance of the genes presented on the y-axis with the second analysis presented on the x-axis (e.g. Fig. 2i), i.e. the genes on the x-axis get also high ranks in the second analysis. On the contrary, curves tending to the 45° diagonal line from the lower left corner to the upper right corner of the plot denote no similarity between the results of the two analyses (Fig. 2h). The first six plots of both Figure 2 and Supplementary Figure 1 are comparisons between different analyses performed on the three data sets using the same statistical method (ac: SAM and df: MMA). The dataset of Ernst shows the highest concordance of differentially expressed genes (large AUCs in Fig. 2c and f). The Welsh and Singh datasets give concordance plots that are close to the 45° diagonal line, and this denotes that the differentially expressed genes of these data sets show a low concordance along ranked genes of other datasets (Fig. 2a, b, d and e). In contrast, the concordance between the two statistical methods for same data sets (Fig. 2gi, Supplementary Fig. 1gi) was higher. SAM and MMA showed very good concordance rates for the dataset of Ernst, where all significantly differentially expressed genes found with one method were among the first 3000 top rated genes with the other method. For the dataset of Welsh, > 90% of the significantly differentially expressed genes found with one method were among the top 4000 ranked genes with the other method. The dataset of Singh showed, however, much lower concordance. This is presented by the short and not steeply ascending first phase of the concordance curve (Fig. 2h).
|
As the next step, three approaches of group testing for pathway analysis have been applied. Fisher's exact test was applied to the six lists of significantly differentially expressed genes coming from three datasets and two statistical methods. GSEA and global test were applied to the three vsn-normalized datasets. For Fisher's exact test and GSEA, a threshold of 0.05 for the p-values was set to identify significantly regulated pathways. Because almost all examined pathways with the global test approach were assigned a p-value of < 0.05 we took the top 20 high-rated pathways for further investigation. The affected pathways found by each pathway analysis method and their p-values are presented in Supplementary Tables 24. Supplementary Table 2 presents the results obtained by Fisher's exact test, and is a dual comparison of the outputs of pathway analyses applied to SAM and MMA gene lists for the same dataset and pathway analysis method. Supplementary Tables 3 and 4 present the results obtained by GSEA and the global test for each data set, respectively. The numbers of affected gene groups and their overlaps for each group testing method are presented in Table 3.
|
Figure 3 summarizes the occurrences of the significantly regulated pathways found with Fisher's exact test (Fig. 3a), GSEA (Fig. 3b), and global test (Fig. 3c), respectively. A total number of six Fisher's exact test analyses have been applied on lists of significantly differentially expressed genes found with either SAM or MMA for the three public datasets of Welsh, Singh and Ernst, while for each GSEA and global test only three analyses have been applied directly to normalized data of these data sets, respectively. By Fisher's exact testing (Fig. 3a) one pathway, androgen and prostate cancer, was found to be significantly regulated in all six analyses. Two pathways, ribosome and glutathione metabolism, were found to be significantly affected in five analyses, six pathways in four analyses, four pathways in three analyses, 17 pathways in two analyses and 17 pathways in only one analysis. GSEA (Fig. 3b) gave no affected pathways in more than one analysis. A total number of 23 pathways has been obtained by all analyses. By global testing eight pathways have been obtained by two analyses, while 44 pathways have been obtained by only one analysis.
|
Finally, the coincidence of affected pathways between the three group testing methods, Fisher's exact test, GSEA and global test is presented (Fig. 4). For this purpose, pathways found to be significantly regulated in at least four of the six Fisher's exact test analyses and two of the three GSEA or global test analyses, were used. Three pathways, androgen and prostate cancer, pyrimidine metabolism and nucleotide metabolism, were found to be affected by two group testing methods. A number of 11 pathways was found by only one method.
|
| 4 DISCUSSION |
|---|
|
|
|---|
In this study we examined the discrepancies of different statistical methods and datasets on the level of affected pathways, and compared the output of three recent group testing methods used for pathway analysis. Data of three public prostate cancer datasets were pre-processed by vsn or MMN, and were then subjected to SAM or MMA, respectively. The FDR was calculated for multiple testing adjustment, and the threshold was set to 0.05 to assign differentially expressed genes. Three approaches of group testing for pathway analysis, Fisher's exact test, GSEA and global test, were applied to the three public datasets.
Comparing the overlaps of genes obtained from the two statistical methods and three datasets, we conclude that both different statistical methods and different datasets examining the same biological condition (in our case prostate cancer) lead to significant discrepancies. Different datasets showed higher dissimilarities in the obtained significantly differentially expressed genes than different statistical methods did. This observation was confirmed by investigating the concordance of a list of differentially expressed genes found by one analysis along the ranked list of all examined genes in another analysis. In general, genes obtained from the dataset of Ernst showed a higher concordance because of the smaller numbers of differentially expressed genes found in this dataset. These genes are expected to be the highly ranked among prostate cancer genes. Genes obtained from the dataset of Singh showed, in contrary, the worst concordance.
The problem of having a small overlap between gene sets coming from different datasets has been also pointed out by Ein-Dor et al. (2005). In this study one single dataset was analyzed by a single method and it was shown that the resulting set of genes is strongly influenced by the subset of patients used for gene selection. An explanation would be that there is a high number of differentially expressed genes, many of which are highly correlated. Which of them are chosen as the top-ranked genes is more or less arbitrary and depends on the analysis method or the set of samples from which the genes were inferred. One would expect, however, that these discrepancies are less pronounced when the genes are mapped to biological pathways.
Comparing the results of the three group testing methods (Fig. 3), Fisher's exact test followed by global test showed the highest overlap between affected pathways inferred from different datasets. Androgen and prostate cancer was the pathway found to be apparently affected in all six Fisher's exact test analyses and in two global test analyses, and this supports the validity of these results. The GSEA gave no overlaps, which can be partly explained by the smaller numbers of apparently affected pathways gained by this method. However, most of the obtained pathways were also high rated pathways by the other two methods. As expected, results of group testing applied on different datasets were more discrepant than results of group testing applied on lists of differentially expressed genes obtained by different statistical methods (Fig. 3a).
Examining the coincidence of affected pathways between the three group testing methods, Fisher's exact test, GSEA and global test (Fig. 4), we observed that different methods gave mainly diverging results. Three pathways, androgen and prostate cancer, pyrimidine metabolism and nucleotide metabolism, were found to be affected by two methods (Fisher's exact test and global test). GSEA gave, as mentioned above, no overlaps at all.
Many of the pathways found to be affected by differential expression are already known to be involved in prostate cancer pathogenesis. Androgen and prostate cancer has a clear connection to prostate cancer as they were manually created from literature information on these diseases (M. Kenzelmann). Pathways from the group of nucleotide metabolism, amino acid metabolism, carbohydrate metabolism, as well as pathways like ribosome are characteristic for fast proliferating tumour cells. Glutathione metabolism plays an important role in defense against reactive oxygen species, xenobiotics and heavy metals (Mendoza-Cózatl et al., 2005), while glutathione S-transferase pi (GSTP1) is a characteristically down-regulated marker gene in prostate cancer (Nakayama et al., 2004). Gap junction proteins connexins marks the increased cell communication in cancer for processes like apoptosis, differentiation and tissue homeostasis, and for activation of calcium and MAPK signalling pathways (http://www.genome.ad.jp/kegg/pathway.html). Hypoxia can be related to the increased intracellular redox state of prostate cancer cells, associated to the high oxidizing power of the fatty acid synthesis (FAS) pathway, that yields expression of hypoxia-regulated genes (Hochachka et al., 2002). An increased intracellular redox state yields also an increase in the expression of the intrinsic prion protein (PrPc), suggesting the possible participation of PrPc in antioxidative defense (Sauer et al., 1999) and explaining the high occurrence of prion disease in our results.
Even a pathway as remote as Cholera-infection yields some interesting results, as it contains genes of the adenylate cyclase signaling, phospholipase C and other factors that are also changed in tumour cells.
| 5 CONCLUSION |
|---|
|
|
|---|
In conclusion, group testing applied to different datasets yields interesting common results, diminishing the large discrepancies observed in direct comparisons of lists of differentially expressed genes obtained not only from different datasets, but also by different statistical methods. Moreover, the multiple microarray analyses performed in this study result in discriminative pathway regulation signatures that are found and validated by different laboratories and microarray analysis methods. Pathways obtained by these analyses are likely to be more robust than those generated by a single analysis on a single dataset.
The three group testing methods used in this study differed in their results. Fisher's exact test showed the most consistent results with respect to the concordance between analyses on gene lists obtained by different methods from different datasets. Global test showed to a lesser extend consistent results between analyses applied to different datasets, while GSEA showed no overlaps between results coming from different datasets. All group testing methods gave pathways that had already been described to be involved in the pathogenesis of prostate cancer.
| Acknowledgments |
|---|
The authors acknowledge financial support by the BMBF (BioFuture; 0311880A) and the National Genome Research Network (01 GR 0450). T.M. receives a stipend from the DFG Graduiertenkolleg 886. Funding to pay the Open Access publication charges was provided by the German Federal Ministry of Research and Education (grant 01 GR 0450).
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: David Rocke
Received on March 8, 2006; revised on July 22, 2006; accepted on July 28, 2006
| REFERENCES |
|---|
|
|
|---|
Allison, D.B., et al. (2006) Microarray data analysis: from disarray to consolidation and consensus. Nat. Rev. Genet, . 7, 5565[CrossRef][Web of Science][Medline].
Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. B, 57, 289300.
Chu, T.M., et al. (2002) A systematic statistical linear modeling approach to oligonucleotide array experiments. Math. Biosci, . 176, 3551[CrossRef][Web of Science][Medline].
Curtis, R.K., et al. (2005) Pathways to the analysis of microarray data. Trends Biotechnol, . 23, 429435[CrossRef][Web of Science][Medline].
Draghici, S., et al. (2003) Global functional profiling of gene expression. Genomics, 81, 98104[CrossRef][Web of Science][Medline].
Ein-Dor, L., et al. (2005) Outcome signature genes in breast cancer: is there a unique set? Bioinformatics, 21, 171178
Ernst, T., et al. (2002) Decrease and gain of gene expression are equally discriminatory markers for prostate carcinoma: a gene expression analysis on total and microdissected prostate tissue. Am. J. Pathol, . 160, 21692180
Gentleman, R.C., et al. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol, . 5, R80[CrossRef][Medline].
Giltnane, J.M. and Rimm, D.L. (2004) Technology insight: Identification of biomarkers with tissue microarray technology. Nat. Clin. Pract. Oncol, . 1, 104111[Medline].
Goeman, J.J., et al. (2004) A global test for groups of genes: testing association with a clinical outcome. Bioinformatics, 20, 9399
Golub, T.R., et al. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531537
Hochachka, P.W., et al. (2002) Going malignant: the hypoxia-cancer connection in the prostate. Bioessays, 24, 749757[CrossRef][Web of Science][Medline].
Hsieh, W.-P., et al. (2003) Mixed-model reanalysis of primate data suggests tissue and species biases in oligonucleotide-based gene expression profiles. Genetics, 165, 747757
Huber, W., et al. (2002) Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics, 18, Suppl. 1, S96S104[Abstract].
Huber, W., et al. (2003) Parameter estimation for the calibration and variance stabilization of microarray data. Stat. Appl. Genet. Mol. Biol, . 2, 3.
Ihaka, R. and Gentleman, R. (1996) R: a language for data analysis and graphics. J. Comput. Graph. Stat, . 5, 299314[CrossRef].
Khatri, P. and Draghici, S. (2005) Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics, 21, 35873595
Lamb, J., et al. (2003) A mechanism of cyclin D1 action encoded in the patterns of gene expression in human cancer. Cell, 114, 323334[CrossRef][Web of Science][Medline].
Mendoza-Cózatl, D., et al. (2005) Sulfur assimilation and glutathione metabolism under cadmium stress in yeast, protists and plants. FEMS Microbiol. Rev, . 29, 653671[CrossRef][Web of Science][Medline].
Mootha, V.K., et al. (2003) PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet, . 34, 267273[CrossRef][Web of Science][Medline].
Nakayama, M., et al. (2004) GSTP1 CpG island hypermethylation as a molecular biomarker for prostate cancer. J. Cell Biochem, . 91, 540552[CrossRef][Web of Science][Medline].
Pavlidis, P., et al. (2004) Using the gene ontology for microarray data mining: a comparison of methods and application to age effects in human prefrontal cortex. Neurochem. Res, . 29, 12131222[CrossRef][Web of Science][Medline].
R Development Core Team (2004). R: A language and environment for statistical computing, (2004) , Austria ISBN 3-900051-07-0 R Foundation for Statistical Computing Vienna.
SAS Institute Inc (2004). SAS Scientific Discovery Solutions Supplement: SAS Microarray 1.3, (2004) , NC Cary.
Sauer, H., et al. (1999) Redox-regulation of intrinsic prion expression in multicellular prostate tumor spheroids. Free Radic. Biol. Med, . 27, 12761283[CrossRef][Web of Science][Medline].
Singh, D., et al. (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1, 203209[CrossRef][Web of Science][Medline].
Storey, J.D. and Tibshirani, R. (2003) Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA, 100, 94409445
Subramanian, A., et al. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA, 102, 1554515550
Sweet-Cordero, A., et al. (2005) An oncogenic KRAS2 expression signature identified by cross-species gene-expression analysis. Nat. Genet, . 37, 4855[Web of Science][Medline].
Tukey, J.W. Exploratory Data Analysis, (1977) Addison-Wesley, Reading, MA.
Tusher, V.G., et al. (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA, 98, 51165121
van't Veer, L.J., et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415, 530536[CrossRef][Medline].
Welsh, J.B., et al. (2001) Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. Cancer Res, . 61, 59745978
This article has been cited by other articles:
![]() |
M. C Wu and Xihong Lin Prior biological knowledge-based approaches for the analysis of genome-wide expression profiles using gene sets and pathways Statistical Methods in Medical Research, December 1, 2009; 18(6): 577 - 593. [Abstract] [PDF] |
||||
![]() |
Y. Lu, P. Huggins, and Z. Bar-Joseph Cross species analysis of microarray expression data Bioinformatics, June 15, 2009; 25(12): 1476 - 1483. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. S. Leong and D. Kipling Text-based over-representation analysis of microarray gene lists with annotation bias Nucleic Acids Res., June 1, 2009; 37(11): e79 - e79. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. S. Nagaraj Evolving 'omics' technologies for diagnostics of head and neck cancer Briefings in Functional Genomics, March 9, 2009; (2009) elp004v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Neretti, P.-Y. Wang, A. S. Brodsky, H. H. Nyguyen, K. P. White, B. Rogina, and S. L. Helfand Long-lived Indy induces reduced mitochondrial reactive oxygen species production and oxidative damage PNAS, February 17, 2009; 106(7): 2277 - 2282. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Sanchez-Espiridion, A. Sanchez-Aguilera, C. Montalban, C. Martin, R. Martinez, J. Gonzalez-Carrero, C. Poderos, C. Bellas, M. F. Fresno, C. Morante, et al. A TaqMan Low-Density Array to Predict Outcome in Advanced Hodgkin's Lymphoma Using Paraffin-Embedded Samples Clin. Cancer Res., February 15, 2009; 15(4): 1367 - 1375. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. P. Smirnova, A. A. Ptitsyn, K. J. Austin, H. Bielefeldt-Ohmann, H. Van Campen, H. Han, A. L. van Olphen, and T. R. Hansen Persistent fetal infection with bovine viral diarrhea virus differentially affects maternal blood cell signal transduction pathways Physiol Genomics, February 2, 2009; 36(3): 129 - 139. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Chen, L. Wang, J. D. Smith, and B. Zhang Supervised principal component analysis for gene set enrichment of microarray data with continuous or survival outcomes Bioinformatics, November 1, 2008; 24(21): 2474 - 2481. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Zschiedrich, U. Hardeland, A. Krones-Herzig, M. Berriel Diaz, A. Vegiopoulos, J. Muggenburg, D. Sombroek, T. G. Hofmann, R. Zawatzky, X. Yu, et al. Coactivator function of RIP140 for NF{kappa}B/RelA-dependent cytokine gene expression Blood, July 15, 2008; 112(2): 264 - 276. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Nam and S.-Y. Kim Gene-set approach for expression pattern analysis Brief Bioinform, May 1, 2008; 9(3): 189 - 197. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Shriner, T. M. Baye, M. A. Padilla, S. Zhang, L. K. Vaughan, and A. E. Loraine Commonality of functional annotation: a method for prioritization of candidate genes from genome-wide linkage studies Nucleic Acids Res., March 27, 2008; 36(4): e26 - e26. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Hummel, R. Meister, and U. Mansmann GlobalANCOVA: exploration and assessment of gene group effects Bioinformatics, January 1, 2008; 24(1): 78 - 85. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Xu, Y. Zhao, and R. Simon Gene Set Expression Comparison kit for BRB-ArrayTools Bioinformatics, January 1, 2008; 24(1): 137 - 139. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. J. Goeman and P. Buhlmann Analyzing gene expression data in terms of gene sets: methodological issues Bioinformatics, April 15, 2007; 23(8): 980 - 987. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


















