Skip Navigation


Bioinformatics Advance Access originally published online on August 7, 2006
Bioinformatics 2006 22(20):2500-2506; doi:10.1093/bioinformatics/btl424
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
22/20/2500    most recent
btl424v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (33)
Google Scholar
Right arrow Articles by Manoli, T.
Right arrow Articles by Brors, B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Manoli, T.
Right arrow Articles by Brors, B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2006 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (
http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Group testing for pathway analysis improves comparability of different microarray datasets

Theodora Manoli 1,2,3, Norbert Gretz 2, Hermann-Josef Gröne 3, Marc Kenzelmann 3, Roland Eils 1 and Benedikt Brors 1,*

1 Theoretical Bioinformatics, German Cancer Reseach Center 69120 Heidelberg, Germany
2 Medical Research Center, University Hospital Mannheim 68167 Mannheim, Germany
3 Cellular and Molecular Pathology, German Cancer Research Center 69120 Heidelberg, Germany

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 REFERENCES
 

Motivation: The wide use of DNA microarrays for the investigation of the cell transcriptome triggered the invention of numerous methods for the processing of microarray data and lead to a growing number of microarray studies that examine the same biological conditions. However, comparisons made on the level of gene lists obtained by different statistical methods or from different datasets hardly converge. We aimed at examining such discrepancies on the level of apparently affected biologically related groups of genes, e.g. metabolic or signalling pathways. This can be achieved by group testing procedures, e.g. over-representation analysis, functional class scoring (FCS), or global tests.

Results: Three public prostate cancer datasets obtained with the same microarray platform (HGU95A/HGU95Av2) were analyzed. Each dataset was subjected to normalization by either variance stabilizing normalization (vsn) or mixed model normalization (MMN). Then, statistical analysis of microarrays was applied to the vsn-normalized data and mixed model analysis to the data normalized by MMN. For multiple testing adjustment the false discovery rate was calculated and the threshold was set to 0.05. Gene lists from the same method applied to different datasets showed overlaps between 42 and 52%, while lists from different methods applied to the same dataset had between 63 and 85% of genes in common. A number of six gene lists obtained by the two statistical methods applied to the three datasets was then subjected to group testing by Fisher's exact test. Group testing by GSEA and global test was applied to the three datasets, as well. Fisher's exact test followed by global test showed more consistent results with respect to the concordance between analyses on gene lists obtained by different methods and different datasets than the GSEA. However, all group testing methods identified pathways that had already been described to be involved in the pathogenesis of prostate cancer. Moreover, pathways recurrently identified in these analyses are more likely to be reliable than those from a single analysis on a single dataset.

Contact: b.brors{at}dkfz.de

Supplementary Information: Supplementary Figure 1 and Supplementary Tables 1–4 are available at Bioinformatics online.


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 REFERENCES
 
DNA microarrays accelerated dramatically the gene expression profiling of cells, and became therefore an important tool in many biomedical applications. In cancer research, microarray experiments are frequently performed for tumour classification and identification of marker genes (van't Veer et al., 2002; Golub et al., 1999; Giltnane and Rimm, 2004). When results obtained by different microarray studies examining the same biological conditions, e.g. differential expression between tumour and normal samples, are compared, the lists of differentially expressed genes hardly overlap (Ein-Dor et al., 2005). Moreover, numerous statistical methods that are being used for the processing of microarray data exist, and result in dissimilar lists of differentially expressed genes as well (Allison et al., 2006).

A recent addition to the repertoire of analysis methods is group testing, i.e. testing whether predefined lists of genes that belong to, e.g. a metabolic pathway, the same cellular function or cellular component are significantly changed as a group in a microarray dataset (Curtis et al., 2005). Three approaches for this task are: (1) an over-representation analysis (ORA), where the genes in the predefined lists are analyzed to see which categories are represented more than expected by chance (Draghici et al., 2003); (2) a functional class scoring (FCS; Pavlidis et al., 2004; Mootha et al., 2003), where the genes are ranked based on the correlation between their expression and the given phenotype and (3) a global test looking for associations between gene expression in predefined gene sets and a target variable (Goeman et al., 2004).

Until now, results coming from different datasets and statistical methods have only been compared on the level of lists of differentially expressed genes. The goal of this study is to extend the comparison of different statistical methods and datasets to the level of affected pathways, and to compare the output of three recent group testing methods used for pathway analysis. Three public prostate cancer datasets were subjected to two distinct statistical analyses of differential expression, the Statistical Analysis of Microarrays (SAM; Tusher et al., 2001) and Mixed Model Analysis (MMA; Hsieh et al., 2003; Chu et al., 2002). The six lists of genes obtained from these analyses were then subjected to ORA by Fisher's exact test (Draghici et al., 2003). Normalized data of the three datasets were subjected to FCS by GSEA (Mootha et al., 2003; Sweet-Cordero et al., 2005) and to global test (Goeman et al., 2004).


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 REFERENCES
 
2.1 Public dataset collection and pre-processing
Data for this study were downloaded from public websites (Table 1) and were pre-processed by software packages included in the R-project (R Development Core Team, 2004; Ihaka and Gentleman, 1996), Bioconductor (Gentleman et al., 2004), and SAS Microarray 1.3 (SAS Institute Inc, 2004). In all cases raw data were imported from CEL files. Normalization was carried out using the vsn (Huber et al., 2002) or MMN (Hsieh et al., 2003; Chu et al., 2002) algorithms with default parameters as implemented in the Bioconductor package vsn 1.5.0 or SAS Microarray 1.3, respectively. Data were two-base log-transformed. Vsn-normalized data were summarized by the medianpolish method (Tukey, 1977). To calculate fold changes for vsn-normalized expression values the robust estimator Formula was used (Huber et al., 2003). For MMN data, summarization of probes was only necessary to calculate fold changes; this was done by calculating the median of each probe-set. Fold changes were then obtained by dividing the two averages.


View this table:
[in this window]
[in a new window]

 
Table 1 Key characteristics of the microarray data used in this study

 
2.2 Statistical analysis
For the identification of differentially expressed genes, SAM (Tusher et al., 2001), as implemented in the siggenes 1.2.11 Bioconductor package, has been applied on vsn-normalized data and MMA (Hsieh et al., 2003; Chu et al., 2002), as implemented in SAS Microarray 1.3, has been applied on MMN-normalized data. In all three datasets the same two classes, tumour versus normal, were compared. For multiple testing adjustment, the false discovery rate (FDR) was calculated, using the algorithm of Storey and Tibshirani (2003) for SAM and the algorithm of Benjamini and Hochberg (1995) for MMA. A threshold of 0.05 was used.

2.3 Group testing for pathway analysis
To identify pathways that are likely to be affected by differential expression three approaches were used; an ORA approach using Fisher's exact test as described by Draghici et al. (2003), an FCS approach using a modified GSEA as described by Sweet-Cordero et al. (2005) and the global test approach as described by Goeman et al. (2004) and as implemented in the Bioconductor package globaltest 3.0.4. We used a total number of 227 pathway lists from which 132 were generated from the KEGG database (Kyoto Encyclopedia of Genes and Genomes, http://www.genome.ad.jp/kegg/pathway.html) using the Bioconductor annotation package hgu95av2 1.8.4. A number of 95 pathways was generated manually (M. Kenzelmann).

2.4 Fisher's exact test (Draghici et al., 2003)
We consider that there are N single-symbol-annotated genes on the microarray (replicates were averaged by calculating the mean), which are either significantly differentially expressed (S) or not (F), and either belong to a pre-defined pathway list (P) or not (NP), see Table 2. If we pick randomly P genes, we would like to estimate the probability of having exactly {alpha} genes in S. The p-value of having {alpha} genes or fewer in S can be calculated by summing the probabilities of a random list of K genes having 1, 2, ... , {alpha} genes in S:

Formula 1(1)


View this table:
[in this window]
[in a new window]

 
Table 2 Gene categorization for group testing with Fisher's exact test approach

 
This is a one-sided test in which the P values correspond to over-represented lists of genes.

A review about similar current tools used for group testing on the level of Gene Ontology (GO) terms was given by Khatri and Draghici (2005).

2.5 GSEA (Mootha et al., 2003; Sweet-Cordero et al., 2005)
An earlier version of this approach, called also gene set enrichment analysis (GSEA), has previously been described by Lamb et al. (2003) and Mootha et al. (2003). This procedure was extended by Sweet-Cordero et al., 2005 to address the case of multiple gene sets as well as multiple datasets. A refinement of the GSEA methodology with a broader applicability along several kinds of datasets has been given by Subramanian et al. (2005). We use the basic GSEA procedure as described by Sweet-Cordero et al. (2005), applying a phenotype permutation but no gene permutation. This is based on a maximum deviation statistic of two empirical distribution functions. First, the genes are ranked using the SNR (signal to noise ratio) with respect to their correlation with the phenotype of interest, in our case the comparison of tumour tissue versus normal tissue. The SNR is defined as SNR = |(µC – µT)|/({sigma}C + {sigma}T), where, µC, µT, {sigma}C and {sigma}T are the mean expression values and the standard deviations of the control group C and the test group T, respectively. Second, an enrichment measure ES is calculated and assigned to each gene set as following. If L = {g1, ... , gN} are the ranked genes, the two empirical cumulative distribution functions are defined as

Formula 2(2)
where G is the gene set that is to be tested, D the dataset under investigation, NH the number of genes in the gene set, Formula 2 the number of genes (cardinality) ranked above the ith gene that are in the gene set (‘hits’) and Formula 2 the number of genes (cardinality) ranked above the ith gene that are not in the gene set (‘miss’).

The enrichment score ES is the maximum difference between Phit and Pmiss, i.e.

Formula 3(3)

Finally we calculate a nominal p-value to estimate the statistical significance of the enrichment score ES. This is done by permuting the class labels 999 times to produce reshuffled datasets D({pi}). Then we re-compute ranked lists by calculating the SNR of D({pi}). After that we calculate the enrichment score of each gene set for the new ranked lists, Formula 3. The nominal p-value of a gene set is determined by:

Formula 4(4)

2.6 Global test (Goeman et al., 2004)
This test investigates whether samples with similar clinical outcomes tend to have similar gene expression patterns. For a significant result, it suffices if many genes in the group are correlated with the outcome and not necessarily have similar expression patterns. The model used is

Formula 5(5)
where Yi is the outcome of the sample i, h a link function (e.g. the logit function), a an intercept, Formula 5 an element of the n x m data matrix of samples i and genes j, and ßj the regression coefficient for gene j (j = 1, ... , m).

To test whether there is a predictive effect of the gene expressions on the clinical outcome, the null hypothesis that all regression coefficients are zero is tested, H0 : ß1 = ß2 = ... = ßm = 0. The statistic Q used for testing H0 in equation (5) can be written as:

Formula 6(6)
where µ = h–1(a) is the expectation of Y under H0, Formula 6 is the second central moment of Y under H0, Formula 6 the covariance matrix of the gene-expression patterns between the samples where Xi (i = 1, ... , m) the n x 1 vector of the gene expressions of gene i, and Formula 6 the covariance matrix of the clinical outcomes of the samples.


    3 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 REFERENCES
 
Three public datasets (Table 1) were used to compare different statistical methods applied to these datasets on the level of apparently affected pathways. All three studies used the HGU95A (Welsh and Ernst) or HGU95Av2 (Singh) microarray platform from Affymetrix, and were consisting of the same two sample classes, normal prostate and prostate cancer. Each dataset was subjected to normalization by either vsn or MMN. Then, SAM was applied to the vsn-normalized data and MMA to the data normalized by MMN. Comparisons were made between prostate tumour and normal prostate tissue to detect genes that are significantly differentially expressed under these conditions. For multiple testing adjustment the FDR was calculated using the algorithm of Storey and Tibshirani for SAM and the algorithm of Benjamini and Hochberg for MMA. A threshold of 0.05 was used to assign differentially expressed genes. All differentially expressed genes with their significance values and fold-changes are to be found in Supplementary Table 1. The numbers of these genes and their overlap with those obtained from other statistical methods and datasets are shown by Venn diagrams (Fig. 1). Figure 1a and b show the numbers of genes obtained by SAM or MMA applied to three datasets. The intersections of all three datasets contains 146 or 132 genes, respectively, which represent only 52 or 48 % of the smallest sets of significantly differentially expressed genes, which are from the data of Ernst. Comparing the overlaps between SAM or MMA genes for the three datasets (Fig. 1c–e), the dataset of Welsh shows the highest rate of common genes of 84%, while the datasets of Singh and Ernst show rates of 63 or 65%, respectively. Finally, the common differentially expressed genes between SAM and MMA were compared with each other (Fig. 1f). Here, the overall overlap of 76 genes represents just the 42% of the smallest set of significantly differentially expressed genes (dataset of Ernst).


Figure 1
View larger version (24K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 Venn diagrams representing the numbers of significantly differentially expressed genes and the overlaps of sets obtained from two different statistical methods and three datasets. The upper magenta and cyan diagrams show the overlaps of SAM and MMA genes between the three datasets, respectively. The middle Venn diagrams show the overlaps between SAM and MMA genes for the three datasets separately. The bottom diagram shows the overlaps between the intersections of the middle row.

 
Although the overlaps between different methods or between different datasets appear to be rather small, these are better compared by concordance plots that allow us to examine how big these discrepancies really are (Fig. 2, Supplementary Fig. 1). These plots show the numbers of common genes between significantly differentially expressed genes found by one analysis (y-axis) along the ranked list of all examined genes according to another analysis (x-axis). The area under the curve (AUC) determines the extent of concordance between the two analyses being compared. Hence, curves that are fast steeply increasing indicate a high concordance of the genes presented on the y-axis with the second analysis presented on the x-axis (e.g. Fig. 2i), i.e. the genes on the x-axis get also high ranks in the second analysis. On the contrary, curves tending to the 45° diagonal line from the lower left corner to the upper right corner of the plot denote no similarity between the results of the two analyses (Fig. 2h). The first six plots of both Figure 2 and Supplementary Figure 1 are comparisons between different analyses performed on the three data sets using the same statistical method (a–c: SAM and d–f: MMA). The dataset of Ernst shows the highest concordance of differentially expressed genes (large AUCs in Fig. 2c and f). The Welsh and Singh datasets give concordance plots that are close to the 45° diagonal line, and this denotes that the differentially expressed genes of these data sets show a low concordance along ranked genes of other datasets (Fig. 2a, b, d and e). In contrast, the concordance between the two statistical methods for same data sets (Fig. 2g–i, Supplementary Fig. 1g–i) was higher. SAM and MMA showed very good concordance rates for the dataset of Ernst, where all significantly differentially expressed genes found with one method were among the first 3000 top rated genes with the other method. For the dataset of Welsh, > 90% of the significantly differentially expressed genes found with one method were among the top 4000 ranked genes with the other method. The dataset of Singh showed, however, much lower concordance. This is presented by the short and not steeply ascending first phase of the concordance curve (Fig. 2h).


Figure 2
View larger version (26K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2 Concordance plots presenting the numbers of common genes between significantly differentially expressed genes found by one analysis (y-axis) along the ranked list of all examined genes according to another analysis (x-axis). (ac) SAM analyses of different datasets are compared; (df) MMA analyses of different datasets; (gi) SAM and MMA analyses of same datasets.

 
As the next step, three approaches of group testing for pathway analysis have been applied. Fisher's exact test was applied to the six lists of significantly differentially expressed genes coming from three datasets and two statistical methods. GSEA and global test were applied to the three vsn-normalized datasets. For Fisher's exact test and GSEA, a threshold of 0.05 for the p-values was set to identify significantly regulated pathways. Because almost all examined pathways with the global test approach were assigned a p-value of < 0.05 we took the top 20 high-rated pathways for further investigation. The affected pathways found by each pathway analysis method and their p-values are presented in Supplementary Tables 2–4. Supplementary Table 2 presents the results obtained by Fisher's exact test, and is a dual comparison of the outputs of pathway analyses applied to SAM and MMA gene lists for the same dataset and pathway analysis method. Supplementary Tables 3 and 4 present the results obtained by GSEA and the global test for each data set, respectively. The numbers of affected gene groups and their overlaps for each group testing method are presented in Table 3.


View this table:
[in this window]
[in a new window]

 
Table 3 Numbers of affected pathways found with three statistical approaches for group testing (Fisher's exact test, GSEA and global test), obtained from the three datasets of Welsh, Singh and Ernst

 
Figure 3 summarizes the occurrences of the significantly regulated pathways found with Fisher's exact test (Fig. 3a), GSEA (Fig. 3b), and global test (Fig. 3c), respectively. A total number of six Fisher's exact test analyses have been applied on lists of significantly differentially expressed genes found with either SAM or MMA for the three public datasets of Welsh, Singh and Ernst, while for each GSEA and global test only three analyses have been applied directly to normalized data of these data sets, respectively. By Fisher's exact testing (Fig. 3a) one pathway, ‘androgen and prostate cancer’, was found to be significantly regulated in all six analyses. Two pathways, ‘ribosome’ and ‘glutathione metabolism’, were found to be significantly affected in five analyses, six pathways in four analyses, four pathways in three analyses, 17 pathways in two analyses and 17 pathways in only one analysis. GSEA (Fig. 3b) gave no affected pathways in more than one analysis. A total number of 23 pathways has been obtained by all analyses. By global testing eight pathways have been obtained by two analyses, while 44 pathways have been obtained by only one analysis.


Figure 3
View larger version (29K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3 Coincidence of affected pathways found with (a) Fisher's exact test, (b) GSEA or (c) the global test from six different lists of genes. Each colour box designates that the corresponding pathway is significantly regulated in the corresponding list, coming from a specific dataset (Welsh, Singh or Ernst) and, for the case of Fisher's exact test, a specific statistical analysis (SAM or MMA). Significantly regulated pathways in 6 lists of genes are demonstrated in magenta, significantly regulated pathways in 5, 4, 3, 2 or 1 lists of genes are shown in red, blue, yellow or grey, respectively. The numbers in parenthesis following each pathway name denote the KEGG-ID (http://www.genome.ad.jp/kegg/pathway.html) for pathways coming from the KEGG pathway database. Pathways without number were complied and curated manually (M. Kenzelmann).

 
Finally, the coincidence of affected pathways between the three group testing methods, Fisher's exact test, GSEA and global test is presented (Fig. 4). For this purpose, pathways found to be significantly regulated in at least four of the six Fisher's exact test analyses and two of the three GSEA or global test analyses, were used. Three pathways, ‘androgen and prostate cancer’, ‘pyrimidine metabolism’ and ‘nucleotide metabolism’, were found to be affected by two group testing methods. A number of 11 pathways was found by only one method.


Figure 4
View larger version (24K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4 Coincidence of affected pathways along the three different pathway analysis methods, Fisher's exact test, GSEA and global test. Pathways found to be affected in at least four of six Fisher's exact test analyses, or two of three GSEA or global test analyses, are presented. Yellow and grey boxes refer to significantly regulated pathways found with two or one pathway analysis methods, respectively. The numbers in parenthesis following each pathway name denote the KEGG-ID (http://www.genome.ad.jp/kegg/pathway).

 

    4 DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 REFERENCES
 
In this study we examined the discrepancies of different statistical methods and datasets on the level of affected pathways, and compared the output of three recent group testing methods used for pathway analysis. Data of three public prostate cancer datasets were pre-processed by vsn or MMN, and were then subjected to SAM or MMA, respectively. The FDR was calculated for multiple testing adjustment, and the threshold was set to 0.05 to assign differentially expressed genes. Three approaches of group testing for pathway analysis, Fisher's exact test, GSEA and global test, were applied to the three public datasets.

Comparing the overlaps of genes obtained from the two statistical methods and three datasets, we conclude that both different statistical methods and different datasets examining the same biological condition (in our case prostate cancer) lead to significant discrepancies. Different datasets showed higher dissimilarities in the obtained significantly differentially expressed genes than different statistical methods did. This observation was confirmed by investigating the concordance of a list of differentially expressed genes found by one analysis along the ranked list of all examined genes in another analysis. In general, genes obtained from the dataset of Ernst showed a higher concordance because of the smaller numbers of differentially expressed genes found in this dataset. These genes are expected to be the highly ranked among prostate cancer genes. Genes obtained from the dataset of Singh showed, in contrary, the worst concordance.

The problem of having a small overlap between gene sets coming from different datasets has been also pointed out by Ein-Dor et al. (2005). In this study one single dataset was analyzed by a single method and it was shown that the resulting set of genes is strongly influenced by the subset of patients used for gene selection. An explanation would be that there is a high number of differentially expressed genes, many of which are highly correlated. Which of them are chosen as the top-ranked genes is more or less arbitrary and depends on the analysis method or the set of samples from which the genes were inferred. One would expect, however, that these discrepancies are less pronounced when the genes are mapped to biological pathways.

Comparing the results of the three group testing methods (Fig. 3), Fisher's exact test followed by global test showed the highest overlap between affected pathways inferred from different datasets. ‘Androgen and prostate cancer’ was the pathway found to be apparently affected in all six Fisher's exact test analyses and in two global test analyses, and this supports the validity of these results. The GSEA gave no overlaps, which can be partly explained by the smaller numbers of apparently affected pathways gained by this method. However, most of the obtained pathways were also high rated pathways by the other two methods. As expected, results of group testing applied on different datasets were more discrepant than results of group testing applied on lists of differentially expressed genes obtained by different statistical methods (Fig. 3a).

Examining the coincidence of affected pathways between the three group testing methods, Fisher's exact test, GSEA and global test (Fig. 4), we observed that different methods gave mainly diverging results. Three pathways, ‘androgen and prostate cancer’, ‘pyrimidine metabolism’ and ‘nucleotide metabolism’, were found to be affected by two methods (Fisher's exact test and global test). GSEA gave, as mentioned above, no overlaps at all.

Many of the pathways found to be affected by differential expression are already known to be involved in prostate cancer pathogenesis. ‘Androgen and prostate cancer’ has a clear connection to prostate cancer as they were manually created from literature information on these diseases (M. Kenzelmann). Pathways from the group of nucleotide metabolism, amino acid metabolism, carbohydrate metabolism, as well as pathways like ‘ribosome’ are characteristic for fast proliferating tumour cells. ‘Glutathione metabolism’ plays an important role in defense against reactive oxygen species, xenobiotics and heavy metals (Mendoza-Cózatl et al., 2005), while glutathione S-transferase pi (GSTP1) is a characteristically down-regulated marker gene in prostate cancer (Nakayama et al., 2004). ‘Gap junction proteins connexins’ marks the increased cell communication in cancer for processes like apoptosis, differentiation and tissue homeostasis, and for activation of calcium and MAPK signalling pathways (http://www.genome.ad.jp/kegg/pathway.html). ‘Hypoxia’ can be related to the increased intracellular redox state of prostate cancer cells, associated to the high oxidizing power of the fatty acid synthesis (FAS) pathway, that yields expression of hypoxia-regulated genes (Hochachka et al., 2002). An increased intracellular redox state yields also an increase in the expression of the intrinsic prion protein (PrPc), suggesting the possible participation of PrPc in antioxidative defense (Sauer et al., 1999) and explaining the high occurrence of ‘prion disease’ in our results.

Even a pathway as remote as ‘Cholera-infection’ yields some interesting results, as it contains genes of the adenylate cyclase signaling, phospholipase C and other factors that are also changed in tumour cells.


    5 CONCLUSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 REFERENCES
 
In conclusion, group testing applied to different datasets yields interesting common results, diminishing the large discrepancies observed in direct comparisons of lists of differentially expressed genes obtained not only from different datasets, but also by different statistical methods. Moreover, the multiple microarray analyses performed in this study result in discriminative pathway regulation signatures that are found and validated by different laboratories and microarray analysis methods. Pathways obtained by these analyses are likely to be more robust than those generated by a single analysis on a single dataset.

The three group testing methods used in this study differed in their results. Fisher's exact test showed the most consistent results with respect to the concordance between analyses on gene lists obtained by different methods from different datasets. Global test showed to a lesser extend consistent results between analyses applied to different datasets, while GSEA showed no overlaps between results coming from different datasets. All group testing methods gave pathways that had already been described to be involved in the pathogenesis of prostate cancer.


    Acknowledgments
 
The authors acknowledge financial support by the BMBF (BioFuture; 0311880A) and the National Genome Research Network (01 GR 0450). T.M. receives a stipend from the DFG Graduiertenkolleg 886. Funding to pay the Open Access publication charges was provided by the German Federal Ministry of Research and Education (grant 01 GR 0450).

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: David Rocke

Received on March 8, 2006; revised on July 22, 2006; accepted on July 28, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 REFERENCES
 

    Allison, D.B., et al. (2006) Microarray data analysis: from disarray to consolidation and consensus. Nat. Rev. Genet, . 7, 55–65[CrossRef][Web of Science][Medline].

    Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. B, 57, 289–300.

    Chu, T.M., et al. (2002) A systematic statistical linear modeling approach to oligonucleotide array experiments. Math. Biosci, . 176, 35–51[CrossRef][Web of Science][Medline].

    Curtis, R.K., et al. (2005) Pathways to the analysis of microarray data. Trends Biotechnol, . 23, 429–435[CrossRef][Web of Science][Medline].

    Draghici, S., et al. (2003) Global functional profiling of gene expression. Genomics, 81, 98–104[CrossRef][Web of Science][Medline].

    Ein-Dor, L., et al. (2005) Outcome signature genes in breast cancer: is there a unique set? Bioinformatics, 21, 171–178[Abstract/Free Full Text].

    Ernst, T., et al. (2002) Decrease and gain of gene expression are equally discriminatory markers for prostate carcinoma: a gene expression analysis on total and microdissected prostate tissue. Am. J. Pathol, . 160, 2169–2180[Abstract/Free Full Text].

    Gentleman, R.C., et al. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol, . 5, R80[CrossRef][Medline].

    Giltnane, J.M. and Rimm, D.L. (2004) Technology insight: Identification of biomarkers with tissue microarray technology. Nat. Clin. Pract. Oncol, . 1, 104–111[Medline].

    Goeman, J.J., et al. (2004) A global test for groups of genes: testing association with a clinical outcome. Bioinformatics, 20, 93–99[Abstract/Free Full Text].

    Golub, T.R., et al. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531–537[Abstract/Free Full Text].

    Hochachka, P.W., et al. (2002) Going malignant: the hypoxia-cancer connection in the prostate. Bioessays, 24, 749–757[CrossRef][Web of Science][Medline].

    Hsieh, W.-P., et al. (2003) Mixed-model reanalysis of primate data suggests tissue and species biases in oligonucleotide-based gene expression profiles. Genetics, 165, 747–757[Abstract/Free Full Text].

    Huber, W., et al. (2002) Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics, 18, Suppl. 1, S96–S104[Abstract].

    Huber, W., et al. (2003) Parameter estimation for the calibration and variance stabilization of microarray data. Stat. Appl. Genet. Mol. Biol, . 2, 3.

    Ihaka, R. and Gentleman, R. (1996) R: a language for data analysis and graphics. J. Comput. Graph. Stat, . 5, 299–314[CrossRef].

    Khatri, P. and Draghici, S. (2005) Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics, 21, 3587–3595[Abstract/Free Full Text].

    Lamb, J., et al. (2003) A mechanism of cyclin D1 action encoded in the patterns of gene expression in human cancer. Cell, 114, 323–334[CrossRef][Web of Science][Medline].

    Mendoza-Cózatl, D., et al. (2005) Sulfur assimilation and glutathione metabolism under cadmium stress in yeast, protists and plants. FEMS Microbiol. Rev, . 29, 653–671[CrossRef][Web of Science][Medline].

    Mootha, V.K., et al. (2003) PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet, . 34, 267–273[CrossRef][Web of Science][Medline].

    Nakayama, M., et al. (2004) GSTP1 CpG island hypermethylation as a molecular biomarker for prostate cancer. J. Cell Biochem, . 91, 540–552[CrossRef][Web of Science][Medline].

    Pavlidis, P., et al. (2004) Using the gene ontology for microarray data mining: a comparison of methods and application to age effects in human prefrontal cortex. Neurochem. Res, . 29, 1213–1222[CrossRef][Web of Science][Medline].

    R Development Core Team (2004). R: A language and environment for statistical computing, (2004) , Austria ISBN 3-900051-07-0 R Foundation for Statistical Computing Vienna.

    SAS Institute Inc (2004). SAS Scientific Discovery Solutions Supplement: SAS Microarray 1.3, (2004) , NC Cary.

    Sauer, H., et al. (1999) Redox-regulation of intrinsic prion expression in multicellular prostate tumor spheroids. Free Radic. Biol. Med, . 27, 1276–1283[CrossRef][Web of Science][Medline].

    Singh, D., et al. (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1, 203–209[CrossRef][Web of Science][Medline].

    Storey, J.D. and Tibshirani, R. (2003) Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA, 100, 9440–9445[Abstract/Free Full Text].

    Subramanian, A., et al. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA, 102, 15545–15550[Abstract/Free Full Text].

    Sweet-Cordero, A., et al. (2005) An oncogenic KRAS2 expression signature identified by cross-species gene-expression analysis. Nat. Genet, . 37, 48–55[Web of Science][Medline].

    Tukey, J.W. Exploratory Data Analysis, (1977) Addison-Wesley, Reading, MA.

    Tusher, V.G., et al. (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA, 98, 5116–5121[Abstract/Free Full Text].

    van't Veer, L.J., et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415, 530–536[CrossRef][Medline].

    Welsh, J.B., et al. (2001) Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. Cancer Res, . 61, 5974–5978[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
Y. Lu, P. Huggins, and Z. Bar-Joseph
Cross species analysis of microarray expression data
Bioinformatics, June 15, 2009; 25(12): 1476 - 1483.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
H. S. Leong and D. Kipling
Text-based over-representation analysis of microarray gene lists with annotation bias
Nucleic Acids Res., June 1, 2009; 37(11): e79 - e79.
[Abstract] [Full Text] [PDF]


Home page
Brief Funct Genomic ProteomicHome page
N. S. Nagaraj
Evolving 'omics' technologies for diagnostics of head and neck cancer
Brief Funct Genomic Proteomic, March 9, 2009; (2009) elp004v1.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
N. Neretti, P.-Y. Wang, A. S. Brodsky, H. H. Nyguyen, K. P. White, B. Rogina, and S. L. Helfand
Long-lived Indy induces reduced mitochondrial reactive oxygen species production and oxidative damage
PNAS, February 17, 2009; 106(7): 2277 - 2282.
[Abstract] [Full Text] [PDF]


Home page
Clin. Cancer Res.Home page
B. Sanchez-Espiridion, A. Sanchez-Aguilera, C. Montalban, C. Martin, R. Martinez, J. Gonzalez-Carrero, C. Poderos, C. Bellas, M. F. Fresno, C. Morante, et al.
A TaqMan Low-Density Array to Predict Outcome in Advanced Hodgkin's Lymphoma Using Paraffin-Embedded Samples
Clin. Cancer Res., February 15, 2009; 15(4): 1367 - 1375.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
N. P. Smirnova, A. A. Ptitsyn, K. J. Austin, H. Bielefeldt-Ohmann, H. Van Campen, H. Han, A. L. van Olphen, and T. R. Hansen
Persistent fetal infection with bovine viral diarrhea virus differentially affects maternal blood cell signal transduction pathways
Physiol Genomics, February 2, 2009; 36(3): 129 - 139.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
X. Chen, L. Wang, J. D. Smith, and B. Zhang
Supervised principal component analysis for gene set enrichment of microarray data with continuous or survival outcomes
Bioinformatics, November 1, 2008; 24(21): 2474 - 2481.
[Abstract] [Full Text] [PDF]


Home page
BloodHome page
I. Zschiedrich, U. Hardeland, A. Krones-Herzig, M. Berriel Diaz, A. Vegiopoulos, J. Muggenburg, D. Sombroek, T. G. Hofmann, R. Zawatzky, X. Yu, et al.
Coactivator function of RIP140 for NF{kappa}B/RelA-dependent cytokine gene expression
Blood, July 15, 2008; 112(2): 264 - 276.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
D. Nam and S.-Y. Kim
Gene-set approach for expression pattern analysis
Brief Bioinform, May 1, 2008; 9(3): 189 - 197.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. Shriner, T. M. Baye, M. A. Padilla, S. Zhang, L. K. Vaughan, and A. E. Loraine
Commonality of functional annotation: a method for prioritization of candidate genes from genome-wide linkage studies
Nucleic Acids Res., March 27, 2008; 36(4): e26 - e26.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Hummel, R. Meister, and U. Mansmann
GlobalANCOVA: exploration and assessment of gene group effects
Bioinformatics, January 1, 2008; 24(1): 78 - 85.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
X. Xu, Y. Zhao, and R. Simon
Gene Set Expression Comparison kit for BRB-ArrayTools
Bioinformatics, January 1, 2008; 24(1): 137 - 139.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. J. Goeman and P. Buhlmann
Analyzing gene expression data in terms of gene sets: methodological issues
Bioinformatics, April 15, 2007; 23(8): 980 - 987.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
22/20/2500    most recent
btl424v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (33)
Google Scholar
Right arrow Articles by Manoli, T.
Right arrow Articles by Brors, B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Manoli, T.
Right arrow Articles by Brors, B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?