Bioinformatics Advance Access originally published online on January 18, 2008
Bioinformatics 2008 24(3):374-382; doi:10.1093/bioinformatics/btm620
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A comparison of meta-analysis methods for detecting differentially expressed genes in microarray experiments

1Department of Biostatistics, Division of Information Sciences, City of Hope National Medical Center, Beckman Research Institute, 1500 Duarte Rd, Duarte, CA 91010, USA and 2Groningen Bioinformatics Centre, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Kerklaan 30, 9751 NN Haren, The Netherlands
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: The proliferation of public data repositories creates a need for meta-analysis methods to efficiently evaluate, integrate and validate related datasets produced by independent groups. A t-based approach has been proposed to integrate effect size from multiple studies by modeling both intra- and between-study variation. Recently, a non-parametric rank product method, which is derived based on biological reasoning of fold-change criteria, has been applied to directly combine multiple datasets into one meta study. Fisher's Inverse
2 method, which only depends on P-values from individual analyses of each dataset, has been used in a couple of medical studies. While these methods address the question from different angles, it is not clear how they compare with each other.
Results: We comparatively evaluate the three methods; t-based hierarchical modeling, rank products and Fisher's Inverse
2 test with P-values from either the t-based or the rank product method. A simulation study shows that the rank product method, in general, has higher sensitivity and selectivity than the t-based method in both individual and meta-analysis, especially in the setting of small sample size and/or large between-study variation. Not surprisingly, Fisher's
2 method highly depends on the method used in the individual analysis. Application to real datasets demonstrates that meta-analysis achieves more reliable identification than an individual analysis, and rank products are more robust in gene ranking, which leads to a much higher reproducibility among independent studies. Though t-based meta-analysis greatly improves over the individual analysis, it suffers from a potentially large amount of false positives when P-values serve as threshold. We conclude that careful meta-analysis is a powerful tool for integrating multiple array studies.
Contact: fxhong{at}jimmy.harvard.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
Present address: Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard School of Public Health, 44 Binney Street, Boston, MA 02115, USA.
Received on June 8, 2007; revised on December 4, 2007; accepted on December 8, 2007