Skip Navigation


Bioinformatics Advance Access first published online on February 2, 2005
This version published online on February 4, 2005

Bioinformatics, doi:10.1093/bioinformatics/bti301
This Article
Right arrow Advance Access manuscript (PDF) Freely available
Right arrow All Versions of this Article:
21/9/1987    most recent
bti301v2
bti301v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Datta, S.
Right arrow Articles by Datta, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Datta, S.
Right arrow Articles by Datta, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author (2005). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oupjournals.org
Received April 23, 2004
Revised December 9, 2004
Accepted January 27, 2005

Article

Empirical Bayes screening (EBS) of many p-values with applications to microarray studies

Susmita Datta 1* and Somnath Datta 2

1 Department of Mathematics and Statistics, Georgia State University, Atlanta, GA 30303, USA; Department of Biology, Georgia State University, Atlanta, GA 30303, USA
2 Department of Statistics, University of Georgia, Athens, GA 30602, USA

* To whom correspondence should be addressed.
Susmita Datta, E-mail: sdatta{at}mathstat.gsu.edu


   Abstract

Motivation: Statistical tests for the detection of differentially expressed genes lead to a large collection of p-values one for each gene comparison. Without any further adjustment, these p-values may lead to a large number of false positives, simply because the number of genes to be tested is huge, which might mean wastage of laboratory resources. To account for multiple hypotheses, these p-values are typically adjusted using a single step method or a step-down method in order to achieve an overall control of the error rate (the so called familywise error rate). In many applications, this may lead to an overly conservative strategy leading to too few genes being flagged.

Results: In this paper we introduce a novel empirical Bayes screening (EBS) technique to inspect a large number of p-values in an effort to detect additional positive cases. In effect, each case borrows strength from an overall picture of the alternative hypotheses computed from all the p-values, while the entire procedure is calibrated by a step-down method so that the familywise error rate at the complete null hypothesis is still controlled. It is shown that the empirical Bayes screening has substantially higher sensitivity than the standard step-down approach for multiple comparison at the cost of a modest increase in the FDR. The EBS procedure also compares favorably when compared with existing FDR control procedures for multiple testing. The EBS procedure is particularly useful in situation where it is important to identify all possible potentially positive cases which can be subjected to further confirmatory testing in order to eliminate the false positives. We illustrated this screening procedure using a data set on human colorectal cancer where we show that the EBS method detected additional genes related to colon cancer that were missed by other methods.

This novel empirical Bayes procedure is advantageous over our earlier proposed empirical Bayes adjustments due to the following reasons: (i) it offers an automatic screening of the p-values the user may obtain from a univariate (i.e., gene by gene) analysis package making it extremely easy to use for a nonstatistician, (ii) since it applies to the p-values, the tests don't have to be t-tests; in particular they could be F-tests which might arise in certain ANOVA formulation with expression data or even nonparametric tests, (iii) the empirical Bayes adjustment uses nonparametric function estimation techniques to estimate the marginal density of the transformed p-values rather than using a parametric model for the prior distribution and is therefore robust against model mis-specification.

Availability: R code for EBS is available from the authors upon request.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
Y. Pawitan, K. R. K. Murthy, S. Michiels, and A. Ploner
Bias in the estimation of false discovery rate in microarray studies
Bioinformatics, October 15, 2005; 21(20): 3865 - 3872.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.