Bioinformatics Advance Access first published online on April 19, 2005
This version published online on April 21, 2005
Bioinformatics, doi:10.1093/bioinformatics/bti448
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
* To whom correspondence should be addressed.
Motivation: In microarray data studies most researchers are keenly aware of the potentially high rate of false positives and the need to control it. One key statistical shift is the move away from the well-known p-value to false discovery rate (FDR). Less discussion perhaps has been spent on the sensitivity or the associated false negative rate (FNR). The purpose of this paper is to explain in simple ways why the shift from p-value to FDR for statistical assessment of microarray data is necessary, to elucidate the determining factors of FDR and, for a two-sample comparative study, to discuss its control via sample size at the design stage. Results: We use a mixture model, involving differentially expressed (DE) and non DE genes, that captures the most common problem of finding DE genes. Factors determining FDR are (1) the proportion of truly differentially expressed genes, (2) the distribution of the true differences, (3) measurement variability and (4) sample size. Many current small microarray studies are plagued with large FDR, but controlling FDR alone can lead to unacceptably large FNR. In evaluating a design of a microarray study, sensitivity or FNR curves should be computed routinely together with FDR curves. Under certain assumptions, the FDR and FNR curves coincide, thus simplifying the choice of sample size for controlling the FDR and FNR jointly. Availability: R-package OCplus for computing FDR, sensitivity curves and sample size is freely available at http://www.meb.ki.se/~yudpaw.
Received January 10, 2005
Revised March 23, 2005
Accepted April 10, 2005
Article
False Discovery Rate, Sensitivity and Sample Size for Microarray Studies
2 Unit of Biostatistics and Epidemiology, Institute Gustave Roussy, Villejuif, France; Unit of Functional Genomics, Institute Gustave Roussy, Villejuif, France
3 Unit of Functional Genomics, Institute Gustave Roussy, Villejuif, France
4 Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden; Medical Research Council, Biostatistics Unit, Institute of Public Health, Cambridge, United Kingdom
Yudi Pawitan, E-mail: yudi.pawitan{at}meb.ki.se
![]()
Abstract
The title of the paper has been updated in this version.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. Demissie, B. Mascialino, S. Calza, and Y. Pawitan Unequal group variances in microarray data analyses Bioinformatics, May 1, 2008; 24(9): 1168 - 1174. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. D. Minna, L. Girard, and Y. Xie Tumor mRNA Expression Profiles Predict Responses to Chemotherapy J. Clin. Oncol., October 1, 2007; 25(28): 4329 - 4336. [Full Text] [PDF] |
||||
![]() |
Y. Lai A moment-based method for estimating the proportion of true null hypotheses and its application to microarray gene expression data Biostat., October 1, 2007; 8(4): 744 - 755. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Huang, A. Gusnanto, K. O'Sullivan, J. Staaf, A. Borg, and Y. Pawitan Robust smooth segmentation approach for array CGH data analysis Bioinformatics, September 15, 2007; 23(18): 2463 - 2469. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Liu and J. T. G. Hwang Quick calculation for sample size while controlling false discovery rate with application to microarray analysis Bioinformatics, March 15, 2007; 23(6): 739 - 746. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Pawitan, S. Calza, and A. Ploner Estimation of false discovery proportion under general dependence Bioinformatics, December 15, 2006; 22(24): 3025 - 3031. [Abstract] [Full Text] [PDF] |
||||
![]() |
G.J. McLachlan, R.W. Bean, and L. B.-T. Jones A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays Bioinformatics, July 1, 2006; 22(13): 1608 - 1615. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Roos and P. Klemm Global Gene Expression Profiling of the Asymptomatic Bacteriuria Escherichia coli Strain 83972 in the Human Urinary Tract. Infect. Immun., June 1, 2006; 74(6): 3565 - 3575. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Seo, H. Gordish-Dressman, and E. P. Hoffman An interactive power analysis tool for microarray hypothesis testing and generation Bioinformatics, April 1, 2006; 22(7): 808 - 814. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Ploner, S. Calza, A. Gusnanto, and Y. Pawitan Multidimensional local false discovery rate for microarray studies Bioinformatics, March 1, 2006; 22(5): 556 - 565. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. W. Norris and C. R. Kahn Analysis of gene expression in pathophysiological states: Balancing false discovery and false negative rates PNAS, January 17, 2006; 103(3): 649 - 653. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Roos, G. C. Ulett, M. A. Schembri, and P. Klemm The Asymptomatic Bacteriuria Escherichia coli Strain 83972 Outcompetes Uropathogenic E. coli Strains in Human Urine Infect. Immun., January 1, 2006; 74(1): 615 - 624. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Camphausen, B. Purow, M. Sproull, T. Scott, T. Ozawa, D. F. Deen, and P. J. Tofilon Orthotopic Growth of Human Glioma Cells Quantitatively and Qualitatively Influences Radiation-Induced Changes in Gene Expression Cancer Res., November 15, 2005; 65(22): 10389 - 10393. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Pawitan, K. R. K. Murthy, S. Michiels, and A. Ploner Bias in the estimation of false discovery rate in microarray studies Bioinformatics, October 15, 2005; 21(20): 3865 - 3872. [Abstract] [Full Text] [PDF] |
||||





