Skip Navigation


Bioinformatics Advance Access originally published online on May 6, 2005
Bioinformatics 2005 21(14):3183-3184; doi:10.1093/bioinformatics/bti480
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/14/3183    most recent
bti480v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Obreiter, M.
Right arrow Articles by Beckmann, L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Obreiter, M.
Right arrow Articles by Beckmann, L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

SDMinP: a program to control the family wise error rate using step-down minP adjusted P-values

M. Obreiter 1, C. Fischer 2, J. Chang-Claude 1 and L. Beckmann 1,*

1German Cancer Research Center DKFZ, Department of Clinical Epidemiology Im Neuenheimer Feld 280,69120 Heidelberg, Germany
2Institute of Human Genetics, University of Heidelberg Germany

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 1 INTRODUCTION
 2 FEATURES
 3 IMPLEMENTATION
 REFERENCES
 

Summary: SDMinP is an easy-to-use program for fast calculation of empirical and adjusted P-values for correlated and uncorrelated hypotheses in multiple testing experiments. It is based on the Free Step-Down Resampling Method for controlling the family wise error rate, and implements a variation of an efficient algorithm, which reduces the originally required re-sampling effort considerably and makes the method computationally feasible. The program is independent of the underlying test statistic and works with provided observed and permutation test statistics.

Availability: http://www.dkfz.de/SDMinP/software.html

Contact: L.beckmann{at}dkfz.de


    1 INTRODUCTION
 TOP
 Abstract
 1 INTRODUCTION
 2 FEATURES
 3 IMPLEMENTATION
 REFERENCES
 
Multiple testing, a scenario in which more than one individual hypothesis are tested simultaneously, requires the control of the multiple type I error rate. One definition of the multiple type I error rate is the family wise error rate (FWER), which denotes the probability of having at least one false significant test result within the set of tested hypotheses. The FWER increases with the number of hypotheses and therefore has to be controlled by adjusting the (raw) P-value of the observed test statistic of each individual hypothesis and obtaining a corresponding adjusted P-value.

Common methods used for controlling are the Bonferroni correction or more refined variations of the method. Applying the Bonferroni correction, which is a single-step procedure, to a multiple testing scenario with correlated hypotheses (e.g. association analysis of multiple markers and gene–gene interaction) leads to conservative results. Thus, multi-step procedures were developed to achieve higher power for correlated tests while controllingthe FWER.

Westfall and Young (1993) proposed the Free Step-Down Resampling Method, a multi-step procedure for controlling the FWER. This method uses the joint null distribution of P-values, obtained by re-sampling under the global null hypothesis (i.e. under the assumption that all individual null hypotheses are true), to obtain step-down minP adjusted P-values. However, the determination of the joint null distribution of P-values leads to almost infeasible re-sampling effort if the distribution of the test statistics is unknown. In this case, P-values have to be determined empirically by additional re-sampling and permutation steps under the global null hypothesis.

Ge et al. (2003) and Becker and Knapp (2004) improved the original method by reducing the re-sampling effort and made it feasible and attractive. Ge et al. (2003) offer the R-package multtest (available at www.bioconductor.org), which is especially applicable for microarray data analysis. It can only be used with provided standard test statistics. Based on their approaches, we developed SDMinP for fast calculation of step-down minP adjusted P-values, which is independent of a particular test statistic. Beckmann et al. (2004) demonstrated the gain in statistical power when applying this adjustment compared with other methods for controlling the FWER.

Ge et al. (2003) reduced the re-sampling complexity, known as ‘double-permutation’, considerably and hence lessened the computational effort. The joint null distribution of P-values is calculated on the basis of only one set of permutation test statistics. Furthermore, they proposed an efficient algorithm for the implementation of the method.

Becker and Knapp (2004) presented an approach that further reduced the re-sampling effort. Here, the calculation of the raw P-values of the observed test statistics and the joint null distribution of P-values are based on the same single set of permutation test statistics. They presented two optional formulae for obtaining the global P-value. The first formula determines the global P-value as the smallest adjusted P-value of the individual hypotheses. The second formula is appropriate for relatively small numbers of permutation replicates. It takes into account the discreteness of the P-value distribution and also considers the distribution of the second smallest raw P-values.

SDMinP incorporates the suggestions of Ge et al. (2003) and Becker and Knapp (2004). It calculates step-down minP adjusted P-values and, depending on the input format (see Input data and format in Features section), empirical raw P-values. The global P-value is determined as well, where both formulae presented by Becker and Knapp (2004) can be applied. The program is easy-to-use and works with provided observed and permutation test statistics. This makes it appealing to non-standard test statistics, whose distributions are unknown and where P-values have to be estimated empirically by permutation under the global null hypothesis.


    2 FEATURES
 TOP
 Abstract
 1 INTRODUCTION
 2 FEATURES
 3 IMPLEMENTATION
 REFERENCES
 
SDMinP is available for free. A detailed documentation and example files are included in the download package.

Program type and configuration: SDMinP is a command line tool, which is controlled by a configuration file with a low number of parameters. It is possible to set calculation and performance parameters and to determine the logging granularity.

Algorithms: SDMinP supports two optional approaches for the calculation of unadjusted permutation based raw P-values: Ge et al. (2003), and Becker and Knapp (2004). The formulas differ slightly. For discussion regarding the choice of formula see Becker and Knapp (2004) and the program documentation.

The global P-value is obtained either by taking the smallest adjusted P-value of the hypotheses or by using the improved formula, presented by Becker and Knapp (2004) which includes the distribution of the second smallest raw P-values.

Input data and format: Input data are provided via a flat text file. Each line of the input file contains the information of one hypothesis. The required data per hypothesis consists of one unique identifier, the pre-calculated raw P-value (if available, otherwise the placeholder ‘NA’ has to be set), the observed test statistic and a user defined number of calculated permutation test statistics. If the placeholder instead of the raw P-value is given SDMinP calculates the empirical raw P-value on the basis of the provided test statistics, as proposed by Becker and Knapp (2004).

The input file can be in the magnitude of megabytes or gigabytes for large numbers of hypotheses and permutation test statistics. The performance problem for handling such files has been solved, see Implementation section.

Statistical test: The test character, i.e. whether it is left-, right- or two-sided, can be specified in the configuration file.

Logging mechanism: Each single calculation step can be logged by enabling the respective log mechanism. This feature is useful for following up the computational process of small datasets. For larger datasets, it slows down the performance and does not give easily readable information owing to the large amount of data.

Results: The results are stored in a result file consisting of the observed test statistic, the raw- and adjusted P- value per hypothesis and the global P-value. Optionally, an additional text file containing the results in an ‘R’-readable format (R Development Core Team, 2004 http://www.R-project.org) can be created.


    3 IMPLEMENTATION
 TOP
 Abstract
 1 INTRODUCTION
 2 FEATURES
 3 IMPLEMENTATION
 REFERENCES
 
The program is written in Python 2.3.5 (available at www.python.org) and runs in a Windows as well as in an Unix environment. The results of performance tests are presented in Table 1. One challenge was to deal with the data input file, which can be considerably large and has to be parsed frequently. We solved this performance problem by splitting the input file into smaller parts, which can be browsedfaster.


View this table:
[in this window]
[in a new window]
 
Table 1 Results of the performance test

 


    Acknowledgments
 
We thank Drs Tim Becker and Michael Knapp for advice and comments on the statistical methods. This work was supported by a Deutsche Forschungsgemeinschaft grant (CH117/3-1 for LB) and by the Federal Ministry of Education, Science, Research and Technology (NGFN-2 PGE-S19T05 and PGE-S30T09 for MO).

Received on March 16, 2005; revised on April 20, 2005; accepted on April 28, 2005

    REFERENCES
 TOP
 Abstract
 1 INTRODUCTION
 2 FEATURES
 3 IMPLEMENTATION
 REFERENCES
 

    Becker, T. and Knapp, M. (2004) A powerful strategy to account for multiple testing in the context of haplotype analysis. Am. J. Hum. Genet., 75, 561–570[CrossRef][Medline].

    Beckmann, L., et al. (2004) Analysis of multiple error rates in haplotype-based association studies. Proceedings of the 54th Annual Meeting on Abstracts of The American Society of Human Genetics.

    Ge, Y., et al. (2003) Resampling-based multiple testing for microarray data analysis. Test, 12, , pp. 1–77.

    R Development Core Team. (2004) R: a language and environment for statistical computing. , Vienna, Austria R Foundation for Statistical Computing.

    Westfall, P.H. and Young, S.S. Resampling-based Multiple Testing: Examples and Methods for P-value Adjustment, (1993) , New York John Wiley & Sons.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
GeneticsHome page
P. Chanda, L. Sucheston, A. Zhang, D. Brazeau, J. L. Freudenheim, C. Ambrosone, and M. Ramanathan
AMBIENCE: A Novel Approach and Efficient Algorithm for Identifying Informative Genetic and Environmental Associations With Complex Phenotypes
Genetics, October 1, 2008; 180(2): 1191 - 1210.
[Abstract] [Full Text] [PDF]


Home page
Am. J. PsychiatryHome page
M. Rietschel, L. Beckmann, J. Strohmaier, A. Georgi, A. Karpushova, F. Schirmbeck, K. V. Boesshenz, C. Schmal, C. Burger, R. A. Jamra, et al.
G72 and Its Association With Major Depression and Neuroticism in Large Population-Based Groups From Germany
Am J Psychiatry, June 1, 2008; 165(6): 753 - 762.
[Abstract] [Full Text] [PDF]


Home page
Cancer Epidemiol. Biomarkers Prev.Home page
W. Sauter, A. Rosenberger, L. Beckmann, S. Kropp, K. Mittelstrass, M. Timofeeva, G. Wolke, A. Steinwachs, D. Scheiner, E. Meese, et al.
Matrix Metalloproteinase 1 (MMP1) Is Associated with Early-Onset Lung Cancer
Cancer Epidemiol. Biomarkers Prev., May 1, 2008; 17(5): 1127 - 1135.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/14/3183    most recent
bti480v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Obreiter, M.
Right arrow Articles by Beckmann, L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Obreiter, M.
Right arrow Articles by Beckmann, L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?