Bioinformatics Advance Access originally published online on April 4, 2006
Bioinformatics 2006 22(11):1408-1409; doi:10.1093/bioinformatics/btl126
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
HTS-Corrector: software for the statistical analysis and correction of experimental high-throughput screening data
1 Departement d'informatique, Université du Québec à Montreal C.P.8888, suc.Centre-Ville, Montreal, QC, H3C 3P8, Canada
2 McGill University and Genome Quebec Innovation Centre 740 Dr Penfield, Montreal, QC, H3A 1A4, Canada
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: High-throughput screening (HTS) plays a central role in modern drug discovery, allowing for testing of >100 000 compounds per screen. The aim of our work was to develop and implement methods for minimizing the impact of systematic error in the analysis of HTS data. To the best of our knowledge, two new data correction methods included in HTS-Corrector are not available in any existing commercial software or freeware.
Results: This paper describes HTS-Corrector, a software application for the analysis of HTS data, detection and visualization of systematic error, and corresponding correction of HTS signals. Three new methods for the statistical analysis and correction of raw HTS data are included in HTS-Corrector: background evaluation, well correction and hit-sigma distribution procedures intended to minimize the impact of systematic errors. We discuss the main features of HTS-Corrector and demonstrate the benefits of the algorithms.
Availability: The Microsoft Windows version and a detailed description of the software are freely available at the following URL: http://www.labunix.uqam.ca/~makarenv/hts.html
Contact: makarenkov.vladimir{at}uqam.ca
| 1 INTRODUCTION |
|---|
|
|
|---|
High-throughput screening (HTS) technology provides rapid screening of large number of compound collections against putative drug targets. A typical HTS operation in the pharmaceutical industry consists of processing >100 000 compounds per screen, generating
50 million data points per year (Heuer et al., 2002). HTS operates with samples in microliter volumes that are arranged in two-dimensional plates. HTS plates may contain 96 (8 x 12), 384 (16 x 24) or 1536 (32 x 48) samples. Hits can be defined as positive signals corresponding to biologically or chemically active compounds. Quality control is an essential part of correct hit selection in HTS. Random and systematic error can induce either underestimation (false negatives) or overestimation (false positives) of measured signals. Various methods for quality control, systematic error correction, random error estimation and statistical testing of HTS data have been proposed in the scientific literature (Zhang et al., 2000; Brideau et al., 2003; Gunter et al., 2003; Malo et al., 2006).
We describe a new software (HTS-Corrector) designed for statistical analysis and visualization of HTS data. The methods included in HTS-Corrector are based on the statistical analysis of signal variation within plates of an assay. HTS-Corrector displays the results of the correction analysis in the form of plate maps, tables and bar charts (Fig. 1). Details of the algorithms are provided elsewhere (Kevorkov and Makarenkov, 2005; Gagarin et al., 2006).
|
| 2 MAIN FEATURES |
|---|
|
|
|---|
HTS-Corrector offers the following options for the statistical analysis and correction of HTS data.
2.1 Evaluation of the background surface
Ideally, across-plate means for inactive samples should be unrelated to the well location. In practice, however, systematic errors generate reproducible local artifacts and smooth global drifts on the background surface, creating biased measurements which depend on well location within the plates. One option in HTS-Corrector is to first normalize all plates using the Z-score transformation, which standardizes measurements within plates to have a mean of zero and a standard deviation of one. Mean Z-score values are then calculated for each well position across all plates within the screen.
A trend-surface analysis procedure uses a fourth degree polynomial least-squares function to discover general trends and local effects on the mean Z-score values. This global surface-fitting method generates a two-dimensional surface which depicts the main trends of the evaluated background. However, because of their statistical properties, we recommend not to use the high order polynomials for data correction, but only for a better visualization of general trends of the background surfaces (Hastie et al., 2001). The Remove background option enables the user to subtract evaluated background from the original dataset, minimizing the impact of systematic errors (Kevorkov and Makarenkov, 2005).
2.2 Well normalization procedure
The Well correction option first normalizes the experimental values within plates (using Z-score standardization) and then analyzes arrays of values in each well along the whole HTS assay. The method proceeds by computing the approximation curve using either a straight line or a second-degree polynomial and subtracts it from plate-normalized experimental values. In addition, normalization along each well is carried out (Gagarin et al., 2006).
2.3 Hit-sigma distribution method
The Hit-sigma distribution method allows the user to set well-specific standard deviation thresholds with respect to a user-defined mean hit rate. This approach offers advantages over the more traditional procedure of using a fixed threshold for all wells (e.g. 3 SD), which is not optimal in the presence of location-specific systematic errors.
2.4 K-means plate partitioning of large assays
Systematic errors generate repeatable local artifacts and smooth global drifts which become more noticeable when computing a mean assay background. For small HTS assays, the plate background patterns should not substantially vary from plate to plate. However, the plate patterns for large assays may change from batch to batch or shift over a time. HTS-Corrector includes a k-means partitioning procedure (MacQueen, 1967) to find the breaks between batches. This procedure allows the user to separate the initial dataset into several subsets with similar plate patterns.
| 3 BENEFITS OF THE PROPOSED APPROACHES |
|---|
|
|
|---|
We use simulated data to illustrate the proposed correction procedures. The dataset contains 1250 plates with wells arranged in 8 x 10 matrices (imitating the parameters of the McMaster assay proposed as a benchmark for the McMaster data mining and docking competition and examined in detail in Gagarin et al., 2006) (http://hts.mcmaster.ca/HTSDataMiningCompetition.htm).
We first examined the output of various hit identification methods on null simulated data [
N(0, 1)]. Using a 3
hit detection threshold, 119, 117 and 104 false positives were found in the uncorrected, and in the HTS-Corrector background removed and well correction data, respectively. Second, we randomly added 1% of hits to the null dataset. Hit values were randomly chosen from the range [µ 3.5
; µ 4.5
], where µ and
denote the within-plate mean and standard deviation, respectively. Systematic errors affected by random noise were added to this dataset according to a constant c, which ranged from 0
through 0.5
(in 0.1
increments), to produce six noisydatasets. The values 4c, 3c, 2c, 1c, 0, 0, 1c, 2c, 3c and 4c were added to the 1st, 2nd, ... , 10th columns, respectively. Finally, normally distributed random error, [
N(0, c/4)], was added independently to each well of each plate. For each value of c, hits were selected from the uncorrected, background-removed and well-corrected datasets using the 3
threshold. The true positive rate was noticeably higher for both corrected datasets, while the background subtraction procedure slightly outperformed the well correction method (Fig. 2a). The false positive rates of the corrected data were always lower than that of the uncorrected data, although the well correction method generally outperformed the background subtraction procedure (Fig. 2b).
|
| Acknowledgments |
|---|
The authors thank Genome Quebec for funding this project. The authors also thank two anonymous referees and Associate Editor Jonathan Wren for their helpful comments.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Jonathan Wren
Received on February 9, 2006; revised on March 20, 2006; accepted on March 28, 2006
| REFERENCES |
|---|
|
|
|---|
Brideau, C., et al. (2003) Improved statistical methods for hit selection in high-throughput screening. J. Biomol. Screen, . 8, 634647
Gagarin, A., Kevorkov, D., Makarenkov, V., Zentilli, P. (2006) Comparison of two methods for detecting and correcting systematic error in HTS data. Proceedings of IFCS 2006Ljubljana, SpringerVerlag (in press).
Gunter, B., et al. (2003) Statistical and graphical methods for quality control determination of HTS data. J. Biomol. Screen, . 8, 624633
Hastie, T., Tibshirani, R., Friedman, J. The Elements of Statistical Learning, (2001) Springer-Verlag.
Heuer, C., Haenel, T., Prause, B. (2002) A novel approach for quality control and correction of HTS data based on artificial intelligence. Pharmaceutical Discovery and Development Report, .
Kevorkov, D. and Makarenkov, V. (2005) Statistical analysis of systematic errors in high-throughput screening. J. Biomol. Screen, . 10, 557567[Abstract].
MacQueen, J. (1967) Some methods for classification and analysis of multivariate observations. Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, University of California Press.
Malo, N., et al. (2006) Statistical practice in high-throughput screening data analysis. Nat. Biotechnol, . 24, 167175[CrossRef][Web of Science][Medline].
Zhang, J.H., et al. (2000) Confirmation of primary active substances from HTS of chemical and biological populations: a statistical approach and practical considerations. J. Comb. Chem, . 2, 258265[CrossRef][Web of Science][Medline].
This article has been cited by other articles:
![]() |
V. Makarenkov, P. Zentilli, D. Kevorkov, A. Gagarin, N. Malo, and R. Nadon An efficient method for the detection and elimination of systematic error in high-throughput screening Bioinformatics, July 1, 2007; 23(13): 1648 - 1657. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


