Skip Navigation


Bioinformatics Advance Access originally published online on April 4, 2006
Bioinformatics 2006 22(11):1408-1409; doi:10.1093/bioinformatics/btl126
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/11/1408    most recent
btl126v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Makarenkov, V.
Right arrow Articles by Nadon, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Makarenkov, V.
Right arrow Articles by Nadon, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

HTS-Corrector: software for the statistical analysis and correction of experimental high-throughput screening data

Vladimir Makarenkov 1,*, Dmytro Kevorkov 1, Pablo Zentilli 1, Andrei Gagarin 1, Nathalie Malo 2 and Robert Nadon 2

1 Departement d'informatique, Université du Québec à Montreal C.P.8888, suc.Centre-Ville, Montreal, QC, H3C 3P8, Canada
2 McGill University and Genome Quebec Innovation Centre 740 Dr Penfield, Montreal, QC, H3A 1A4, Canada

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MAIN FEATURES
 3 BENEFITS OF THE...
 REFERENCES
 

Motivation: High-throughput screening (HTS) plays a central role in modern drug discovery, allowing for testing of >100 000 compounds per screen. The aim of our work was to develop and implement methods for minimizing the impact of systematic error in the analysis of HTS data. To the best of our knowledge, two new data correction methods included in HTS-Corrector are not available in any existing commercial software or freeware.

Results: This paper describes HTS-Corrector, a software application for the analysis of HTS data, detection and visualization of systematic error, and corresponding correction of HTS signals. Three new methods for the statistical analysis and correction of raw HTS data are included in HTS-Corrector: background evaluation, well correction and hit-sigma distribution procedures intended to minimize the impact of systematic errors. We discuss the main features of HTS-Corrector and demonstrate the benefits of the algorithms.

Availability: The Microsoft Windows version and a detailed description of the software are freely available at the following URL: http://www.labunix.uqam.ca/~makarenv/hts.html

Contact: makarenkov.vladimir{at}uqam.ca


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MAIN FEATURES
 3 BENEFITS OF THE...
 REFERENCES
 
High-throughput screening (HTS) technology provides rapid screening of large number of compound collections against putative drug targets. A typical HTS operation in the pharmaceutical industry consists of processing >100 000 compounds per screen, generating ~50 million data points per year (Heuer et al., 2002). HTS operates with samples in microliter volumes that are arranged in two-dimensional plates. HTS plates may contain 96 (8 x 12), 384 (16 x 24) or 1536 (32 x 48) samples. Hits can be defined as positive signals corresponding to biologically or chemically active compounds.

Quality control is an essential part of correct hit selection in HTS. Random and systematic error can induce either underestimation (false negatives) or overestimation (false positives) of measured signals. Various methods for quality control, systematic error correction, random error estimation and statistical testing of HTS data have been proposed in the scientific literature (Zhang et al., 2000; Brideau et al., 2003; Gunter et al., 2003; Malo et al., 2006).

We describe a new software (HTS-Corrector) designed for statistical analysis and visualization of HTS data. The methods included in HTS-Corrector are based on the statistical analysis of signal variation within plates of an assay. HTS-Corrector displays the results of the correction analysis in the form of plate maps, tables and bar charts (Fig. 1). Details of the algorithms are provided elsewhere (Kevorkov and Makarenkov, 2005; Gagarin et al., 2006).


Figure 1
View larger version (44K):
[in this window]
[in a new window]
 
Fig. 1 Screenshot of HTS-Corrector showing a background map, data table and bar chart characterizing an evaluated background.

 

    2 MAIN FEATURES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MAIN FEATURES
 3 BENEFITS OF THE...
 REFERENCES
 
HTS-Corrector offers the following options for the statistical analysis and correction of HTS data.

2.1 Evaluation of the background surface
Ideally, across-plate means for inactive samples should be unrelated to the well location. In practice, however, systematic errors generate reproducible local artifacts and smooth global drifts on the background surface, creating biased measurements which depend on well location within the plates. One option in HTS-Corrector is to first normalize all plates using the Z-score transformation, which standardizes measurements within plates to have a mean of zero and a standard deviation of one. Mean Z-score values are then calculated for each well position across all plates within the screen.

A trend-surface analysis procedure uses a fourth degree polynomial least-squares function to discover general trends and local effects on the mean Z-score values. This global surface-fitting method generates a two-dimensional surface which depicts the main trends of the evaluated background. However, because of their statistical properties, we recommend not to use the high order polynomials for data correction, but only for a better visualization of general trends of the background surfaces (Hastie et al., 2001). The ‘Remove background’ option enables the user to subtract evaluated background from the original dataset, minimizing the impact of systematic errors (Kevorkov and Makarenkov, 2005).

2.2 Well normalization procedure
The ‘Well correction’ option first normalizes the experimental values within plates (using ‘Z-score standardization’) and then analyzes arrays of values in each well along the whole HTS assay. The method proceeds by computing the approximation curve using either a straight line or a second-degree polynomial and subtracts it from plate-normalized experimental values. In addition, normalization along each well is carried out (Gagarin et al., 2006).

2.3 Hit-sigma distribution method
The ‘Hit-sigma distribution’ method allows the user to set well-specific standard deviation thresholds with respect to a user-defined mean hit rate. This approach offers advantages over the more traditional procedure of using a fixed threshold for all wells (e.g. 3 SD), which is not optimal in the presence of location-specific systematic errors.

2.4 K-means plate partitioning of large assays
Systematic errors generate repeatable local artifacts and smooth global drifts which become more noticeable when computing a mean assay background. For small HTS assays, the plate background patterns should not substantially vary from plate to plate. However, the plate patterns for large assays may change from batch to batch or shift over a time. HTS-Corrector includes a k-means partitioning procedure (MacQueen, 1967) to find the breaks between batches. This procedure allows the user to separate the initial dataset into several subsets with similar plate patterns.


    3 BENEFITS OF THE PROPOSED APPROACHES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MAIN FEATURES
 3 BENEFITS OF THE...
 REFERENCES
 
We use simulated data to illustrate the proposed correction procedures. The dataset contains 1250 ‘plates’ with ‘wells’ arranged in 8 x 10 matrices (imitating the parameters of the McMaster assay proposed as a benchmark for the McMaster data mining and docking competition and examined in detail in Gagarin et al., 2006) (http://hts.mcmaster.ca/HTSDataMiningCompetition.htm).

We first examined the output of various hit identification methods on null simulated data [~N(0, 1)]. Using a 3{sigma} hit detection threshold, 119, 117 and 104 false positives were found in the uncorrected, and in the HTS-Corrector ‘background removed’ and ‘well correction’ data, respectively. Second, we randomly added 1% of hits to the null dataset. Hit values were randomly chosen from the range [µ – 3.5{sigma}; µ – 4.5{sigma}], where µ and {sigma} denote the within-plate mean and standard deviation, respectively. Systematic errors affected by random noise were added to this dataset according to a constant c, which ranged from 0{sigma} through 0.5{sigma} (in 0.1{sigma} increments), to produce six ‘noisy’datasets. The values 4c, 3c, 2c, 1c, 0, 0, –1c, –2c, –3c and –4c were added to the 1st, 2nd, ... , 10th columns, respectively. Finally, normally distributed random error, [~N(0, c/4)], was added independently to each well of each plate. For each value of c, hits were selected from the uncorrected, background-removed and well-corrected datasets using the 3{sigma} threshold. The true positive rate was noticeably higher for both corrected datasets, while the background subtraction procedure slightly outperformed the well correction method (Fig. 2a). The false positive rates of the corrected data were always lower than that of the uncorrected data, although the well correction method generally outperformed the background subtraction procedure (Fig. 2b).


Figure 2
View larger version (12K):
[in this window]
[in a new window]
 
Fig. 2 True (a) and false positive (b) hit detection rates for random data with added systematic error obtained from uncorrected (open square), background subtracted (open circle) and well corrected (open triangle) methods.

 


    Acknowledgments
 
The authors thank Genome Quebec for funding this project. The authors also thank two anonymous referees and Associate Editor Jonathan Wren for their helpful comments.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Jonathan Wren

Received on February 9, 2006; revised on March 20, 2006; accepted on March 28, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MAIN FEATURES
 3 BENEFITS OF THE...
 REFERENCES
 

    Brideau, C., et al. (2003) Improved statistical methods for hit selection in high-throughput screening. J. Biomol. Screen, . 8, 634–647[Abstract/Free Full Text].

    Gagarin, A., Kevorkov, D., Makarenkov, V., Zentilli, P. (2006) Comparison of two methods for detecting and correcting systematic error in HTS data. Proceedings of IFCS 2006Ljubljana, Springer–Verlag (in press).

    Gunter, B., et al. (2003) Statistical and graphical methods for quality control determination of HTS data. J. Biomol. Screen, . 8, 624–633[Abstract/Free Full Text].

    Hastie, T., Tibshirani, R., Friedman, J. The Elements of Statistical Learning, (2001) Springer-Verlag.

    Heuer, C., Haenel, T., Prause, B. (2002) A novel approach for quality control and correction of HTS data based on artificial intelligence. Pharmaceutical Discovery and Development Report, .

    Kevorkov, D. and Makarenkov, V. (2005) Statistical analysis of systematic errors in high-throughput screening. J. Biomol. Screen, . 10, 557–567[Abstract].

    MacQueen, J. (1967) Some methods for classification and analysis of multivariate observations. Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, University of California Press.

    Malo, N., et al. (2006) Statistical practice in high-throughput screening data analysis. Nat. Biotechnol, . 24, 167–175[CrossRef][Web of Science][Medline].

    Zhang, J.H., et al. (2000) Confirmation of primary active substances from HTS of chemical and biological populations: a statistical approach and practical considerations. J. Comb. Chem, . 2, 258–265[CrossRef][Web of Science][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
V. Makarenkov, P. Zentilli, D. Kevorkov, A. Gagarin, N. Malo, and R. Nadon
An efficient method for the detection and elimination of systematic error in high-throughput screening
Bioinformatics, July 1, 2007; 23(13): 1648 - 1657.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/11/1408    most recent
btl126v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Makarenkov, V.
Right arrow Articles by Nadon, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Makarenkov, V.
Right arrow Articles by Nadon, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?