Skip Navigation


Bioinformatics Advance Access originally published online on April 26, 2007
Bioinformatics 2007 23(13):1648-1657; doi:10.1093/bioinformatics/btm145
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary data
Right arrow All Versions of this Article:
23/13/1648    most recent
btm145v2
btm145v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (9)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Makarenkov, V.
Right arrow Articles by Nadon, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Makarenkov, V.
Right arrow Articles by Nadon, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

An efficient method for the detection and elimination of systematic error in high-throughput screening

Vladimir Makarenkov 1,*, Pablo Zentilli 1, Dmytro Kevorkov 1, Andrei Gagarin 1, Nathalie Malo 2,3 and Robert Nadon 2,4

1Department d’informatique, Université du Québec à Montreal, C.P.8888, s. Centre Ville, Montreal, QC, Canada, H3C 3P8, 2McGill University and Genome Quebec Innovation Centre, 740 Dr. Penfield Ave., Montreal, QC, Canada, H3A 1A4, 3Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, 1020 Pine Av. West, Montreal, QC, Canada, H3A 1A4 and 4Department of Human Genetics, McGill University, 1205 Dr. Penfield Ave., N5/13, Montreal, QC, Canada, H3A 1B1

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Motivation: High-throughput screening (HTS) is an early-stage process in drug discovery which allows thousands of chemical compounds to be tested in a single study. We report a method for correcting HTS data prior to the hit selection process (i.e. selection of active compounds). The proposed correction minimizes the impact of systematic errors which may affect the hit selection in HTS. The introduced method, called a well correction, proceeds by correcting the distribution of measurements within wells of a given HTS assay. We use simulated and experimental data to illustrate the advantages of the new method compared to other widely-used methods of data correction and hit selection in HTS.

Contact: makarenkov.vladimir{at}uqam.ca

Supplementary information: Supplementary data are available at Bioinformatics online.


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
High-throughput screening (HTS) is a modern technology used for the identification of pharmacologically active compounds (i.e. hits). In screening laboratories, testing more than 100 000 compounds a day has become routine. Automated mass screening for pharmacologically active compounds is now widely distributed. It serves for the identification of chemical compounds as starting points for optimization (primary screening), for the determination of activity, specificity, physiological and toxicological properties of large libraries (secondary screening) and for the verification of structure-activity hypotheses in focused libraries (tertiary screening) (Heyse, 2002). The lack of standardized data validation and quality assurance processes has been recognized as one of the major hurdles for successful implementing high-throughput experimental technologies (Kaul, 2005). Therefore, automated quality assessment and data correction systems need to be applied to biochemical data in order to recognize and eliminate experimental artefacts that might confound with important biological or chemical effects. The description of several methods for quality control and correction of HTS data can be found in Brideau et al. (2003), Gagarin et al. (2006a), Gunter et al. (2003), Heuer et al. (2003), Heyse (2002), Kevorkov and Makarenkov (2005), Makarenkov et al. (2006), Malo et al. (2006) and Zhang et al. (1999, 2000).

Various sources of systematic errors can affect experimental HTS data, and thus introduce a bias into the hit selection process (Heuer et al., 2003), including:

  • Systematic errors caused by aging, reagent evaporation or cell decay which can be recognized as smooth trends in the plate means/medians.
  • Errors in liquid handling and malfunction of pipettes which can generate localized deviations from expected values.
  • Variation in incubation time, time drift in measuring different wells or different plates and reader effects which can be recognized as smooth attenuations of measurements over an assay.

Random errors produce noise that cause minor variation of the hit distribution surface. Systematic errors generate repeatable local artifacts and smooth global drifts, which become more noticeable when computing a hit distribution surface (Kevorkov and Makarenkov, 2005). Often systematic errors create border, row or columns effects, resulting in the measurements in certain rows or columns that are systematically over or underestimated (Brideau et al., 2003). This article introduces a new method allowing one to minimize the impact of systematic error on the hit selection process. We propose to examine the hit distribution of raw data and fit the data variation within each well to correct the data at hand. The comparison of the new method to other data correction techniques used in HTS is described in the Simulations section. The latter section is followed by an application example, where we carried out the identification of active compounds in the raw and well-corrected HTS assay generated at McMaster University.


    2 MATERIALS AND METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
2.1 Experimental data
In this article, we examine an experimental data set generated at the HTS Laboratory of McMaster University. This test assay was proposed as a benchmark for the McMaster Data mining and docking competition (http://hts.mcmaster.ca/Downloads/82BFBEB4-F2A4-4934-B6A8-804CAD8E25A0.html; Elowe et al., 2005). It consists of a screen of compounds that inhibits the Escherichia coli dihydrofolate reductase. Each compound was screened twice: two copies of 625 plates were run through the screening machines. This gives 1250 plates in total, each having wells arranged in eight rows and 12 columns (the columns 1 and 12 containing controls were not considered in this study). The assay conditions reported in Elowe et al. (2005) were the following: assays were carried out at 25°C and performed in duplicate. Each 200 µl reaction mixture contained 40 µM NADPH, 30 µM DHF, 5 nM DHFR, 50 mM Tris (pH 7.5), 0.01% (w/v) Triton and 10 mM ß-mercaptoethanol. Test compounds from the screening library were added to the reaction before initiation by enzyme and at a final concentration of 10 µM. The Supplementary Materials section contains more detail on the screening method and plate layout (Fig. 1S) for this assay.

2.2 Data preprocessing and correction in HTS
The analysis of experimental HTS data requires preprocessing to ensure the statistical meaningfulness and accuracy of the data analysis and the hit selection. Ideally, inactive samples should have similar mean values and generate a constant surface. In a real case, however, random errors produce random noise. For a large number of plates considered, the noise residuals should compensate each other in the computation of mean values. Systematic repeatable artifacts become more visible as the number of plates increases (Kevorkov and Makarenkov, 2005). The following steps can be carried out to pre-process experimental HTS data:

  1. Hit and outlier elimination (optional). This elimination can be carried out in order to reduce the influence of hits and outliers on the plates’ means and SDs (standard deviations). It can be particularly important when analyzing screens with few (<100) plates.
  2. Within-plate normalization of all samples, which can be done including or excluding hits and outliers, using the Z-score transformation (i.e. zero mean and unit variance standardization, Equation 1) or the Control normalization (Equation 2) can be carried out. Such transformations should be applied to analyze together experimental HTS data generated under different testing conditions. In case of Z-score, the following formula is used:


Formula 1

(1)
where xi—measured value at well i, Formula —normalized output value at well i, Formula —mean value.

The control normalization (i.e. normalized percent inhibition) is based on the following formula:


Formula 2

(2)
where xi—measured value at well i, H—mean of high controls, L—mean of low controls and Formula —evaluated percentage at well i.

The additivity of experimental data is a necessary property that should hold prior to the application of some statistical procedures. Plate means and SDs vary substantially from plate to plate. In order to compare and analyze together experimental data from various plates and data tested under different conditions, all measurements should be normalized.

(3) Data correction of all samples. This step can be conducted using the median polish procedure (Tukey, 1977), the background correction procedure (Kevorkov and Makarenkov, 2005), or the well correction method discussed in this article. The B-score transformation procedure (Brideau et al., 2003; Malo et al., 2006), additionally correcting for row and column biases, can also be carried out. The comparison of the data correction techniques is presented in the Simulation study section.

Also, background plates (i.e. control plates) can be inserted throughout a screen. Background plates are separate plates containing only control wells and no screening compounds. They are particularly useful for calculating the background levels of an assay and help determine whether an assay has sufficient signal which can be reliably detected (Fassina, 2006). Such additional plates enables one to create background signatures that lead to plate-based correction factors on, per well, per row or per column, basis for all other assay plates. The use of background plates gets around the main assumption made for the well correction procedure: when examined across plates, wells should not systematically contain compounds from the same family (see the description of the well correction method subsequently). The main inconvenience of this method is that it does not take into account possible errors that might occur in the plates processed between two background plates.

Note that for certain methods, the corrected data can be easily denormalized in order to obtain a data set scaled as the original one. Analysis of the hit distribution surface of the corrected data can then be carried out using the {chi}2-contingency test (see the results in the section 3.2).

2.3 Hit selection process
In the HTS workflow, the bias correction process is followed by hit selection (i.e. inference of active compounds). The selection of hits in the corrected data is often done using a preselected threshold (e.g. Formula , in case of an inhibition assay).

Hit selection is a process that should not only consider statistical treatment of HTS data. It should also be used in conjunction with the structure-activity-relationships (SAR) observed using the corrected HTS data (Gedeck and Willett, 2001). In SAR, the basic assumption for all molecule based hypotheses is that similar molecules have similar activities. The quality of hits is improved if SAR is taken into consideration. For instance, the likelihood that an identified hit is an artifact grows if a large number of highly related compounds was confirmed as inactive in the corrected data.

2.4 Analysis of hit distribution
The presence of systematic errors in an assay can be detected through the analysis of its hit distribution surface (Kevorkov and Makarenkov, 2005). This surface can be computed by estimating the number of selected hits within each well location. In the case of randomly distributed compounds, hits should be distributed evenly over the well locations. In the example presented in Figure 1, we considered the normalized McMaster assay (Elowe et al., 2005; Zolli-Juran et al., 2003) comprising 1250 plates arranged in eight rows and 10 columns (the control columns were not considered). For each well, we estimated the number of experimental values that deviated from the plate means by more than Formula (i.e. number of values that are lower than Formula at each well across plates).


Figure 1
View larger version (34K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Hit distribution surface for the McMaster data (1250 plates). Values deviating from the plate means for more than one SD were taken into account during the computation. (a) Well positional effects in 3D are shown; (b) Well, row and column positional effects are shown.

 
As the common strategy utilizes the Formula threshold for the hit selection, the considered data comprise all hits as well as all values close to them. The substantial variation of the measurements shown in Figure 1 illustrates the presence of systematic errors in this assay (see the first two columns in Table 5S for the results of the {chi}2-contingency test conducted on this surface). The detailed analysis of the McMaster hit distributions obtained for different thresholds is presented in the section 3.2.

2.5 Compared methods
The five following methods were compared in this study. Methods 1 and 2 do not involve any data correction, whereas Methods 3–5 proceed by the correction of systematic error before hit selection.

2.5.1 Method 1
Classical hit selection using the value Formula as a hit selection threshold, where the mean value Formula and SD are computed separately for each plate and c is a preliminary chosen constant. All values lower than or equal to the threshold value are considered as hits.

This method should be applied cautiously when samples are not randomly assigned to plates. Specifically, Method 1 can lead to increased rates of false positives and false negatives when analyzing plates with compounds belonging to the same family (e.g. in case of an inhibition assay, a high concentration of small hit values in a plate can lead to their transformation into false negative hits, whereas a high concentration of large values can transform some of the lowest non hit measurements into false-positive hits).

2.5.2 Method 2
Classical hit selection using the value Formula as a hit selection threshold, where the mean value Formula and SD are computed over all assay values, and c is a preliminary chosen constant. This method can be chosen when we are certain that all plates of the given assay were processed under the ‘same testing conditions’.

Method 2 should be applied cautiously when the plates have been tested either over numerous days and/or by different machines (robots, readers, etc). Experience suggests that the ‘same conditions’ requirement may be violated under these circumstances. Moreover, in at least some circumstances, the variability of the various compounds does not appear to be constant but rather follows an inverse gamma distribution (Malo et al., 2006). In the latter case, the pooled variance of Method 2 provides only one component of an individual compound's variance—the other component would be provided by the compound specific variance as estimated by replicates. However, if the differences among the compound variances are very small, then Method 2 is expected to do well.

2.5.3 Method 3
Median polish procedure (Tukey, 1977) can be used to remove the impact of systematic error. Median polish (Equation 3) works by alternately removing the row and column medians, and continues until the proportional reduction in the sum of absolute residuals is less than a fixed value {varepsilon} or until a fixed number of iterations has been carried out. The residual (rijp) of the measurement for row i and column j on the p-th plate is obtained by fitting a two-way median polish, and is defined as follows:


Formula 3

(3)

The residual is defined as the difference between the observed result (xijp) and the fitted value Formula , which is defined as the estimated average of the plate Formula + estimated systematic measurement offset for row i on plate Formula + estimated systematic measurement column offset for column j on plate Formula . Thus, the matrix of residuals R replaces the original matrix in the further computations. In our simulations, Method 2 was applied to the values of the matrix R in order to select hits.

2.5.4 Method 4
B-score (Equation 4) normalization procedure (Brideau et al., 2003) is designed to remove plate row/column biases in HTS (Malo et al., 2006). The residual (rijp) of the measurement for row i and column j on the p-th plate is obtained by fitting a two-way median polish. In addition, for each plate p, the adjusted median absolute deviation (MADp) is obtained from the rijp's. The B score is calculated as follows:


Formula 4

(4)
where MADp = median{|rijp – median(rijp)|}. The raw MAD used in the B-score calculation is rescaled by the multiplicative constant of 1.4826. To select hits, this computation was followed by Method 2, applied to the B-score matrix. Here we considered the version of B-score presented in Malo et al. (2006); the latter version of the method does not include the smoothness parameter used by Brideau et al. (2003). The main assumption that must be met to apply Methods 3 and 4 is that the compounds should be randomly distributed within each plate. Any systematic row or column placement of compounds within a plate will bias the results given by these two methods.

2.5.5 Method 5
Well correction procedure described below followed by Method 2.

The first two methods are the classical hit selection strategies not involving any correction of systematic bias, whereas the last three methods combine a data preprocessing procedure with the hit selection by Method 2. The results of the median polish and B-score methods were generated using the S-PLUS package (S-PLUS manual, 2006).

2.6 Well correction procedure
To be able to apply the new correction procedure to experimental data sets, the following assumptions about HTS data should be made: screened samples can be divided into active and inactive; the majority of the screened samples are inactive; values of the active samples differ substantially from the inactive ones; and systematic error causes a repeatable influence on the measurements within wells across plates. Also, wells, across plates, should not systematically contain compound samples belonging to the same family. However, it does not require the randomization of samples within plates, which seems to be a much more frequent situation in the real HTS campaigns. We studied the chemical structure of compounds within each well of the McMaster data set and have not found any systematic pattern in the compound distribution. Usually, each well location contains a large number of samples across plates (e.g. 1250 samples for the McMaster assay), and small systematic compound placements are very unlikely to bias the data.

The Z-score normalization produces a modified data set in which the values within each plate are zero-mean centered, whereas SD and variance are equal to unity. Once the data are plate-normalized, we propose to analyze the values within each well measured across all assay plates. If no systematic error is present in the data set, the distribution of measurements within wells should be also close to a zero-mean centered one with a SD close to unity. The well correction method consists of two main steps:

  1. Least-squares approximation of the data carried out separately for each well of the assay.
  2. Z-score normalization of the data within each well location of the assay.

The real distribution of values can differ substantially from the ideal one. The example presented in Figure 2S features the measurements obtained for the well located in column 1 and row 8 of the McMaster data (Elowe et al., 2005). The mean of the observed values is –0.37. Such a deviation suggests the presence of systematic error in this well location. Experimental values for a specific well location can also have ascending or descending trends. The well correction procedure first discovers these trends using the linear least-squares approximation; note that the fitting by a polynomial of a higher degree can be also carried out instead of the linear approximation. Thus, the obtained trend (e.g. a straight line y = ax + b in case of the linear fitting, where x denotes the plate number and y denotes the plate-normalized measurement) is subtracted from or added to the original measurements bringing the mean value of this well to zero. Because the optimal parameters are sought for each well location of the assay independently, well correction has more fitting parameters than B-score and median polish. For the analysis of large industrial assays, more sophisticated functions (e.g. higher degree polynomials or spline functions) can be also used. Alternatively, an assay can be divided into intervals and a particular trend function characterizing each interval can be determined through approximation.

Second, the well normalization using the Z-score normalization (Equation 1) of the well measurements is carried out independently for each well location. Then, we can select hits in the corrected data and reexamine the hit distribution surface.


    3 RESULTS AND DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
3.1 Simulation study
To demonstrate the effectiveness of the well correction procedure, we first carried out simulations with random data. Specifically, we considered three types of random symmetrically distributed data: standard normal, heavy tailed (positive kurtosis) and light tailed (negative kurtosis) distributions. As with the McMaster data, our data sets consisted of 1250-plate assays, each plate comprising wells arranged in eight rows and 10 columns. First, three random null data sets (i.e. without hits) were generated. The hit selection procedure was carried out on these data and the false positive hit rates reported in Table 1 were found for the five different hit selection methods presented earlier.


View this table:
[in this window]
[in a new window]

 
Table 1. False positive hit rate for the five preprocessing methods. Random data without hits having standard normal, heavy and light tailed distributions were considered

 
Hit selection thresholds equal to Formula for the standard normal, to Formula for the light tailed, and to Formula for the heavy tailed data, were considered. The hit selection thresholds for the heavy and light tailed data were chosen to have approximately the same hit percentage found by the hit selection method based on the assay parameters (Method 2) for the three raw data sets (~ 0.14% of hits; i.e. 140 hits). Since the simulated data did not contain any hits, the hits identified by the methods were false-positives by definition. Note that Method 1 was very sensitive to the data distribution; it found the lowest number of false-positives in the case of the heavy tailed (83 hits) and standard normal distributions (104 hits), but 642 false-positive hits in case of the light tailed data. The median polish and B-score methods were unstable, yielding the most false positives (2685 and 2676 for the light tailed data, and 288 and 361 for the heavy tailed data, respectively). The most stable results were obtained by Method 2 and the well correction procedure. As expected, the largest percentage of the false-positive hits was found in the light tailed data. For each type of random data we then generated and added to plates k percent of hits, where k was consequently taking values 0.5, 1, 1.5, 2, 2.5 and 3%, whose locations and values were chosen arbitrarily; the probability of each well in each plate to contain a hit was k percent. One thousand replicates of data of each distribution and for each hit percentage were generated. The values of hits were assumed to have a standard normal distribution with the parameters N(Formula , SD) for the standard normal, N(Formula , SD) for the light tailed and Formula for the heavy tailed distributions, respectively, where Formula is the mean value and SD is the standard deviation of the observed plate.

Second, row and column biases were generated as follows. For each row and each column of a given random assay we generated a systematic error that was identical for all assay plates. We also added a small random error to all assay measurements. Therefore, the error-perturbed value, Formula , of the measurement in row i and column j on the p-th plate was obtained as follows:


Formula 5

(5)
where xijp is the observed result in well ij of plate p, si is the systematic error present in row i, sj is the systematic error present in column j and randijp is the random error in well ij of plate p (Table 1S, case a). The variables si and sj in Equation (5) had a standard normal distribution with parameters N(0, c), where the variable c was consequently taking the values 0, 0.6SD, 1.2SD, 1.8SD, 2.4SD and 3SD. For each value of the variable c, a different set of assays was generated and tested. For all values of the parameter c, the random error rand was always distributed according to a standard normal low with parameters N(0, 0.6SD).

We carried out the five preprocessing methods described earlier, choosing as hits the measurements with the values lower than Formula , Formula and Formula , for the standard normal, light tailed, and heavy tailed data, respectively. Statistics describing the impact of systematic error on the hit selection process are reported in Tables 2S–4S. Specifically, the hit detection rate as well as the false positive and the false negative rates were computed during the simulations. The results in Tables 2 and 3S,4S are indicated for the data with 1% of added hits whereas the systematic error varied from 0 to 3.0SD. The hit detection rates for the five competing methods corresponding to the systematic error of 1.2SD are depicted in Figures 2 and 3S, 4S. As the tables and graphics suggest, the well correction procedure showed the most stable behavior regardless the data distribution, level of systematic bias, and hit percentage. In all situations, well correction, median polish and B-score methods removed systematic error regardless of amount of bias (Figures 2, 3S, 4S, a and c). However, the well correction procedure generally outperformed the median polish and B-score methods. The two latter methods were able to remove systematic trend and return the correct residuals in the easiest cases, but they often converged to a local instead of a global minimum when the data structure was rather fuzzy. We can also observe that Method 2 based on the assay parameters was more precise than Method 1 using the plates’ parameters. Consequently, when the testing conditions are similar for all plates of the given assay, treating all assay measurements as a whole batch should be preferred to the plate-by-plate analysis. On the other hand, the usage of the well correction procedure can be advocated for any type of data regardless of hit percentage. Compared to the four other methods, the well correction procedure was particularly accurate as to the false-negative rate (Tables 2S–4S). At the same time, when the well correction was applied to the data free of noise, it did not have any negative influence to the false-positive rate (Table 1).

The B score method with this type of normally distributed constant variance data did not perform well, although the median for the hits was separated from the median for the non-hits to a greater extent than in the well correction method which, in turn, was less separated from the non-hits than raw data (Fig. 8S). This effect was offset, however, by the increased variance for both the non-hits and the hits. The B-score method improved accuracy somewhat but at the high cost of a large decrease in precision. Accordingly, B-score method should not be used unless there is evidence of row or column effects.

We also constructed the Receiver Operating Characteristic (ROC) curves for the five methods under study. ROC curves provide a graphical representation of the relationship between the true-positive and false-positive prediction rate of a model. There are many advantages to this approach, including that thresholds do not need to be determined in advance.

The y-axis corresponds to the sensitivity of the model, i.e. how well the model is able to predict true positives (real cleavages); the y-coordinates are calculated as follows:


Formula 6

(6)
where TP is the number of true positives and FN is the number of false negatives. The x-axis corresponds to 1-specificity, i.e. the ability of the model to identify true negatives. An increase in specificity (i.e. a decrease along the x-axis) results in an increase in sensitivity. The x-coordinates are calculated as follows:


Formula 7

(7)
where TN is the number of true negatives and FP is the number of false positives. The greater the sensitivity at high specificity values (i.e. high y-axis values at low x-axis values) the better the model. A numerical measure of the accuracy of the model can be obtained from the area under the curve, where an area of 1.0 signifies near perfect accuracy, while an area of less than 0.5 indicates that the model is worse than just random. Figure 3 illustrates the ROC curves associated with the five methods compared in this article. The curves were obtained for the standard normal data with 1% of added hits without (a) and with (b) systematic error. The ROC curves confirm the conclusions that can be made while observing the methods’ performances reported in Table 2S and depicted in Figure 2. The well correction procedure and Method 2 provide the best results for data free of systematic bias (Fig. 3a), whereas median polish and B-score methods fail to recover correct hits in this situation. After the addition of systematic noise (Fig. 3b) well correction procedure outperformed the four other methods, whereas the performances of Methods 1 and 2, not assuming any correction of systematic bias, decreased compared to the case of the error free data.


Figure 2
View larger version (21K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. (see also Table 2S). True hit rate and total of the number of false-positive and false-negative hits for the noisy standard normal data with systematic error stemming from row x column interactions which are constant across plates. The results were obtained with the methods using plates’ parameters (i.e. Method 1, open diamond), assay parameters (i.e. Method 2, open square), median polish (cross), B-score (open circle) and well correction (open triangle). The abscissa axis indicates the noise factor (a and c—with fixed hit percentage of 1%) and the percentage of added hits (b and d—with fixed error rate of 1.2SD).

 

Figure 3
View larger version (10K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. ROC curves for the noisy standard normal data with systematic error stemming from row x column interactions which are constant across plates. The results were obtained using: Z-score (i.e. Method 1, open diamond), raw data (i.e. Method 2, open square), median polish (cross), B-score method (open circle), and the well correction procedure (open triangle). The graph (a) corresponds to the case: 1% of added hits and no systematic error; the graph (b) corresponds to the case: 1% of added hits and systematic error of 1.2SD.

 
Finally, for the noisy standard normal data only, we carried out simulations with four additional types of error. The data generation diagram for these simulations is presented in Table 1S (see cases b–e). The following additional error conditions were considered:
(b) Systematic error with column effects across plates (Fig. 5S).
(c) Systematic error varying from well to well (no row x column interactions involved) across plates (Fig. 6S).
(d) Systematic error stemming from row x column interactions and changing from plate to plate (Fig. 4).
(e) Random error only varying from well to well and from plate to plate (Fig. 7S).


Figure 4
View larger version (19K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4. True hit rate and total of the number of false positive and false negative hits for the noisy standard normal data with systematic error stemming from row x column interactions which are varying across plates. The same five methods as in Figure 2 are presented above.

 
These situations account for the most realistic scenarios, although it is of course not possible to represent all contingencies. For these additional data sets, the well correction procedure was generally more accurate than the four other methods. The only case when the B-score method outperformed the well correction procedure was the case where systematic error stemmed from row x column interactions, which were changing from plate to plate (i.e. systematic error was not constant across plates, Fig. 4) and this error was sufficiently large (1.2SD and more for the true hit rate, and 2.3SD and more for the sum of false positives and false negatives).

3.2 Well correction of the McMaster data
We fitted the McMaster assay data to a Gaussian distribution (Fig. 9S). The raw values were first plate normalized using Z-scores. The experimental distribution was evaluated by counting the number of measurements in 0.1SD intervals in the range Formula The Gaussian distribution was modeled using the parameters of the experimental one. The hit selection area (Formula is shown in the upper right corner of Fig. 9S).

One popular hit selection method in high-throughput screening proceeds by fixing a constant threshold (usually, Formula ) for all considered wells. For an inhibition assay, all measurements that are lower than this threshold are identified as hits. The procedure assumes that the measurement distributions in each well have the same shapes and properties. We verified this assumption while examining the cumulative distribution functions at wells of the McMaster assay (80 functions for 8 x 10 plates). These functions have a broad band of shapes. These differences can be due to systematic biases and can have a substantial impact on the hit selection procedure.

After the plate normalization by Z-scores, the values in each plate are zero-mean centered, and their SD and variance are equal to unity. However, the values in wells, measured across all plates, can have different SDs and variances.

The example in Figure 2S shows that the mean value of the normalized measurements from the well located in column 1 and row 8 (McMaster assay) is –0.37. This deviation is likely to be caused by systematic error. Furthermore, Figure 5 shows that the variation of values within a well can have descending (Fig. 5a) and ascending (Fig. 5b) trends. Thus, to identify hits in the McMaster assay we applied the classical hit selection algorithm based on the assay mean and SD (Method 2) and the well correction procedure (Method 5). The well correction algorithm first evaluates trends using a least-squares approximation. These trends are removed from the experimental data. Then, the algorithm normalizes the modified assay values within each well separately using Z-scores. We examined and compared the hit distribution surfaces obtained using Methods 2 and 5 for different hit selection thresholds. The hit distribution surfaces for the McMaster raw and well-corrected data sets are shown in Figures 6 and 10S. Figure 6 presents the hit distributions for the following hit selection thresholds Formula and Figure 10S presents the hit distribution surfaces for the thresholds (Formula , Formula and Formula ). Each value on the graphic depicts the number of hits found at the associated well. Figures 6 and 10S suggest that the well correction procedure allows one to attenuate the impact of systematic bias. The improvements in the hit distribution surfaces are more evident for the bigger hit selection thresholds (Fig. 6). For all obtained hit distributions, we also carried out a {chi}2-contingency test (with the parameter {alpha} of 0.01). In our case, the null hypothesis (H0) assumes that the observed hit distribution is a constant surface. The results of this test are reported in Tables 5S and 6S.


Figure 5
View larger version (75K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 5. Variation of the plate-normalized measurements in two different wells along 1250 plates of the McMaster assay; descending (a) and ascending (b) trends are highlighted.

 

Figure 6
View larger version (47K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 6. Hit distributions for the raw (a, c and e) and well-corrected (b, d and f) McMaster data sets obtained for the thresholds Figure 6 (a and b), Figure 6 (c and d) and Figure 6 (e and f).

 
Figure 6 a (raw data) and b (well-corrected data) depicts the hit distributions for the Formula threshold. The null hypothesis was rejected in both cases (Table 5S). However, the well-corrected data set demonstrated a substantial improvement of the {chi}2-statistic compared to the raw data. The {chi}2-value decreased from 2377.4 to 204.6 (with critical value equal to 111.1), i.e. this value for the corrected data is about 11.6 times lower compared to the raw ones. Figure 6 c (raw data) and d (well-corrected data) depict the hit distributions for the Formula threshold. After the well correction, the {chi}2-coefficient decreased from 1258.4 to 173.6; i.e. it is about seven times lower for the well-corrected data compared to the raw ones, but it was still larger than the {chi}2-critical value (111.1). Figure 6 e (raw data) and f (well-corrected data) shows the hit distribution surfaces for the Formula threshold. The {chi}2-contingency test failed to reject the null hypothesis (H0) for the corrected data ({chi}2-value of 74.8) and rejected it for the raw data ({chi}2-value of 438.6). Figure 10S illustrates the hit distribution surfaces obtained for the raw and well-corrected data for commonly used hit selection thresholds. The thresholds (Formula , Formula and Formula ) were employed to identify hits in the raw and corrected McMaster data. The null hypothesis, postulating that the hit distribution corresponds to a constant surface, was not rejected for the well-corrected data in case of all three considered hit selection thresholds (Table 6S). For the raw data, the null hypothesis was rejected for all considered thresholds, except Formula , for which the values of the {chi}2-coefficient for the raw and corrected data were close (106.2 and 105.3, respectively). This is certainly due to the decrease in the number of hits when lowering the hit selection threshold. The mean numbers of hits per well for the well-corrected data set were usually slightly lower than for the raw data (Tables 5S and 6S). Tables 7S and 8S report the hit numbers per well in the raw and well-corrected McMaster data computed for the Formula threshold. Even though for the corrected data set the null hypothesis was rejected for the thresholds Formula and Formula , the well correction procedure led to an important reduction of the {chi}2-statistic.

We also analyzed the list of active compounds from the Test Library of the McMaster data set. The samples in the original data set were identified as Consensus hits by the organizers of the McMaster HTS competition if both of their replicate measurements were lower or equal to 75% with respect to the reference controls. Only 42 of 50 000 different tested compounds indicated by their MAC-IDs in Table 9S satisfied this property. Our experiments showed that the selection of these 42 replicate compounds can be reached by carrying out Method 1 on the non-normalized data (hit selection by plates, where the values lower than Formula are identified as hits) with the SD coefficient c = 2.29. Among the 42 consensus hits identified in such a manner, the competition organizers also identified 14 compounds having well behaved dose–response curves (they are highlighted in Table 9S). We also performed the analysis of the original data set using the well correction procedure which was carried out prior to the selection of hits (Method 5 with hit selection carried out by plate). All other parameters were identical to those of Method 1. With these parameters the application of Method 5 led to the identification of 40 hits. Among these hits, 31 were those found by Method 1 (Table 9S), and nine hits were new. Note that the proportion of compounds with well behaved dose-response curves was better for the well-corrected data (~42%; this percentage does not include the nine new compounds for which the follow up tests were not conducted) than for the raw data (~33%). However, a more detailed study comparing the dose-response behavior of the hits identified by all the five competing methods was not possible because the dose-response follow up information is not available for the nine hits found by Method 5. To conduct a comprehensive study of the five methods compared in this manuscript an experimental dose-response follow-up of the hits obtained using each of these methods is certainly necessary.

It would be quite practical to have the benchmark data sets for which all the results, including the confirmed hits, and testing conditions are known. We think that the scientists from the HTS Laboratory of McMaster University who organized the HTS Data Mining and Docking competition are on the way of doing it: after the announcement of the competition results a special issue of Journal of Bimolecular Screening was dedicated to the analysis of the test data set (see Elowe et al., 2005 and the articles in the same issue).


    4 CONCLUSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
We described a method that can be used to refine the analysis of experimental HTS data by eliminating systematic biases from them prior to the hit selection procedure. The proposed method, called a well correction, rectifies the distribution of assay measurements by normalizing data within each considered well across all assay plates. Simulations were carried out with standard normal, heavy and light tailed random data sets. They suggest that the well correction procedure is a robust method that should be used prior to the hit selection process. Well correction generally outperformed the median polish and B-score methods as well as the classical hit selection procedure. In the situations when neither hits nor systematic errors were present in the data, the well correction method showed the performance similar to the traditional method of hit selection. The well correction method also compares advantageously (Gagarin et al., 2006b) to the background correction procedure (Kevokov and Makarenkov, 2005). In the future, it would be interesting to examine how robust the methods are to violations of normality of HTS data.

We also examined an experimental assay generated at the HTS Laboratory of McMaster University. The analysis of the hit distribution of the raw McMaster data set showed the presence of systematic errors. The McMaster data were processed using different hit selection thresholds varying from Formula to Formula . Note that for all considered thresholds, except Formula , for the raw McMaster data, the {chi}2-contingency test rejected the null hypothesis, postulating that the hit distribution surface is a constant. The analysis of the well-corrected data sets showed that the new method considerably smoothes the hit distribution surfaces for the Formula and Formula thresholds. When applied to the well-corrected data set, the {chi}2-contingency test failed to reject the null hypothesis for the thresholds Formula to Formula . Furthermore, the simulation study also confirmed that Method 2 based on the assay parameters was more accurate than Method 1 based on the plates’ parameters. Therefore, in case of identical testing conditions for all plates of the given assay, all assay measurements should be treated as a single batch.

The HTS Corrector software (Makarenkov et al., 2006, http://www.labunix.uqam.ca/~makarenv/hts.html), including the methods for data preprocessing and correction of systematic error, was developed. HTS Corrector includes all data correction methods discussed is this article (well correction, B-score, and median polish) as well as different methods of hit selection (e.g. Methods 1 and 2 compared in this study). Note that for large industrial assays, a procedure allowing one to divide assays into homogeneous sub-assays has been included in the program. HTS Corrector first establishes a user-defined distance measure between plates and carries out a k-means partitioning algorithm (Legendre and Legendre, 1998; MacQueen, 1967) to form k homogeneous subassays.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
We thank Genome Quebec for funding this project. We also thank two anonymous referees for their helpful comments.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Jonathan Wren

Received on December 7, 2006; revised on February 22, 2007; accepted on April 10, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Brideau C, et al. Improved statistical methods for hit selection in HTS. J. Biomol. Screen (2003) 8:634–647.[Abstract/Free Full Text]

    Elowe NH, et al. Experimental screening of dihydrofolate reductase yields a "Test Set" of 50 000 small molecules for a computational data-mining and docking competition. J. Biomol. Screen (2005) 10:653–657.[Abstract]

    Fassina G. HTS of combinatorial libraries. Survey of applications of combinatorial technologies. In: Training Course Presentation (2006) http://www.ics.trieste.it/Documents/Downloads/df3983.pdf.

    Gagarin A, et al. Clustering techniques to improve the hit selection in HTS. J. Biomol. Screen (2006a) 11:903–914.[Abstract/Free Full Text]

    Gagarin A, et al. Comparison of two methods for detecting and correcting systematic error in HTS data. In: Data Science and Classification—In Batagelj V, Bock HH, Ferligoj A, Ziberna A, eds. (2006b) IFCS 2006, Studies in Classification, Data Analysis, and Knowledge Organization, Springer Verlag. 241–249.

    Gedeck P, Willett P. Visual and computational analysis of structure-activity relationships in high-throughput screening data. Curr. Opin. Chem. Biol (2001) 5:389–395.[CrossRef][Web of Science][Medline]

    Gunter B, et al. Statistical and graphical methods for quality control determination of HTS data. J. Biomol. Screen (2003) 8:624–633.[Abstract/Free Full Text]

    Heuer C, et al. A novel approach for quality control and correction of HTS data based on artificial intelligence. In: Pharmaceutical Discovery & Development Report (2003) PharmaVentures.

    Heyse S. Comprehensive analysis of high-throughput screening data. (2002) 4626. Proceedings of SPIE, 2002: Bellingham, WA. 535–547.

    Kaul A. The impact of sophisticated data analysis on the drug discovery process. Business Briefing: Future Drug Discovery 2005 (2005).

    Legendre P, Legendre L. Numerical Ecology (1998) 2nd English. Amsterdam: Elsevier Science BV. 739–746.

    Kevorkov D, Makarenkov V. Statistical analysis of systematic errors in HTS. J. Biomol. Screen (2005) 10:557–567.[Abstract]

    MacQueen J. Some methods for classification and analysis of multivariate observations. In Le Cam LM, Neyman J, eds. (1967) Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. 1, Statistics. Berkeley: University of California Press.

    Makarenkov V, et al. HTS-Corrector: new application for statistical analysis and correction of experimental data. Bioinformatics (2006) 22:1408–1409.[Abstract/Free Full Text]

    Malo N, et al. Statistical practice in high-throughput screening data analysis. Nat. Biotechnol (2006) 24:167–175.[CrossRef][Web of Science][Medline]

    S-PLUS 6. S-plus programmer's guide, Insightful, URL: http://www.insightful.com/support/doc_splus_win.asp. (2006).

    Tukey JW. Exploratory Data Analysis (1977) MA, Addison-Wesley: Cambridge.

    Zhang JH, et al. A simple statistic parameter for use in evaluation and validation of HTS assays. J. Biomol. Screen (1999) 4:67–73.[Abstract/Free Full Text]

    Zhang JH, et al. Confirmation of primary active substances from HTS of chemical and biological populations: a statistical approach and practical considerations. J. Comb. Chem (2000) 2:258–265.[CrossRef][Web of Science][Medline]

    Zolli-Juran M, et al. HTS identifies novel inhibitors of Escherichia coli dihydrofolate reductase that are competitive with dihydrofolate. Bioorg. Med. Chem. Lett (2003) 13:2493–2496.[CrossRef][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J Biomol ScreenHome page
A. B. Shapiro, G. K. Walkup, and T. A. Keating
Correction for Interference by Test Samples in High-Throughput Assays
J Biomol Screen, September 1, 2009; 14(8): 1008 - 1016.
[Abstract] [PDF]


Home page
J Biomol ScreenHome page
W. P. Janzen and I. G. Popa-Burke
Review: Advances in Improving the Quality and Flexibility of Compound Management
J Biomol Screen, June 1, 2009; 14(5): 444 - 451.
[Abstract] [PDF]


Home page
BioinformaticsHome page
X. D. Zhang and J. F. Heyse
Determination of sample size in genome-scale RNAi screens
Bioinformatics, April 1, 2009; 25(7): 841 - 844.
[Abstract] [Full Text] [PDF]


Home page
J Biomol ScreenHome page
I. Coma, L. Clark, E. Diez, G. Harper, J. Herranz, G. Hofmann, M. Lennon, N. Richmond, M. Valmaseda, and R. Macarron
Process Validation and Screen Reproducibility in High-Throughput Screening
J Biomol Screen, January 1, 2009; 14(1): 66 - 76.
[Abstract] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary data
Right arrow All Versions of this Article:
23/13/1648    most recent
btm145v2
btm145v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (9)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Makarenkov, V.
Right arrow Articles by Nadon, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Makarenkov, V.
Right arrow Articles by Nadon, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?