Skip Navigation


Bioinformatics Advance Access originally published online on July 28, 2006
Bioinformatics 2006 22(19):2381-2387; doi:10.1093/bioinformatics/btl399
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/19/2381    most recent
btl399v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Oberg, A. L.
Right arrow Articles by Therneau, T. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Oberg, A. L.
Right arrow Articles by Therneau, T. M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Joint estimation of calibration and expression for high-density oligonucleotide arrays

Ann L. Oberg *, Douglas W. Mahoney , Karla V. Ballman and Terry M. Therneau

Division of Biostatistics, Department of Health Sciences Research, Mayo Clinic College of Medicine 200 First Street SW, Rochester, MN 55905, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DATA
 3 CALIBRATION
 4 IMPLEMENTATION
 5 RESULTS
 6 DISCUSSION
 REFERENCES
 

Motivation: The need for normalization in microarray experiments has been well documented in the literature. Currently, many analysis methods treat normalization and analysis as a series of steps, with summarized data carried forward to the next step.

Results: We present a unified algorithm which incorporates normalization and class comparison in one analysis using probe level perfect match and mismatch data. The algorithm is based on calibration models common to most biological assays, and the resulting chip-specific parameters have a natural interpretation. We show that the algorithm fits into the statistical generalized linear models framework, describe a practical fitting strategy and present results of the algorithm applied to an example dataset as well as based on metrics used in affycomp. The algorithm ranks amongst the top third of the affycomp competitors, performing best in measures of bias.

Availability: R functions are available on request from the authors.

Contact: oberg.ann{at}mayo.edu


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DATA
 3 CALIBRATION
 4 IMPLEMENTATION
 5 RESULTS
 6 DISCUSSION
 REFERENCES
 
Microarray technology provides researchers with a powerful tool to measure expression levels of thousands of genes in a specimen sample simultaneously. Owing to the costly nature of these studies and sometimes sample scarcity, an experiment typically yields results for only a few samples. In addition, it is usually feasible to pursue further research regarding only a small subset of the genes on the array. Hence, it is important to make efficient use of the data in order to distinguish biological variation from random error.

Currently, many methods of class comparison involve a separate normalization step and comparison step, although unifying normalization together with class comparison into one analysis has been advocated by Kerr et al. (2000), Kepler et al. (2002), Emptage et al. (2003) and others. The purpose of normalization is to remove systematic sources of variability while preserving the biologic variation of interest. The simplest of the normalization methods [applied by Affymetrix MAS software (Affymetrix, 2001), and others http://www.affymetrix.com/support/technical/manuals.affx, Affymetrix, Inc] involves setting the overall mean of each chip to a common value. (The microarray laboratory in our institution for instance, uses the value of 1500.) This assumes a linear relationship between the log true expression level and the log fluorescent intensity actually observed over the entire range of gene expression values. There is compelling evidence that this relationship is in fact non-linear (Bolstad et al., 2003; Dudoit et al., 2002; Kepler et al., 2002; Naef et al., 2003), and several more sophisticated normalization methods have been proposed. The normalized probe-level data are then typically summarized into a per-gene estimate of expression. Class comparisons and other analyses are then performed on the data at this level.

We show here that the massive number of probes on a high-density oligonucleotide array chip actually allows the construction of an indirect calibration curve using the available data in an unsummarized form, and that these curves have several desirable features. We present an algorithm that unifies the normalization and class comparison steps into one analysis, show how it fits into the statistical class of generalized linear models of McCullagh and Nelder (1983) and describe a practical fitting strategy. Section 2 describes the datasets used for illustrative purposes. Section 3 motivates and introduces the concept of calibration and describes its application here. Section 4 describes the algorithm and its implementation and Section 5 describes the performance of the method when applied to an example dataset as well as based on the benchmark of Affymetrix GeneChip expression measures (Affycomp) of Cope et al. (2004) and Irizarry et al. (2005). Conclusions and discussion are presented in Section 6.


    2 DATA
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DATA
 3 CALIBRATION
 4 IMPLEMENTATION
 5 RESULTS
 6 DISCUSSION
 REFERENCES
 
The properties of the proposed algorithm were evaluated on data from publicly available spike-in experiments as well as on an experimental data set published by Abraham et al. (2005). These datasets are briefly described in this section. The spike-in datasets are available from the Affymetrix website http://www.affymetrix.com or on the affycomp website http://affycomp.jhsph.edu.

2.1 Affymetrix U95A spike-in data
The Affymetrix U95A spike-in dataset has 16 genes which were spiked in varying concentrations into a pancreas background in concentrations ranging from 0 to 1024 pM and hybridized in a cyclic Latin Square design onto Hu95A chips. There were at least three chips per concentration combination. In this dataset, there are only 16 genes expected to show fold changes in concentration expression. All other 12 610 genes should be identically expressed on all arrays. A more complete description can be found in Cope et al. (2004) and Irizarry et al. (2005) or via a search for ‘Latin square data’ on the Affymetrix website.

2.2 Affymetrix U133A spike-in data
The Affymetrix U133A spike-in dataset has 42 genes spiked into a human HeLa cell line background in a cyclic Latin square fashion and hybridized onto Hu133A chips. The concentrations range from 0 to 512 pM and the sample was hybridized to 42 chips. In this dataset, there are only 42 genes expected to show fold changes in concentration expression. The other 22 258 genes should be identically expressed on all arrays. More details can be found via a search for ‘Latin square data’ on the Affymetrix website or in Cope et al. (2004) and Irizarry et al. (2005).

2.3 Immunoglobulin light chain amyloidosis (AL) data
Abraham et al. (2005) performed microarray experiments using the Affymetrix Hu133A platform in order to identify a set of genes which could distinguish immunoglobulin light chain amyloidosis (AL) from multiple myeloma (MMy). Class comparison analyses were performed on data from 24 AL and 28 MMy patients.


    3 CALIBRATION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DATA
 3 CALIBRATION
 4 IMPLEMENTATION
 5 RESULTS
 6 DISCUSSION
 REFERENCES
 
In simpler, low volume assays, it is common to normalize the data using a direct calibration curve f

Formula
where y is the observed data from the assay, x is the true concentration, and {varepsilon} is random error. In a 96 well ELISA assay, for instance, f is directly estimated using reference samples of known concentration in one row of wells. The fitted curve is used to recover estimates of the true values x from the data y.

Microarray data does not typically contain the information needed for a direct calibration. In particular, it is not clear what could be used as reference targets that would fairly represent the wide variety of probes on the array. (See the Gene Logic technical note entitled ‘Optimization of an external standard for the normalization of Affymetrix GeneChip arrays’ available from the Gene Logic website http://www.genelogic.com for one approach to this issue, however.) Here, we focus on an indirect calibration function applied to an array as a whole accounting for the largest effects, with future work dedicated to accounting for probe-specific background effects.

Hekstra et al. (2003) and Burden et al. (2004) have shown that the Langmuir isotherm

Formula
is an appropriate equation for the binding curves on theoretical grounds, and that it successfully fits several datasets. As elsewhere in this paper, y and x are on the log2 scale. The parameters a and b are scaling factors per chip. The per-probe constant K depends on the binding efficiency of each probe.

It is a general principle that for any biological assay that spans a wide range, an S-shaped curve between the true (unmeasured) value and the observed assayed value is almost a guarantee. A lower threshold on the observed data can be owing to background binding, lower limits of detection for the instrumentation, or other causes [the mismatch (MM) probes are actually designed to estimate this], while an upper limit may be owing to either biochemical or instrumentation saturation. Given the wide range of gene expression values, from ~10 to 46 000 seen on the Affymetrix platform, this S-shape relationship is likely to be true.

Furthermore, the review article of Finney (1976) describes analytical approaches to the binding problem in radioligand assay. He states that for most problems, a logistic or probit function fit to the log of the true value (on the horizontal axis) versus the log of the observed value is sufficient. A further advantage of this is that the data in question is approximately equivariant on the log scale (Ballman and Therneau, 2005), making both plots and analyses more straightforward.

Figure 1 contains two Langmuir isotherms over the concentration and fluorescence range found in the U95 spike-in data with matching logistic curves overlayed. The Langmuir and logistic curves are nearly identical in each case, and we have chosen to retain the logistic fit for this report owing to its statistical familiarity. For the Langmuir form, a and b control the upper and lower thresholds, and log2(K) translates the curve left-right. For the logistic form Formula, {eta} = {gamma}3(xK), {gamma}1 and {gamma}2 again control the upper and lower limits, the location parameter K corresponds to a per probe binding constant, and {gamma}3 = 1/2 causes the curve to have slope 1 at the midpoint; this is what is shown in the figure.


Figure 1
View larger version (7K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 Two Langmuir isotherm curves (dashed lines) over the concentration and fluorescence range found in the U95 spike-in data. Matching logistic curves (solid lines) are overlayed in order to show the similarities between the two curves.

 
The importance of a per-probe binding constant is shown in Figures 2 and 3, both of which show the data for chip 1 of the U95 experiment. Figure 2 shows the raw fluorescence data versus the spiked in values, equivalent to assuming K = 0 for all probes, along with a lowess smooth; Wu and Irizarry (2004) pursue this in more detail. In contrast, Figure 3 shows the result with probe-specific values for K; based on a fit to all 59 chips and 256 PM/MM probe pairs of the U95 spike-in with x = the true spike-in amount. The fit is much tighter, and in particular the logistic (Langmuir) form of the true calibration curve is much more clear.


Figure 2
View larger version (8K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2 True concentration in pM versus observed intensity for chip 1 of the Affymetrix U95 spike-in experiment, along with a fitted lowess curve. Genes spiked in at concentration 0 are plotted at 1/2 of the next largest concentration, but labeled as 0 concentration.

 


Figure 3
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3 Log true PM concentration (i.e. gik) plus probe effect (i.e. pij) in pM versus log observed intensity for chip 1 of the Affymetrix U95 spike-in experiment, along with a fitted logistic curve allowing for a probe effect in the model. Genes spiked in at concentration 0 are plotted at 1/2 of the next largest concentration, but labeled as 0 concentration.

 

    4 IMPLEMENTATION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DATA
 3 CALIBRATION
 4 IMPLEMENTATION
 5 RESULTS
 6 DISCUSSION
 REFERENCES
 
4.1 An algorithm
A goal of many microarray experiments is to evaluate differences in gene expression between experimental groups. The normalization function is necessary, but is not of interest in and of itself. Our basic calibration model is

Formula
where yijk is the observed intensity value on the log2 scale for the j-th probe associated with the i-th gene on the k-th chip, fk is the chip specific calibration function for the k-th chip, and Formula where gik is the expression of the i-th gene on the k-th chip and pij is the binding efficiency of the j-th probe associated with the i-th gene. The primary parameters of interest are the gene effects gik, secondarily the calibration functions fk themselves, and the probe effects pij are ancillary.

Note that if the chip functions fk were known, then this is the estimation of a generalized linear model (McCullagh and Nelder, 1983) with f–1 as the link function and {eta}ijk as the linear predictor. This opens up a whole set of well-developed tools for estimation and hypothesis testing.

It is not feasible to fit all the parameters at once, however, and we use a simple iterative algorithm where each step is described in more detail in subsequent sections:

  1. Create initial estimates Formula of fk.
  2. Solve for pij and gik, given Formula and yijk, creating Formula and Formula. Notice that this step can be done separately for each gene i.
  3. Solve for fk, given Formula, Formula, and yijk, creating an updated Formula. Notice that this step can be done separately for each chip k.
  4. Iterate steps 1 and 2 a fixed number of times, or until convergence.

4.2 Calibration function, fk
We parameterize the logistic calibration function for the kth chip as

Formula

The parameters are the lower threshold {gamma}1k, the upper threshold {gamma}1k + {gamma}2k, the inflection point {gamma}4k and the slope {gamma}3k. The chip inflection parameter {gamma}4k is completely confounded with the per-chip gene expression gik. This confounding corresponds to an obvious physical interdigitation: if each signal on chip 2 were 20% larger than the corresponding signal on chip 1, it is not possible to say, without outside information or assumptions, whether this is a chip effect (such as a different sample handling method or different scanner) or that all gene products on the second chip are actually more highly expressed (such as in a dilution experiment). The first corresponds to {gamma}41 > {gamma}42 and gi1 = gi2 for all i, the second to {gamma}41 = {gamma}42 and gi1 > gi2 for all i.

For computation, we will assume without loss of generality that {gamma}4k = 0 for all k, i.e. set the chip effect to 0. After the fit, this can be adjusted based on chosen constraints, for instance rescaling so that all of the gene effects gik have mean 0 within a chip will credit all systematic variations to chip effects rather than to the experimental unit that was hybridized to that chip. This is probably what will be done most often in practice.

The other non-identifiability is between gene effects and probes within gene: Formula for any constant c. We have chosen to make the perfect match (PM) probe effects within each gene sum to zero. Any comparison of the absolute values of gik between genes would require the assumption that the average binding efficiency of their two probe sets is identical.

The MM data are used in addition to the PM data in this algorithm. A wide range of pij s facilitates detection of the curvature by the fitting process, and the MM data help to ensure a wide range. Hence, the information added by the MM data is key to the success of this algorithm. In fact, the algorithm surprisingly has substantially more difficulty recovering the true calibration curve without the MM data when evaluated in simulations (data not shown).

4.3 Initial estimates
As an initial estimate of the lower threshold of the calibration function, we use a percentile of the MM data. If the MM probes measure only background binding, as was purposed in their design, then we would expect an average MM probe to be close to the true lower threshold. In actuality, some of them will have activity (Ballman and Therneau, 2005 http://mayoresearch.mayo.edu/mayo/research/biostat/techreports.cfm), so a percentile between the 20th and 30th is used. For the upper threshold, we use the 99th percentile of the PM values since the 100th percentile represents saturated housekeeping genes.

An initial value for the slope {gamma}3k can be set to 1 since this only controls the scale of the primary parameters gik and pij. As stated above, the offset parameter {gamma}4k is set to 0.

4.4 Estimation of gene and probe effects given fk
Since this estimation will be done on a very large number of genes, the calculations should be as fast as possible. Calls to a general non-linear estimation would be slow. The iterative weighted least squares (IWLS) approach pioneered in generalized linear models (McCullagh and Nelder, 1983) allows fast and easy estimation of the logistic parameters. For a given gene i, we have

Formula 1(1)
Suppressing the subscript i, we see that the design matrix X is the same as that for a classic two-way balanced analysis of variance. The index j ranges from 1 to twice the number of probe pairs for an Affymetrix array (both PM and MM data are used), and k from 1 to the number of chips. Given starting estimates for the lower and upper threshold parameters {gamma}1k and {gamma}1k + {gamma}2k, the IWLS update is the solution to a weighted regression with a working dependent variable z and case weights wijk, where

Formula 1
where the derivative in dijk is evaluated at Formula 1 and Formula 1 is the variance of the yijk vector evaluated at the current value of the linear predictor Formula 1.

Since we are using the log2(intensity) data without background subtraction, the variance is approximately constant, i.e. Formula 1 (this is evident in Fig. 3 in that the width of the scatter around the predicted line does not vary greatly over the range of the data). Ballman et al. (2004) examined plots for all 1508 probes for the spiked in genes of the U95 and U133 spike in experiments and reached the same conclusion. This provides a key simplification in the estimation procedure. Background subtraction makes no essential difference in the formulation of Equation (1), as it corresponds more or less to subtracting a constant from {gamma}1; but it would add a major complication to the variance function Formula 1 in that it is no longer constant.

With V a constant and using the logistic formulation for fk, the update formula simplifies to a regression of Formula 1 on DX, where D is a diagonal matrix with elements dijk. The weights dijk serve to down-weight points in the far left or right tails of the function. Notice that if we were to assume d = 1, then this becomes a bias correction algorithm similar to fastlo (Ballman et al., 2004), but with a more complex linear model (both gene and probe effects rather than the simple probe mean) and a logistic mean function rather than a non-parametric smooth.


    5 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DATA
 3 CALIBRATION
 4 IMPLEMENTATION
 5 RESULTS
 6 DISCUSSION
 REFERENCES
 
The calibration algorithm was applied to the datasets described in Section 2. The resulting gene estimates (i.e. the gik), centered and scaled to attribute the systematic effects to the chip, were submitted to affycomp (Cope et al., 2004; Irizarry et al., 2005). The full report can be found on the affycomp website http://affycomp.jhsph.edu/ under the method name chipcal4 on the new assessment link.

Figure 4 shows the estimated calibration curve for chip 1 of the U95 data from the proposed method, along with the best possible estimate created from a fit where the true concentrations are known (i.e. the model fit to the spike in genes only). The two curves are surprisingly similar.


Figure 4
View larger version (10K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4 Estimated true calibration curve for chip 1 of the U95 experiment (solid line), from a model where the true concentration is known (i.e. fit to the spike in genes only) and the probe specific binding constants (left/right offsets) are estimated. The dashed line is the estimated binding curve from the proposed algorithm, where true concentrations are not known.

 
At the time of this writing, competition results were available on the new assessment link of the affycomp website for 48 methods in the U95 table and 42 unique methods for the U133 table. In the 14-dimensional score displayed for all submitted methods our calibration method has average rank (based on absolute distance from the ideal value) of 21 for the U95 and 19 for the U133 result tables, receiving the highest ranks in the items measuring bias and the lowest ranks in the items measuring variance. When the methods are ranked based on their average score ranking, this algorithm places 13th and 14th for the U95 and U133 data, respectively.

A plot of observed expression intensity versus true expression intensity is displayed in Figure 4a of the reports. The slopes from a regression line fitted to these data (observed versus nominal expression values) are 0.94 and 0.95 for the U95 and U133 data, respectively. A value of 1 would indicate no bias. While these slopes show that there is bias towards zero, this algorithm ranks fourth and sixth on this measure in the U95 and U133 datasets, respectively, indicating that this method has less bias than most. It is our thought that the excellent performance in this area may be owing to the fact that we are, at least approximately, capturing the correct functional form of the calibration curve directly in the model.

The Achilles heel of the current algorithm lies in variance. Figure 2b for both the U95 and U133 reports indicate that variability for the non-spiked in genes increases at both the lowest and highest expression values, with the median standard deviation ranking in the top third for both datasets. The R2 values from the regression lines fit to the observed versus nominal expression values of 0.70 and 0.82 for the U95 and U133 fits rank near the bottom of the list. When signal detection is based on the criteria of a fold change >2, the algorithm performs reasonably well for medium and high fold changes with ROC AUC values of 0.80 and 0.77 for the U95 data and 0.76 and 0.96 for the U133 data. However, for low expression genes, the AUC is <0.5 at 0.36 and 0.44 in the U95 and U133 experiments, respectively, indicating a large number of false positives. Further exploration of the false positives, however, revealed that a large fraction of them are owing to a single aberrant probe which had an overly large influence on the two-way ANOVA of Section 4.4 and that they are located on the horizontally flat portions of the logistic curve.

Going beyond the Affycomp results, we explored the behavior of contrasts for both the spike in and null genes, using the six chips in experiments 1 and 2 of the U133 experiment. For each gene i, the fit gives six gene effect estimates Formula 1 for which k = 1, 2, 3 represent the first spike-in pattern and k = 4, 5, 6 the second. The variance matrix and residual standard error of the fit can be used to test whether c'g = 0 for these six chips where c = (1, 1, 1, –1, –1, –1, 0, ... , 0) and Formula 1, a contrast which incorporates both gene and probe variability. We compare these to results from RMA which are based on only the six summary expression values. Similarly, we explored the behavior of these contrasts in the U95 data utilizing the two spike in combinations having 12 replicates each.

The two chosen spike in combinations in both datasets are such that all but two of the spike in genes have a nominal fold change of 2, with the other two spike in genes having fold changes of 0 versus 0.125 or 512 versus 0 for the U133 data and 0 versus 1024 or 0.25 versus 0 for the U95 data. The remaining genes on the chip have a nominal fold change of zero. Contrast estimates and confidence intervals for the genes having fold change of two from both datasets are displayed in Figure 5 for both our algorithm and RMA. Ideally the contrast estimates would be centered at one (the plots are on the log2 scale) and the confidence intervals would not include zero. Both algorithms have greater difficulty in detecting true changes in the lower abundance range than in the high abundance range. Performance in the mid abundance range is better in the U95 than in the U133 data. Of the 42 fold changes in the U133 data 25 are significant at the 5% level with this algorithm while RMA declares 30 significant (24 of them are in common). There are 11 genes which neither algorithm declares significantly differentially expressed. Of the 16 fold changes in the U95 data 15 are significant at the 5% level with this algorithm while RMA declares 16 significant. This algorithm declares 6.3 and 19.4% of the null genes significantly differentially expressed at the 5% level in the U133 and U95 data, respectively, while RMA declares 5.7 and 4.3% significant. Scatter plots of the distributions of the null gene test statistics for this algorithm versus the RMA algorithm are displayed in Figure 6. These results indicate that both algorithms do a reasonable job of controlling type I error in the U133 data. The current algorithm is not controlling this well in the U95 data. Inspection revealed that, as for signal detection via fold change, a single probe outlier was typically the cause of the false positives, indicating that robust estimation methods will probably improve the control of type I error. As would be expected, the statistical criterion for significance performs much better than the fold change criterion which does not account for variability present in the data.


Figure 5
View larger version (13K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 5 Contrast estimates with associated confidence intervals for the comparison of spike in genes between experiments 1 and 2 of the U133 experiment (top panel) and between the two spike in combinations having 12 replicates each in the U95 data (bottom panel). All have expected fold change of 2. Results are shown for this algorithm (solid lines) and RMA (dashed lines), and are sorted in order of total nominal abundance with lowest total nominal abundance on the left.

 


Figure 6
View larger version (17K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 6 Distribution of null gene test statistics for the calibration algorithm (x-axis) versus RMA (y-axis) for the U133 data (top panel) and the U95 data (bottom panel). Dashed lines indicate significance at the 5% level for a z-statistic for calibration and for a t distribution with four degrees of freedom for the U133 data or 22 degrees of freedom for the U95 data for RMA. Numbers indicate the false positive rate (%) within each region of the plot.

 
The algorithm was applied to the AL data described in Section 2.3 and the hypothesis of differential expression between AL and MMy samples was tested. Before and after calibration, box plots of the overall chip distributions in Figure 7 demonstrate that the calibration algorithm removed systematic differences between the chips. The false discovery rate (FDR) versus type I error rate is displayed in Figure 8 for both our algorithm and RMA. Our algorithm clearly controls the FDR better than RMA does across the entire range of alpha values. At an alpha level of 0.01 (0.05), the FDR for our algorithm is 1.4% (4.3%) while for RMA it is 5.2% (12%).


Figure 7
View larger version (30K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 7 Pre-(top panel) and post-(bottom panel) normalization box plots of probe expression values from chips in the AL experiment described in Section 2.3. Values from the AL patients are in the first 24 boxes, values from the MMy chips are the remaining 28 boxes.

 


Figure 8
View larger version (11K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 8 False discovery rate (FDR) versus significance level (type I error rate) for this algorithm (solid line) and RMA (dashed line) when applied to the AL data described in Section 2.3.

 

    6 DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DATA
 3 CALIBRATION
 4 IMPLEMENTATION
 5 RESULTS
 6 DISCUSSION
 REFERENCES
 
We have presented an algorithm for normalization and analysis of high-density oligonucleotide microarray data. The current algorithm, cast into the generalized linear model framework, brings a statistical framework to bear on the normalization problem and paves the way for a large set of well-established methods to be easily applied in this situation. In addition, the generalized linear model framework allows one to conduct hypothesis tests based on variability at the probe level, accounting for all sources of variation present in an experiment. While our implementation assumes the variance is not a function of the mean, the algorithm itself does not rely on this assumption. The implementation can be modified accordingly in order to allow application to background subtracted data.

The algorithm incorporates biological principles as well as efficient classical statistical techniques. It uses probe level PM and MM data as the input and results in a chip-specific normalization or indirect calibration function. The logistic function used to model this is based on calibration models common to most biological assays and the parameters have a natural interpretation. Parameter estimates from these calibration functions could eventually be used for quality control purposes once their behavior is better understood. The algorithm is simple to implement using standard software. R functions that implement this algorithm are available from the authors.

The algorithm estimates the true calibration functions well. Results from the affycomp competition show that on average, this algorithm is amongst the top third of affycomp competitors based on average rank in the 14-dimensional score and performs well with respect to bias. There is room for improvement with respect to variance and signal detection. Including the MM probes in the modeling process is probably adding variability to the estimates, in a sense acting like background subtraction. However, it is well-known that MM probes measure signal, and without the MMs the algorithm has mediocre performance. In addition, knowledge of probe-specific non-specific binding (such as GC content) has not yet been incorporated into this algorithm. Reports have shown that incorporating this knowledge into normalization routines improves the variance at the lower expression levels (Naef and Magnasco, 2003; Wu et al., 2004). Work is in progress to incorporate probe-specific background binding information into the algorithm.

Regarding signal detection, the affycomp results indicate that this algorithm produces a large number of false positives based on the criteria of fold change greater than two, and the statistical contrast results demonstrate that type I error is not well controlled in the U95 data. Inspection of the null (non-spiked in) genes (i.e. those with expected fold change of zero) which were given large fold changes by the algorithm reveals that these particular genes are those that fall on the horizontally flat portions of the logistic curve and the large fold changes are owing to one or two outliers. This suggests that using robust methods would improve the algorithm with respect to this metric. Hence, we are currently working to incorporate robust methods into this algorithm.


    Acknowledgments
 
The authors are grateful to the reviewers for their insightful comments which resulted in an improved manuscript, to Mr E. M. Lunde for his programming support and to Dr R. S. Abraham for use of the AL data. Partial funding for A.L.O. was provided by the Fraternal Order of Eagles Cancer Research Fund.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Alvis Brazma

Received on December 23, 2006; revised on June 9, 2006; accepted on July 18, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DATA
 3 CALIBRATION
 4 IMPLEMENTATION
 5 RESULTS
 6 DISCUSSION
 REFERENCES
 

    Abraham, R., et al. (2005) Functional gene expression analysis of clonal plasma cells identifies a unique molecular profile for light chain amyloidosis. Blood, 105, 794–803[Abstract/Free Full Text].

    Affymetrix. (2001) Microarray suite user guide, version 5. Technical Report.

    Ballman, K.V. and Therneau, T.M. (2005) An exploration of affymetrix probe-set intensities in spike-in experiments. , Rochester, MN Technical Report No. 74 Department of Health Sciences Research, Mayo Clinic.

    Ballman, K.V., et al. (2004) Faster cyclic loess: normalizing RNA arrays via linear models. Bioinformatics, 20, 2778–2786[Abstract/Free Full Text].

    Bolstad, B.M., et al. (2003) A comparison of normalization methods for high-density oligonucleotide array data based on variance and bias. Bioinformatics, 19, 185–193[Abstract/Free Full Text].

    Burden, C.J., et al. (2004) Statistical analysis of adsorption models for oligonucleotide microarrays. Stat Appl. Genetic Mol. Biol, . 3, Article 35.

    Cope, L.M., et al. (2004) A benchmark for affymetrix genechip expression measures. Bioinformatics, 20, 323–331[Abstract/Free Full Text].

    Dudoit, S., et al. (2002) Statistical methods for identifying genes with differential expression in replicated cdna microarray experiments. Statistica Sinica, 12, 111–139[ISI].

    Emptage, M.R., et al. (2003) Treatment of microarray experiments as split-plot designs. Biopharm. Stat, . 13, 159–178[CrossRef].

    Finney, D.J. (1976) Radioligand assay. Biometrics, 32, 721–740[CrossRef][ISI][Medline].

    Hekstra, D., et al. (2003) Absolute mRNA concentrations from sequence-specific calibration of oligonucleotide arrays. Nucleic Acids Res, . 31, 1962–1968[Abstract/Free Full Text].

    Irizarry, R.A., Wu, Z., Jaffee, H.A. (2005) Comparison of affymetrix genechip expression measures. Technical Report Working Paper 86, Johns Hopkins University, Department of Biostatistics Working Papers.

    Kepler, T.B., et al. (2002) Normalization and analysis of DNA microarray data by self-consistency and local regression. Genome Biol, . 3, research0037.1–research0037.12.

    Kerr, M.K., et al. (2000) Analysis of variance for gene expression microarray data. J. Comput. Biol, . 7, 819–837[CrossRef][ISI][Medline].

    McCullagh, P. and Nelder, J.A. Generalized Linear Models, (1983) Chapman and Hall.

    Naef, F. and Magnasco, M.O. (2003) Solving the riddle of the bright mismatches: Labeling and effective binding in oligonucleotide arrays. Phys. Rev. E Stat. Nonlin. Soft Matter Phys, . 68, 011906–011904[Medline].

    Naef, F., et al. (2003) A study of accuracy and precision in oligonucleotide arrays: extracting more signal at large concentrations. Bioinformatics, 19, 178–184[Abstract/Free Full Text].

    Wu, Z. and Irizarry, R. (2004) Stochastic models inspired by hybridization theory for short oligonucleotide arrays [extended abstract]. San Diego, CA. RECOMB, Copyright 2004 ACM 1-58113-755-9/04/0003.

    Wu, Z., et al. (2004) A model-based background adjustment for oligonucleotide expression arrays. J. Am. Stat. Assoc, . 99, 909–917[CrossRef][ISI].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/19/2381    most recent
btl399v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Oberg, A. L.
Right arrow Articles by Therneau, T. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Oberg, A. L.
Right arrow Articles by Therneau, T. M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?