Skip Navigation


Bioinformatics Advance Access originally published online on July 27, 2007
Bioinformatics 2007 23(18):2391-2398; doi:10.1093/bioinformatics/btm361
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/18/2391    most recent
btm361v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Fan, J.
Right arrow Articles by Niu, Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Fan, J.
Right arrow Articles by Niu, Y.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Selection and validation of normalization methods for c-DNA microarrays using within-array replications

Jianqing Fan * and Yue Niu

Department of Operations Research and Financial Engineering Princeton University, Princeton, NJ 08544, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS OF NORMALIZATION
 3 VALIDATION TESTS
 4 ESTIMATION OF GENEWISE...
 5 SIMULATIONS AND APPLICATIONS
 6 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Motivation: Normalization of microarray data is essential for multiple-array analyses. Several normalization protocols have been proposed based on different biological or statistical assumptions. A fundamental problem arises whether they have effectively normalized arrays. In addition, for a given array, the question arises how to choose a method to most effectively normalize the microarray data.

Results: We propose several techniques to compare the effectiveness of different normalization methods. We approach the problem by constructing statistics to test whether there are any systematic biases in the expression profiles among duplicated spots within an array. The test statistics involve estimating the genewise variances. This is accomplished by using several novel methods, including empirical Bayes methods for moderating the genewise variances and the smoothing methods for aggregating variance information. P-values are estimated based on a normal or {chi} approximation. With estimated P-values, we can choose a most appropriate method to normalize a specific array and assess the extent to which the systematic biases due to the variations of experimental conditions have been removed. The effectiveness and validity of the proposed methods are convincingly illustrated by a carefully designed simulation study. The method is further illustrated by an application to human placenta cDNAs comprising a large number of clones with replications, a customized microarray experiment carrying just a few hundred genes on the study of the molecular roles of Interferons on tumor, and the Agilent microarrays carrying tens of thousands of total RNA samples in the MAQC project on the study of reproducibility, sensitivity and specificity of the data.

Availability: Code to implement the method in the statistical package R is available from the authors.

Contact: jqfan{at}princeton.edu


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS OF NORMALIZATION
 3 VALIDATION TESTS
 4 ESTIMATION OF GENEWISE...
 5 SIMULATIONS AND APPLICATIONS
 6 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Microarray techniques have been widely used in many areas of biological research. They have substantial impact on tumor diagnostics, and classification and understanding of the molecular mechanisms of biochemical processes, tumorigenesis and tumor developments. Proper statistical analysis is vital for revealing meaningful biological results. For an overview of statistical analysis of DNA microarrays, we refer to Fan and Ren (2006) and references therein.

The quality of microarray data is very important for downstream statistical analysis (Eisenstein, 2006; Marshall, 2004; Patterson et al., 2006; Shi et al., 2006). Experimental variations, such as RNA quality, probe labeling, hybridization condition, washing, signal and background detection in the scanning process, slide and block effects, pose significant challenges in the analysis of microarray data. The first step in microarray analysis is to remove the systematic biases due to the variations in experimental conditions so as to make multiple array analyses meaningful. These efforts are collectively referred to as the normalization of microarray data in the literature.

A number of useful normalization protocols have been proposed based on different assumptions. These include the global normalization (Kroll and Wölfl, 2002), rank invariant normalization (Tseng et al., 2001), LOWESS normalization (Dudoit et al., 2002), Semi-Linear In-slide Model (SLIM, Fan et al., 2004, 2005), Two-way Semi-Linear Models (TW-SLM, Huang et al., 2005), robust TW-SLM (Ma et al., 2006; Wang et al., 2005) and normalization of small diagnostic microarrays (Jaeger and Spang, 2006). Recently, two seminal normalization methods (CADS and eCADS) based on dye swap and statistical models are proposed by Dabney and Storey (2007a, b). All of the aforementioned methods are based on some statistical and biological assumptions. For example, the global normalization is adequate only when there is no print-tip block effect and no intensity effect; the LOWESS method assumes implicitly that at each given intensive level the average expression level of up- and down-regulated genes are about the same in each print-tip block; SLIM, TW-SLM, robust TW-SLIM all require that the statistical models are correct. The questions then arise naturally for a given array, which methods are most effective to normalize the data and whether the data have been properly normalized. These questions are very fundamental to the statistical analysis of microarray and have not yet been addressed.

Our study is motivated by the aforementioned fundamental concerns. Our approach is to use the duplicated spots within an array. They contain the most valuable information about possible systematic biases in the microarray experiments. The idea is that if microarray data have been properly normalized, there should be no systematic biases among duplicated spots. Therefore, when the sum of the square differences of duplicated spots, standardized by the estimated genewise variances, are aggregated among many different genes having duplications, the test statistic follows approximately a {chi}2-distribution with a large degree of freedom, or more formally a normal distribution. This provides a simple and useful diagnostic test statistic to check if an array has been properly normalized by a particular method. Regarding the test statistic as a measure of the discrepancy of replicated spots after normalization, we select a normalization method that has the smallest value of the test statistics. In addition, the associated P-value of the test statistic enables us to judge the degree to which the normalization has been properly carried out.

In implementing the validation tests, it involves inevitably the estimation of genewise SDs and variances. A precise estimate of SDs and variances will improve the statistical power of the validation tests. It has also important applications in selecting significant genes (Cui et al., 2005; Dudoit et al., 2003; Reiner et al., 2003; Storey and Tibshirani, 2003; Symth et al., 2005; Tusher et al., 2001;). The precision of estimating genewise SDs and variances depends on the number of replications that are available. Several innovative strategies will be introduced to enhance the precision of the estimate.

Duplicated spots play very important roles in the analysis of microarray data. They are not only powerful for normalization (Fan et al., 2004), but also useful for genewise variance estimation (Smyth et al., 2005) in selecting statistically differently expressed genes. Furthermore, they are fundamental to our proposed validation tests. The availability of such duplicated genes can be accomplished by the designs of c-DNA microarrays. For example, in the microarrays analyzed by Fan et al. (2004), 111 out of 19 968 clones of genes were printed twice randomly on the 32 print-tip blocks. This enables them to untangle the block effects and intensity effects from these 111 duplicated spots. With the increased popularity of customized microarrays, which enables researchers to focus only on hundreds of genes of their primary interest with more reliable measurements, within-array replications can easily be obtained. The gene selection biases in customized arrays require more sophisticated normalization techniques. The validation tests are essential for controlling the quality of downstream statistical data analysis of customized arrays.


    2 METHODS OF NORMALIZATION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS OF NORMALIZATION
 3 VALIDATION TESTS
 4 ESTIMATION OF GENEWISE...
 5 SIMULATIONS AND APPLICATIONS
 6 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
There are several useful normalization methods, which are based on different biological or statistical assumptions. We briefly review several of them that will be used in our numerical studies.

Suppose that we have J replications of a c-DNA microarray experiment. For each given array, there are N different genes. Among them, G genes are replicated I times. Let Ig denote the number of replications for gene g, with Ig = 1 indicating no duplication. For customized arrays, usually G = N (all genes have within-array replications) and Ig = 2 or 3 or 4. However, this can also be designed differently. For the microarrays analyzed in Fan et al. (2004), G = 111 and N = 19 968 – G so that Ig = 2 only for 111 clones that are duplicated and randomly placed on 32 print-tip blocks.

Let Rgij and Ggij be respectively the intensities of red (Cy3) and green (Cy5) channels for the ith duplication of the gth gene in the jth array, and bgi be the print-tip block where the ith duplication of the gth gene resides. Then, we can compute the log-ratios and log-intensities as


Formula

The global normalization is to compute the median Formula of log-ratios {Ygij} of the jth array and to normalize the data for the jth array as


Formula 1

(1)

This basically assumes that the up-regulated and down-regulated genes are about the same, which does not usually hold for customized arrays due to gene selection biases. In addition, the global normalization assumes no block or intensity effects.

To address the above two concerns, Dudoit et al. (2002) apply the global normalization technique more locally to each block and each intensity level, resulting in the LOWESS normalization. For the data {(Xgij,Ygij):bgi = b} in a given print-tip block b of the jth array, they applied the LOWESS smoother to estimate the conditional mean function Formula of the log-ratios Y given the intensity level X = x, and computed the normalized log-ratios in the bth block in jth array as follows:


Formula 2

(2)
This significantly relaxes the restrictions of the global normalization, but still assumes that up-regulated and down-regulated genes are about the same at each given intensity level. Again, this might not be appropriate when the cells or tissues are treated by cytokines. It might not be valid for customized arrays due to gene selection biases.

To address these issues, Fan et al. (2004) introduced the SLIM technique based on the model assumption:


Formula 3

(3)
in which {alpha}g is the treatment effect on gene g, βj and mj represent respectively the array-specific block and intensity effect and {varepsilon} is the stochastic noise. For each given array j, Fan et al. (2004) estimated the model parameters using the data from duplicated spots and computed the normalized data


Formula 4

(4)
In Fan et al. (2005), the efficiency of parameters in (4) is improved by aggregating the data from all arrays. Namely, the parameters in (3) are jointly estimated using all arrays.

TW-SLM (Huang et al., 2005) and robust TW-SLM (Ma et al., 2006; Wang et al., 2005) are in the same spirit as SLIM. It does not require duplications of spots but relies heavily on the assumption that the treatment effect {alpha}g is independent of the array. Statistically, it is a specific model of (3) with β = 0. With estimated parameters, the normalized data are computed as in (4) with Formula . TW-SLM applies the technique for each given block so that the block effects in each array have been properly taken care of.

The effectiveness of SLIM, TW-SLM and robust TW-SLM depends critically on the statistical model assumption. With so many normalization methods, the questions arise naturally which method is the most appropriate for a specific array and whether the expression profiles have been properly normalized.

The CADS and eCADS (Dabney and Storey, 2007a, b) are the normalization methods based on dye-swap and statistical models. They aim at preserving time differential expression relationships among the treatment and control arrays after normalization.


    3 VALIDATION TESTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS OF NORMALIZATION
 3 VALIDATION TESTS
 4 ESTIMATION OF GENEWISE...
 5 SIMULATIONS AND APPLICATIONS
 6 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Within array replications not only provide useful information about the possible systematic biases: block effect and intensity effect, but also play an important role in validation tests of the necessity and effectiveness of the normalization methods. When the log-ratios have been properly normalized, the systematic biases should be negligible and hence the following model holds approximately:


Formula 5

(5)
where Ig is the number of duplication for gene g. See also (4).

Duplicated spots provide valuable information about the validity of model (5) for each given array. We will use only G genes with I replications for each given array to develop the validation tests, and drop the subscript j to facilitate the notation. This leads to the simplified notation:


Formula 6

(6)

Hence, the difference Formula should have mean zero, where Formula . We will assume that the first four moments of the noise behave like those of a normal distribution:


Formula

3.1 Genewise standardization
Two natural test statistics for testing model (6) are based on the genewise standardized L2- and L1-norm, aggregated over G genes having duplications. Specifically, we let


Formula 7

(7)

Under the normality assumption Formula , if the data have been properly normalized, the test statistic T1 follows the {chi}2-distribution with degree of freedom (I 1)G. In particular, when I = 2,


Formula

Note that the test statistic above is reasonably robust to the normality assumption. In the validation test, the number of genes with duplications G is reasonably large and by the Central Limit Theorem, T1 follows approximately a normal distribution. In this case, Formula is also approximately a normal distribution. Hence, the null distribution Formula is approximately valid.

Instead of using the L2-norm, one can use a more robustified norm L1-norm. This leads naturally to consider the test based on the genewise standardized L1-norm:


Formula 8

(8)
In particular, when I = 2, the test statistic reduces to


Formula

By the Central Limit Theorem,


Formula 9

(9)

where Formula and Formula . Specifically,


Formula

3.2 Aggregated standardization
Accurate estimates of the genewise SD {sigma}g are challenging. The process itself may depend on the selection of a method of normalization. The test statistics T1 and T2 may not be well approximated by the {chi}2-distribution or the normal distribution, when the genewise variances are not estimated accurately. In addition, the standardized square differences in T1 and T2 are sensitive to the estimation error in {sigma}g. On the other hand, the aggregated variance Formula or aggregated SD Formula can be more accurately estimated. These considerations lead us the unweighted differences among the expression profiles of replicated spots.

The aggregated differences among replicated spots when standardized, are given by


Formula 10

(10)
The test statistic T3 is obtained by aggregation first followed by standardization. On the other hand, the test statistic T1 is obtained by standardization first and followed by aggregation. The null distribution of T3 follows approximately Formula when G is large.

Following the same spirit, a more robustified counterpart of T2 is


Formula 11

(11)
where {xi}I and {eta}I are two constants defined in (10). The null distribution of T4 follows approximately Formula , when the data are properly normalized.

Note that the test statistic T4 involves the estimation of aggregated SD Formula . If this is estimated with bias, then the null distribution will be shifted. The genewise variance is usually estimated by the sample variance or aggregated sample variance Formula . Suppose that Formula has the distribution Formula with a given degree of freedom K, then Formula is an unbiased estimator of Formula , but Sg is not an unbiased estimator of {sigma}g. An unbiased estimator of {sigma}g is given by (Gurland and Tripathi, 1971)


Formula 12

(12)
In our numerical studies, this is implemented in T2 and T4. Our experiences show that the correction factor in (12) is necessary in order to obtain a more accurate approximation of the null distribution.

3.3 Choosing a method of normalization
The test statistics T1,...,T4 can be regarded as measures of effectiveness of normalization. The smaller, the less discrepancy among repeated measurements, and the more effectiveness of a normalization method. For a given array, among several normalization methods, we would choose the one that has the smallest test statistic. The associated P-value gives us an idea on the extent to which the expression profiles have been normalized. The power of the validation test depends on the number of data points G in the training set. Excessively, large G will result in overpower of the tests to reject even a tiny systematic bias.

As to be discussed in Section 4, the implementation of the validation tests might require the choice of an appropriate normalization method first in order to estimate the genewise variances. The aggregated standardization tests T3 and T4 can be used for this purpose, since the normalization constants can be ignored.

3.4 Training and testing sets
In many situations, there are many genes that have duplications. This is particularly the case for the customized arrays. In these situations, we can randomly select between 50 and 100 different genes as the testing set and use the remaining genes as the training set. The training set is used to estimate the parameters in the normalization, while the testing set is applied to the validation tests.

In other situations, there are limited genes that have duplicated spots. In this case, the multi-fold cross-validation ideas can be employed to choose the training and testing sets.


    4 ESTIMATION OF GENEWISE VARIANCE
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS OF NORMALIZATION
 3 VALIDATION TESTS
 4 ESTIMATION OF GENEWISE...
 5 SIMULATIONS AND APPLICATIONS
 6 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Accurate estimation of genewise variance Formula is important for assessing the effectiveness of normalization. It is also critically important for selecting the genes that are statistically differently expressed among treatments and controls (Cui et al., 2005; Fan and Ren 2006; Fan et al., 2004; Storey, and Tibshirani, 2003; Tusher, et al., 2001). In particular, Cui et al. (2005) demonstrates that genewise variance estimation has direct impact on the sensitivity and specificity of selecting differently expressed genes.

4.1 Use of within array replications
A natural estimate of the genewise variance is the sample variance of the duplicated expressions. These log-ratios are computed after the data are normalized. If there are several normalization methods available, one can use T3 and T4 without standardization to help select a method of normalization.

Suppose that we have J replicated arrays. If we assume that Formula , which are the same across J arrays, then we would pool the variability information from other arrays. This leads to a pooled estimator of Formula by


Formula 13

(13)

The above estimate ignores the correlation among duplicated genes. If within-array replications have a common correlation {rho}g = {rho} and observations across arrays are independent, Smyth et al. (2005) introduced the residual maximum likelihood (REML) estimator of Formula as follows:


Formula 14

(14)
where Formula with Formula is the between-arrays variance and Formula is an estimate of {rho}. They also proposed the REML estimation of Formula by


Formula

They argued further that the degree of freedom of Formula is (IJ – 1).

4.2 Smoothing estimator
The degree of freedom for the within-array estimate Formula of the variance is still limited. As the variability of c-DNA microarray measurements is related to the intensity level (Fan et al., 2004; Tseng et al., 2001), one can pool the information of variability from expression profiles with similar intensity levels (Fan et al., 2004). This results in a non-parametric estimate of the intensity-specific variance function {sigma}2(·) from the non-parametric regression model:


Formula 15

(15)

See also (5) with the array index j suppressed. The procedure of Fan et al. (2004) is to first smooth Formula on {Xgi} by using a local linear estimate, which is really a smoothing estimator of the scatter plot Formula , to obtain an estimate of {alpha}g by regarding it as a smooth function of the log-intensity Xgi, and then smooth the squared residuals Formula on {Xgi} to obtain an estimate of the intensity-specific variance function {sigma}2(·). The fundamental assumption in Fan et al. (2004) is that {alpha}g is basically a smooth function of Xgi. When this assumption is not valid, the estimator will be biased.

The bias issue in Fan et al. (2004) can be significantly reduced when the within-array replications are available as in (15). Consider those genes with I replications. Replacing {alpha}g by its unbiased estimate Formula , we have the variance of the residual


Formula

When the variability of log-intensities is not large, we can approximate Xgj by its average Formula . Owing to the smoothness assumption, we have


Formula 16

(16)

Letting Formula be the sample SD, we have


Formula

Therefore, the intensity-specific variance {sigma}2(·) can be estimated by smoothing the pairs Formula , resulting in an estimated function Formula . Hence, our estimate of Formula is


Formula

The approach is particularly appealing to the customized arrays, in which the variability of intensities is not large and the replicated spots are available. See Section 5.3.

4.3 Empirical Bayes estimator
With within-array replications, we have two estimators: REML Formula and the intensity-specific estimator Formula . One naturally uses the intensity-specific estimator Formula to augment the REML estimator Formula . One simple way to do this is the following empirical Bayes shrinkage estimator:


Formula 17

(17)
where d is the degree of freedom of Formula . The estimator (17) is the Bayes estimator with the inverse Gamma prior with the prior mean Formula and prior degree of freedom d. Since the prior parameters are estimated from the data, (17) is indeed an empirical Bayes estimator.

The degree of freedom of the non-parametric estimator Formula is often estimated by the effective sample size, which is the inverse of the asymptotic variance of the kernel local linear regression estimator. For example, if the kernel local regression estimator is used, then, we have


Formula

where K is the kernel function used in the smoothing . For example, in the LOWESS smoother, the kernel function is Formula .

Note that when the within-array variance Formula in (13) is used in (17), the factor (IJ 1) should be replaced by I(J – 1). In addition, the intensity-specific estimate of variance is usually not as reliable as the within-array variance Formula , the constant d can be replaced by some smaller numbers to reduce somewhat of its influence.


    5 SIMULATIONS AND APPLICATIONS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS OF NORMALIZATION
 3 VALIDATION TESTS
 4 ESTIMATION OF GENEWISE...
 5 SIMULATIONS AND APPLICATIONS
 6 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
In this section, we first use the simulated data set to illustrate the validity and the power of our proposed validation tests. In particular, the advantages of the aggregated standardization tests T3 and T4 are demonstrated. The three real data analyses illustrate the methodological power of our approaches.

5.1 Simulation
In each simulation, we generate J = 4 arrays from model (3) with G = N = 2000 genes, each having I = 3 replications randomly assigned over the 48 blocks. The details of simulation scheme for this example are summarized as follows:

Formula : The expression levels of the first 150 genes are generated from the standard double exponential distribution. The rest are 0's;. These expression levels are the same over four arrays in each simulation, but may vary over simulations.

Formula : The 48 parameters for the block effects Formula j in array j are all set to Formula , which is given by


Formula

These parameter values are taken from the estimates in Ma et al. (2006).

X: The intensity is generated from a mixture distribution: with probability 0.8 from probability distribution 0.0004(16 x)3I(6 < x < 16) and 0.2 from the uniform distribution over [4, 16].

mj (·): Set the function Formula , whose expectation with respect to X is approximately zero.

bg: For each given gene, its associated block is assigned at random at one of 48 print-tip blocks.

{varepsilon}Formula is generated from the standard normal distribution with mean zero and variance Formula .

This is a heteroscedastic model with small block effect and intensity effect. Three normalization methods are used: Global normalization, LOWESS normalization using all blocks and the SLIM normalization (Fan et al., 2004). They are applied to 200 pseudo-data sets, each having J = 4 arrays, giving a total of 800 arrays.We drew 200 genes at random from the 2 000 different genes as the testing set, and the remaining genes as the learning set.

We first apply the validation test statistic T3 with genewise variances estimated by the REML estimators (14) to check the effectiveness of the three normalization methods over 200 pseudo-data sets, which comprise of 800 pseudo-arrays. The distributions of the test statistic |T3| based on the three normalization methods are presented in Figure 1. They are well separated (Fig. 1a). First of all, T3 for the global normalization is the same that with no normalization, which by far the largest. This shows the global normalization is the worst method for the data sets. Indeed, the corresponding 800 P-values are all zero, which shows the power of validation test is 100 %. The LOWESS normalization using all blocks ignores the small block effects Formula . The resulting statistics T3 from 200 simulations are smaller than those based on the global normalization, but are stochastically larger than those based on the SLIM normalization, which accounts for these small block effects. This shows that the LOWESS normalization ignoring block effect is not as effective as the SLIM, but is more effective than the global normalization. Figure 1 also depicts the distribution of the 800 P-values of the LOWESS and SLIM method. The P-values of SLIM follow nearly a uniform distribution, which indicates that systematically biases have been effectively removed. On the other hand, for a portion of arrays, the LOWESS normalization is inadequate. This demonstrates the power of the validation: even ignoring small block effects, the test statistic T3 is able to detect these small systematic biases.


Figure 1
View larger version (25K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. (a) Distributions of |T3| before normalization (right) and after normalization using SLIM (left). (b) Distribution of |T3| after the LOWESS normalization without accounting small block effects. (c) and (d) Distributions of P-values of the validation test T3 after normalization using SLIM (left panel) and LOWESS (right panel) methods. P-values based on test T3 before normalization are all zero.

 
After demonstrating the power of the test statistic T3, we now examine the accuracy of the null distributions T1,...,T4 using SLIM as the method of normalization. Since the correct method is used in the normalization for the pseudo-data, the estimation errors come from two sources: estimation of block and intensity effect and estimation of genewise variance by (14). First of all, without normalization, all validation test statistics T1,...,T4 report zeros P-values for all realizations. This indicates the sensitivity of these tests. When the SLIM is used to normalize the data, if the normalization method is effective and the null distributions are accurate, the P-values follow the uniform distribution on the interval [0, 1]. Figure 2 depicts the P-values based on the SLIM normalization. The distributions of P-values based on T3 are reasonably uniform. This shows that both the normalization method is effective and the null distribution is accurate for the test statistic T3. However, the distributions of T2 and T4 have large deviations from the uniform distribution, which indicates the estimation errors from the REML estimators (14) of genewise SDs. These deviations have greatly been mitigated by correcting the REML estimators to the unbiased estimators (12).


Figure 2
View larger version (30K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Distributions of P-values for test statistics T1,T2,T3 and T4 after the SLIM normalization. (a)–(d) Distributions of P-values for test statistics T1,T3,T2 and T4 after the SLIM normalization. (e) and (f) Distributions of P-values for test statistics T2 and T4 calculated by using unbiased estimators of genewise SD (13). The distribution of T3 is somewhat more uniform than that based on T1. The same conclusion applies to T4 and T2, both using the REML estimators and unbiased REML estimators.

 
5.2 Application to human placenta data
A collection of human placenta cDNAs comprising 7042 clones was identified and used as the probe set for cDNA microarray fabrication in this study (Ma et al., 2005). Three kinds of RNA samples were used. These include the common reference RNA derived from the probe set (PS) in equal amounts representing artificial RNA produced by in vitro transcription, the ‘Universal Human Reference RNA’ from Stratagene, which is comprised of 10 different cell lines and human full-term placenta RNA. The original goal of the study was to evaluate the performance of the PS RNA as a reference RNA in comparison with that of Stratagene's; universal reference RNA.

For the sake of studying the normalization and validation tests, we only compare the ‘Universal Human Reference’ RNA with human placenta RNA in this study. Gene expression values were obtained through direct hybridizations between these two kinds of RNAs. There are four slides, including two dye-swapped slides. Each clone was printed three times on different blocks in each slide. There are 48 blocks on each array. After preprocessing that filters low quality spots, there remained 2149 genes that have three replications. Our analysis focuses on this subset. We compared the effectiveness of four normalization methods: Global, LOWESS, SLIM and aggregated SLIM by using test statistics T3 and T4. One hundred different genes were selected as the validation set, while the remaining 2049 genes were used to estimate the parameters.

The key assumption of the LOWESS method is that up- and down-regulated genes' expressions are symmetrically distributed around 0, which is usually not the case for genes investigated in placenta tissue (Ma et al., 2005). Figure 3 (a)–(d) compares the effectiveness of the four normalization methods using the validation test statistics T3 and T4. The genewise variances are estimated by (14). The results show clearly that normalizations are needed for each array. In addition, the blockwise LOWESS normalization is inadequate except the fourth array at significant level 5 %. The outcomes of validation tests show that we can choose either SLIM or aggregated SLIM to normalize the log-ratios for each array.


Figure 3
View larger version (18K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. The P-values for validation tests T3 and T4, based on the human placenta data, under four different normalization approaches: Global (labeled ‘1’), LOWESS (labeled ‘2’), SLIM (labeled ‘3’), and aggregated SLIM (labeled ‘4’) methods. The y-axis is Figure 3. The dark gray columns are P-values for T3, while the light gray ones are for T4. The results for four arrays are plotted in (a) – (d). The lines correspond to 5 % significance level. The P-values for Global normalization are highly statistically significant at level 5 % for all four arrays, indicating the ineffectiveness of the method. The LOWESS normalization method improves the P-values a lot but still are statistically significant at level 5 % except the fourth array. The P-values based on SLIM and aggregated SLIM methods show that the systematic biases due to the variation of experimental conditions have been effectively removed by either of these two methods. Therefore, the resulting log-ratios can be used for the downstream statistical analysis.

 
5.3 Application to interferons data using customized arrays
An important property of interferons (IFNs), a cytokine, is their anti-tumor activity. IFNs have efficacy in the treatment of several types of solid tumors carcinomas. Interestingly, it has been reported that IFN-β has greater anti-tumor effects than IFN-{alpha} on melanoma. One probable mechanism for the different effects may be the different affinity of IFN-{alpha} and IFN-β binding to the IFN receptors. Another possibility may be the differences in intracellular signaling. To address whether different signal pathways are involved in IFN-{alpha} and IFN-β-mediated anti-tumor activity, customized c-DNA chips are designed to include IFN stimulated genes and genes involved in multiple pathways. Gene expression changes induced by IFN-{alpha} and IFN-β were investigated and compared by using the customized c-DNA microarrays.

The customized c-DNA microarrays provides an ideal platform for our validation tests. Usually, only several hundreds of genes of primary interest are monitored for the changes of expression profiles and these genes are often duplicated several times. For our particular applications, 768 genes that might be induced by IFN are printed on the 16 blocks with 12 rows and 12 columns of spots in each block. These 768 genes are duplicated 3 times. The three replications of each gene reside in the same row in the same block but adjacent columns. The expression profiles of these 768 genes with IFN-{alpha} or IFN-β stimulations are compared with those without stimulations. This results in two customized microarrays, one with INF-{alpha} stimulation and the other with INF-β stimulation, each having with 3x768 spots. After some preprocessing that filtered the data with low quality, there were 572 genes left for further analysis.

To examine the necessity and effectiveness of normalization, we randomly selected 50 genes as the test set; the remaining 522 genes were used as the learning set. Three normalization methods are employed: Global, LOWESS and SLIM normalizations. Due to the specific designs of the replications, the block effect is not estimable. The genewise variance are estimated by using the empirical Bayes method in Section 4.3—the variance estimates from two arrays are not aggregated. Table 1 depicts the P-values using the test statistics T1,...,T4. From the table, we conclude that there is no need for normalization since all of the P-values for test statistics are high for both slides and for all methods. Note that the P-values by using the SLIM normalization are a little bit higher than those using the Global normalization, and they are larger than the P-values using the LOWESS method. That is because the fundamental assumption for LOWESS method—up- and down- regulated genes are about the same—are not substantiated here.


View this table:
[in this window]
[in a new window]

 
Table 1. P-values for T1, ..., T4 based on human placenta data

 
5.4 Application to human total RNA samples using Agilent arrays
Our third example comes from the Microarray Quality Control (MAQC) project (Patterson et al., 2006). The MAQC project studies the reproducibility, sensitivity and specificity of the microarray data across different platforms and sites. It compared two RNA samples, Stratagene Universal Human Reference total RNA and Ambion Human Brain Reference total RNA using different microarray technology. Our study focuses only on the RNA samples generated at three test sites using Agilent platform. At each site, 10 Agilent two-color microarrays were processed with five arrays for each dye configuration, which assayed a total of 30 microarrays. Following Patterson et al. (2006), we excluded two outlier microarrays based on single microarray quality metrics, resulting in 28 microarrays. After preprocessing, we obtained 21 767 genes from a total of 43 931 and found four genes each having 10 replications, randomly located on the microarray.

For our validation test, gProcessedSignal and rProcessedSignal values from Agilent's; Feature Extraction software were used as input to calculate the test statistics. If the genewise variance is estimated by using (14), all P-values are at least 99.99 %, indicating the proper normalization of all Agilent arrays. Even using the most stringent estimation of genewise variance (13), almost all the microarrays pass the validation test at significant level 1 % except for one array AGL-3-D3 at the test site 3. See Table 2. The results show the data processed by Agilent software are properly normalized and reliable. These are in line with the conclusion of the MAQC project.


View this table:
[in this window]
[in a new window]

 
Table 2. P-values based on T1, ..., T4 for MAQC project data

 

    6 CONCLUSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS OF NORMALIZATION
 3 VALIDATION TESTS
 4 ESTIMATION OF GENEWISE...
 5 SIMULATIONS AND APPLICATIONS
 6 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Motivated by the urge of measures for comparing different normalization methods, we proposed four validation tests to evaluate the necessity and effectiveness of normalization methods, relying on the replicated clones. They are based on the standardized differences of expression profiles among replicated clones aggregated over different genes, resulting in the test statistics T1 and T2. These tests depend on genewise variances and SDs and hence cannot be used to compare the effectiveness of the normalization without estimation of these genewise variances. This leads us to consider the (unweighted) differences of expression profiles among replicated clones aggregated over different genes, standardized after aggregation, resulting in the test statistics T3 and T4. The unscaled test statistics T3 and T4 can be used to compare the effectiveness of normalization without estimating genewise variance (which itself depends on selecting a method of normalization). The aggregated standardization tests T3 and T4 depend on the aggregated SD and variance, which can be more precisely estimated than the genewise SD and variance. As a result, the null distribution of the test statistics T3 and T4 can be more accurately approximated.

We have also demonstrated that the within-array replications are essential for estimating the genewise variance. A novel non-parametric approach is proposed, which aggregates variance information from genes with similar intensity level. Several innovative approaches are proposed to enhance the accuracy of the estimation of the genewise variance. These new methods can also be used to improve the power of selecting significantly differently expressed genes (Cui et al., 2005; Smyth et al., 2005).

Our simulation studies show convincingly the power of the validation tests and their validity. The applications to three real data sets demonstrate the methodological power of our proprietary methods in choosing a normalization method and in assessing whether the systematic biases due to variations in the experimental conditions have been properly removed. The customized array provides an ideal platform for the applications of our proposed approaches. Furthermore, our idea does not restrain to the two-color arrays. As long as replicated genes exist, we could apply our validation test to check the necessity and effectiveness of normalization. The validation tests could have been reliably applied the one-color Affymetrix GeneChip arrays, if they were within-array replications. Without such replications, we need to aggregate information across arrays. Assuming the absence of the gene and array interactions, our validation tests can be applied to the Affymetrix array to check the necessity and effectiveness of normalization, by treating each array as a block in a synthetic super-array.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS OF NORMALIZATION
 3 VALIDATION TESTS
 4 ESTIMATION OF GENEWISE...
 5 SIMULATIONS AND APPLICATIONS
 6 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
The authors thank Dr Yi Ren of Rutgers University for printing the customized arrays used in the study described in the article. This research was supported by NIH Grant R01-GM072611, NSF Grant DMS-0354223 and DMS-0714554.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Olga Troyanskaya

Received on February 22, 2007; revised on June 26, 2007; accepted on July 9, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS OF NORMALIZATION
 3 VALIDATION TESTS
 4 ESTIMATION OF GENEWISE...
 5 SIMULATIONS AND APPLICATIONS
 6 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Cui X, et al. Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics (2005) 6:59–75.[Abstract]

    Dabney AR, Storey J. A new approach to intensity-dependent normalization of two-channel microarrays. Biostatistics (2007a) 8:128–139.[Abstract/Free Full Text]

    Dabney AR, Storey J. Normalization of two-channel microarrays accounting for experimental design and intensity-dependent relationships. Genome Biol (2007b) 8. R44.

    Dudoit S, et al. Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc (2002) 97:77–87.[CrossRef][Web of Science]

    Dudoit S, et al. Multiple hypothesis testing in microarray experiments. Stat. Sci (2003) 18:71–103.[CrossRef][Web of Science]

    Eisenstein M. Microarrays: quality control. Nature (2006) 442:1067–1070.[Medline]

    Fan J, Ren Y. Statistical analysis of DNA microarray data in cancer research. Clin. Cancer Res (2006) 12:4469–4473.[Abstract/Free Full Text]

    Fan J, et al. Semilinear high-dimensional model for normalization of microarray data: a theoretical analysis and partial consistency. J. Am. Stat (2005) 100:781–813. (with discussion).[CrossRef]

    Fan J, et al. Normalization and analysis of cDNA micro-arrays using within-array replications applied to neuroblastoma cell response to a cytokine. Proc. Natl Acad. Sci. USA (2004) 101:1135–1140.[Abstract/Free Full Text]

    Gurland J, Tripathi RC. A simple approximation for unbiased estimation of the standard deviation. Am. Stat (1971) 25:30–32.

    Huang J, et al. A two-way semi-linear model for normalization and significant analysis of cDNA microarray data. J. Am. Stat. Assoc (2005) 100:814–829.[CrossRef][Web of Science]

    Jaeger J, Spang R. Selecting normalization genes for small diagnostic microarrays. BMC Bioinformatics (2006) 7:388.[CrossRef][Medline]

    Kroll TC, Wölfl S. Ranking: a closer look on globalisation methods for normalisation of gene expression arrays. Nucleic Acids Res (2002) 30:1–6.[Abstract/Free Full Text]

    Ma S, et al. Robust semiparametric microarray normalization and significance analysis. Biometrics (2006) 62:555–561.[CrossRef][Web of Science][Medline]

    Marshall E. Getting the noise out of gene arrays. Science (2004) 306:630–631.[Abstract/Free Full Text]

    Patterson T, et al. Performance comparison of one-color and two-color platforms within the MicroArray Qualtiy Control (MAQC) project. Nat. Biotechnol (2006) 24:1140–1150.[CrossRef][Web of Science][Medline]

    Reiner A, et al. Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics (2003) 19:368–375.[Abstract/Free Full Text]

    Shi L, et al. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol (2006) 24:1151–1161.[CrossRef][Web of Science][Medline]

    Smyth G, et al. Use of within-array replicate spots for assessing differential expression in microarray experiments. Bioinformatics (2005) 21:2067–2075.[Abstract/Free Full Text]

    Storey JD, Tibshirani R. Statistical significance for genome-wide studies. Proc. Natl Acad. Sci. USA (2003) 100:9440–9445.[Abstract/Free Full Text]

    Tseng GC, et al. Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res (2001) 29:2549–2557.[Abstract/Free Full Text]

    Tusher VG, et al. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA (2001) 98:5116–5121.[Abstract/Free Full Text]

    Wang D, et al. A robust two-way semi-linear model for normalization of cDNA microarray data. BMC Bioinformatics (2005) 6:14.[CrossRef][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Physiol. GenomicsHome page
T. J. Chu and D. G. Peters
Serial analysis of the vascular endothelial transcriptome under static and shear stress conditions
Physiol Genomics, July 1, 2008; 34(2): 185 - 192.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/18/2391    most recent
btm361v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Fan, J.
Right arrow Articles by Niu, Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Fan, J.
Right arrow Articles by Niu, Y.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?