Skip Navigation


Bioinformatics Advance Access originally published online on January 18, 2005
Bioinformatics 2005 21(9):2067-2075; doi:10.1093/bioinformatics/bti270
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/9/2067    most recent
bti270v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (149)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Smyth, G. K.
Right arrow Articles by Scott, H. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Smyth, G. K.
Right arrow Articles by Scott, H. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

Use of within-array replicate spots for assessing differential expression in microarray experiments

Gordon K. Smyth *, Joëlle Michaud and Hamish S. Scott

Walter and Eliza Hall Institute of Medical Research Melbourne, Vic 3050, Australia

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 1 INTRODUCTION
 2 cDNA MICROARRAY PREPARATION...
 3 THE BALANCED SINGLE...
 4 RESULTS FOR BALANCED...
 5 RESULTS FOR UNBALANCED...
 6 COMBINING WITH EMPIRICAL...
 7 RESULTS WITH SPIKE-IN...
 8 DISCUSSION
 REFERENCES
 

Motivation: Spotted arrays are often printed with probes in duplicate or triplicate, but current methods for assessing differential expression are not able to make full use of the resulting information. The usual practice is to average the duplicate or triplicate results for each probe before assessing differential expression. This results in the loss of valuable information about genewise variability.

Results: A method is proposed for extracting more information from within-array replicate spots in microarray experiments by estimating the strength of the correlation between them. The method involves fitting separate linear models to the expression data for each gene but with a common value for the between-replicate correlation. The method greatly improves the precision with which the genewise variances are estimated and thereby improves inference methods designed to identify differentially expressed genes. The method may be combined with empirical Bayes methods for moderating the genewise variances between genes. The method is validated using data from a microarray experiment involving calibration and ratio control spots in conjunction with spiked-in RNA. Comparing results for calibration and ratio control spots shows that the common correlation method results in substantially better discrimination of differentially expressed genes from those which are not. The spike-in experiment also confirms that the results may be further improved by empirical Bayes smoothing of the variances when the sample size is small.

Availability: The methodology is implemented in the limma software package for R, available from the CRAN repository http://www.r-project.org

Contact: smyth{at}wehi.edu.au


    1 INTRODUCTION
 TOP
 Abstract
 1 INTRODUCTION
 2 cDNA MICROARRAY PREPARATION...
 3 THE BALANCED SINGLE...
 4 RESULTS FOR BALANCED...
 5 RESULTS FOR UNBALANCED...
 6 COMBINING WITH EMPIRICAL...
 7 RESULTS WITH SPIKE-IN...
 8 DISCUSSION
 REFERENCES
 
Microarrays measure the mRNA expression of tens of thousands of genes in a single hybridization experiment. Designed experiments involving two or more microarrays hybridized with RNA from different sources generate expression profiles which can help classify the genes according to functional groups or molecular pathways. Although much attention has been given to the statistical analysis of microarray data many problems are still unresolved (Nguyen et al., 2002; Smyth et al., 2003; Parmigiani et al., 2003; Speed, 2003; Causton et al., 2003; Firestein and Pisetsky, 2002; Tilstone, 2003). Particular challenges and opportunities arise from the multiplicity of genes and the possibilities for parallel inference.

A standard analysis method is to fit the same statistical model separately to the expression measurements for each gene (Wolfinger et al., 2001; Yang and Speed, 2003). A number of authors have noted that inference for each individual gene can be made more reliable by making use of information generated from the whole ensemble of genes (Newton et al., 2001; Tusher et al., 2001; Efron et al., 2001; Efron and Tibshirani, 2002; Lönnstedt and Speed, 2002; Kendziorski et al., 2003; Newton et al., 2004; Smyth, 2004). Such methods have not as yet been applied to experimental designs in which there are technical or biological replicates leading to multiple strata of random variation for each gene. This article develops a between-gene moderation method appropriate for a particular type of technical replication, that of within-array replicate spots. The method proposed is particularly simple in that a suitably chosen parameter is constrained to be common between the genes. The treatment proposed here for within-array replicates may be combined with moderation methods designed for a single error strata.

Spotted microarrays are produced by printing cDNA or oligonucleotide sequences on glass slides using a robotic printer. The spots are laid down using a printhead made up of capillary print tips or pins or inkjets. The DNA is prepared in 96-well or 384-well plates ready for printing, with normally one well for each distinct probe. The robot acquires DNA by dipping the tips of the print head into the wells of the plate before depositing the DNA on the glass slide. In most cases only a small proportion of the DNA in each well is actually printed onto the arrays and any excess is discarded. Provided that there is space on the array, there is no cost, apart from printing time, in printing two or more spots on the arrays from each well. This is accomplished by programming the robotic printer to dip the print head more than once into the same set of wells. This results in a printed array in which each gene appears two or more times a fixed distance apart. Usually multiple printing produces two spots of each gene but an arbitrarily large number of replicate spots may be printed if there is sufficient space to accommodate them on the arrays.

Normally the replicate spots are printed side-by-side in the same row, in the same column or at the top and bottom halves of the array. Any intensity or log-ratio measurements made from the replicate spots will be positively correlated though being observed on the same array. Replicate spots which are side-by-side are likely to be very highly correlated since they are not only printed with the same gene but are also spatially close together and therefore likely to share many common causes including local effects on the array surfaces as well as hybridization and labelling effects. Indeed the value of having multiple prints of each clone on an array has often been questioned given the low within-array variability compared with between-array variability (Tran et al., 2002). Replicate spots in the top and bottom halves of the array are also likely to be positively correlated but less so than side-by-side replicates.

Replicate spots are often used as a quality assessment tool since disagreement between replicates is strong evidence that at least one of the spots is affected by a local artifact. Repeatability of the log-ratios across replicate spots within arrays can be used as a basis for removing outlier spots (Tseng et al., 2001; Hoffmann et al., 2002; Yang et al., 2002; Jenssen et al., 2002; Lyne et al., 2003; König et al., 2004), to construct spot-quality measures (Beissbarth et al., 2000) or to evaluate the effectiveness of a spot-quality scheme (Wang et al., 2001). It is almost universal practice to average the log-intensities or log-ratios obtained from within-array replicate spots before conducting a formal statistical analysis of differential expression (Andrews et al., 2000; Tseng et al., 2001; Berwanger et al., 2002; Hoffmann et al., 2002; Yang et al., 2002; Kaynak et al., 2003; Lyne et al., 2003), although averaging can cause complications when some of the log-ratios are missing or when there are spot-specific quality weights. Many public microarray database programs, such as the Stanford Microarray Database, automatically average log-ratios from duplicate spots. A relatively small number of studies have used within-array replicate level information to improve the assessment of differential expression (Baggerly et al., 2001; Boer et al., 2001; Fan et al., 2004).

The method developed in this paper extracts more information from the within-array replicate spots by estimating the correlation between them. A simple model is explored in which the between-replicate correlation is taken to be constant across genes. The method uses a consensus estimator of the correlation across genes in such a way that the correlation can be taken to be known at the individual gene level. Compared with simply averaging replicate spots, this method greatly improves the precision with which the genewise variances are estimated and thereby improves inference methods designed to identify differentially expressed genes.

The method is validated using data from a microarray experiment involving calibration and ratio control spots in conjunction with spiked-in RNA. Comparing results for calibration and ratio control spots shows that the within-array correlation method results in substantially better discrimination of differentially expressed genes from those which are not compared with simply averaging the replicate spots. Based on these data, the proposed method increases the power to detect differential expression when it is present without incurring a greater rate of Type I errors when it is not.


    2 cDNA MICROARRAY PREPARATION METHODS
 TOP
 Abstract
 1 INTRODUCTION
 2 cDNA MICROARRAY PREPARATION...
 3 THE BALANCED SINGLE...
 4 RESULTS FOR BALANCED...
 5 RESULTS FOR UNBALANCED...
 6 COMBINING WITH EMPIRICAL...
 7 RESULTS WITH SPIKE-IN...
 8 DISCUSSION
 REFERENCES
 
2.1 Spike-in control spots
This study uses data from a set of 26 cDNA microarrays which were printed and hybridized as part of a study on human transcription factors. The paper presents data only from the spike-in control spots.

The arrays were printed at the Australian Genome Research Facility with the Hs8k cDNA clone library from Research Genetics and a selection of control spots. Each array was printed with 12 sets of the Lucidea Universal Scorecard system (Amersham). Spots were printed in duplicate, side-by-side by rows, including the 12 sets of ScoreCard spots.

The RNA samples hybridized to the arrays included ScoreCard spike mixes according to the Lucidea ScoreCard User's Guide. The ScoreCard system includes calibration and ratio control spots designed to generate pre-determined fold changes. Each set of ScoreCard spots includes 10 calibration spots, labeled here Calib1–Calib10, which have a theoretical fold change of 1 and are expressed at successively decreasing intensities. The ratio controls have fold changes as follows: 3-fold up and down at low intensity (3UL and 3DL), 3-fold up and down at high intensity (3UH and 3DH), 10-fold up and down at low intensity (10UL and 10DL) and 10-fold up and down at high intensity (10UH and 10DH). The same spike mix was applied to all the arrays, so the arrays can be treated as a set of replicate arrays for the purposes of the ScoreCard spots.

2.2 Hybridization
An aliquot of 50 µg of total RNA extracted from HeLa cells and 1 µl of either reference or test spike mRNA was reverse transcribed using an anchored oligo(dT) primer and 200 units of Superscript II reverse transcriptase (Invitrogen) in the presence of 25 mM dATP, 25 mM dCTP, 25 mM dGTP, 15 mM aminoallyl-dUTP (SIGMA #A0410) and 10 mM dTTP. The single strand cDNA was purified using QIAquick PCR purification kit (Qiagen) and labelled with CyDye post-labelling dye (Amersham) for an hour. After a second purification as above, both Cy-5 and Cy-3 labelled cDNAs were pooled and mixed to 25 µg of human Cot1 DNA, 38 µg of polyA DNA and 50 µg of salmon sperm DNA. The mixture was concentrated using a vacuum dryer and resuspended in 50% formamide, 5x SSC and 0.1% SDS.

The arrays were incubated in 50% formamide, 5x SSC, 0.1% SDS and 10 mg/ml BSA for 45 min at 42°C, rinsed with distilled water and dried using an air gun. The labelled cDNA mixture was denatured at 95°C for 5 min, incubated at 45°C for 30 min and cooled to room temperature before being pipetted onto the array. The slides were incubated overnight at 42°C in a hybridization chamber (Corning) placed in a water bath. After incubation the slides were washed in 1 x SSC/0.2% SDS solution for 5 min, in 0.1 x SSC/0.2%SDS solution for 5 min, and twice in 0.1 x SSC for 2 min. The slides were then spun dry using a centrifuge.

2.3 Image analysis and normalization
The arrays were scanned using a Genepix 4000B scanner with adjusted settings in order to obtain a similar green and red overall intensity. The images were analyzed using the SPOT software (Buckley, 2000, http://www.cmis.csiro.au/iap/Spot/spotmanual.htm). Foreground intensities were background corrected using the ‘morph’ background measure and the ScoreCard spot log-ratios were normalized using global loess normalization with the default smoothing span of 0.3 (Yang et al., 2001; Smyth and Speed, 2003).


    3 THE BALANCED SINGLE SAMPLE PROBLEM
 TOP
 Abstract
 1 INTRODUCTION
 2 cDNA MICROARRAY PREPARATION...
 3 THE BALANCED SINGLE...
 4 RESULTS FOR BALANCED...
 5 RESULTS FOR UNBALANCED...
 6 COMBINING WITH EMPIRICAL...
 7 RESULTS WITH SPIKE-IN...
 8 DISCUSSION
 REFERENCES
 
3.1 Individual correlations
For simplicity, consider first a series of n replicate two-color microarray experiments, each array hybridized with RNA from the same two sources. Suppose that each gene is replicated m times on each array at a fixed distance apart. Image analysis and normalization of the microarray data will yield a log-ratio of expression ygij for each spot. Here ygij is the log-ratio for gene g = 1,...,G, array i = 1,...,n and replicate j = 1,...,m. Usually ygij is a normalized version of log2(Rgij/Ggij) where Rgij is the measured red intensity while Ggij is the measured green intensity for that spot. Assume that

where µg is the true log-ratio of the expression levels for gene g. Interest lies in estimating µg and especially in testing H0 : µg = 0.

It is reasonable to assume that observations made on different arrays for a given gene are independent or nearly so. On the other hand, replicate observations made on the same array are likely to be correlated, perhaps highly so. For the remainder of this paper, the term ‘replicate spots’ will be taken to refer to spots on the same array. Let {rho}g be the correlation between replicate spots for gene g. We will assume that

and

for j != j'. Observations with different i are assumed to be independent. Observations on different genes on the same array are also likely to be correlated. The correlations between genes, however, are highly problematic to estimate, because of the very large number of genes compared to the number of arrays, and so these correlations are left unspecified in this article. If the replicate spots are close together we expect {rho}g to be large, perhaps close to unity. If the replicate spots are far apart, the correlation will be much smaller. Note that {rho}g is constrained according to –1/(m – 1) ≤ {rho}g ≤ 1 by the requirement that the covariance matrix of the ygij be non-negative definite.

It will be further assumed in this article that the ygij are normally distributed. Although this assumption will be used in deriving specific results in this paper, most of the results of this paper do not depend on normality for their usefulness. See further comments on this issue in Section 8.

For each gene g write for the sample mean of the replicate observations on array i and for overall sample mean across all arrays. For each gene let be the between-arrays standard deviation,

and be the within-arrays standard deviation,

Then , and are mutually independent and sufficient for µg, {sigma}g and {rho}g with

Under this model, inference about µg can be conducted entirely using and . The within standard deviation does not contribute any further information. The maximum likelihood estimator of µg is

and the most powerful test statistic for testing H0 : µg = 0 is

(1)
If µg = 0 then t ~ tn–1. This explains why it is usual practice to average the replicate spots before undertaking inference for microarrays with within-array replication.

It is useful for later reference to consider the estimation of {rho}g even though it does not contribute here to inference about µg. Let

Note that {theta}g is a monotonic increasing transformation of {rho}g which takes values on the whole real line. The transformation is reversed by

which reduces to {rho}g = tanh {theta}g when m = 2. The residual maximum likelihood (REML) (Searle et al., 1992) estimator of {theta}g is

which is distributed as . This shows that

where the bias is determined by the function

where {psi} is the digamma function. The variance is

with

where is the trigamma function. The distribution of is somewhat skew to the left because of the differing degrees of freedom for and . In the worst case with n = m = 2 the bias of is –0.35.

3.2 Common correlation
Now we make the simplifying assumption that the between-replicate correlation is common across genes, {rho}g = {rho} for all g. This assumption is motivated by the belief that the correlation springs mainly from the physical proximity of the replicate spots on the same array. The robotic printing ensures that the spacing between the replicate spots is the same for all genes and all arrays. In practice it will not be necessary that the assumption be precisely true but rather that the correlations be sufficiently stable to make the common correlation model a useful one. This is likely to be true when the between and within standard deviations and are positively associated across genes, meaning that the correlations are much more stable than the variances. This has been true in all microarray experiments seen by the authors so far.

Let

If observations on different genes were independent then the REML estimator of {theta} would be

(2)
which would be distributed as . This estimator remains consistent as n -> {infty} even if the genes are not independent because it requires only that the mean of the and the mean of the converge to quantities in the ratio of 1 + (m – 1){rho} to 1 – {rho}. For the same reasons it requires only weak assumptions on the independence between genes to be consistent as G -> {infty}. In practice this estimator is likely to be very accurate if the number of genes G is large. Under the assumption of dependence between genes, the bias is

and the variance is

both of which are very small when G is large. For example, if G = 1000 and n = m = 2 the above bias is minimal at –0.00025 while the standard deviation is 0.016.

The fact that the correlation is common between genes does not change the estimator for each gene but does substantially change the inference about . Since the common correlation can be estimated very accurately from the ensemble of genes, {rho} may be treated as known to a very good approximation when undertaking inference about each individual gene. This means that can contribute to the estimator of , improving the precision with which we judge whether µg is nonzero. The REML estimator of is approximately

(3)
which is approximately distributed as . The test statistic for testing H0 : µg = 0 now becomes

(4)
and this follows a tnm–1 distribution under the null hypothesis. The number of degrees of freedom associated with the test statistic is more than doubled compared with Section 3.1 even in the most conservative case when there are m = 2 within-array replicates.


    4 RESULTS FOR BALANCED LINEAR MODELS
 TOP
 Abstract
 1 INTRODUCTION
 2 cDNA MICROARRAY PREPARATION...
 3 THE BALANCED SINGLE...
 4 RESULTS FOR BALANCED...
 5 RESULTS FOR UNBALANCED...
 6 COMBINING WITH EMPIRICAL...
 7 RESULTS WITH SPIKE-IN...
 8 DISCUSSION
 REFERENCES
 
Section 3 considered only replicate arrays comparing two RNA sources. The results of Section 3 generalize easily to arbitrarily complicated microarray experiments comparing ≥2 RNA sources. Let yg be the vector containing the nm log-ratios or log-intensities observed for gene g. A general microarray experiment can be represented by a linear model

where X is a known nm x k dimensional design matrix specifying the experimental design and ßg is a vector of k regression coefficients (Yang and Speed, 2003; Smyth, 2004). In order for the linear model to be identifiable, we assume that k < n and that the matrix X is of full column rank. When there are m replicates of each gene on each array, there will be m repeated rows of the design matrix X corresponding to each set of m replicate spots. The covariance matrix is

where Rg is the block diagonal matrix with n blocks equal to the m x m correlation matrix

Let {alpha}g = cTßg, where c is a vector of known constants, be a particular contrast or linear combination of the regression coefficients and suppose that interest lies in testing H0 : {alpha}g = 0. This formulation is sufficiently general to accommodate a wide variety of microarray experiments including dye-swaps, time course experiments and factorial experiments. It is also applicable to single-channel microarray experiments for which ygij is a normalized version of log2 Igij, where Igij is the measured intensity for that spot.

Generalizing from replicate arrays to the linear model causes little extra complication for the theory of Section 2. Let be the n-vector of array means and let be the reduced n x k dimensional design matrix in which there is only one row instead of m rows for each gene by array combination. Then

(5)
To generalize the results of Section 2.2, we simply generalize the between-arrays standard deviation to be m times the residual standard error which arises from fitting the linear model (5). This mean square is now on nk instead of n–1 degrees of freedom. Let be the estimator of ßg from fitting this model and let . The estimate from the reduced linear model (5) is the same as that from the full linear model for yg. Note that

with

The t-statistic (1) arising from the individual correlation model is now

which is on nk degrees of freedom.

Assuming now that {rho}g = {rho}, the common correlation estimator (2) would now be distributed as if the genes were independent. The pooled variance estimator (3) now becomes

which is on mnk degrees of freedom. The t-statistic (4) now becomes

on mnk degrees of freedom and is used to test H0 : {alpha}g = 0.

It can be seen that the relative difference in degrees of freedom between and can be large if k > 1 and especially if k is not <<n. This means that the gain in degrees of freedom of sg over which results from assuming common correlations is especially important for larger values of k, i.e. for designed experiments involving a larger number of distinct RNA sources to be compared.


    5 RESULTS FOR UNBALANCED MODELS
 TOP
 Abstract
 1 INTRODUCTION
 2 cDNA MICROARRAY PREPARATION...
 3 THE BALANCED SINGLE...
 4 RESULTS FOR BALANCED...
 5 RESULTS FOR UNBALANCED...
 6 COMBINING WITH EMPIRICAL...
 7 RESULTS WITH SPIKE-IN...
 8 DISCUSSION
 REFERENCES
 
Suppose that now there are spot-specific weights wgij associated with the observations so that

The weights may arise from quality assessment or quality filtering of the spots (Smyth and Speed, 2003). In general the weights are non-negative but may be permitted to take value zero corresponding to log-ratios or log-intensities which are missing. The linear model is as before

but the covariance matrix is now

Unlike in Section 3 the estimator of ßg is now somewhat dependent on the estimated value for {rho}g. This produces an unbalanced statistical model in which there are no non-iterative formulae for the REML estimators of {sigma}g or {rho}g. On the other hand, iterative computational procedures are readily available to compute the numerical REML estimates and for any given dataset (Pinheiro and Bates, 2000).

Assume now that {rho}g = {rho}. Even assuming independence between the genes, exact REML estimation of the common correlation would require iterative computation using the entire dataset. This is at best very unattractive computationally and would in most cases involve prohibitive memory storage requirements. An alternative and much easier strategy is to estimate the common correlation {rho} by combining the individual correlation estimators . The fact that this estimation method is not fully efficient is not important when the number of genes is large. Let

where is the REML estimator of {rho}g from the data for gene g. By analogy with the balanced case we can conclude that

where is the between-array degrees of freedom and is the within-array degrees of freedom for gene g. A combined estimator of {theta} is

This estimator is consistent for {theta} as n -> {infty} regardless of the dependence structure between the genes and is consistent as G -> {infty} given weak assumptions on the dependence structure. The estimator of {rho} is recovered by

Having estimated the common correlation, the regression coefficients can be estimated by weighted least squares of yg on X with weight matrix

where is equal to Rg with substituted for {rho}g. The weighted least squares estimator is

and the approximate REML estimator of {sigma}2g is the residual variance

The t-statistic for testing H0 : {alpha} = 0 is

where

is the unscaled variance of . The t-statistic is on nmk degrees of freedom. If wgij = 1 then and tg reduce to the same forms as in the balanced case in Section 3 apart from differences in the estimation of {rho}, specifically the replacement of with .

Note that the t-statistic tg is not sensitive to small changes in the correlation correlation , since the estimated residual variance will tend to compensate. This reassures us that the common correlation model will not lead to misleading results if it fails to be exactly correct for some genes.

The efficiency of relative to the REML estimator can be computed for the balanced case under the assumption of independence between genes. When G = 1000 and n = m = 2, the standard deviation of is 0.029, showing that its relative efficiency compared to the REML estimator is about 30%. This is the worst case; efficiency increases with the number of arrays. For example, the efficiency is 70% if there are n = 6 arrays. For the purposes of the methodology of this paper, these are acceptable efficiencies.


    6 COMBINING WITH EMPIRICAL BAYES MODERATION
 TOP
 Abstract
 1 INTRODUCTION
 2 cDNA MICROARRAY PREPARATION...
 3 THE BALANCED SINGLE...
 4 RESULTS FOR BALANCED...
 5 RESULTS FOR UNBALANCED...
 6 COMBINING WITH EMPIRICAL...
 7 RESULTS WITH SPIKE-IN...
 8 DISCUSSION
 REFERENCES
 
A number of authors have shown that one can improve on the use of t-statistics for assessing differential expression in microarray experiments by using appropriate shrinkage methods to moderate the genewise sample variances (Baldi and Long, 2001; Efron et al., 2001; Tusher et al., 2001; Lönnstedt and Speed, 2002; Broberg, 2003; Smyth, 2004). We show here that the empirical Bayes method of Smyth (2004) combines in a natural way with the methods of this paper.

In the separate correlation model of Section 2.1, an inverse Gamma prior would be applied to the between-array variances yielding posterior variances

where is the prior value and d0 the prior degrees of freedom. Replacing the sample variance in Equation (1) with the posterior variance produces the moderated t-statistic

which follows a t-distribution on n – 1 + d0 degrees of freedom if µg = 0 (Smyth, 2004). In the common correlation model of Section 2.2 an inverse Gamma prior would be applied to yielding posterior variances

Replacing the sample variance in Equation (4) with the posterior variance produces the moderated t-statistic

which is t-distributed on nm – 1 + d0 degrees of freedom if µg = 0. The same technique could be applied to the individual and common correlation models of Sections 3 and 4. In Section 4 an inverse Gamma prior for would lead to posterior variances

and to moderated t-statistics

on nmk+d0 degrees of freedom. The use of empirical Bayes results in effect in a further d0 degrees of freedom for the estimation of the genewise sample variances, where d0 is estimated from the data. The common correlation methodology proposed in this paper and the use of empirical Bayes to smooth the variances are complementary techniques in the sense that using both techniques together results in the greatest possible increase in the effective degrees of freedom for estimating the variances.

Note that empirical Bayes smoothing could in principle be applied to the correlations as well as the variances. In fact, smoothing the between and within variances and independently leads immediately to smoothed correlation estimators from

This, however, turns out to be equivalent to averaging the log-ratios over replicate spots and then applying smoothing to the variances, i.e. empirical Bayes smoothing of the correlations does not add more information over that of smoothing the variances alone. So it appears that to get an extra benefit it is necessary to smooth the correlations to a greater degree than the variances, e.g. by setting them equal as in this paper.


    7 RESULTS WITH SPIKE-IN DATA
 TOP
 Abstract
 1 INTRODUCTION
 2 cDNA MICROARRAY PREPARATION...
 3 THE BALANCED SINGLE...
 4 RESULTS FOR BALANCED...
 5 RESULTS FOR UNBALANCED...
 6 COMBINING WITH EMPIRICAL...
 7 RESULTS WITH SPIKE-IN...
 8 DISCUSSION
 REFERENCES
 
The methodology is demonstrated on a set of 26 microarrays for which the differential expression status of a set of control spots is known. Figure 1 shows boxplots of t-statistics for the ScoreCard series of control spots. There are 12 t-statistics in each box. The grey filled boxplots, on the left of each pair of boxplots, show statistics computed using common correlations while the white boxplots on the right of each pair show statistics computed by averaging the duplicate spots.



View larger version (21K):
[in this window]
[in a new window]
 
Fig. 1 Boxplots of Z-score equivalents of ordinary t-statistics (top) and of moderated t-statistics (bottom) for different types of spike-in spot-pairs. The grey filled boxes are for statistics based on estimated between-replicate correlations while the unfilled (white) boxes are for statistics based on averaging the replicate observations. Statistics are calculated from the whole series of 26 arrays. Control spots labeled Calib1–10 are non-differentially expressed calibration spots at increasing dilutions and therefore decreasing intensities. Control spots labeled 3DL and 3UL are ratio controls designed to be 3-fold downregulated and 3-fold upregulated, respectively. Control spots labeled 3DH and 3UH are similar but at high rather than low intensity. Control spots labeled 10DL, 10UL, 10DH and 10UH are similar but are 10-fold up or downregulated.

 
The t-statistics produced by averaging the duplicate spots are on fewer degrees of freedom than those produced by the common correlation method, meaning that they are not directly comparable on the basis of magnitudes alone. One way to compare the t-statistics would be to compute P-values. The vertical axis in the plot actually shows Z-score equivalents of the t-statistics, i.e. the standard normal deviate which has the same P-value as has the t-statistic. The Z-scores put t-statistics with different degrees of freedom on the same scale. Comparing Z-scores is equivalent to comparing P-values but the Z-scores are better suited to graphical presentation.

An ideal test statistic will show Z-score values which are randomly distributed about zero with as little variability as possible for the calibration spots and Z-scores as far from zero as possible for the ratio controls. The greater the separation between the calibration values and the ratio values, the better the performance of the statistic. The plot shows that the t-statistics computed assuming common correlations give much larger absolute Z-scores for the differentially expressed genes while maintaining a similar null distribution for the non-differentially expressed spots. This shows that the common correlation t-statistics have greater power for detecting differential expression while producing no more false positives on average.

The bottom panel of Figure 1 shows the results with empirical Bayes smoothing of the sample variations while the top panel shows results with ordinary t-statistics. The relatively large number of arrays here means that the sample variances are fairly reliable so that use of empirical Bayes changes the picture only slightly.

The minimum number of arrays for which t-statistics can be computed is two, this being the minimum number to return a degree of freedom for error when the duplicate spot values are averaged. In order to examine this extreme situation, we separated the 26 arrays into 13 pairs of arrays and computed t-statistics for each pair. Figure 2 shows the results. Each boxplot in Figure 2 represents 156 values, i.e. 12 values for each of the 13 pairs. The fact that t-statistics are computed from only 2 arrays instead of all 26 means that they are less able to distinguish which spots are differentially expressed; however, the t-statistics using common correlations do markedly better. As before, the common correlation t-statistics have greater power to detect differential expression while having a similar null distribution (Fig. 2, top panel). The relative gain of the common correlation method compared with averaging the duplicate spots is perhaps even greater here with n = 2 than with the larger sample size.



View larger version (30K):
[in this window]
[in a new window]
 
Fig. 2 Boxplots of Z-score equivalents of ordinary t-statistics (top) and of moderated t-statistics (bottom) for different types of spike-in spot-pairs. The grey filled boxes are for statistics based on estimated between-replicate correlations while the unfilled (white) boxes are for statistics based on averaging the replicate observations. Statistics are calculated from two arrays. The boxes include statistics from 13 such sets of two arrays.

 
With only two arrays for each t-statistic, the sample variances are rather unreliably estimated. When the duplicate spots are averaged, the sample variances have in fact only one degree of freedom. In this situation, empirical Bayes smoothing can be expected to have a large impact on the reliability of the statistics. The bottom panel of Figure 2 shows the same results as the left but with empirical Bayes smoothing of the variances. The empirical Bayes method greatly improves the performance of both statistics, with and without common correlations, and the separation of calibration and ratio values is improved relative to the top panel. The comparison between common and individual correlations is no longer clear cut because the Z-scores for ratio controls without common correlations are so variable, sometimes larger and sometimes smaller than the statistics with common correlations. The important observation here is that the common correlation (grey) boxes are noticeably more compact than the white boxes for all intensities of calibration spots, i.e. the rate of false discoveries is reduced. Furthermore, assuming common correlations also gives larger median Z-scores for all types of ratio controls except for 10DL, meaning that the common correlation method gives greater power in most cases as well as better control of type I errors.


    8 DISCUSSION
 TOP
 Abstract
 1 INTRODUCTION
 2 cDNA MICROARRAY PREPARATION...
 3 THE BALANCED SINGLE...
 4 RESULTS FOR BALANCED...
 5 RESULTS FOR UNBALANCED...
 6 COMBINING WITH EMPIRICAL...
 7 RESULTS WITH SPIKE-IN...
 8 DISCUSSION
 REFERENCES
 
This paper shows that setting the between-replicate correlation constant across genes is a useful strategy. Results using spike-in probes show that the statistics assuming common correlations give clearly improved assessment of differential expression. Any bias which is introduced by assuming correlations to be equal seems to be more than offset by an increase in the precision with which the genewise variances are estimated. When the number of arrays is small, the spike-in results were further improved by empirical Bayes smoothing of the sample variances as in Smyth (2004).

The authors have applied the methodology to a variety of microarray experiments with arrays printed in several different laboratories with several different clone libraries. Our experience has been that correlations between side-by-side duplicates are estimated typically in the range 0.7–0.9, suggesting that side-by-side duplicates share about half of their variability as measured by squared correlation. Correlations between replicates in the top and bottom halves of the array are typically estimated in the range 0.5–0.6, suggesting that duplicates at the maximum distance apart still share about a quarter of their variability. These observations are consistent with the idea that spots which are further apart should be less highly correlated.

In most experiments the genewise correlation estimates are found to be too variable across genes to be compatible with a common true correlation and the theoretical scaled Fnk,n sampling variability for (data not shown). In other words the assumption of constant correlation across genes does not appear to be strictly tenable. On the other hand, the between and within sample variances and have been found to be positively associated, meaning that the correlation estimates are less variable, relative to the theoretical F-distribution, than are the sample variances relative to their theoretical {chi}2 distribution. So the assumption of constant correlation appears to be valid in practice in the relative sense that the correlations are more nearly constant than are the variances themselves.

The effectiveness of the common correlation model seems to be due to three main characteristics. First, the estimated common correlation is very stable, it being a consensus estimator derived from a large number of genes. This stability results in a favorable variance–bias trade-off, especially for small data sets. Second, the correlation is a nuisance parameter rather than a quantity of primary interest. It has been noted that the genewise t-statistics are not sensitive to small changes in the correlation estimate, so it is not necessary to track small differences in the genewise correlations provided that the common correlation estimate is broadly correct. Third, the common correlation model causes genes with poor quality data to be downweighted. Good quality data seems to give rise to consistently high correlations between replicate spots. Even for arrays with a lot of poor quality spots, the common correlation is generally large and positive. Those genes which do give rise to low or even negative correlations seem to do so most often because of poor quality data, e.g. artifacts on the surface of the arrays which affect only one of a set of replicate spots. Holding the correlation fixed forces the estimated residual variance for these genes to be relatively large to reflect the disagreement between the replicate spots. This means that statistical significance for these genes is downweighted, a phenomenon which seems conservative and desirable. Allowing each gene to estimate its own correlation would cause disagreements between replicate spots to be disregarded.

The formal calculations in this paper have assumed normality of the expression log-ratios as well as independence and constant variances across arrays for each gene. There are several reasons to expect the methodology to remain useful even for data which deviate from normality. First, apart from the bias correction b(f1,f2) which is relatively small in magnitude, the estimators derived here remain consistent given only the first and second moments of the response distribution. Second, the estimation procedures can be modified to make them more resistant to outliers. A simple method which has proved effective is to estimate {theta} not from in Section but from a robust mean of the . This has the effect of ignoring a small proportion of aberrant correlation estimates. The default estimator used in the limma software package is the trimmed mean removing 15% of the values from each tail.

Third, and perhaps most importantly, the basic purpose of differential expression analyses for microarray data is to rank the genes in terms of evidence for differential expression (Smyth et al., 2003). An effective ranking, which reliably ranks the truly differentially expressed genes near the top, is even more important than the ability to decide which genes are significantly differentially expressed. It is more important then that the P-values for different genes are correctly ordered than that the P-values have the correct uniform null distribution. On this measure, the common correlation method has clear advantages over alternatives even when the underlying model is not exactly correct. It is more effective than averaging the replicate spots because it takes into account deviations between replicates when estimating the precision of the data for each gene. Compared with rank-based or permutation tests, the parametric method described here has the advantage of greater resolution, i.e. lack of granularity in the estimators and P-values, allowing genes to be more finely graduated for small or moderate sized datasets.

For the reasons explained above, the application of the methods described here is not limited to high quality data sets for which normality might be reasonable nor to very large data sets for which rigorous checking of the distributional assumptions might be feasible. In fact the benefits, relative to alternative methods such as averaging the replicate spots or simply ranking genes on fold change, may be most pronounced in experiments with very few arrays or with poor quality data. This expectation is borne out by the spike-in experimental data with n = 2.

As with any statistical modelling technique, it is assumed that appropriate quality assessment has been done of the data before application of the method proposed here. It has been described in Section how the method is capable of incorporating spot and array quality weights which might arise from such quality assessment.

The method used in this paper differs from previous work on empirical Bayes or shrinkage estimators in that a suitably chosen parameter is simply set equal across genes. The idea works here because the correlation parameter is of secondary interest from an inferential point of view and because it is relatively stable across genes. The technique is applicable to other situations involving mixed model analyses of microarray data such as those with technical as well as biological replication or the separate channel analyses described by Jin et al. (2001) and Wolfinger et al. (2001). These situations have within-block or within-spot correlations for which consensus estimators might be used across genes.

The methods described in this paper are implemented in the software package limma for the R computing environment (Smyth et al., 2004, http://bioinf.wehi.edu.au/limma). Limma is part of the Bioconductor project at http://www.bioconductor.org (Gentleman et al., 2004).


    Acknowledgments
 
The authors thank Lisa Martin, Melanie O'Keefe and Cathy Jensen of the Australian Genome Research Facility for printing the microarrays used in the study described in the paper. Thanks are due to Terry Speed and Matthew Ritchie for valuable discussions. This research was supported by NHMRC Grants 257501 and 257529.

Received on May 9, 2004; revised on December 26, 2004; accepted on January 7, 2005

    REFERENCES
 TOP
 Abstract
 1 INTRODUCTION
 2 cDNA MICROARRAY PREPARATION...
 3 THE BALANCED SINGLE...
 4 RESULTS FOR BALANCED...
 5 RESULTS FOR UNBALANCED...
 6 COMBINING WITH EMPIRICAL...
 7 RESULTS WITH SPIKE-IN...
 8 DISCUSSION
 REFERENCES
 

    Andrews, J., Bouffard, G.G., Cheadle, C., Lu, J., Becker, K.G., Oliver, B. (2000) Gene discovery using computational and microarray analysis of transcription in the Drosophila melanogaster testis. Genome Res., 10, 2030–2043[Abstract/Free Full Text].

    Baggerly, K.A., Coombes, K.R., Hess, K.R., Stivers, D.N., Abruzzo, L.V., Zhang, W. (2001) Identifying differentially expressed genes in cDNA microarray experiments. J. Comput. Biol., 8, 639–659[CrossRef][Web of Science][Medline].

    Baldi, P. and Long, A.D. (2001) A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics, 17, 509–519[Abstract/Free Full Text].

    Beissbarth, T., Fellenberg, K., Brors, B., Arribas-Prat, R., Boer, J., Hauser, N.C., Scheideler, M., Hoheisel, J.D., Schutz, G., Poustka, A., Vingron, M. (2000) Processing and quality control of DNA array hybridization data. Bioinformatics, 16, 1014–1022[Abstract/Free Full Text].

    Berwanger, B., Hartmann, O., Bergmann, E., Bernard, S., Nielsen, D., Krause, M., Kartal, A., Flynn, D., Wiedemeyer, R., Schwab, M., Schafer, H., Christiansen, H., Eilers, M. (2002) Loss of a FYN-regulated differentiation and growth arrest pathway in advanced stage neuroblastoma. Cancer Cell, 2, 377–386[CrossRef][Web of Science][Medline].

    Boer, J.M., Huber, W.K., Sültmann, H., Wilmer, F., von Heydebreck, A., Haas, S., Korn, B., Gunawan, B., Vente, A., Fuzesi, L., Vingron, M., Poustka, A. (2001) Identification and classification of differentially expressed genes in renal cell carcinoma by expression profiling on a global human 31,500-element cDNA array. Genome Res., 11, 1861–1870[Abstract/Free Full Text].

    Broberg, P. (2003) Statistical methods for ranking differentially expressed genes. Genome Biol., 4, R41[CrossRef][Medline].

    Buckley, M.J. Spot User's Guide, (2000) , Sydney, Australia CSIRO Mathematical and Information Sciences.

    Causton, H.C., Quackenbush, J., Brazma, A. Microarray Gene Expression Data Analysis: A Beginner's Guide, (2003) , Malden, MA Blackwell Publishing.

    Efron, B. and Tibshirani, R. (2002) Empirical Bayes methods and false discovery rates for microarrays. Genet. Epidemiol., 23, 70–86[CrossRef][Web of Science][Medline].

    Efron, B., Tibshirani, R., Storey, J.D., Tusher, V. (2001) Empirical Bayes analysis of a microarray experiment. J. Am. Stat. Assoc., 96, 1151–1160[CrossRef][Web of Science].

    Fan, J., Tam, P., Woude, G.V., Ren, Y. (2004) Normalization and analysis of cDNA microarrays using within-array replications applied to neuroblastoma cell response to a cytokine. Proc. Natl Acad. Sci. USA, 101, 1135–1140[Abstract/Free Full Text].

    Firestein, G.S. and Pisetsky, D.S. (2002) DNA microarrays: boundless technology or bound by technology? Guidelines for studies using microarray technology. Arthritis Rheum., 46, 859–861[CrossRef][Web of Science][Medline].

    Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R., Leisch, F., Li, C., Maechler, M., Rossini, A.J., Sawitzki, G., Smith, C., Smyth, G., Tierney, L., Yang, J.Y., Zhang, J. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol., 5, R80[CrossRef][Medline].

    Hoffmann, K.F., Johnston, D.A., Dunne, D.W. (2002) Identification of Schistosoma mansoni gender-associated gene transcripts by cDNA microarray profiling. Genome Biol., 3, Research0041.

    Jenssen, T.K., Langaas, M., Kuo, W.P., Smith-Sorensen, B., Myklebost, O., Hovig, E. (2002) Analysis of repeatability in spotted cDNA microarrays. Nucleic Acids Res., 30, 3235–3244[Abstract/Free Full Text].

    Jin, W., Riley, R.M., Wolfinger, R.D., White, K.P., Passador-Gurgel, G., Gibson, G. (2001) The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster. Nat. Genet., 29, 389–395[CrossRef][Web of Science][Medline].

    Kaynak, B., von Heydebreck, A., Mebus, S., Seelow, D., Hennig, S., Vogel, J., Sperling, H.P., Pregla, R., Alexi-Meskishvili, V., Hetzer, R., Lange, P.E., Vingron, M., Lehrach, H., Sperling, S. (2003) Genome-wide array analysis of normal and malformed human hearts. Circulation, 107, 2467–2474[Abstract/Free Full Text].

    Kendziorski, C.M., Newton, M.A., Lan, H., Gould, M.N. (2003) On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Stat. Med., 22, 3899–3914[CrossRef][Web of Science][Medline].

    König, R., Baldessari, D., Pollet, N., Niehrs, C., Eils, R. (2004) Reliability of gene expression ratios for cDNA microarrays in multiconditional experiments with a reference design. Nucleic Acids Res., 32, e29[Abstract/Free Full Text].

    Lönnstedt, I. and Speed, T. (2002) Replicated microarray data. Stat. Sinica, 12, 31–46.

    Lyne, R., Burns, G., Mata, J., Penkett, C.J., Rustici, G., Chen, D., Langford, C., Vetrie, D., Bahler, J. (2003) Whole-genome microarrays of fission yeast: characteristics, accuracy, reproducibility, and processing of array data. BMC Genomics, 4, 27[CrossRef][Medline].

    Newton, M.A., Kendziorski, C.M., Richmond, C.S., Blattner, F.R., Tsui, K.W. (2001) On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J. Comput. Biol., 8, 37–52[CrossRef][Web of Science][Medline].

    Newton, M.A., Noueiry, A., Sarkar, D., Ahlquist, P. (2004) Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics, 5, 155–176[Abstract].

    Nguyen, D.V., Arpat, A.B., Wang, N., Carroll, R.J. (2002) DNA microarray experiments: biological and technological aspects. Biometrics, 58, 701–717[CrossRef][Web of Science][Medline].

    (Eds.). The Analysis of Gene Expression Data. Statistics for Biology and Health, (2003) , NY Springer-Verlag.

    Pinheiro, J.C. and Bates, D.M. Mixed-effects Models in S and S-PLUS. Statistics and computing, (2000) , NY Springer.

    Searle, S.R., Casella, G., McCulloch, C.E. (1992) Wiley series in probability and mathematical statistics. Applied probability and statistics. Variance Components, , NY Wiley.

    Smyth, G.K. (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol., 3, Article 3.

    Smyth, G.K. and Speed, T. (2003) Normalization of cDNA microarray data. Methods, 31, 265–273[CrossRef][Web of Science][Medline].

    Smyth, G.K., Yang, Y.H., Speed, T. (2003) Statistical issues in cDNA microarray data analysis. Methods Mol. Biol., 224, 111–136[Medline].

    Smyth, G.K., Thorne, N., Wettenhall, J. Limma: Linear Models for Microarray, User's Guide, (2004) .

    (Ed.). Statistical Analysis of Gene Expression Microarray Data, (2003) , Boca Raton, FL Chapman and Hall/CRC press.

    Tilstone, C. (2003) DNA microarrays: vital statistics. Nature, 424, 610–612[CrossRef][Medline].

    Tran, P.H., Peiffer, D.A., Shin, Y., Meek, L.M., Brody, J.P., Cho, K.W. (2002) Microarray optimizations: increasing spot accuracy and automated identification of true microarray signals. Nucleic Acids Res., 30, e54[Abstract/Free Full Text].

    Tseng, G.C., Oh, M.K., Rohlin, L., Liao, J.C., Wong, W.H. (2001) Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res., 29, 2549–2557[Abstract/Free Full Text].

    Tusher, V.G., Tibshirani, R., Chu, G. (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA, 98, 5116–5121[Abstract/Free Full Text].

    Wang, X., Ghosh, S., Guo, S.W. (2001) Quantitative quality control in microarray image processing and data acquisition. Nucleic Acids Res., 29, E75-5.

    Wolfinger, R.D., Gibson, G., Wolfinger, E.D., Bennett, L., Hamadeh, H., Bushel, P., Afshari, C., Paules, R.S. (2001) Assessing gene significance from cDNA microarray expression data via mixed models. J. Comput. Biol., 8, 625–637[CrossRef][Web of Science][Medline].

    Yang, Y.H. and Speed, T.P. (2003) Design and analysis of comparative microarray experiments. In Speed, T.P. (Ed.). Statistical Analysis of Gene Expression Microarray Data, , FL Chapman and Hall/CRC Press, pp. 35–91.

    Yang, Y.H., Dudoit, S., Luu, P., Speed, T.P. (2001) Normalization for cDNA microarray data. In Bittner, M.L., Chen, Y., Dorsel, A.N., Dougherty, E.R. (Eds.). Microarrays: Optical Technologies and Informatics, , pp. 141–152 Volume 4266 of Proceedings of SPIE.

    Yang, I.V., Chen, E., Hasseman, J.P., Liang, W., Frank, B.C., Wang, S., Sharov, V., Saeed, A.I., White, J., Li, J., Lee, N.H., Yeatman, T.J., Quackenbush, J. (2002) Within the fold: assessing differential expression measures and reproducibility in microarray assays. Genome Biol., 3, Research 0062.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
EndocrinologyHome page
B. Morte, D. Diez, E. Auso, M. M. Belinchon, P. Gil-Ibanez, C. Grijota-Martinez, D. Navarro, G. Morreale de Escobar, P. Berbel, and J. Bernal
Thyroid Hormone Regulation of Gene Expression in the Developing Rat Fetal Cerebral Cortex: Prominent Role of the Ca2+/Calmodulin-Dependent Protein Kinase IV Pathway
Endocrinology, February 1, 2010; 151(2): 810 - 820.
[Abstract] [Full Text] [PDF]


Home page
BloodHome page
N. Droin, A. Jacquel, J.-B. Hendra, C. Racoeur, C. Truntzer, D. Pecqueur, N. Benikhlef, M. Ciudad, L. Guery, V. Jooste, et al.
Alpha-defensins secreted by dysplastic granulocytes inhibit the differentiation of monocytes in chronic myelomonocytic leukemia
Blood, January 7, 2010; 115(1): 78 - 88.
[Abstract] [Full Text] [PDF]


Home page
MicrobiologyHome page
W. Vongsangnak, M. Salazar, K. Hansen, and J. Nielsen
Genome-wide analysis of maltose utilization and regulation in aspergilli
Microbiology, December 1, 2009; 155(12): 3893 - 3902.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
A. J. Jasinska, S. Service, O.-w. Choi, J. DeYoung, O. Grujic, S.-y. Kong, M. J. Jorgensen, J. Bailey, S. Breidenthal, L. A. Fairbanks, et al.
Identification of brain transcriptional variation reproduced in peripheral blood: an approach for mapping brain expression traits
Hum. Mol. Genet., November 15, 2009; 18(22): 4415 - 4427.
[Abstract] [Full Text] [PDF]


Home page
J Exp BotHome page
I. Ginzberg, G. Barel, R. Ophir, E. Tzin, Z. Tanami, T. Muddarangappa, W. de Jong, and E. Fogelman
Transcriptomic profiling of heat-stress response in potato periderm
J. Exp. Bot., November 1, 2009; 60(15): 4411 - 4421.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
X.-Q. Xia, M. McClelland, S. Porwollik, W. Song, X. Cong, and Y. Wang
WebArrayDB: cross-platform microarray data analysis and public data repository
Bioinformatics, September 15, 2009; 25(18): 2425 - 2429.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
J. M. Peters, R. A. Mooney, P. F. Kuan, J. L. Rowland, S. Keles, and R. Landick
Rho directs widespread termination of intragenic and stable RNA transcription
PNAS, September 8, 2009; 106(36): 15406 - 15411.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Y. Benita, H. Kikuchi, A. D. Smith, M. Q. Zhang, D. C. Chung, and R. J. Xavier
An integrative genomics approach identifies Hypoxia Inducible Factor-1 (HIF-1)-target genes that form the core response to hypoxia
Nucleic Acids Res., August 1, 2009; 37(14): 4587 - 4602.
[Abstract] [Full Text] [PDF]


Home page
CSH ProtocolsHome page
Y. Wang, M. McClelland, and X.-Q. Xia
Analyzing Microarray Data Using WebArray
CSH Protocols, August 1, 2009; 2009(8): pdb.prot5260 - pdb.prot5260.
[Abstract] [Full Text]


Home page
J. Biol. Chem.Home page
B. Schade, T. Rao, N. Dourdin, R. Lesurf, M. Hallett, R. D. Cardiff, and W. J. Muller
PTEN Deficiency in a Luminal ErbB-2 Mouse Model Results in Dramatic Acceleration of Mammary Tumorigenesis and Metastasis
J. Biol. Chem., July 10, 2009; 284(28): 19018 - 19026.
[Abstract] [Full Text] [PDF]


Home page
Mol PlantHome page
L. Song, X.-Y. Zhou, L. Li, L.-J. Xue, X. Yang, and H.-W. Xue
Genome-Wide Analysis Revealed the Complex Regulatory Network of Brassinosteroid Effects in Photomorphogenesis
Mol Plant, July 1, 2009; 2(4): 755 - 772.
[Abstract] [Full Text] [PDF]


Home page
Molecular Cancer TherapeuticsHome page
J. Kwong, H. Kulbe, D. Wong, P. Chakravarty, and F. Balkwill
An antagonist of the chemokine receptor CXCR4 induces mitotic catastrophe in ovarian cancer cells
Mol. Cancer Ther., July 1, 2009; 8(7): 1893 - 1905.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
B. Singh, U. Avci, S. E. Eichler Inwood, M. J. Grimson, J. Landgraf, D. Mohnen, I. Sorensen, C. G. Wilkerson, W. G.T. Willats, and C. H. Haigler
A Specialized Outer Layer of the Primary Cell Wall Joins Elongating Cotton Fibers into Tissue-Like Bundles
Plant Physiology, June 1, 2009; 150(2): 684 - 699.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
H. Guo, L. Li, H. Ye, X. Yu, A. Algreen, and Y. Yin
Three related receptor-like kinases are required for optimal cell elongation in Arabidopsis thaliana
PNAS, May 5, 2009; 106(18): 7648 - 7653.
[Abstract] [Full Text] [PDF]


Home page
Cancer Res.Home page
V. C. Daniel, L. Marchionni, J. S. Hierman, J. T. Rhodes, W. L. Devereux, C. M. Rudin, R. Yung, G. Parmigiani, M. Dorsch, C. D. Peacock, et al.
A Primary Xenograft Model of Small-Cell Lung Cancer Reveals Irreversible Changes in Gene Expression Imposed by Culture In vitro
Cancer Res., April 15, 2009; 69(8): 3364 - 3373.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. Leprohon, D. Legare, F. Raymond, E. Madore, G. Hardiman, J. Corbeil, and M. Ouellette
Gene expression modulation is associated with gene amplification, supernumerary chromosomes and chromosome loss in antimony-resistant Leishmania infantum
Nucleic Acids Res., April 1, 2009; 37(5): 1387 - 1399.
[Abstract] [Full Text] [PDF]


Home page
J. Immunol.Home page
R. Brambilla, T. Persaud, X. Hu, S. Karmally, V. I. Shestopalov, G. Dvoriantchikova, D. Ivanov, L. Nathanson, S. R. Barnum, and J. R. Bethea
Transgenic Inhibition of Astroglial NF-{kappa}B Improves Functional Outcome in Experimental Autoimmune Encephalomyelitis by Suppressing Chronic Central Nervous System Inflammation
J. Immunol., March 1, 2009; 182(5): 2628 - 2640.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
S. G. Dashper, C.-S. Ang, P. D. Veith, H. L. Mitchell, A. W. H. Lo, C. A. Seers, K. A. Walsh, N. Slakeski, D. Chen, J. P. Lissel, et al.
Response of Porphyromonas gingivalis to Heme Limitation in Continuous Culture
J. Bacteriol., February 1, 2009; 191(3): 1044 - 1055.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
L. Milani, A. Lundmark, J. Nordlund, A. Kiialainen, T. Flaegstad, G. Jonmundsson, J. Kanerva, K. Schmiegelow, K. L. Gunderson, G. Lonnerholm, et al.
Allele-specific gene expression patterns in primary leukemic cells reveal regulation of gene expression by CpG site methylation
Genome Res., January 1, 2009; 19(1): 1 - 11.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
F. Hommais, C. Oger-Desfeux, F. Van Gijsegem, S. Castang, S. Ligori, D. Expert, W. Nasser, and S. Reverchon
PecS Is a Global Regulator of the Symptomatic Phase in the Phytopathogenic Bacterium Erwinia chrysanthemi 3937
J. Bacteriol., November 15, 2008; 190(22): 7508 - 7522.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
K. Lemuth, T. Hardiman, S. Winter, D. Pfeiffer, M. A. Keller, S. Lange, M. Reuss, R. D. Schmid, and M. Siemann-Herzberg
Global Transcription and Metabolic Flux Analysis of Escherichia coli in Glucose-Limited Fed-Batch Cultivations
Appl. Envir. Microbiol., November 15, 2008; 74(22): 7002 - 7015.
[Abstract] [Full Text] [PDF]


Home page
Eukaryot CellHome page
M. M. Abhyankar, A. E. Hochreiter, J. Hershey, C. Evans, Y. Zhang, O. Crasta, B. W. S. Sobral, B. J. Mann, W. A. Petri Jr., and C. A. Gilchrist
Characterization of an Entamoeba histolytica High-Mobility-Group Box Protein Induced during Intestinal Infection
Eukaryot. Cell, September 1, 2008; 7(9): 1565 - 1572.
[Abstract] [Full Text] [PDF]


Home page
EndocrinologyHome page
D. Diez, C. Grijota-Martinez, P. Agretti, G. De Marco, M. Tonacchera, A. Pinchera, G. Morreale de Escobar, J. Bernal, and B. Morte
Thyroid Hormone Action in the Adult Brain: Gene Expression Profiling of the Effects of Single and Multiple Doses of Triiodo-L-Thyronine in the Rat Striatum
Endocrinology, August 1, 2008; 149(8): 3989 - 4000.
[Abstract] [Full Text] [PDF]


Home page
CarcinogenesisHome page
E. Hervouet, A. Cizkova, J. Demont, A. Vojtiskova, P. Pecina, N. L.W. Franssen-van Hal, J. Keijer, H. Simonnet, R. Ivanek, S. Kmoch, et al.
HIF and reactive oxygen species regulate oxidative phosphorylation in cancer
Carcinogenesis, August 1, 2008; 29(8): 1528 - 1537.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Soc. Nephrol.Home page
M. J. Vitalone, P. J. O'Connell, E. Jimenez-Vera, A. Yuksel, M. Wavamunno, C. L.-S. Fung, J. R. Chapman, and B. J. Nankivell
Epithelial-to-Mesenchymal Transition in Early Transplant Tubulointerstitial Damage
J. Am. Soc. Nephrol., August 1, 2008; 19(8): 1571 - 1583.
[Abstract] [Full Text] [PDF]


Home page
Plant CellHome page
C.-y. Wu, A. Trieu, P. Radhakrishnan, S. F. Kwok, S. Harris, K. Zhang, J. Wang, J. Wan, H. Zhai, S. Takatsuto, et al.
Brassinosteroids Regulate Grain Filling in Rice
PLANT CELL, August 1, 2008; 20(8): 2130 - 2145.
[Abstract] [Full Text] [PDF]


Home page
Genes Dev.Home page
Y. Ghavi-Helm, M. Michaut, J. Acker, J.-C. Aude, P. Thuriaux, M. Werner, and J. Soutourina
Genome-wide location analysis reveals a role of TFIIS in RNA polymerase III transcription
Genes & Dev., July 15, 2008; 22(14): 1934 - 1947.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
D. Riewe, L. Grosman, A. R. Fernie, C. Wucke, and P. Geigenberger
The Potato-Specific Apyrase Is Apoplastically Localized and Has Influence on Gene Expression, Growth, and Development
Plant Physiology, July 1, 2008; 147(3): 1092 - 1109.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
A. Y.-L. So, S. B. Cooper, B. J. Feldman, M. Manuchehri, and K. R. Yamamoto
Conservation analysis predicts in vivo occupancy of glucocorticoid receptor-binding sequences at glucocorticoid-induced genes
PNAS, April 15, 2008; 105(15): 5745 - 5749.
[Abstract] [Full Text] [PDF]


Home page
Clin. Chem.Home page
S. Szameit, K. Vierlinger, L. Farmer, H. Tuschl, and C. Noehammer
Microarray-Based In Vitro Test System for the Discrimination of Contact Allergens and Irritants: Identification of Potential Marker Genes
Clin. Chem., March 1, 2008; 54(3): 525 - 533.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
T. Mourier, C. Carret, S. Kyes, Z. Christodoulou, P. P. Gardner, D. C. Jeffares, R. Pinches, B. Barrell, M. Berriman, S. Griffiths-Jones, et al.
Genome-wide discovery and verification of novel structured RNAs in Plasmodium falciparum
Genome Res., February 1, 2008; 18(2): 281 - 292.
[Abstract] [Full Text] [PDF]


Home page
J. Immunol.Home page
C. Pridans, M. L. Holmes, M. Polli, J. M. Wettenhall, A. Dakic, L. M. Corcoran, G. K. Smyth, and S. L. Nutt
Identification of Pax5 Target Genes in Early B Cell Differentiation
J. Immunol., February 1, 2008; 180(3): 1719 - 1728.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
S. Raengpradub, M. Wiedmann, and K. J. Boor
Comparative Analysis of the {sigma}B-Dependent Stress Responses in Listeria monocytogenes and Listeria innocua Strains Exposed to Selected Stress Conditions
Appl. Envir. Microbiol., January 1, 2008; 74(1): 158 - 171.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
M. Alhagdow, F. Mounet, L. Gilbert, A. Nunes-Nesi, V. Garcia, D. Just, J. Petit, B. Beauvoit, A. R. Fernie, C. Rothan, et al.
Silencing of the Mitochondrial Ascorbate Synthesizing Enzyme L-Galactono-1,4-Lactone Dehydrogenase Affects Plant and Fruit Development in Tomato
Plant Physiology, December 1, 2007; 145(4): 1408 - 1422.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. E. Ritchie, J. Silver, A. Oshlack, M. Holmes, D. Diyagama, A. Holloway, and G. K. Smyth
A comparison of background correction methods for two-colour microarrays
Bioinformatics, October 15, 2007; 23(20): 2700 - 2707.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Fan and Y. Niu
Selection and validation of normalization methods for c-DNA microarrays using within-array replications
Bioinformatics, September 15, 2007; 23(18): 2391 - 2398.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Reiner-Benaim, D. Yekutieli, N. E. Letwin, G. I. Elmer, N. H. Lee, N. Kafkafi, and Y. Benjamini
Associating quantitative behavioral traits with gene expression in the brain: searching for diamonds in the hay
Bioinformatics, September 1, 2007; 23(17): 2239 - 2246.
[Abstract] [Full Text] [PDF]


Home page
Biol. Reprod.Home page
Y. Choi, Y. Qin, M. F. Berger, D. J. Ballow, M. L. Bulyk, and A. Rajkovic
Microarray Analyses of Newborn Mouse Ovaries Lacking Nobox
Biol Reprod, August 1, 2007; 77(2): 312 - 319.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
V. J. Armstrong, M. Muzylak, A. Sunters, G. Zaman, L. K. Saxon, J. S. Price, and L. E. Lanyon
Wnt/beta-Catenin Signaling Is a Component of Osteoblastic Bone Cell Early Responses to Load-bearing and Requires Estrogen Receptor {alpha}
J. Biol. Chem., July 13, 2007; 282(28): 20715 - 20727.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
C. Gao, K. Furge, J. Koeman, K. Dykema, Y. Su, M. L. Cutler, A. Werts, P. Haak, and G. F. Vande Woude
Chromosome instability, chromosome transcriptome, and clonal evolution of tumor cell populations
PNAS, May 22, 2007; 104(21): 8995 - 9000.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Oshlack, A. E. Chabot, G. K. Smyth, and Y. Gilad
Using DNA microarrays to study gene expression in closely related species
Bioinformatics, May 15, 2007; 23(10): 1235 - 1242.
[Abstract] [Full Text] [PDF]


Home page
MicrobiologyHome page
M. Shemesh, A. Tam, and D. Steinberg
Differential gene expression profiling of Streptococcus mutans cultured under biofilm and planktonic conditions
Microbiology, May 1, 2007; 153(5): 1307 - 1317.
[Abstract] [Full Text] [PDF]


Home page
J Exp BotHome page
B. Zhou, Y. Li, Z. Xu, H. Yan, S. Homma, and S. Kawabata
Ultraviolet A-specific induction of anthocyanin biosynthesis in the swollen hypocotyls of turnip (Brassica rapa)
J. Exp. Bot., May 1, 2007; 58(7): 1771 - 1781.
[Abstract] [Full Text] [PDF]


Home page
J. Immunol.Home page
M.-P. Puissegur, G. Lay, M. Gilleron, L. Botella, J. Nigou, H. Marrakchi, B. Mari, J.-L. Duteyrat, Y. Guerardel, L. Kremer, et al.
Mycobacterial Lipomannan Induces Granuloma Macrophage Fusion via a TLR2-Dependent, ADAM9- and beta1 Integrin-Mediated Pathway
J. Immunol., March 1, 2007; 178(5): 3161 - 3169.
[Abstract] [Full Text] [PDF]


Home page
Plant CellHome page
J. Lee, K. He, V. Stolc, H. Lee, P. Figueroa, Y. Gao, W. Tongprasit, H. Zhao, I. Lee, and X. W. Deng
Analysis of Transcription Factor HY5 Genomic Binding Sites Revealed Its Hierarchical Role in Light Regulation of Development
PLANT CELL, March 1, 2007; 19(3): 731 - 749.
[Abstract] [Full Text] [PDF]


Home page
Eukaryot CellHome page
L. Strmecki, G. Bloomfield, T. Araki, E. Dalton, J. Skelton, C. Schilde, A. Harwood, J. G. Williams, A. Ivens, and C. Pears
Proteomic and Microarray Analyses of the Dictyostelium Zak1-GSK-3 Signaling Pathway Reveal a Role in Early Development
Eukaryot. Cell, February 1, 2007; 6(2): 245 - 252.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D. S. Yuan and R. A. Irizarry
High-resolution spatial normalization for microarrays containing embedded technical replicates
Bioinformatics, December 15, 2006; 22(24): 3054 - 3060.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
P. C. LaRosa, J. Miner, Y. Xia, Y. Zhou, S. Kachman, and M. E. Fromm
Trans-10, cis-12 conjugated linoleic acid causes inflammation and delipidation of white adipose tissue in mice: a microarray and histological analysis
Physiol Genomics, November 21, 2006; 27(3): 282 - 294.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
K. Rogasch, V. Ruhmling, J. Pane-Farre, D. Hoper, C. Weinberg, S. Fuchs, M. Schmudde, B. M. Broker, C. Wolz, M. Hecker, et al.
Influence of the Two-Component System SaeRS on Global Gene Expression in Two Different Staphylococcus aureus Strains.
J. Bacteriol., November 1, 2006; 188(22): 7742 - 7758.
[Abstract] [Full Text] [PDF]


Home page
J Mol EndocrinolHome page
A. Petri, J. Ahnfelt-Ronne, K. S. Frederiksen, D. G. Edwards, D. Madsen, P. Serup, J. Fleckner, and R S. Heller
The effect of neurogenin3 deficiency on pancreatic gene expression in embryonic mice.
J. Mol. Endocrinol., October 1, 2006; 37(2): 301 - 316.
[Abstract] [Full Text] [PDF]


Home page
Infect. Immun.Home page
M. Lo, D. M. Bulach, D. R. Powell, D. A. Haake, J. Matsunaga, M. L. Paustian, R. L. Zuerner, and B. Adler
Effects of Temperature on Gene Expression Patterns in Leptospira interrogans Serovar Lai as Assessed by Whole-Genome Microarrays
Infect. Immun., October 1, 2006; 74(10): 5848 - 5859.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. Abdueva, D. Skvortsov, and S. Tavare
Non-linear analysis of GeneChip arrays
Nucleic Acids Res., September 10, 2006; 34(15): e105 - e105.
[Abstract] [Full Text] [PDF]


Home page
Clin. Cancer Res.Home page
J. Fan and Y. Ren
Statistical Analysis of DNA Microarray Data in Cancer Research
Clin. Cancer Res., August 1, 2006; 12(15): 4469 - 4473.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
I. C. Gunesekere, C. M. Kahler, D. R. Powell, L. A. S. Snyder, N. J. Saunders, J. I. Rood, and J. K. Davies
Comparison of the RpoH-Dependent Regulon and General Stress Response in Neisseria gonorrhoeae
J. Bacteriol., July 1, 2006; 188(13): 4769 - 4776.
[Abstract] [Full Text] [PDF]


Home page
J. Immunol.Home page
M. H. Lahoud, A. I. Proietto, K. H. Gartlan, S. Kitsoulis, J. Curtis, J. Wettenhall, M. Sofi, C. Daunt, M. O'Keeffe, I. Caminschi, et al.
Signal Regulatory Protein Molecules Are Differentially Expressed by CD8- Dendritic Cells
J. Immunol., July 1, 2006; 177(1): 372 - 382.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
I. C. Gunesekere, C. M. Kahler, C. S. Ryan, L. A. S. Snyder, N. J. Saunders, J. I. Rood, and J. K. Davies
Ecf, an Alternative Sigma Factor from Neisseria gonorrhoeae, Controls Expression of msrAB, Which Encodes Methionine Sulfoxide Reductase.
J. Bacteriol., May 1, 2006; 188(10): 3463 - 3469.
[Abstract] [Full Text] [PDF]


Home page
Plant CellHome page
R. Schwab, S. Ossowski, M. Riester, N. Warthmann, and D. Weigel
Highly Specific Gene Silencing by Artificial MicroRNAs in Arabidopsis
PLANT CELL, May 1, 2006; 18(5): 1121 - 1133.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/9/2067    most recent
bti270v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (149)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Smyth, G. K.
Right arrow Articles by Scott, H. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Smyth, G. K.
Right arrow Articles by Scott, H. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?