Bioinformatics Advance Access originally published online on November 2, 2005
Bioinformatics 2006 22(1):50-57; doi:10.1093/bioinformatics/bti750
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Using a calibration experiment to assess gene-specific information: full Bayesian and empirical Bayesian models for two-channel microarray data
1Department of Statistics G.Parenti, University of Florence & Biostatistic Unit CSPO, Florence
2Department Area Critica Medico Chirurgica University of Florence, Careggi Hospital, AOC Florence, Italy
3Cytogenetic and Genetic Unit, Careggi Hospital, AOC Florence, Italy
*To whom correspondence should be addressed at Imperal College School of Medicine, Norfolk Place W2 IPG. London, UK.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Microarray studies permit to quantify expression levels on a global scale by measuring transcript abundance of thousands of genes simultaneously. A difficulty when analysing expression measures is how to model variability for the whole set of genes. It is usually unrealistic to assume a common variance for each gene. Several approaches to model gene-specific variances are proposed. We take advantage of calibration experiments, in which the probes hybridized on the two channels come from the same population (selfself experiment). In this case it is possible to estimate the gene-specific variance, to be incorporated in comparative experiments on the same tissue, cellular line or species.
Results: We present two approaches to introduce prior information on gene-specific variability from a calibration experiment: an empirical Bayes model and a full Bayesian hierarchical model. We apply the methods in the analysis of human lipopolysaccharide-stimulated leukocyte experiments.
Availability: The calculations are implemented in WinBugs. The codes are available on request from the authors.
Contact: m.blangiardo{at}imperial.ac.uk
| 1 INTRODUCTION |
|---|
|
|
|---|
In the framework of microarray analysis there are two main research goals: one is the identification of differentially expressed genes among several varieties (class comparison), while the other is the discovery of clusters within a collection of samples (class discovery) (Simon et al., 2003). Class comparison is related to the assessment of exposure or treatment effects (i.e. comparison of gene expression for a population of smokers and non-smokers) and the comparison can be performed directly (i.e. loop design) or indirectly (i.e. reference design). Class discovery is based on distances between gene expression profiles of pairs of samples (Dobbin and Simon, 2002) and can be absolute or relative. To the aim of class comparison the classical statistical approach is based on modified Student t-test procedures where, for each gene, at the numerator there is the difference between gene expression levels in two conditions to be tested and at the denominator there is the square root of the variance, divided by the number of replicates (Wit and McClure, 2004, p. 183 and followings). In this context a crucial point is how to obtain a suitable estimate of the variance. Actually, when the number of replicates is very small the sampling distribution of the variance is very asymmetric, with higher probability for small values and a strong instability of the pivotal t-value. For this reason in the literature many authors proposed several procedures to stabilize the variability measure (Speed, 2003, p. 51 and followings). One possibility is to consider a unique variance estimate for the whole set of genes or a function of the variance for all the genes. This approach could be used for single array inference (e.g. the Bayesian approach of Newton et al., 2001). Generally speaking it implies a loss of power, because it tends to be very conservative and to increase the number of false negative results. A better way to proceed can be found in a parametric or not parametric framework.
In a parametric context, many authors consider gene-specific variance estimates for the denominator of the t-test, but add a stabilizing constant for the whole set of genes. Baldi and Long (2001) use a full Bayesian hierarchical model for the log-expression. They discuss point estimates for the parameters and hyperparameters values. Regularized expressions for the variance of each gene are derived combining the empirical variance with a prior variance
. Several choices for the prior are proposed and among them the variance of the neighboring genes contained in a window of predefined size w (i.e. ranking the genes on the base of their expression measure, the 50 genes immediately above or below the gene under consideration). An additional hyperparameter
0 (prior degrees of freedom) is necessary to determine the weight assigned to the prior variance. It is tuned so that its sum is equal to a given constant (
0 + n = K).
Lönnstedt and Speed (2002) propose a method that can be classified as empirical Bayesian: differently from a full Bayesian approach, they do not define prior distributions on hyperparameters, but substitute them by a frequentist estimate based on the marginal distribution. In particular, the authors present a Bg-statistic (a Bayes posterior logodds) instead of the classical t-statistic used to classify the differentially expressed genes. Following the same philosophy, the variance has a gene-specific component
and a constant term a0. Values of Bg are explicitly calculated assuming conjugate prior on the gene expression mean and variance.
Other authors have worked on specific parametric models for the errors, starting from the idea that the standard deviation for expression measure increases proportionally to the level of expression (Newton et al., 2001), but does not tend to 0 for not expressed genes. From this assumption Rocke and Durbin (2001) develop an error model including a gene-specific additive component and a gene-specific multiplicative one and propose several ways to estimate the models, based on negative controls, or replicates.
In a non-parametric framework Tusher et al. (2001) work on t-tests and assign a score tg to each gene on the basis of its change in gene expression and relative to standard deviation calculated on repeated measures. Permutations are used to identify significantly altered genes and to estimate the false discovery rate. They introduce a fudge factor s0 to the denominator of t-test to avoid low expression genes dominate the results. It is chosen to minimize the coefficient of variation. This method is framed in a frequentist approach, does not assume any distribution on the parameters.
Very similar to the previous, Efron et al. (2001) propose a simple empirical Bayes model in which the fudge factor to be added at the denominator is the 90th percentile of the standard deviation for all the genes. Delmar et al. (2004) develop a finite mixture model for the marginal gene-specific distribution (which can be classified as non-parametric maximum likelihood). In particular, estimating gene-specific variance can be seen as a classification problem, where the number of components and the gene belonging are estimated. Since the number of groups is much lower than the number of genes, the estimates of group variance are very stable.
Heuristically, Comander et al. (2004) pooled genes to calculate more reliable variance estimates by average of minimum intensity values. There is no parametric statistical modelling of variance as function of intensity, but instead a loess smoothed estimate of variance is derived. Uncertainty in this procedure is not considered and a Z-test is used.
All the previous approaches work with a classical comparative experiment (with replications), where samples from two populations are compared. A different approach is introduced by Tseng et al. (2001) who propose calibration experiments in which the probes hybridized on the two channels come from the same population (selfself experiment). Such experiments make possible to incorporate the gene-specific variability information in comparative experiments on the same tissue, cellular line or species, with a prior ignorance on the remaining parameters and represent an alternative way to face the problem of variance estimate.
We followed the Tseng's approach and performed a calibration experiment before doing the comparative one. We built a full Bayesian model and a simpler Empirical Bayesian model. We analysed data on lipopolysaccharide (LPS) stimulated and un-stimulated human leukocyte, obtaining prior knowledge on variability from selfself experiment.
The structure of the paper is as follows. In Section 2 we describe the calibration and comparative experiments (Subsection 2.1) and the data preprocessing phase (Subsection 2.2); in Section 3 we present the normalization procedure used, and then focus the attention on the full Bayesian model and on the Empirical Bayesian one; model graphs and details on implementation follow; in Section 4 we describe the results in terms of differentially expressed genes; In Section 5 a sensitivity analysis is reported and in Section 6 we discuss the differences between the two models.
| 2 MATERIALS |
|---|
|
|
|---|
2.1 LPS microarray experiment
2.1.1 Calibration experiment
Mononuclear cells were obtained from peripheral blood (PMBC) of 10 healthy subjects by density gradient centrifugation on Ficoll-Hypaque. Cells from each subjects were incubated in RPMI 1640 at 37° in a humidified atmosphere with 5% CO2 for 3 h in standard conditions (absence of lipopolysaccharide). Total RNA was extracted and equal amount of total RNA from different subjects was pooled. Total RNAs were split into six aliquots and then retro-transcribed with amino-allyl-dUTP, hydrolysed, purified and labelled with NHS-Cyanine dyes (three aliquots with Cy3, probe A and three aliquots with Cy5, probe B). Then, three arrays were produced having the two probes purified, mixed and hybridized on the arrays. After incubation, the three arrays were scanned by the 4000B scanner (Axon). Image analysis was performed by GenePix 4.1 software.
2.1.2 Comparative experiment
Mononuclear cells were obtained from peripheral blood (PMBC) of the same 10 healthy subjects used in calibration experiment by density gradient centrifugation on Ficoll-Hypaque. Cells from each subjects were divided into two aliquots; the first was incubated in RPMI 1640 at 37° in a humidified atmosphere with 5% CO2 for 3 h in the presence of LPS (10 µg/ml, stimulated cells). The second was incubated in the same conditions but in the absence of LPS (un-stimulated cells). Total RNA was extracted and equal amount of total RNA separately, from stimulated or un-stimulated cells, was pooled. Total RNAs were retro-transcribed with amino-allyl-dUTP, hydrolysed, purified and labelled with NHS-Cyanine dyes following th dye-swap design (Cy3 and Cy5, coupled, to un-stimulated and stimulated specimens). The two probes were purified, mixed and hybridized on the arrays. After incubation, arrays were scanned by the 4000B scanner (Axon). Image analysis was performed by GenePix 4.1 software. For the comparative experiment, two arrays finally were printed according to the dye-swap design.
Therefore, the complete experiment consists in 5 arrays made up 22 x 21 spots grid, for a total of 14 784 spots. The 14 784 spots included 13 971 oligonucleotides representing each one different gene, 29 negative controls (mixtures of oligonucleotide of other organisms), 2 positive controls (a mixture of all the human oligonucleotides) and 872 blanks (only printing solution). Out of 14 784, 1502 (10.2%) spots were absent because of a failure during the printing procedure.
2.2 Microarray data preprocessing
2.2.1 Quality control
The process of microarray fabrication is subjected to many sources of variability and could contain a large amount of noise. In particular, it is possible that the noise dominates the signal for some spots. We applied the quality control present in GenePix Pro 4.1, with the aim of evaluating the presence of artefacts (bubbles, hair, fibres). After GenePix Pro 4.1 quality control and the visual inspection, the analysable spots resulted 80, 87 and 90% as concerned the 3 selfself experiments, and 83 and 87%, for the 2 arrays of the comparative experiment.
2.2.2 Spots selection for the analysis of gene-specific variances
To the purpose of the present paper, we restricted our attention to a subset of genes for which extraneous sources of variability can be excluded. To select these spots all the five arrays were screened following the criteria suggested by Simon et al. (2003). In particular, we excluded a spot if the number of pixels used to calculate the intensity was less than 25 for the foreground intensity in either channel, if the signal was lower than 200 for both the channels or if the ratio between the average foreground intensity and the median background intensity was smaller than 1.5 in either channel. Spots with a large signal for one channel and low, undetectable signal for the other were not eliminated, but modified to become analysable, forcing the low intensity signal (defined as less than 200) to 200. In this paper we considered 2887 genes represented in all the 5 arrays (3 calibration arrays and 2 comparative arrays).
| 3 METHODS |
|---|
|
|
|---|
In this section we present the two methods we used to analyse the data. The first model, is a full Bayesian hierarchical model while the second, originally proposed by Tseng et al. (2001), is an instance of the empirical Bayes approach.
3.1 Normalization
We performed two different types of normalization (Yang et al., 2002): for each slide a local A-dependent normalization (loess), considering all the genes present on the array, is used for empirical Bayes model. For Bayesian hierarchical model, the normalization step was part of the modelling phase.
3.2 Models
3.2.1 Bayesian hierarchical model
The model is split into two parts.
Calibration model. The first submodel is used to estimate gene-specific variances from the calibration experiment. To this purpose we specified the following model, which is in the same philosophy of Lewin et al., 2005, for the unnormalized log-intensity
![]() | (1) |
g as the variance.
The normalization procedure was achieved by an ANOVA model (see Kerr et al., 2002 for a general introduction to the analysis of variance approach to microarray data)
![]() | (2) |
ig denotes the gene-specific arraygene interactions,
c the dye-effects and
g the normalized gene effects.
g
N(µ
, 
) are exchangeable, with µ
non-informative Gaussian and 1/
non-informative Gamma hyperpriors. All the other normalization parameters were fixed effects modelled with non-informative Gaussian hyperpriors. The gene-specific variances were assumed to follow a Lognormal distribution x
g
log N(µ
, 
) with µ
N(0, 10 000) and 1/
Ga(0.001, 0.001) non-informative hyperpriors. This assumption of a skewed distribution for variance is standard and flexible enough to allow high variances for few genes.
Comparative model. The second submodel is specified for the comparative experiment and incorporates relevant information from the calibration experiment. The kernel likelihood is the same as for the calibration model. For the i-th array (i = 1, 2) the unnormalized log-intensity
![]() | (3) |
g
log N(µ
, 
) with informative parameters values obtained from the selfself experiment. In particular, we assumed µ
equal to the mean of the appropriate posterior distribution on the selfself data:
![]() | (4) |

we plugged in the posterior mean of the corresponding posterior distribution f(
| xself).
A linear model was assumed for µigc as follows:
![]() | (5) |
g can be interpreted as a normalized log-ratio and quantify the treatment (LPS) effects. Their distribution was assumed Gaussian with gene-specific mean µ
g and variance 
g. Summarizing, the prior distributions for
g, µ
g and 
g were assumed as follows:
![]() | (6) |
![]() | (7) |
, 
, 
, ß
.
This formulation is sensible since a Gaussian distributed effect parameter
g, on the log scale, is justified by most of the literature on generalized linear mixed models (see Clayton in Markov Chain Monte Carlo in Practice, 1996). The conjugate hyperpriors [Equation (6)] are standard and assume an exchangeable structure, i.e. same ignorance about the status of the gene (differentially or not differentially expressed). More sophisticated mixture models could be introduced (see Parmigiani et al., 2002).
Informative prior on log-ratio. Actually values for µ
, 
, 
, ß
were obtained from the calibration experiment as follows. On the calibration arrays we calculated a residual effect rigc = xigc µigc and reconstructed a normalized log-ratio under the null hypothesis for each slide as the difference between the residual effect of c = 1 channel and the residual effect of c = 2 channel is given as
![]() | (8) |
Then for each gene we calculated the plug-in values for the µ
g prior as:
![]() | (9) |
![]() | (10) |
(Fig. 1).
|
Similarly, we obtained the plug-in values for the prior Gamma parameters

and ß
from the mean and variance of
:
![]() | (11) |
![]() | (12) |
3.2.2 Tseng's empirical Bayes model
To adapt the model proposed by Tseng et al. (2001) we reformulated it as follow. We normalized the data externally by loess (Yang et al., 2002) through the MAANOVA library implemented in R (www.r-project.org) (Wu et al., 2003). The normalized log-ratio mig for g-th gene and i-th array were modelled as
![]() | (13) |
g was the mean and m
g was the variance of log-ratio over the replicates of the comparative experiment for the gene g. To make easy compare it with the full Bayesian model and the likelihood can be written as follows:
![]() | (14) |
![]() | (15) |
g. The distribution of
g was assumed Gaussian with gene-specific parameters and all the hyperparameters had a classic Bayesian non-informative distribution [compare with Equations 6 and 7]. The information pooled from the calibration experiment was used to obtain an informative prior distribution for m
g:
![]() | (16) |
2-deviate; wg was a weighted average of gene-specific and overall empirical variance calculated on the calibration arrays (i = 1, ..., Iself) as follows:
![]() | (17) |
![]() | (18) |
![]() | (19) |
In other words, in the Tseng model the information on the gene-specific variability from the selfself experiment is utilized to derive an informative inverse Gamma prior.
However, the two variance modelling are deeply different. The empirical Bayes approach uses the information from the selfself experiment to plug in values of parameters of the gene-specific variance prior
the full Bayes approach uses the posteriors given calibration data to obtain values for the hyperparameters of the hyperpriors governing the gene-specific variance priors 
g
1/[
(
, ß
)].
3.2.3 Tseng's prior with internal normalization
To better address model comparison we modified the empirical Bayes model proposed by Tseng including the normalization step into the model as follows:
![]() | (20) |
![]() | (21) |
![]() | (22) |
g were informative coming from the calibration experiment (see Subsection 3.2.1), and the normalization parameters were modelled following standard ANOVA [see Equation (5)]. The hyperpriors for
g were modelled following Tseng's proposal
(Fig. 2).
|
3.2.4 Bayesian hierarchical model with loess normalization
We also modified the Bayesian hierarchical model to carry out a loess normalization instead of the linear one. We performed a loess normalization through MAANOVA library and then we calculated the normalized values for the two channels as follows:
![]() | (23) |
The normalized channel intensity (on log scale) are
![]() | (24) |
![]() | (25) |
![]() | (26) |
3.3 The graph of the model
A system of conditional distributions can be often represented through the correspondent directed acyclic graph (DAG, directed for the link between each pair of nodes, acyclic for the impossibility of turning on the same node after leaving it, following the direction of the arrows) (Gilks et al., 1996). In a DAG the circles denote unobserved quantities, while single squares indicate observed quantities and double squares indicate a mathematical quantity; the arrows between the nodes are solid to mean a stochastic dependence, while dashed arrow denotes functional relationships; solid lines show stochastic undirected dependence. Repetitive structures (arrays, for example), are shown as stacked rectangles. Figure 3 shows the graph for the Bayesian hierarchical model presented in Section 3.2.1 while Figure 4 shows the DAG for Tseng's model presented in Section 3.2.2.
|
|
3.4 Implementation
To estimate the parameters of interest we use the marginal posterior distributions approximated by MCMC methods implemented in WinBugs 1.4 (Spiegelhalter et al., 2003); the Bayesian hierarchical model with ANOVA normalization as well as with loess normalization, and Tseng's model with internal normalization are estimated by Metropolis-within-Gibbs routine, a generalization of Gibbs that can be used for non-log concave sampling (Tanner, 1996); the Tseng's empirical Bayes model can also be fitted by Gibbs sampling in WinBugs. We have checked the convergence both visually by Gelman-Rubin statistics (Gelman and Rubin, 1992) and using different starting points. We have performed 10 000 burn-initerations followed by 4000 sampling iterations for all the models. Fitting the Bayesian hierarchical model on calibration experiment takes 1 h to do 100 iterations on a workstation HPXW6000 with 2 GbRAM and Intel Xeon CPU2. 8 GHz processor, for the large number of posterior distributions it has to store to be subsequently incorporated in the comparative experiment analysis. Performing the comparative experiment takes 380 s for 1000 iterations. Fitting Tseng's model takes 300 s to perform 1000 iterations.
| 4 RESULTS |
|---|
|
|
|---|
We explored the posterior distribution of the treatment effects
g to identify the differentially expressed genes taking 95% two sides probability level. Genes found differentially expressed with at least one of the two methods are shown in Table 1. Using the Bayesian hierarchical model we found 26 differentially expressed genes. Out of 26 genes IFI30 and PRKAG2 were under-expressed in LPS stimulated leukocytes. Using the Tseng et al. one we found 46 differentially expressed genes. Out of 46 genes, 20 emerged downregulated in LPS stimulated leukocytes. Out of 26 genes, 22 identified by the first model were highlighted also by the Tseng et al. one (Fig. 5 and Table 1).
|
|
The LPS-induced transcripts identified by both models mainly consist of gene encoding protein associated with cytokines and chemokines including interleukin (IL)-1 beta, IL-1 receptor antagonist (RA), macrophage inflammatory protein (MIP)-1 alpha, MIP-1 beta, MIP-2 beta, MIP-3 alpha; cytoskeletal protein such as vimentin and cofillin 2 (Mor-Vaknini et al., 2003); and plasminogen activator inhibitor type 2 (PAI-2) (Pepe et al., 1997).
To facilitate the interpretation of our results, we reported the results obtained by a classic analysis of the comparative arrays only, without taking into account the calibration ones. The analysis of the comparative experiment by Tusher's SAM resulted in 18 significant differentially expressed genes, using a cut-off at p = 0.01. Fifteen were also identified by the Bayesian approaches. Due to the limited sample size, a low sensitivity is expected compared to the analysis which took into account the calibration arrays. The Bayesian approaches provided also a more stable inference on genes with small sample standard deviation, among which three were significant by SAM but were not confirmed by the Bayesian analyses. No negative log-ratios emerged as significative by SAM.
| 5 SENSITIVITY ANALYSIS AND MODEL COMPARISON |
|---|
|
|
|---|
The results presented in the previous section are difficult to interpret comparatively because the two models use different normalization procedures. To gain insight on the behaviour of the different approaches we need to evaluate differentially expressed genes taking fixed the normalization procedure (Subsection 3.2.3).
The largest differences were observed in the downregulated genes. The full Bayesian models found two negative genes and three negative genes. On the other side, by Tseng model 20 genes emerged as downregulated, but using the internal linear ANOVA normalization it found only 2 negative genes. Generally speaking, as theoretically expected, the full Bayesian model seems more conservative and robust with regard to the choice of normalization procedure. The Tseng model seems less conservative and more sensitive to the normalization procedure adopted. Since this results is based on the analysis of only one dataset, we do not know if one particular normalization procedure has to be recommended. The reader should note that theoretically the EB model is more sensible to normalization procedures. A full comparison among different normalization approaches to be used in the EB approach is outside the scope of the present paper.
| 6 DISCUSSION |
|---|
|
|
|---|
The observed differences in number of differentially expressed genes between the Bayesian hierarchical model and the Tseng empirical Bayesian one are related to different factors, namely normalization method and specification of prior information. In the Bayesian Hierarchical method, the normalization step is performed inside the model through a multi-slide linear normalization (ANOVA). In the empirical Bayesian approach, data are normalized outside the model, through a loess normalization performed separately for each array. When incorporating the normalization into the model, the likelihood is based on single channel expression measures over replicates, while with an external normalization, the likelihood is based on an empirical measure of relative expression.
This is a very important point in modelling gene-specific variances. In fact, many ratios with high variances result from spots that have a medium or high intensity in one channel and a very low intensity in the other (Comander et al., 2004, p. 4) and building a model with single channel intensity can be much more sensitive than modelling the empirical log-ratio. Coherently, using the Tseng prior with the normalization step into the model (Subsection 3.2.3) all the genes emerged downregulated in the previous analysis were no more differentially expressed.
Using the Bayesian hierarchical modelling with loess normalization (Subseection B.2.4) 27 genes were found differentially expressed; 18 out of 27 overlap those obtained by the empirical Bayes model and only 2 out of them were downregulated.
The full Bayesian model originates likely more conservative estimates of relative expression with respect to the empirical Bayes one. The sensitivity analysis performed in the previous section shows that the Bayesian model is more robust to the different normalization procedures adopted.
The empirical Bayesian model and the full Bayesian one insert prior information on variability from the calibration experiment in different ways. In the first the prior distribution for the variance of the normalized gene log-ratio (m
g) is a function of a weighted average between the observed gene-specific variances (sg) and their average among the set of genes (s·) on the calibration arrays [Equation (16)]. It is not assumed a hyperprior distribution on the prior parameters, but instead an estimate is plugged in, following the empirical Bayesian approach. The proposed estimate in Tseng model lies on the theory of the generalized estimator of James-Stein (Efron and Morris, 1972) and has optimality properties under a frequentist point of view.
The full Bayesian hierarchical model inserts information from selfself experiment at the normalized log-ratio level for each gene, as well as at the single channel intensity level (Fig. 3).
The gene-specific log-ratio (
g) probability density has informative distribution on its parameters µ
g, 
g [Equation (7)]. The single channel intensity likelihood has a gene-specific prior distribution for the variance with parameters µ
, 
estimated from the selfself experiment [Equation (4)]. An alternative would be to consider the whole posterior distribution of µ
and 
from the calibration experiment. The hierarchical structure of the model is a robust answer to the problem of putting in prior knowledge. The introduction of a supplementary layer in the model permits to filter the available previous information in a sensible way.
As showed in Figure 2, in our data Bayesian posterior estimates of gene-specific variances tend to be larger than the empirical Bayes estimates. The reader can also appreciate that the distribution of log-ratios (Fig. 1) from calibration experiment has a heavier tail for negative values and a positive mode. Coherently, our Bayesian analysis for the comparative experiment is more conservative and gives more penalty to negative log-ratios.
Both models reveal a shrinkage effect: additional materials to illustrate this point can be requested to the authors.
In conclusion, we showed how information from calibration experiments can be utilized to improve inference on differentially expressed genes in comparative experiments.
The approach presented is specific for two-channel arrays. However, our modelling is based on absolute gene expression level, the log-ratio being a model parameter to be estimated. Therefore it can be adapted to Affymetrix platforms.
We can point out that the calibration experiment is a good answer to the problem of gene-specific variability estimate and allows us to include prior information both working in a full Bayesian framework and in an Empirical Bayesian one. It naturally extends to a sequence of experiments (e.g. time course experiments): it permits to update prior information and to take under control sources of variations that can be introduced between different experiments. Moreover, a calibration experiment can be used as baseline for future experiments on the same tissue, cellular line or species.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Alvis Brazma
Received on January 14, 2005; revised on August 4, 2005; accepted on October 27, 2005
| REFERENCES |
|---|
|
|
|---|
Baldi, P. and Long, A.D. (2001) A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics, 17, 50095019.
Comander, J., et al. (2004) Improving the statistical detection of regulated genes from microarray data using intensity-based variance estimation. BMC Genomics, 5, 121
Delmar, P., Robin, S., Daudin, J.J. (2005) Efficient variance modelling for differential analysis of replicated gene expression data. Bioinformatics, 21, 502508
Dobbin, K. and Simon, R. (2002) Comparison of microarray designs for class comparison and class discovery. Bioinformatics, 18, 14381445
Efron, B. and Morris, C. (1972) Empirical Bayes on vector observations: an extension of Stein's method. Biometrika, 59, 335347
Efron, B., et al. (2001) Empirical Bayes analysis of a microarray experiment. J. Am. Stat. Assoc, . 96, 11511160[CrossRef][Web of Science].
Gelman, A. and Rubin, D.B. (1992) Inference from iterative simulations using multiple sequences. Stat. Sci, . 7, 457511[CrossRef].
Gilks, W.R., Richardson, S., Spiegelhalter, D.J. Markov Chain Monte Carlo in Practice, (1996) , London Chapman and Hall.
Kerr, M.K., et al. (2002) Statistical analysis of gene expression microarray experiment with replication. Stat. Sin, . 12, 203217.
Lönnstedt, I. and Speed, T. (2002) Replicated microarray data. Statistica Sinica, 12, 3146[Web of Science].
Lewin, A., Richardson, S., Marshall, C., Glazier, A., Aitman, T. (2005) Bayesian Modelling of Differential Gene Expression. Biometrics, in press.
Mor-Vaknini, T., et al. (2003) Vimentin is secreted by activated macrophages. Nat. Cell Biol, . 5, 5963[CrossRef][Web of Science][Medline].
Newton, M.A., et al. (2001) On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J. Comput. Biol, . 8, 3752[CrossRef][Web of Science][Medline].
Parmigiani, G., et al. (2002) A statistical framework for expression based molecular classification in cancer. J. R. Stat. Soc, . 64, 717736[CrossRef].
Pepe, G., et al. (1997) Tissue factor and plasminogen activator inhibitor type 2 expression in human stimulated monocytes is inhibited by heparin. Semin. Thrombosis Hemostasis, 23, 135141.
Rocke, D.M. and Durbin, B. (2001) A model for measurement error for gene expression Data. J. Comput. Biol, . 8, 557569[CrossRef][Web of Science][Medline].
Simon, R.M., Korn, E.L, McShane, L.M. Design and Analysis of DNA Microarray Investigations, (2003) , New York Springer-Verlag.
Speed, T. Statistical Analysis of Gene Expression Microarray Data, (2003) , New York, NY Chapman and Hall.
Spiegelhalter, D., Thomas, A., Best, N., Lunn, D. (2003) WinBUGS, version 1.4. User manual MRC Biostatistics Unit, , Cambridge, UK.
Tanner, M.A. Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions, . (1996) , New York Springer.
Tseng, G.C., et al. (2001) Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res, . 29, 25492557
Tusher, V.G., et al. (2001) Significance analysis of microarray applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA, 98, 51165121
Wit, E. and McClure, J. Statistics for Microarrays, (2004) , Chichester, UK John Wiley and Son.
Wu, H., Kerr, M.K., Cui, X., Churchill, G.A. (2003) MAANOVA: a software package for the analysis of spotted cDNA microarray experiments. In Parmigiani, G., Garett, E., Irizarry, R., Zeger, S. (Eds.). The Analysis of Gene Expression Data: Methods and Software, , New York, NY Springer, pp. 313341.
Yang, Y.H., et al. (2002) Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res, . 30, e15
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||























from selfself experiment.





