Bioinformatics Advance Access originally published online on July 4, 2006
Bioinformatics 2006 22(17):2107-2113; doi:10.1093/bioinformatics/btl361
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Probe-level measurement error improves accuracy in detecting differential gene expression
1 School of Computer Science, University of Manchester Oxford Road, Manchester M13 9PL, UK
2 Department of Computer Science, University of Sheffield Regent Court 211 Portobello Street, Sheffield S1 4DP, UK
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Finding differentially expressed genes is a fundamental objective of a microarray experiment. Numerous methods have been proposed to perform this task. Existing methods are based on point estimates of gene expression level obtained from each microarray experiment. This approach discards potentially useful information about measurement error that can be obtained from an appropriate probe-level analysis. Probabilistic probe-level models can be used to measure gene expression and also provide a level of uncertainty in this measurement. This probe-level measurement error provides useful information which can help in the identification of differentially expressed genes.
Results: We propose a Bayesian method to include probe-level measurement error into the detection of differentially expressed genes from replicated experiments. A variational approximation is used for efficient parameter estimation. We compare this approximation with MAP and MCMC parameter estimation in terms of computational efficiency and accuracy. The method is used to calculate the probability of positive log-ratio (PPLR) of expression levels between conditions. Using the measurements from a recently developed Affymetrix probe-level model, multi-mgMOS, we test PPLR on a spike-in dataset and a mouse time-course dataset. Results show that the inclusion of probe-level measurement error improves accuracy in detecting differential gene expression.
Availability: The MAP approximation and variational inference described in this paper have been implemented in an R package pplr. The MCMC method is implemented in Matlab. Both software are available from http://umber.sbs.man.ac.uk/resources/puma
Contact: magnus.rattray{at}manchester.ac.uk
Supplementary Information: Supplementary data are available at Bioinformatics Online.
| 1 INTRODUCTION |
|---|
|
|
|---|
Microarrays (Schena et al., 1995; Lockhart et al., 1996) are widely used for large-scale measurements of gene expression levels. Microarray experiments are a complicated multiple step procedure and variability exists in every step of the experiment. The generated data are therefore notoriously noisy. In order to provide meaningful information from these data, various levels of analysis should be conducted. The first stage is probe-level analysis which provides a summary of the expression level for each gene. The type of downstream analysis to be performed then depends on the particular biological questions being addressed. Detecting differentially expressed genes is often the most basic aim of a microarray experiment. Affymetrix GeneChips are a popular choice of microarray technology and in this article we focus on detecting differential gene expression using these arrays.
Owing to the high variability in microarray data, replicate arrays are often used to obtain improved accuracy and reproducibility. Because of the high cost of the experiment, the number of chips for each condition is usually small, typically 24 replicates. There are two main reasons that make the detection of differential gene expression difficult in this context:
- Microarray experiments are associated with low precision probe-level measurements, especially for weakly expressed genes (probe-level measurement error).
- The small number of replicates makes it difficult to obtain an accurate variance estimate for each gene across replicates (between-replicate variance).
Many different approaches have been devised to address the second difficulty and obtain accurate between-replicate variance, such as the widely used methods Cyber-T (Baldi and Long, 2001) and SAM (Tusher et al., 2001). Most of these methods are based on single point estimates of gene expression values that can be obtained from probe-level analysis methods, such as MAS 5.0 (Affymetrix, 2001) or RMA (Irizarry et al., 2003) for Affymetrix arrays. A small number of replicates will often lead to an underestimated variance. To overcome this problem, Cyber-T implements a regularized t-test and SAM adds a regularizing constant to the gene-specific standard deviation. The reader is referred to Delmar et al. (2005) and Yang et al. (2005) for a detailed review of other recent methods for detecting differentially expressed genes.
Few methods have been proposed to include probe-level measurement error in finding differential gene expression. Each gene on the Affymetrix array has 1120 probe-pairs and these provide useful information about probe-level measurement error that is typically discarded. Probe-level probabilistic models, gMOS (Milo et al., 2003), BGX (Hein et al., 2005), mgMOS and multi-mgMOS (Liu et al., 2005), have been developed and provide an estimate of this probe-level measurement error. Our most recent model, multi-mgMOS, provides accurate gene expression measurements along with the associated uncertainty in these measurement (Liu et al., 2005). It has been shown that the measurement variance obtained from probe-level analysis can be propagated through the downstream analysis using probabilistic methods, thereby improving the performance of the analysis (Sanguinetti et al., 2005).
In this contribution, we demonstrate the usefulness of the measurement error in finding differentially expressed genes. We propose a Bayesian hierarchical model to incorporate the probe-level measurement error into the variance estimate of gene expression levels. The probe-level measurements from multi-mgMOS are used in our work. The introduction of an additional variance term makes Bayesian hyper-parameter estimation intractable and we use a variational method for approximate parameter inference. We also compare the variational method with maximum a posteriori (MAP) and Markov chain Monte Carlo (MCMC) in terms of computational efficiency and accuracy. Results from an artificial spike-in dataset and a real mouse time-course dataset show that the inclusion of probe-level measurement error improves the accuracy of finding differentially expressed genes.
| 2 METHODS |
|---|
|
|
|---|
Biological replicates are usually used in an experiment to obtain more confident gene expression measurements. For each gene g, a Gaussian distribution of logged expression level across replicates is commonly assumed (Baldi and Long, 2001; Delmar et al., 2005; Krohn et al., 2005),
, where i is the index of replicate and j is the index of condition. The parameter
is the mean logged expression level under condition j and
is the inverse of the between-replicate variance (we use this form because it is more natural to place a prior on
). We take the probe-level measurement error
into consideration, so the measured expression level
can be expressed as
. We assume a zero-mean Gaussian measurement noise,
, with variance
obtained from the probe-level analysis (Liu et al., 2005; Sanguinetti et al., 2005). After including the probe-level measurement variance in the model we have
![]() | (1) |
From the probabilistic probe-level analysis we can obtain measured logged gene expression level
for each gene on each single chip together with a measurement variance
. The parameters
are to be estimated from this data. Each gene is treated independently on the array and we focus on each individual gene, so we omit the index of the gene, g, in the following equations.
2.1 Likelihood and prior
We consider a full Bayesian approach for the combination of replicate signals. With the absence of probe-level measurement variances it has been widely accepted that
and
should be dependent variables. However, it is unclear whether this remains true when we account for probe-level measurement error. We therefore make a prior assumption that
and
are independent and select a Gaussian prior over
,
![]() | (2) |
and
are hyperparameters. In practice, there are tens of thousands of genes to be processed in each experiment. In order to make our model simple for efficient computation, we put non-informative hyperpriors on
and
.
We assume that
is shared across different conditions and measures the gene-specific variability. We denote the common variance as
. In practice many experiments consider only a small number of conditions with a small number of replicates. To reduce the effect of the small size of dataset, we select a conjugate gamma prior over
,
![]() | (3) |
and ß.
The different contributions to the variance play similar roles to the contributions in Rocke and Durbin's two-component model (Rocke and Durbin, 2001). In their model the additive measurement noise dominates for weakly expressed genes, similar to our measurement noise variance
, while the multiplicative component in their model corresponds to our replicate variance parameter
, which dominates for highly expressed genes. The difference here is that we use the internal probe replication on Affymetrix arrays to estimate
.
We now have a hierarchical model with parameters
and hyperparameters
, where µ represents the set of all
. We are interested in the posterior distribution
which summarizes the information about the mean expression value of replicates under each condition. With the assumption of the independence of observations, the likelihood of the observed data D is
![]() | (4) |
, c is the number of conditions and
is the number of replicates under condition j. The prior on parameters is given by
![]() | (5) |
2.2 Parameter estimation
With the introduction of measurement error in (1), Bayesian inference becomes intractable. We use various parameter estimation methods and we compare the different methods in terms of efficiency and accuracy. We use MAP for crude estimation, a variational method for approximate Bayesian inference and MCMC for more accurate Bayesian inference.
2.2.1 MAP approximation
With the selected priors on
in (5) there is no closed form solution for the marginal likelihood
. We obtain rough inference based on MAP estimates,
, of the posterior
(Gelman et al., 2004). The conditional posterior of
,
, given other parameters as fixed at their modal values, is
![]() | (6) |
. The crude estimates are calculated simply and efficiently (see Gelman et al., 2004, p. 276) but discard the variability in the parameters. For more details please refer to the Supplementary Material.
2.2.2 Variational Inference
In order to account for variability in parameters, we use the Expectation-Maximisation (EM) algorithm (Dempster et al., 1977) combined with a variational method (Ghahramani and Beal, 2001; Jordan et al., 1999) to optimise a lower bound on
and work out the posterior distribution
.
The log marginal likelihood,
, of data D is a function of
,
![]() | (7) |
The integral is intractable, so we use a distribution Q(
) over
and Jensen's inequality to get a lower bound on
:
![]() | (8) |
The lower bound of
is known as the KullbackLeibler (KL) divergence (except for an additive constant) which measures the discrepancy between Q(
) and
. If there are no constraints on the form of Q(
), optimizing the KL-divergence results in
![]() | (9) |
We use an EM algorithm to optimize (8) with respect to Q(
) and
iteratively. Starting from some initial hyper-parameters
0:
![]() | (10) |
At the E-step the posterior distribution
is not tractable, however by introducing constraints on the form of Q(
) we can approximate the distribution. In variational inference a factorization constraint on distribution Q is often used. Instead of optimizing (8) over the whole Q, we optimize with respect to the distribution of disjoint subsets of
. For our model, we have assumed that
is independent of
, so a reasonable approximation should be to assume Q(
) factorizes as
![]() | (11) |
We substitute the factorized Q distribution into (8) and optimize it with respect to Q(µ) and Q(
) respectively. The optimal expressions of Q(µ) and Q(
) are
![]() | (12) |
![]() | (13) |
·
denotes the expectation of a function with respect to Q(
) or Q(µ) and f(
) denotes
. In the variational approach, when there is no standard form for the Q distribution, importance sampling can be used to obtain the required expectations (Lawrence et al., 2004). Equation (13) suggests an importance sampling procedure to estimate an expectation of interest under the distribution Q(
). If we take n samples
k from Ga(
t, ßt), the approximation to a function g(
) is
![]() | (14) |
At each E-step the distributions Q(µ) and Q(
) are calculated iteratively until the parameters have converged. When the whole EM algorithm in (10) converges, Q(µ) in (12) is the approximated posterior distribution P(µ|D,
) of mean gene expression level. From (12) we can see that as the number of replicates increases, we obtain more confidence in the mean expression level since the inverse variance grows with the number of replicates.
2.2.3 MCMC
Intractable hierarchical Bayesian models are usually solved by random sampling from the posterior distribution of model parameters. We apply standard MCMC to summarize the Bayesian inference of our model by assuming a flat uniform prior on µ0 and flat Gamma priors on
and
0 with shape and inverse scale both 0.001. Gibbs sampling is used for updates of parameters µ, µ0 and
0. The remaining parameter log(
) is updated by the Metropolis algorithm using a Gaussian proposal distribution. For more details please refer to the Supplementary Material.
2.3 Significance of differential expression
Once the posterior distribution P(µ|D,
) is obtained, it is possible to compute the significance of differential expression between any two conditions. Taking a treatment and a control (indicated by 1 and 2 respectively), for instance, the probability of positive log-ratio (PPLR), P(µ1 > µ2|D,
), can be calculated by
![]() | (15) |
Equation (15) gives the posterior probability of increased expression in a treatment compared with a control. One can find the up-regulated genes by setting a level of confidence, like an
-level in a conventional statistical test. Down-regulated genes can also be found using an equation similar to (15) by calculating the integral of P(µ1 µ2 | D,
) over (
,0).
2.4 Implementation and computation time
During the developmental stage of our models, we implement all three parameter estimation approaches in Matlab. For importance sampling in the variational method, we draw 1000 samples once at each EM iteration. At each E-step, Q(µ) and Q(
) are considered to have converged when parameters change by <106 per iteration. For the implementation of MCMC we simulate 10 parallel sequences, each of length 200. We monitor the convergence of the iterative simulation by estimating the potential scale reduction
(Gelman et al., 2004) for all parameters. If
, we draw another 200 samples for each sequence. After discarding the first half of the simulations and mixing the remaining simulated sequences, the length of the effective simulations are between 1000 and 3000 for most genes. We use the fast C program donlp2 (Spellucci et al., 1998) for parameter optimization.
To process the golden spike-in dataset (described in Section 3.1), MAP approximation, variational inference and MCMC take 5 min, 4 and 50 h, respectively, using a 1.8 GHz AMD Opteron machine with 512 MB RAM. We conclude that MCMC is too computationally expensive to use in practice although it is useful to compare with other methods (see Section 3.3). Variational inference is a good compromise between computational efficiency and accuracy. MAP estimation is very fast, but less accurate, so we recommend it for crude inference. For more accurate inference, one can choose to use the variational method. We have implemented both MAP estimation and variational inference in an R-package pplr for public use of our models. For MCMC we provide the Matlab implementation for comparison.
| 3 RESULTS AND DISCUSSION |
|---|
|
|
|---|
3.1 Data sets
We compare our method with one of the most popular approaches, Cyber-T (Baldi and Long, 2001), to show the improvement obtained by including probe-level measurement variance. Two datasets are used to perform this comparison. One is a wholly defined spike-in dataset called the golden spike-in dataset (Choe et al., 2005). The other is a real-world mouse time-course dataset which was used to identify hair-cycle associated genes by (Lin et al., 2004).
The golden spike-in dataset includes two conditions each of which has three replicate chips. This dataset contains a large number of differentially expressed genes with known fold-change, 1.2- to 4-fold, and provides enough true positives to obtain adequate statistics. The two conditions, control and spike, are labelled as C and S respectively. The S sample contains the same cRNAs as the C sample except that some selected groups of cRNAs are present at a defined different increasing concentration in the S sample. The remaining cRNAs are at identical concentration. This leads to 1331 up-regulated genes and 12 679 invariant genes. For more details please refer to Choe et al. (2005).
The mouse time-course dataset is used by Lin et al. (2004) to discover regulators in hair-follicle morphogenesis and cycling by measuring gene expression patterns in mouse back skin at eight representative time points. Each time point has three or four biological replicates. Eight hair-growth associated genes are validated by quantitative real-time PCR (qr-PCR) experiments. The qr-PCR data at each time point have three replicate measurements. For the first hair-growth cycle microarray data includes five time points (day 1, 6, 14, 17 and 23) and qr-PCR data cover eight time points (day 1, 5, 8, 12, 15, 17, 19 and 23). They have three time points in common (day 1, 17 and 23). Please refer to Lin et al. (2004) for more information.
3.2 Making use of measurement error
Before we apply the Bayesian hierarchical model on replicated data, we would like to show the usefulness of the measurement error from multi-mgMOS in a single chip experiment. We select chip C1 and S1 from the golden dataset as the control and treatment chips. Since all spiked-in genes in S1 are up-regulated, we use (15) to calculate PPLR in S1 compared with C1 using the measured gene expression values and variances from multi-mgMOS. The histogram of PPLR is shown in Figure 1a. The PPLRs of most spike-in genes are close to 1 which shows the high confidence of the increasing gene expression in S1. This is consistent with the golden data where all spike-in genes are up-regulated. Most invariant genes have PPLR close to 0.5 and there is no obvious evidence for these genes to be differentially expressed. Figure 2 shows the top 50 differential expression ranking between chip C1 and S1 by (1) simple fold-change and (2) PPLR. Without considering the measurement variance, multi-mgMOS obtains a large number of false positives by fold-change ranking. After taking the credibility intervals into account, there are now no false positives in the top 50 positions.
|
|
The average receiver operator characteristic (ROC) curves for all nine possible single chip-pairs between condition C and S in the golden spike-in dataset are plotted in Figure 3s. As well as the two methods using multi-mgMOS, as shown in Figure 2, we also include a combined method suggested in Choe et al. (2005) which we denote as cMAS 5.0 because the major procedures in this method come from MAS 5.0. The results in Choe et al. (2005) show that cMAS 5.0 performs best among current statistical methods, including RMA (Irizarry et al., 2003) and GCRMA (Wu et al., 2004), on the golden spike-in dataset. We used the same form of loess normalization for all methods as used in Choe et al. (2005) to make results comparable. The area under ROC curves are 0.9226, 0.8062 and 0.7869 for PPLR of multi-mgMOS, fold-change of cMAS 5.0 and fold-change of multi-mgMOS, respectively. We notice that at the upper-right part of ROC curves PPLR results obtains slightly lower true positive rate than the other two methods. In practice we are often more concerned with obtaining a low false positive rate. For a reasonable number of false positives, on the left of the ROC curves, PPLR is much better than the other two alternatives. These results show that the uncertainty of the estimated expression level helps in detecting differential gene expression where there is only a single chip for each condition.
|
3.3 Combining replicates
In practice, people usually use replicates to estimate a level of uncertainty with the estimated gene expression level. We use the proposed Bayesian hierarchical method to include the measurement error of replicates and improve the estimation of the uncertainty of gene expression measurements.
3.3.1 Comparison of estimation accuracy
MCMC simulation obtains reliable results when it converges. We use the results from MCMC as a gold standard to evaluate the accuracy of MAP estimation and variational inference. Figure 4 shows the distribution of signal log ratio between days 14 and 1 for two probe-sets in the mouse time-course dataset. The distribution shown in Figure 4a is for one probe-set of gene Dab2 which is a known hair-growth associated gene. The estimated expression levels of this probe-set are variable over different time points. MAP approximation and variational inference obtain similar accuracy for this probe-set. The probe-set shown in Figure 4b is randomly selected and is not obviously hair-growth related, so the signal log ratio is not as high as for the probe-set in Figure 4a. For this probe-set, the variational method is closer than the MAP approximation to MCMC. In general we find that the posterior estimates of the variational approach are closer than those provided by the MAP approximation to MCMC results. In the following examples, we therefore use the variational method.
|
3.3.2 Performance on artifical dataset
Figure 1b shows the histogram of PPLR between replicated condition C and S in the golden spike-in dataset. In addition to more spike-in genes moving close to one, non-spike-in genes get tighter 0.5 compared with the histogram of PPLR between C1 and S1 in Figure 1a. We therefore obtain more confidence in the up-regulated genes and the invariant genes. We plot ROC curves (Fig. 3b) on the golden spike-in dataset to show the ability of our proposed combination method compared with the widely used approach Cyber-T which does not consider the measurement variance. From Figure 3b we can see that the inclusion of measurement error does improve the ability of multi-mgMOS to detect differentially expressed genes. The area under ROC curves for the Bayesian hierarchical method and Cyber-T on results from multi-mgMOS are 0.9431 and 0.9310, respectively, and 0.9306 for cMAS 5.0 combined with Cyber-T. More results from popular statistical methods combined with Cyber-T on the golden spike-in dataset are presented in Choe et al. (2005).
3.3.3 Performance on real dataset
In order to examine our method in an application on a real-world experiment, we applied the method to the mouse time-course dataset. Figure 5 is the temporal profile of one probe-set related to gene Dab2 in this dataset. The first hair-growth cycle is shown, which is covered by five time points. It can be seen that the combined signal obtains more confidence for each condition compared with the signal from each individual replicate and remains consistent with the qr-PCR profile.
|
The qr-PCR data in the mouse time-course dataset have three measurements at each time point. We make use of these replicate values and process them using Cyber-T at those time points that were also measured by microarray data (day 1, 17 and 23). Using the obtained statistical values of qr-PCR data, we validate the PPLR calculated from our methods and compare them with the results from Cyber-T using expression measurements obtained from cMAS 5.0 and the popular probe-level processing method GCRMA (Wu et al., 2004). The results generated from Cyber-T are associated with a P-value which does not have the same meaning as for PPLR and we cannot compare them dirrectly. In order to make them comparable, we select different credibility levels so that methods obtain a similar number of significant genes. For PPLR we set the significant level at 0.06, while the P-values for the other methods are set at 0.01. The number of significant genes at the different credibility levels for different methods are shown in the lower part of Table 1. A global scaling normalization is used for results from multi-mgMOS and cMAS 5.0 at the probe-set level, and GCRMA uses quantile normalisation at the probe level. The eight PCR validated genes have 14 associated probe-sets shown in Table 1. We find the differential gene expression between two pairs of time points, (day 1, 17) and (day 17, 23). We exclude day 1 of gene Crisp1 owing to the absence of replicate measurements of qr-PCR data at this time point. There are 27 tests altogether in this data. From these 27 tests PPLR obtains two inconsistent results with qr-PCR, while cMAS 5.0+Cyber-T and GCRMA+Cyber-T obtain three. Results from our method therefore show more consistency with results from qr-PCR data.
|
| 4 CONCLUSION |
|---|
|
|
|---|
We have presented an approach using probe-level measurement error in order to improve the detection of differential gene expression and compared three different computation methods, MAP approximation, a variational method and MCMC, to solve the intractability in the model owing to the incorporation of probe-level measurement error. This approach makes full use of information in microarray experimental data and performs relatively efficient and accurate inference based on this rich information. Results on a spike-in dataset and a real time-course dataset show that the inclusion of probe-level measurement error improves the accuracy of finding differentially expressed genes, especially when there are few replicate chips for conditions. Recent work on PCA demonstrates that probe-level uncertainties can also be incorporated into other probabilistic models, leading to an improved method (Sanguinetti et al., 2005). Where possible we recommend that probe-level measurement error should be included in the downstream analysis in order to obtain more reasonable biological conclusions from a microarray experiment.
| Acknowledgments |
|---|
The authors gratefully acknowledge Padhraic Smyth and Bogi Andersen for providing the mouse time-course qr-PCR data, Leo Zeef for helpful suggestions on the software and three anonymous reviewers for constructive comments. M.R. and N.D.L. gratefully acknowledge a BBSRC award Improved processing of microarray data with probabilistic models. M.M. is supported by an Advanced Training Fellowship from the Wellcome Trust.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Martin Bishop
Received on March 30, 2006; revised on June 5, 2006; accepted on June 26, 2006
| REFERENCES |
|---|
|
|
|---|
Affymetrix. Microarray Suite User Guide Version 5.0, (2001) , Santa Clara, CA Affymetrix Inc.
Baldi, P. and Long, A.D. (2001) A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics, 17, 509519
Choe, S.E., et al. (2005) Preferrend analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset. Genome Biol, . 6, R16[CrossRef][Medline].
Delmar, P., et al. (2005) VarMixt: efficient variance modelling for the differential analysis of replicated gene expression data. Bioinformatics, 21, 502508
Dempster, A.P., et al. (1977) Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat Soc, . B39, 138.
Gelman, A., et al. Bayesian Data Analysis, (2004) 2nd , Boca Raton, FL Chapman & Hall/CRC.
Ghahramani, Z. and Beal, M.J. (2001) Graphical models and variational methods. In Opper, M. and Saad, D. (Eds.). Advanced Mean Field MethodsTheory and Practice, , Cambridge, MA MIT Press, pp. 161177.
Hein, A.K., et al. (2005) BGX: a fully Bayesian integrated approach to the analysis of Affymetrix GeneChip data. Biostatistics, 6, 349373
Irizarry, R.A., et al. (2003) Exploration, normalization and summaries of high density oligonucleotide array probe level data. Biostatistics, 4, 249264[Abstract].
Jordan, M.I., et al. (1999) An introduction to variational methods for graphical models. Mach. Learn, . 37, 183233[CrossRef].
Krohn, K., et al. (2005) Increased power of microarray analysis by use of an algorithm based on a multivariate procedure. Bioinformatics, 21, 35303534
Lawrence, N.D., et al. (2004) Reducing the variability in cDNA microarray image processing by Bayesian inference. Bioinformatics, 20, 518526
Lin, K.K., et al. (2004) Identification of hair cycle-associated genes from time-course gene expression profile data by using replicate variance. Proc. Natl Acad. Sci. USA, 101, 1595515960
Liu, X., et al. (2005) A tractable probabilistic model for Affymetrix probe-level analysis across multiple chips. Bioinformatics, 21, 36373644
Lockhart, D.J., et al. (1996) Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol, . 14, 16751680[CrossRef][ISI][Medline].
Milo, M., et al. (2003) A probabilistic model for the extraction of expression levels from oligonucleotide arrays. Biochem. Soc. Trans, . 31, 15101512[ISI][Medline].
Rocke, D.M. and Durbin, B. (2001) A model for measurement error for gene expression arrays. J. Comput. Biol, . 8, 557569[CrossRef][ISI][Medline].
Sanguinetti, G., et al. (2005) Accounting for probe-level noise in principal component analysis of microarray data. Bioinformatics, 21, 37483754
Schena, M., et al. (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science, 270, 467470
Spellucci, P. (1998) A SQP method for general nonlinear programs using only equality constrained subproblems. Math. Program, . 82, 413448.
Tusher, V.G., et al. (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA, 98, 51165121
Wu, Z., et al. (2004) A model based background adjustment for oligonucleotide expression arrays. J. Am. Stat. Assoc, . 99, 909917[CrossRef][ISI].
Yang, Y.H., et al. (2005) Identifying differentially expressed genes from microarray experiments via statistic synthesis. Bioinformatics, 21, 10841093
This article has been cited by other articles:
![]() |
B. Jayawardhana, D. B. Kell, and M. Rattray Bayesian inference of the sites of perturbations in metabolic pathways via Markov chain Monte Carlo Bioinformatics, May 1, 2008; 24(9): 1191 - 1197. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. B. Silkworth, E. A. Carlson, C. McCulloch, K. Illouz, S. Goodwin, and T. R. Sutter Toxicogenomic Analysis of Gender, Chemical, and Dose Effects in Livers of TCDD- or Aroclor 1254-Exposed Rats Using a Multifactor Linear Model Toxicol. Sci., April 1, 2008; 102(2): 291 - 309. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





















