Bioinformatics Advance Access originally published online on June 20, 2006
Bioinformatics 2006 22(19):2452; doi:10.1093/bioinformatics/btl333
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
© 2006 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Response to comments on Bayesian Hierarchical Error Model for Analysis of Gene Expression Data
Department of Public Health Sciences, University of Virginia, Charlottesville VA 22908, USA
Department of Statistics, Korea University Seoul, Korea
Department of Public Health Sciences, University of Virginia Charlottesville, VA 29908, USA
We greatly thank the authors of this letter for pointing out the significance of our original contribution of the hierarchical error model (HEM) in Cho and Lee (2004). As the authors suggested, we agree that an extension of HEM can be made for gene expression data with biological and/or experimental correlations. However, we here discuss several issues in response to some of the points raised in this letter.
First, in this letter the simplified posterior distributions were derived under the assumption of the same numbers of biological and experimental replicates for all conditions, i.e. mij = m and nijk = n. However, in practical microarray studies, the numbers of replicates can often differ among different conditions (e.g. mi1 = 5 and mi2 = 6). Thus, in our original HEM model we considered this kind of heterogeneous sample sizes among different combinations of genes and conditions. Furthermore, expression values can be completely missing for a certain combination of gene and condition, i.e. nijk = 0, owing to quality control and other experimental reasons. In this case the posterior probability of xijk has to be defined with a reduced form
, which is a logical imputation for the missing combination from the available information of the corresponding gene and condition. Thus, we respectfully argue that our simplified form of Equation (6) is appropriate for the aforementioned general cases. Also, we note that the case that nijk = 1 for all i, j, k was presented in the section entitled Hierarchical error model with no replicates in our original paper.
Second, as pointed out in this letter, parameter
must be replaced with
, ..,
i., and
j for µ, gi and cj, respectively. The errors, however, occurred while we attempted to simplify notations in our paper, and it was irrelevant to claim a mistake in our actual derivation and implementation of the corresponding posterior distributions. For example, our HEM methodology was a further extension from a previous study in Lee et al. (2001), in which similar expressions have been shown without such simplification. Therefore, all the results in Cho and Lee (2004) were obtained from the correct posterior distributions, which can be performed by our software HEM available at the BioConductor website (http://www.bioconductor.org).
Third, we again completely agree that the extension of this letter addresses the important issue of biological and experimental correlations in practical microarray studies. However, we cautiously point out that it may be extremely difficult to precisely estimate all the numerous correlation parameters from practical microarray data, which are often based on a small sample size, e.g. duplicates or triplicates per condition.
Finally, we would like to mention that we have recently further refined our HEM methodology by precisely estimating heterogeneous errors based on an error-pooling-based empirical Bayes prior specification (under review); this refined algorithm has also been added to our HEM package at the Bioconductor website.
In summary, as the authors in this letter pointed out, we strongly believe that HEM methodology is quite useful in microarray data analysis by carefully incorporating complex error structures of such complex, high-throughput biological data, and that further extensions and refinements such as the ones proposed in this letter and our empirical Bayes approach will greatly enhance the utility of this technology.
| REFERENCE |
|---|
|
|
|---|
Cho, H. and Lee, J.K. (2004) Bayesian hierarchical error model for analysis of gene expression data. Bioinformatics, 20, 20162025
Lee, J.K., et al. (2001) Analysis of gene expression data of the NCI 60 cancer cell lines using Bayesian hierarchical effects model. Proc. SPIE, 4266, 228235[CrossRef].
Wu, X.L., Forney, L.J., Joyce, P. (2006) Comments on "Bayesian hierarchical error model for analysis of gene expression data". Bioinformatics, 22, 24462451
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||