Skip Navigation


Bioinformatics Advance Access originally published online on May 16, 2006
Bioinformatics 2006 22(14):1682-1689; doi:10.1093/bioinformatics/btl183
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/14/1682    most recent
btl183v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Park, T.
Right arrow Articles by Lee, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Park, T.
Right arrow Articles by Lee, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Combining multiple microarrays in the presence of controlling variables

Taesung Park 1,*, Sung-Gon Yi 1, Young Kee Shin 2 and SeungYeoun Lee 3

1 Department of Statistics, College of Pharmacy, Seoul National University Seoul, Korea
2 Department of Pharmacy, College of Pharmacy, Seoul National University Seoul, Korea
3 Department of Applied Mathematics, Sejong University Seoul, Korea

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 

Motivation: Microarray technology enables the monitoring of expression levels for thousands of genes simultaneously. When the magnitude of the experiment increases, it becomes common to use the same type of microarrays from different laboratories or hospitals. Thus, it is important to analyze microarray data together to derive a combined conclusion after accounting for the differences. One of the main objectives of the microarray experiment is to identify differentially expressed genes among the different experimental groups. The analysis of variance (ANOVA) model has been commonly used to detect differentially expressed genes after accounting for the sources of variation commonly observed in the microarray experiment.

Results: We extended the usual ANOVA model to account for an additional variability resulting from many confounding variables such as the effect of different hospitals. The proposed model is a two-stage ANOVA model. The first stage is the adjustment for the effects of no interests. The second stage is the detection of differentially expressed genes among the experimental groups using the residuals obtained from the first stage. Based on these residuals, we propose a permutation test to detect the differentially expressed genes. The proposed model is illustrated using the data from 133 microarrays collected at three different hospitals. The proposed approach is more flexible to use, and it is easier to accommodate the individual covariates in this model than using the meta-analysis approach.

Availability: A set of programs written in R will be electronically sent upon request.

Contact: tspark{at}stats.snu.ac.kr


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
Microarray technology has important applications in pharmaceutical and clinical research. By comparing gene expression in control and tumor tissues, for example, microarrays can be used to identify tumor-related genes and targets for therapeutic drugs. In microarray experiments, the identification of differentially expressed genes is an important issue. Statistical test procedures have served as useful tools for identifying the differentially expressed genes. For a single slide experiment, a simple cutoff method was proposed to identify the differentially expressed genes (Chen et al., 1997). In addition, a hierarchical Bayesian model was proposed using the posterior odds of change (Newton et al., 2001). Presently, the importance of replication in microarray experiments has been revealed by many researchers, mainly for increasing the precision of estimated quantities and to provide information with regard to the uncertainty of estimates (Kerr and Churchill, 2001; Lee et al., 2000). Recently, many statistical models have been proposed for analyzing multiple slides (Kendziorski et al., 2003; Efron et al., 2001; Kerr et al., 2000; Park et al., 2003; Ideker et al., 2000; Dudoit et al., 2002, 2003; Tusher et al., 2001; Pan, 2003).

With an increase in the number of microarrays, however, the sources of errors also increase. For example, the microarray experiment in our study was conducted at three different hospitals. The main objective of the study was to identify the differentially expressed genes between control tissues and tumor tissues. As expected, the hospitals served as an important source of variability in our microarray experiment. The main interest of this study was not a comparison among the hospitals but identification of the differentially expressed genes after accounting for differences among the hospitals.

Recently, statistical approaches based on meta-analysis have been proposed in order to combine independent and heterogeneous microarray studies (Rhodes et al., 2002, 2004; Choi et al., 2003a,b). The key idea of meta-analysis is to combine the summary statistics from each study in which the commonly used summary statistics are significant levels (p-values) and effect sizes. These summary statistics are combined across different studies to estimate the overall summary statistic.

In this article, we propose an alternative statistical procedure to identify genes that have different gene expression profiles in the presence of many controlling variables. We extend the usual analysis of variance (ANOVA) model to account for an additional variability resulting from many confounding variables such as hospitals. The key idea in this approach is to consider hospitals as one of the controlling variables in the model. The proposed model provides an integrated analysis of microarray data from multiple sources. It is a two-stage ANOVA model. The first stage is the adjustment for the effects of no interests. The second stage is the detection of differentially expressed genes among the experimental groups by using the permutation test based on the residuals from the first stage. The proposed model is illustrated using the data from 133 microarrays collected at three different hospitals.

The article is organized as follows. The proposed ANOVA models and test procedures are presented in Section 2. Test statistics with false discovery rate (FDR) are also discussed. The results of the analysis are presented in Section 3. Finally, the concluding remarks are summarized in Section 4.


    2 MATERIALS AND METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
2.1 Materials
Four independent cDNA microarray datasets were generated from three hospitals using two different chips, as described in Choi et al. (2003a). In summary, Chip 1 contains 10 336 human cDNA clones that were verified by single pass sequencing. Chip 2 contains 10 368 human cDNA clones. The chips were cDNA chips with two-colors, where the method of labeling samples and controls is described in Table 1. The spot intensities were computed using GenePix Pro5.x (Axon). The resolution of each image file was 2049 x 3802 and its size is ~50–60 Mb. The wavelengths of fluorescence were 635 nm for Cy5 and 532 nm for Cy3. Further detailed description of chips used in the microarray experiment has been uploaded in Gene Expression Omnibus (GEO) site (http://www.ncbi.nlm.nih.gov/geo/) with Get GEO accession number GPL2911 [NCBI GEO] .


View this table:
[in this window]
[in a new window]
 
Table 1 Descriptive information for datasets

 
In our analysis, we used 9984 clones that were common to both chips. Chip 1 consists of 32 19 x 17 subgrids, and Chip 2 consists of 32 18 x 18 subgrids. The remaining 352 (i.e. 10 336–9984) and 384 (i.e. 10 368–9984) genes were different genes such as control and spiked genes that did not serve our interest. The main objective of the study was to identify the differentially expressed genes in hepatitis B virus (HBV) positive hepatocellular carcinoma (HCC) samples. HCC and adjacent control tissues, including those of liver cirrhosis and chronic hepatitis, were obtained from the patients.

Although the microarray experiment was performed under relatively controlled experimental conditions, many differences existed among the hospitals owing to different experimental procedures and laboratory conditions such as sample preparation, hybridization and image analysis.

Table 1 summarizes some characteristics of the normalized datasets D1–D4. The chip type (1 and 2), labeling scheme, hospital and number of samples are shown in this table. Here, the data were normalized by locally weighted scatterplot smoothing (LOWESS; Cleveland, 1979). For LOWESS normalization, the value of the span parameter was 0.75 and the tricubic function was used as a weight function. For robustness analysis, Tukey's biweight function was used. In each microarray, two different tissues were used for labelling Cy5 and Cy3. The fourth column in Table 1 summarizes the labelling scheme. A reference sample different from the other three experiments was used for D1, while the opposite labeling scheme was used for D4. In addition, Table 1 summarizes some descriptive statistics such as missing data proportion, mean of standard deviations (SDs) and median absolute deviations (MADs) of the normalized data. The missing data were generated if the background intensity was greater than the signal intensity of Cy5 or Cy3. Channel intensity was defined as the signal intensity subtracted by the background intensity.

These descriptive statistics reveal the existence of some differences among the four datasets. For example, the missing data proportions of D1 and D4 are >20%, where the missing data proportion was computed from all slides in the same dataset. After removing missing data, there were only 39 genes commonly appeared in all 133 slides. However, 8227 genes commonly appeared in >100 slides. The mean SD of D2 is considerably smaller than those of the other datasets.

Figure 1 shows the box-plots of original intensities of Cy5 and Cy3 channels from all 133 slides. Each vertical line in Figure 1 shows the distribution of the original spot intensities. ‘T’ represents the tumor samples and ‘C’ represents the control samples.


Figure 1
View larger version (26K):
[in this window]
[in a new window]
 
Fig. 1 Box-plots showing the distribution of the original two channel intensities from the 133 slides of the tumor and control samples after location normalization.

 
Figure 2 shows the summarized box-plots for these datasets. Figure 2a and b show the box-plots of mean intensities for the original Cy5 and Cy3 intensities, respectively. The boxes were drawn separately for the control and tumor samples. Here, from each slide, a single mean intensity was computed for all genes. As expected, many differences were observed among the four datasets. Particularly, dataset D4 had larger differences in the mean values between control and tumor samples than the other datasets. Figure 2c and d show the box-plots of missing proportions that were computed from each slide. Further, many differences were observed among the four datasets. Datasets D1 and D4, in particular, showed high rates of missing proportions.


Figure 2
View larger version (10K):
[in this window]
[in a new window]
 
Fig. 2 Box-plots of summary measures of the log-transformed ratios for the tumor and control samples.

 
In order to further explore the datasets, we applied the diagnostic plots to identify some outlying chips (Park et al., 2005). The number of outlying chips is summarized in the last two columns of Table 1. Figure 3 shows the scatterplots for the six outlying slides detected at {alpha} = 0.05. The x-axis represents the log-transformed Cy5 channel intensity, and the y-axis represents the log-transformed Cy3 channel intensity. The scatterplots show quite distinct patterns from the usual Cy5 and Cy3 plots. In Section 3, we present the results of all data and the results after removing the six slides.


Figure 3
View larger version (15K):
[in this window]
[in a new window]
 
Fig. 3 Original red and green channel intensity plots of six outlying slides.

 
2.2. Methods
Table 2 shows the structure of data for fitting the ANOVA model. For the i-th chip from hospital h, let Formula be the log-transformed ratio of the two channel intensities. Smith et al. (2003) explained why the log-transformed value should be used instead of the original intensity value for the cDNA microarray data. Let Formula be a class variable for distinguishing the control and tumor tissues. Let Formula and Formula be class variables for the hospital and chip, respectively. For example, Formula consists of two dummy variables representing the three hospitals. The ANOVA model accounting for the effects of the hospital and chip is given by the following equation:

Formula 1(1)
where ßT is the effect of the main interest representing the treatment effect between the control and tumor tissues. ßH and ßC represent the hospital effect and the chip effect, respectively. The main goal of fitting Model (1) is to explore the genes with significant ßT that are differentially expressed between the control and tumor tissues. Note that all the effects in the model are assumed to be linear.


View this table:
[in this window]
[in a new window]
 
Table 2 Data structure for fitting ANOVA models

 
The ANOVA model assumes that the error term {varepsilon}hi is normally distributed. However, we do not expect that this assumption to hold in the microarray experiments. Therefore, we propose a permutation test that does not require the normality assumption. In order to perform the permutation test more efficiently, we employed the following two-stage approach (Park et al., 2003). The first stage is the accounting of the hospital effect and the chip effect by fitting the model with only these effects. The residuals from this model are then free from these effects but contain information pertaining to the treatment effect. The second stage uses the residuals to extract the treatment information by using the permutation test.
Stage 1. Fitting the ANOVA model without ßT is given by the following equation:

Formula 2(2)

Stage 2. Calculate residuals for Model (2):

Formula 2(2)

Perform two sample permutation tests such as the t-statistic based on the information on treatment.

Stage 2 can be performed using SAM (Significance Analysis of Microarray) (Tusher et al., 2001) in which the error rate is controlled by the false discovery rate (FDR) (Benjamini and Hochberg, 1995). Alternatively, the family wise error rate can be controlled by adjusting the p-values (Westfall and Young, 1993).

Although the proposed method uses the permutation test that does not require the normality assumption, the performance of the proposed method would be better if the normality assumption is met, because rhi represent the residual effects after removing the effect of the factors of no interest under the normality assumption.


    3 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
First, we analyzed each dataset separately by using the permutation test based on the two-sampled t-statistic. In Table 3, the four columns under D1–D4 show the number of significant genes detected by separate analysis for each dataset. The next column under ALL slides provides the number of genes detected using the proposed ANOVA model. The numbers in parentheses under D1–D4 are the numbers of genes that were also detected by the ALL slides analysis. As the upper bound of FDR increases, the number of detected genes also increases for each analysis.


View this table:
[in this window]
[in a new window]
 
Table 3 Original SAM approach: number of genes differentially expressed between tumor and control tissues

 
When the two-stage approach was applied to all datasets, we identified 78 differentially expressed genes between the tumor and control groups where the estimated FDR was 1%. In general, the proposed ALL analysis yielded a greater number of significant genes than that obtained using the separate analysis. Except D2, all other datasets rarely yielded significant genes even when the FDR value was 5%.

Figure 4 shows the quantile–quantile (QQ) plot of the SAM result of (Tusher et al., 2001). The x-axis represents the expected values of test statistics based on the permuted data. The y-axis represents the observed statistics from the data. Figure 4a shows the QQ plot for ALL analysis, and Figures 4b–e show the QQ plots for D1 to D4, respectively. The dots outside the dotted lines represent significant genes. Except D2, no other individual datasets showed any significant genes. The ALL analysis yielded a greater number of genes. In the five QQ plots, all lines have slopes close to one. Thus, the observed value of test statistic is close to the expected value, except for some genes in the left lower corner detected in D2 and ALL.


Figure 4
View larger version (11K):
[in this window]
[in a new window]
 
Fig. 4 SAM plots for combined dataset (a) and for four datasets (b-e) (FDR = 5%).

 
Combining the multiple datasets usually increases the power to detect the differentially expressed genes. Figure 2a provides some information on the reason behind the significant genes not being detected in other experiments, except D2, in the separate analysis. For example, D1 does not show any differences in the means of control and tumor groups; D2 shows differences in the means with very small dispersions; and D3 and D4 showed some differences in the means but the SDs were too large for their significance to be detected.

In order to see the effect of the outlying sides, we performed the analysis after removing (R) six outlying slides. As shown in Figure 3, these six slides have characteristics that are quite different from the others. For the same FDR value, the number of genes is slightly larger than that of ALL analysis (Table 3). Thus, the exclusion of six outlying slides increased the power of the test.

In addition, we analyzed the data using the classical F-test of Model (1) under the normality assumption. The number of significant genes detected using this test was smaller than that detected using the proposed method for the FDR value of ≤10% (Table 3). The F-test also appeared to have smaller powers than the permutation test.

Finally, we analyzed the data by the meta analysis. First, we computed the effect size for each dataset and then performed the original SAM analysis to identify the differentially expressed genes. When the FDR value was ≤10%, the number of significant genes was smaller than that obtained by the proposed method. However, when the FDR value was 20%, the number of significant genes was larger than that obtained by the proposed method. With such a high FDR value, however, the results would not be reliable. The number in the parentheses also represents the number of genes that were detected by the ALL slides analysis.

The meta analysis result shown in Table 3 differs from that of Choi et al. (2003a). In Table 3, we applied the standard SAM for the estimated effect sizes. Thus, the significant genes were ordered in terms of their test statistics. The test statistics were compared to its expected values under the null hypothesis (Tusher et al., 2001). As shown in Figure 4, most significant genes have negative effects, that is, the selected genes are more highly expressed in control tissues than in tumor tissues. On the other hand, Choi et al. (2003a) used an ad hoc meta approach by fixing the threshold of the test statistics, e.g. |Z| > 2.5. The threshold was chosen for the preselected FDR value.

For the purpose of comparison, we performed the analysis of Choi et al. (2003a). The results are summarized in Table 4. For a given fixed FDR value, the threshold value of the proposed test statistic was determined. For example, when the FDR value was 20%, the threshold value of the proposed ANOVA test was 1.965. Unlike the ordinary SAM analysis, this ad hoc approach yielded a smaller number of significant genes for the proposed ANOVA test than that for the meta analysis. A total of 313 genes were identified by both methods. The proposed method detected additional 23 genes, whereas the meta-analysis detected additional 165 genes. A partial list of these genes detected by the meta-analysis were given in Choi et al. (2003a).


View this table:
[in this window]
[in a new window]
 
Table 4 Ad hoc SAM approach: number of genes differentially expressed between tumor and control tissues

 
Table 5 shows the list of 21 known genes detected by both the methods when the FDR value was 1%. All the 21 known genes were downregulated in hepatocellular carcinoma (HCC) and covered a broad range of functional activities. The expression of some genes including C9, ADH1C, CYP2C9, HP, BHMT, TDO2, APOA1 and SLC22A1 is confined mainly to the liver, and in accordance with the previous reports, their downregulation reflected the derangement of liver function in HCC. The impaired expression of BHMT indicates a reduced capacity of the HCC tissue to catabolize homocysteine (Avila et al., 2000; Hoffman 1984).


View this table:
[in this window]
[in a new window]
 
Table 5 Common gene list of ANOVA and meta analysis when the FDR value is 1%

 
SLC22A1, one of the polyspecific organic cation transporters, showed reduced expression in HCC; this was in contrast to P-gp, which was elevated in both chemically-induced malignant neoplastic liver lesions and hepatocarcinoma cell lines. As shown in the case of progression of HCV-infected liver to HCC, CYP2C9 was downregulated in HCC caused due to infection of the liver with HBV (Tsunedomi et al. 2005). PRRG2 contains the highly conserved residues implicated in gamma-carboxylation, an N-terminal Gla domain. The Gla residues within these Gla domains are produced by the posttranslational modification of specific glutamic acid residues by a vitamin K-dependent gamma-carboxylase. It is possible that PRRG2 downregulation is related to the abnormal gamma-carboxylation process observed in HCC (Huisse et al., 1994; Naraki et al., 2002).

Table 5 shows two interesting unexpected molecules that require further investigation because they may be the causal genes of HCC: UBE4A and SDCCAG1. Because the derangement of the 11q23 region containing the UBE4A gene is frequently involved in some specific cancers such as neuroblastoma and some types of leukemia (Contino et al., 2004), the reduced expression of UBE4A, a U-box-type ubiquitin ligase, in HCC suggests that UBE4A downregulation is critical for hepatocarcinogenesis. SDCCAG1, a mediator of nuclear export, was not expressed in five human lung carcinoma cell lines (Bi et al., 2005).

Table 6 shows the list of genes detected using the proposed ANOVA method when the FDR value was 5%. It shows several interesting genes that were not predicted by the meta-analysis but showed downregulation in the previous reports: C3, Ob-R (db, leptin receptor), DKK3 (REIC, DORGH-1) and CXCL12 (SDF-1alpha). The expression of Leptin and Ob-R was markedly reduced in hepato cellular carcinoma (Wang et al., 2004). DKK3, an inhibitor of Wnt signaling, functions as a tumor suppressor and was downregulated in some cancers, including hepatocellular carcinoma (Hsieh et al., 2004). CXCL12 downregulation during hepatocarcinogenesis was validated by immunohistochemical analysis (Shibuta et al., 2004).


View this table:
[in this window]
[in a new window]
 
Table 6 Gene list detected only by the ANOVA analysis when the FDR value is 5

 
Table 7 shows the number of significant genes with positive effects (+) and significant genes with negative effects (–). When the FDR value was 1%, both ANOVA and meta approaches yielded genes with negative effects. When the FDR value was 5%, the original SAM approaches yielded only genes with negative effects, while the ad hoc SAM approach yielded some genes with positive effects. Both ANOVA and meta approaches exhibited similar patterns. Table 7 also shows the existence of a greater number of downregulated genes. However, when we counted the total number of upregulated genes and downregrulated genes, the ad hoc SAM approach yielded 4998 ‘+’ and 4986 ‘–’ genes. Thus, these numbers are approximately same, which implies that only downregulated genes showed more significant effects.


View this table:
[in this window]
[in a new window]
 
Table 7 Numbers of significant genes with positive effects and negative effects, respectively

 

    4 DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
In this article, we propose a statistical model to identify genes that have different gene expression profiles in the presence of many controlling variables. The proposed ANOVA model is much more flexible than meta-analysis. One of the main limitations of meta-analysis is that it cannot handle the slide-specific covariates appropriately. The effect size in meta-analysis is simply the standardized mean difference between the tumor tissues and the control tissues. The simple differences for each hospital is then combined together to estimate the overall difference by using some random effects models. Since the mean value is computed for all tumor (control) slides, it cannot account for the effect of slide-level covariate (variable). For example, suppose a controlling variable Z has only values 1 and 2 in hospital 1, and 3 and 4 in hospital 2. Then, the usual meta-analysis cannot handle the controlling variable Z appropriately, because Z is unbalanced within hospitals.

In microarray experiment, taking into account the individual characteristics in the analysis is of great importance. For example, the clinical covariates (variables) such as age, gender and tumor stage might be important controlling variables. These covariates are usually slide-specific and differ with each slide. The meta-analysis would lose slide-specific information by computing the mean values for all slides from the same hospital. Thus, it may ignore the slide-specific controlling variables. On the other hand, the proposed ANOVA model is capable of handling slide-specific information because it can include all slide-specific covariates (variables) in the model. As a result, the proposed model can easily control or adjust for the covariates, irrespective of whether they are hospital-specific or slide-specific.

Furthermore, the proposed ANOVA model can provide some additional information. For example, if we are interested in determining the existence of some genes related to liver tumors in a specific hospital, it can be achieved by adding an interaction term between the hospital and treatment. In addition, if we are interested in determining whether certain specific genes are more strongly expressed in a specific chip type, it can be achieved by adding an interaction term between the chip and treatment.

In the LOWESS-normalized 133 slides, the dispersion of distributions differed among the four datasets (Fig. 1). In a further analysis, we performed the weighted ANOVA analysis to account for these variabilities. However, the results were not considerably different from those of the original ANOVA analysis.


    Acknowledgments
 
The authors would also like to thank Dr J. K. Choi and Dr S. Kim for kindly providing the dataset and H. S. Lee for providing the GEO database information. The authors would like to thank the three anonymous referees for their helpful comments. This work was partially supported by the National Research Laboratory Program of Korea Science and Engineering Foundation (M10500000126).

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Martin Bishop

Received on January 2, 2006; revised on May 8, 2006; accepted on May 8, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 

    Avila, M.A., et al. (2000) Reduced mRNA abundance of the main enzymes involved in methionine metabolism in human liver cirrhosis and hepatocellular carcinoma. J. Hepatol, . 33, 907–914[CrossRef][Web of Science][Medline].

    Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B, 57, 289–300.

    Bi, X., et al. (2005) Drosophia caliban, a nuclear export mediator, can function as a tumor suppressor in human lung cancer cells. Oncogene, 24, 8229–8239[CrossRef][Web of Science][Medline].

    Chen, Y., et al. (1997) Ratio-based decisions and the quantitative analysis of cdna microarray images. J. Biomed. Opt, . 2, 364–374[CrossRef].

    Choi, J.K., et al. (2003a) Integrative analysis of multiple gene expression profiles applied to liver cancer study. FEBS Lett, . 565, 93–100.

    Choi, J.K., et al. (2003b) Combining multiple microarray studies and modeling interstudy variation. Bioinformatics, 19, Suppl. 1, 184–190.

    Cleveland, W.S. (1979) Robust locally weighted regression and smoothing scatterplots. J. Am. Stat. Assoc, . 74, 829–836[CrossRef][Web of Science].

    Contino, G., et al. (2004) Expression analysis of the gene encoding for the u-box-type ubiquitin ligase ube4a in human tissues. Gene, 17, 69–74.

    Dudoit, S., et al. (2002) Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat. Sinica, 12, 111–139.

    Dudoit, S., et al. (2003) Multiple hypothesis testing in microarray experiments. Stat. Science, 18, 71–103[CrossRef].

    Efron, B., et al. (2001) Empirical Bayes analysis of a microarray experiment. J. Am. Statist. Assoc, . 96, 1151–1160[CrossRef][Web of Science].

    Hoffman, R.M. (1984) Altered methionine metabolism, DNA methylation and oncogene expression in carcinogenesis. A review and synthesis. Biochim Biophys Acta, 738, 49–87[Medline].

    Hsieh, S.Y., et al. (2004) Dickkopf-3/REIC functions as a suppressor gene of tumor growth. Oncogene, 57, 9183–9189.

    Huisse, M.G., et al. (1994) Mechanism of the abnormal vitamin k-dependent gamma-carboxylation process in human hepatocellular carcinomas. Cancer, 74, 1533–1541[CrossRef][Web of Science][Medline].

    Ideker, T., et al. (2000) Testing for differentially-expressed genes by maximum-likelihood analysis of microarray data. J. Comput. Biol, . 7, 805–817[CrossRef][Web of Science][Medline].

    Kendziorski, C.M., et al. (2003) On parametric empirical bayes methods for comparing multiple groups using replicated gene expression profiles. Stat. Med, . 22, 3899–3914[CrossRef][Web of Science][Medline].

    Kerr, M.K., et al. (2000) Analysis of variance for gene expression microarray data. J. Comput. Biol, . 7, 819–837[CrossRef][Web of Science][Medline].

    Kerr, M.K. and Churchill, G.A. (2001) Experimental design for gene expression microarrays. Biostatistics, 2, 183–201[Medline].

    Lee, M.-L.T., et al. (2000) Importance of replication in microarray gene expression studies: Statistical methods and evidence from repetitive cdna hybridizations. Proc. Natl Acad. Sci. USA, 97, 9934–9839.

    Naraki, T., et al. (2002) {gamma}-carboxyglutamic acid content of hepatocellular carcinoma-associated des-{gamma}-carboxy prothrombin. Biochim Biophys Acta, 1586, 287–298[Medline].

    Newton, M.A., et al. (2001) On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J. Comput. Biol, . 8, 37–52[CrossRef][Web of Science][Medline].

    Pan, W. (2003) On the use of permutation in and the performance of a class of nonparametric methods to detect differential gene expression. Bioinformatics, 19, 1333–1340[Abstract/Free Full Text].

    Park, T., et al. (2003) Evaluation of normalization methods for microarray data. BMC Bioinformatics, 4, 33[CrossRef][Medline].

    Park, T., et al. (2005) Diagnostic plots for detecting outlying slides in a cDNA microarray experiment. BioTechniques, 38, 463–471[Web of Science][Medline].

    Rhodes, D.R., et al. (2002) Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. Cancer Res, . 62, 4427–4433[Abstract/Free Full Text].

    Rhodes, D.R., et al. (2004) Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progreession. Proc. Natl Acad. Sci. USA, 101, 9309–9314[Abstract/Free Full Text].

    Shibuta, K., et al. (2002) Regional expression of CXCL12/CXCR4 in liver and hepatocellular carcinoma and cell-cycle variation during in vitro differentiation. Jpn. J. Cancer Res, . 93, 789–797[CrossRef][Web of Science][Medline].

    Smyth, G.K., Yang, Y.H., Speed, T. (2003) Statistical issues in cDNA microarray data analysis. In Totowa, N.J., Brownstein, M.J., Khodursky, A.B. (Eds.). Methods in Molecular Biology series, , Totowa, NJ, USA Humana Press, pp. 111–136.

    Tsunedomi, R., et al. (2005) Patterns of expression of cytochrome p450 genes in progression of hepatitis c virus-associated hepatocellular carcinoma. Int. J. Oncol, . 27, 661–667[Web of Science][Medline].

    Tusher, V.G., et al. (2001) Significance analysis of microarrays applied to the inoizing radiation response. Proc. Natl Acad. Sci. USA, 98, 5116–5121[Abstract/Free Full Text].

    Wang, X.J., et al. (2004) Potential involvement of leptin in carcinogenesis of hepatocellular carcinoma. World J. Gastroenterol, . 10, 2478–2481[Medline].

    Westfall, P.H. and Young, S.S. Resampling-based Multiple Testing: Examples and Methods for p-value Adjustment, (1993) , NY, USA John Wiley & Sons, Inc.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/14/1682    most recent
btl183v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Park, T.
Right arrow Articles by Lee, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Park, T.
Right arrow Articles by Lee, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?