Bioinformatics Advance Access originally published online on October 10, 2006
Bioinformatics 2006 22(23):2898-2904; doi:10.1093/bioinformatics/btl500
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Large scale data mining approach for gene-specific standardization of microarray gene expression data
1 Department of Biological Sciences, Sookmyung Women's University Hyochangwongil 52, Youngsan-gu, Seoul, Republic of Korea, 140-742
2 Research Center for Women's Diseases (RCWD), Sookmyung Women's University Hyochangwongil 52, Youngsan-gu, Seoul, Republic of Korea, 140-742
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: The identification of the change of gene expression in multifactorial diseases, such as breast cancer is a major goal of DNA microarray experiments. Here we present a new data mining strategy to better analyze the marginal difference in gene expression between microarray samples. The idea is based on the notion that the consideration of gene's behavior in a wide variety of experiments can improve the statistical reliability on identifying genes with moderate changes between samples.
Results: The availability of a large collection of array samples sharing the same platform in public databases, such as NCBI GEO, enabled us to re-standardize the expression intensity of a gene using its mean and variation in the wide variety of experimental conditions. This approach was evaluated via the re-identification of breast cancer-specific gene expression. It successfully prioritized several genes associated with breast tumor, for which the expression difference between normal and breast cancer cells was marginal and thus would have been difficult to recognize using conventional analysis methods. Maximizing the utility of microarray data in the public database, it provides a valuable tool particularly for the identification of previously unrecognized disease-related genes.
Availability: A user friendly web-interface (http://compbio.sookmyung.ac.kr/~lage/) was constructed to provide the present large-scale approach for the analysis of GEO microarray data (GS-LAGE server).
Contact: yoonsj{at}sookmyung.ac.kr
| 1 INTRODUCTION |
|---|
|
|
|---|
One of the most popular uses of DNA microarrays is the comparison of differences in gene expression under two distinct experimental conditions (treated versus untreated samples, diseased versus normal tissue, mutant versus wild-type organisms, etc.) (Breitling et al., 2004). In this type of experimental setup, a major challenge is the identification of those genes whose expression is significantly different between two conditions (Aittokallio et al., 2003). Many sophisticated statistical methods have been tested, in attempts to achieve a more reliable identification of differentially regulated genes (Huber et al., 2002; Irizarry et al., 2003; Yang et al., 2001; Yang et al., 2002). In fact, many of the existing statistical methods for microarray analysis have been developed by using datasets, in which changes in gene expression are abundant, with many genes having a high magnitude of change that far exceeds the observed variability in expression (Ramaswamy et al., 2001). However, for many physiological and metabolic conditions, the changes in gene expression are often moderate compared to the array-wide variability in expression, thus leading to modest P-values, such that existing statistical models often miss most of the real changes (Mootha et al., 2003).
Therefore, we sought to develop an analytical approach that provides experimental biologists with a more thorough understanding of the statistical significance of any list of genes produced under conditions where the real changes are of modest magnitude but the expressional level is still high in both experimental conditions. An important issue is associated with the normalization of the relative expression of genes across a series of microarray experiments (Colantuoni et al., 2002). The normalization across arrays has been extensively used to minimize systematic variations in specific samples (Bolstad et al., 2003; Gautier et al., 2004; Huber et al., 2002; Irizarry et al., 2003; Yang et al., 2002). The selection of appropriate controls for normalization has been proposed for comparisons of expression levels across samples (Yang et al., 2002). A set of controls (microarray sample pool) with minimal sample-specific bias over a large intensity range was introduced to aid in intensity-dependent normalization. In Affymetrix GeneChip arrays, each gene is represented by a set of 1120 pairs of probes (perfect match and a mismatch), and the intensities for each probe set is summarized by the log scale robust multi-array analysis (RMA) (Irizarry et al., 2003). It has been reported (Li and Wong, 2001) that variation of a specific probe across multiple arrays could be considerably smaller than the variance across probes within a probe set. RMA method effectively accounted for this strong probe affinity effect, and consequently improved the ability to detect differentially expressed genes between samples.
Since using multiple arrays for normalization had improved the detection of differentially expressed genes between samples, we attempted a further development on gene-specific, multi-array standardization method using a large collection of expression data available from the public database. Our focus was to better understand the biological relevance of detected difference of gene expression, rather than improving the fluorescence intensity normalization that many of existing methods (e.g. RMA method) focused on. With the rapid increase of microarray expression data in the public database over the past few years, it has become possible to monitor the general expression level of a gene in diverse biological samples under various conditions. Through the creation of the database-wide expression profile for individual genes (or probesets), which allows an estimation of the gene-specific distribution of expression level in various experimental conditions, it is possible to standardize individual gene expression intensities in a specific assay by using their unique database-wide means and standard deviations. This consideration of gene's behavior in a wide variety of biological conditions gives us new insight on interpreting the expressional difference between given samples. In a biological point of a view, the expressional difference of a gene with a small DB-wide expressional variation should have more attention than those with large DB-wide variations. Retrieving a large amount of expression data available in the public database, NCBI GEO (Gene Expression Omnibus, http://www.ncbi.nlm.nih.gov/geo) (Barrett et al., 2005), we developed a web-based computational tool to apply GEO-wide means and standard deviations to re-standardizing individual gene expression levels in specific samples.
The organization of gene expression data in GEO are schematically presented in Figure 1. Submitted samples are assembled into biologically meaningful and statistically comparable GEO DataSets (GDS). Samples within a GDS refer to the same platform, i.e. a common set of elements are assayed. Each individual entity is assigned a unique and stable accession number; the accession number prefix indicates whether the record is a GEO Platform (GPL), Dataset (GDS) or Sample (GSM). GEO is the largest fully public repository for gene expression data. It currently holds over 70 000 sample data (GSMs) generated from more than 2000 different DNA chips (GPLs).
|
It is assumed that the gene expression data in the GEO database are deposited after log ratio transformation for dual channel data and log (signalbackground) transformation for single channel data. We tried to re-standardize single channel data based on the GEO-wide normal distribution model, N(µGPL,
GPL) where the mean (µGPL) and standard deviation (
GPL) of a gene were calculated from all the available expression data of the gene sharing the same GEO platform (i.e. DNA chip). This approach provides a unique gene-specific standardization based on the expression distribution of the gene in a collection of GDSs sharing the same platform. Data from dual channel experiments do not allow GEO-wide mean expression levels and standard deviations to be calculated for individual genes, since the deposited expression data are in the form of a ratio (R/G ratio) between a test and a reference set. Thus, the focus of this study was on the analysis of log (background-subtracted signal intensity) data from single channel experiments.
In order to calculate means and standard deviations from a large collection of heterogeneous datasets (GDSs), the individual array data must be locally standardized in advance to achieve a common scale. Thus, in practice, we attempted a two-step standardization procedure for single channel microarray data. Within-array standardization (array-specific Z-score calculation) was followed by the gene-specific multi-array standardization using the GEO-wide mean and standard deviation of individual genes (gene-specific Z-score calculation). For the demonstration, a GEO Dataset (GDS) including samples of normal cells and a breast cancer cell line was analyzed by the present two-step procedure. The possibility of obtaining meaningful information from the second step gene-specific standardization was investigated in comparison with the result from the one-step array-specific standardization. The second step standardization intrinsically prioritizes genes, which have small expressional variations in the database (small
GPL). Although, the differential expression of a gene, which has a large database-wide variation (large
GPL), may be underestimated by the present method, it is typically well recognized by conventional methods, such as direct t-tests and ranking tools provided from the GEO website. In this sense, the present method will be complementary to existing analytical tools, and thus contribute to maximize the utility of gene expression data deposited in the public database.
| 2 METHODS |
|---|
|
|
|---|
2.1 Microarray gene expression data acquisition
Microarray gene expression data were obtained from the Entrez Gene Expression Omnibus (GEO) ftp site (ftp.ncbi.nlm.nih.gov/pub/geo) (Barrett et al., 2005; Edgar et al., 2002). A set of single channel microarray data on normal and breast cancer cell lines were analyzed for the demonstration. The GEO Accession no. GDS817, which includes six samples (GSMs) is a collection of microarray experiments for the comparison of gene expression between breast cancer and normal epithelial cell lines (Fig. 1). The two samples in GDS817 are experiments done with the breast cancer cell line, HCC622. Another two involve a normal epithelial cell line. We analyzed the difference in gene expression between these two types of cell lines. The GDS817 experiments were carried out with the Affymetrix U95A DNA chip (GEO Accession no. GPL91 [NCBI GEO] ), which includes 12 651 probesets. GDS817 contains expression data records for only 12 625 probesets. Thus, our analysis was limited to this subset of probesets in GPL91 [NCBI GEO] . In addition to GDS817, a total of 72 additional GDSs sharing a common platform, GPL91 [NCBI GEO] , were retrieved from the current version of GEO for the gene-specific GEO-wide standardization procedure. In summary, expression data for 12 625 genes in a total of 1850 GSMs from 73 GDSs, which share a common platform, GPL91 [NCBI GEO] , were retrieved and analyzed in this study.
2.2 Gene specific large-scale analysis of gene expression (GS-LAGE)
It is a standard practice to correct for foreground intensities by background subtraction (Edwards, 2003). For single channel experiment data deposited in GEO, it is assumed that the values were submitted as normalized (scaled) signal count data [e.g. log (signalbackground) transformation]. However, in order to calculate the mean and standard deviation of the expression level of a gene using a large collection of datasets (GDSs) that were prepared and deposited by different research groups, the individual array data from GEO needed to be re-standardized to achieve a common scale. Thus, we first carried out within-array Z-transformation on each of all collected GEO samples by using the mean expression level and its standard deviation on the array basis, as below.
![]() |
GSM: the standard deviation of all genes in the given GSM.
ui is the standardized intensity of the ith gene in the given GSM. This GSM-based standardization process was repeated for all 1850 GSMs sharing the GPL91
[NCBI GEO]
platform, to permit comparisons between samples to be made. We next calculated the mean expression level and its standard deviation of each gene across arrays using the Z-transformed expression data from 1850 GSMs.
![]() |
µi,GPL is thus the GPL-wide mean of ith gene. The Z-transformed expression level (ui) in a specific assay (GSM) was next re-scaled by the model established from the GPL-wide mean and standard deviation.
![]() |
i,GPL: the standard deviation of GPL-wide expressions of ith gene.
The expression level, vi, of a gene thus represents the re-scaling based on the observation of its expressional behavior in a large collection of diverse experiments. It is assumed that the expressional variation of a gene in the database follows the normal distribution, N(µi,GPL,
i,GPL) when n is large. To confirm this assumption, we implemented an iterative procedure to remove outliers from the final µi,GPL and
i,GPL calculations. An outlier was defined as an expression level of which distance to the original µGPL is three times greater or smaller than the original
GPL. Then, final µi,GPL and
i,GPL values were used for chi square goodness of fit test between observed distribution and its ideal normal distribution. Genes whose expressional variation is significantly deviated from normal distribution (P < 0.05 in the chi square test) were removed from the second standardization (v calculation).
In this study, we compared ui and vi in identifying changes in gene expression between normal and breast cancer cells. ui is assumed to be a quantity provided by the original contributor, which only considers the expression data within the GDS for the preparation. The difference in normalized gene expression between normal and breast cells was calculated as below.
![]() |
avr(|
u|): the absolute average difference between u+ and u.
ui' represents the relative change in gene expression in comparison to the average change in the given assy. On the other hand, the relative change in gene expression was calculated using the
vi' value where the intensity level was re-scaled based on the gene-specific behavior in a wide variety of experiments.
![]() |
v|): the absolute average difference between v+ and v.
2.3 Validation using an Affymetrix spike-in study dataset
For the validation study, a dataset from the spike-in study by Affymetrix was retrieved from Affycomp website (http://affycomp.biostat.jhsph.edu/). In this dataset, Human cRNA fragments matching 16 probesets on the HGU95A GeneChip were added to the hybridization mixture of the arrays at concentrations ranging from 0 to 1024 pM. The same hybridization mixture, obtained from a common tissue source, was used for all arrays. The details of the spike-in data are found in the literature (Cope et al., 2004; Irizarry et al., 2003) and the website (http://affycomp.biostat.jhsph.edu/). The fluorescence intensities were normalized by RMA method. These RMA data were used for further analysis by the present two-step standardization method after log2 transformation.
2.4 Gene expression data analysis by GEO-provided tools
For comparison with the present method, we also analyzed the gene expression data between breast cancer and normal cell lines using GEO-provided analytical tools. Three methods were used to generate lists of probesets which showed significant higher expression levels in the breast cancer cell line than the normal epithelial cell line. The options were appropriately selected as below to include similar number of entries in the final lists. Here, A represents gene expression data on the breast cancer cell line (HCC1954) and B represents gene expression data on normal epithelial cell line.
- One-tailed t-test (A > B) (0.010 significance level) selected a total of 254 probesets.
- Query mean group A versus B by values (4-fold higher ) selected a total of 266 probesets.
- Query mean group A versus B by ranks (3-fold higher) selected a total of 204 probesets.
| 3 RESULTS |
|---|
|
|
|---|
Since the downloaded data for the demonstration were the normalized and combined Affymatrix data from GEO (GPL91 [NCBI GEO] ), the MA plot of test samples from GDS817 was already in a good shape (Fig. 2A). The present two-step standardization procedure was applied to these datasets in order to achieve a better resolution in detecting the modest expressional difference in genes with relatively small DB-wide standard deviation,
GPL. We first carried out within-array Z-transformation on each of all collected samples by using the mean expression level and its standard deviation on the assay basis. The resulting scatter plot (Fig. 2B) is same as the MA plot (Fig. 2A). After the first step, array-specific standardization, the difference in gene expression between normal and cancer cells showed a symmetrical distribution along the diagonal axis (Fig. 2B). After u+ and u are re-scaled via gene-specific normal distribution, i.e. N(µi,GPL,
i,GPL) for the ith gene, the relatively large deviation in gene expression between normal and breast cancer cells were shifted to the high intensity region (Fig. 2C). This observation suggests that genes with large µi,GPL have more variation in
GPL than those with small µi,GPL.
|
To further investigate the difference between Figure 2B and C, we plotted the mean (µGPL) and standard deviation (
GPL) of each gene expression in 1850 experiments (Fig. 3). These mean and standard deviation were used for the second standardization (v calculation). It has been well recognized that the variance of the measured spot intensities increases with their mean. The standard deviation increases roughly linearly with the mean (Huber et al., 2002). The present plot of the DB-wide analysis also indicates that expressional variation (i.e.
GPL in the plot) is increased as µGPL increase. In addition, the plot shows that the vertical distribution of
GPL becomes wider as the µGPL increases. This relatively large variation in
GPL values in the high µGPL region contributed to the difference of the plot shape between Figure 2B and C. For a given µGPL value, a large
GPL resulted in a small v value, while a small
GPL value resulted in a large v in the second standardization. This analysis confirms that the present model, based on N(µi,GPL,
i,GPL) provides an additional resolution particular to recognizing the expressional difference of genes with relatively small
GPL and high expressional intensity, i.e. large µGPL.
|
Figure 2A and B shows that the single step standardization actually represents the original normalization of Affymetrix GeneChip data. However, the second step standardization generated a substantial shift in the distribution from that of single step standardization of gene expression. From a biological point of view, a large expressional change between specific samples for a gene which has a large
GPL may be less meaningful than a moderate expressional change for a gene having a consistent expression in the database (low
GPL). Typical array-specific standardization and a consequent comparison of the intensity of gene expression between samples lack this kind of biological consideration. In this sense, our two-step analysis provides a unique tool for identifying additional genes with moderate changes between samples that are not highly prioritized by conventional assay-specific standardization methods.
We compared the performance of the two methods (i.e.
u' versus
v' scorings) in evaluating differences in gene expression between normal and breast cancer cells (Fig. 4A). A significant number of genes with low and high expression levels were evaluated differently by these two methods. The result shows that the utility of two-step standardization method is on identifying those genes in which the difference of expression between samples is underestimated by the single step, array-specific standardization method. On the upper boundary region of the distribution shown in Figure 4A, the
v' calculation gives up to 2-fold higher estimation than the
u' calculation for gene expression difference between samples. Among 10 792 test genes from GPL91
[NCBI GEO]
, 90.5% showed the discrepancy of less than 1.0 between two methods (i.e. |
v'
u'| < 1.0), while 9.5% of genes showed the discrepancy of greater than 1.0 (Fig. 4B). A total of 2.4% of genes showed the discrepancy of greater than 2.0 between two methods.
|
We also compared the result of the present method with that of RMA method. RMA method is developed for better normalization of fluorescence intensity data by accounting for probe affinity effect, while our present method is for providing better biological insight on interpreting the expressional difference of genes in the database that are assumed to have already been properly normalized. We thus applied the present method to the data that were already normalized by RMA method. For this comparative analysis, a dataset from the spike-in study by Affymetrix was used. Human cRNA fragments matching 16 probesets on the HGU95A GeneChip were added to the hybridization mixture of the arrays at concentrations ranging from 0 to 1024 pM. The same hybridization mixture, obtained from a common tissue source, was used for all arrays (See Methods section for details). The fluorescence intensity data were normalized by RMA method, and then the present two-step method was applied to the normalized data. Observed concentrations are comparatively plotted against nominal concentration (Fig. 5). In this analysis, the observed intensities are averaged at each nominal concentration value, resulting in a single mean curve. Since the log2 scale was applied to the concentrations, observed concentrations should be linear in true concentrations. We therefore fit a simple linear model to the scatterplot data and report the R2 coefficient. The result shows that the present two-step standardization (LAGE) data has a similar R2 coefficient with the original RMA data. This confirms that the additional standardization by N(µGPL,
GPL) does not change the linearity of the original data. However, the rank order of test genes by observed differential expressions (
v' and
RMA) between samples showed a large disagreement between the present method and RMA method (Table 1). When expressional difference between samples was compared among 14 genes, the RMA method provided a more consistent performance than the present method. These 14 test genes showed large variations in their µGPL and
GPL values. The variation in
RMA among 14 genes is random, i.e. no correlation with their
GPL. However, our present method gives additional weights to the expressional difference of those genes (e.g. 407_at) that have relatively small
GPL, while it gives a low significance on the expressional difference of genes with a large
GPL (e.g. 33818_at). We believe that this
GPL-dependent prioritization of gene expression difference improves the identification of previously unknown disease-related genes from database search.
|
|
For a demonstration of the usefulness of the present method, we compared the performance of the two-step method with those methods provided by GEO website (see Methods section for the detail) in evaluating differences in gene expression between normal and breast cancer cells. A total of 10 791 probesets were first ranked based on the
v' score. Then, a total of 100 top-ranked probesets were investigated if they were also prioritized by GEO-provided analytical methods. As a result, 22 probesets on the top ranking list on the
v' score were unique and not found in the GEO analysis reports, while 78 probesets were found in both
v'-ranking list and GEO analysis reports (Table 2). These two sets of genes (un-overlapped and overlapped with GEO lists) commonly showed relatively low µGPL and
GPL in comparison with those of total 10 791 probesets in GPL91
[NCBI GEO]
. However, the un-overlapped set of probesets showed a lower GPL-wide expression variation (average
GPL = 0.10) than the overlapped set (average
GPL = 0.12). This result shows that the consideration of DB-wide expressional variations (
GPL values) has contributed to the identification of 22 additional probesets that are hardly prioritized by other methods.
|
A total of 22 probesets (actually 20 different genes) that were exclusively found in the top ranking list by the two-step method needed to be further analyzed to determine if there are breast cancer-related genes that were not identified by GEO analysis reports. A subset of 12 probsets that showed large difference in the ranking between
u' and
v' are listed (Table 3). From a literature search, we found that 3 of these 12 genes were specifically elevated in cancer-related cells. For example the MRE11 (gene ID: AF073362
[GenBank]
)Rad50NBS1 complex is a cell cycle check point protein and tumor cells have defects in the cell cycle check point protein. It has been known that two components of the MRE11Rad50NBS1 complex, RAD50 and NBS1 are breast cancer susceptibility genes associated with genomic instability (Heikkinen et al., 2006) and the MRE11 gene is mutated in an ataxia-telangiectasia-like disorder (Stewart et al., 1999). Insufficient information was available to determine if other nine genes exclusively found on the top ranking list of
v' scoring are associated with breast cancer pathogenesis in the current NCBI database. Table 3 shows that the difference of expressional intensity between normal and breast cancer cell lines for these genes was estimated to be at least 1.5-fold higher in
v' measure than in
u' measure. It can be concluded that relatively low
GPL values compensate the moderate difference in expression between samples and consequently rank genes in a different order. Further experimental study remains to confirm the association of the selected nine genes with breast tumors.
|
The relative merit of the present method depends on its ability to successfully identify genes that are differentially expressed, while avoiding classifying highly fluctuating genes (i.e. genes with large
GPL) as being differentially expressed (i.e. their false positive or Type I Error rate). Since the false positive rate increases exponentially as the rank goes to the bottom (Norris and Kahn, 2006), medium-level fold changes (moderate
u') in gene expression were usually not considered for further experimental validation. In this sense, this new approach can enrich the hit list of genes in which expression difference between normal and breast cancer cells were moderate.
For public access to this two-step standardization method for GEO gene expression data, we constructed a user friendly web-based database, GS-LAGE (Gene Specific Large-scale Analysis of Gene Expression), which includes all single channel microarray experiments listed on the GEO database (Fig. 6). It can be accessed via http://compbio.sookmyung.ac.kr/~lage/index.html. It provides comparative values of
u' and
v' for each gene between user-selected experimental samples. It will provide a valuable tool for the in silico identification of previously unknown specific (or differential) gene expression patterns in disease-related samples.
|
| Acknowledgments |
|---|
The authors appreciate helpful and stimulating discussions with Dr. Young Ju Suh. This work was supported by the SRC/ERC program of MOST/KOSEF (R11-2005-017-01003-0) and by grant No.R01-2006-000-10515-0 from the Basic Research Program of the Korea Science & Engineering Foundation.,
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Joaquin Dopazo
Received on May 22, 2006; revised on September 7, 2006; accepted on September 30, 2006
| REFERENCES |
|---|
|
|
|---|
Aittokallio, T., et al. (2003) Computational strategies for analyzing data in gene expression microarray experiments. J. Bioinform. Comput. Biol, . 1, 541586[CrossRef][Medline].
Barrett, T., et al. (2005) NCBI GEO: mining millions of expression profilesdatabase and tools. Nucleic Acids Res, . 33, D562D566
Bolstad, B.M., et al. (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19, 185193
Breitling, R., et al. (2004) Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett, . 573, 8392[CrossRef][Web of Science][Medline].
Colantuoni, C., et al. (2002) SNOMAD (Standardization and NOrmalization of MicroArray Data): web-accessible gene expression data analysis. Bioinformatics, 18, 15401541
Cope, L.M., et al. (2004) A benchmark for Affymetrix GeneChip expression measures. Bioinformatics, 20, 323331
Edgar, R., et al. (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res, . 30, 207210
Edwards, D. (2003) Non-linear normalization and background correction in one-channel cDNA microarray studies. Bioinformatics, 19, 825833
Gautier, L., et al. (2004) Affyanalysis of Affymetrix GeneChip data at the probe level. Bioinformatics, 20, 307315
Heikkinen, K., et al. (2006) RAD50 and NBS1 are breast cancer susceptibility genes associated with genomic instability. Carcinogenesis, 27, 15931599
Huber, W., et al. (2002) Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics, 18, Suppl. 1, S96S104[Abstract].
Irizarry, R.A., et al. (2003) Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res, . 31, e15
Li, C. and Wong, W.H. (2001) Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc. Natl Acad. Sci. USA, 98, 3136
Mootha, V.K., et al. (2003) PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet, . 34, 267273[CrossRef][Web of Science][Medline].
Norris, A.W. and Kahn, C.R. (2006) Analysis of gene expression in pathophysiological states: balancing false discovery and false negative rates. Proc. Natl Acad. Sci. USA, 103, 649653
Ramaswamy, S., et al. (2001) Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl Acad. Sci. USA, 98, 1514915154
Stewart, G.S., et al. (1999) The DNA double-strand break repair gene hMRE11 is mutated in individuals with an ataxia-telangiectasia-like disorder. Cell, 99, 577587[CrossRef][Web of Science][Medline].
Yang, M.C., et al. (2001) A statistical method for flagging weak spots improves normalization and ratio estimates in microarrays. Physiol. Genomics, 7, 4553
Yang, Y.H., et al. (2002) Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res, . 30, e15
This article has been cited by other articles:
![]() |
J. D. Wren A global meta-analysis of microarray expression data to predict unknown gene functions and estimate the literature-data divide Bioinformatics, July 1, 2009; 25(13): 1694 - 1701. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Rodenburg, A. G. Heidema, J. M. A. Boer, I. M. J. Bovee-Oudenhoven, E. J. M. Feskens, E. C. M. Mariman, and J. Keijer A framework to identify physiological responses in microarray-based gene expression studies: selection and interpretation of biologically relevant genes Physiol Genomics, October 8, 2008; 33(1): 78 - 90. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||












