Bioinformatics Advance Access originally published online on September 22, 2005
Bioinformatics 2005 21(23):4255-4262; doi:10.1093/bioinformatics/bti684
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Noise and rank-dependent geometrical filter improves sensitivity of highly specific discovery by microarrays
Department of Neurological Sciences, Section of Neuro-Oncology, Rush University Medical Center 1725 West Harrison Street, Chicago, IL 60612, USA
| ABSTRACT |
|---|
|
|
|---|
Summary: MASH is a mathematical algorithm that discovers highly specific states of expression from genomic profiling by microarrays. The goal at the outset of this analysis was to improve the sensitivity of MASH. The geometrical representations of microarray datasets in the 3D space are rank-dependent and unique to each dataset. The first filter (F1) of MASH defines a zone of instability whose F1-sensitive ratios have large variations. A new filter (Fs) constructs in the 3D space rank-dependent lower and upper-bound contour surfaces, which are modeled based on the geometry of the unique noise intrinsic to each dataset. As compared with MASH, Fs increases sensitivity significantly without lowering the high specificity of discovery. Fs facilitates studies in functional genomics and systems biology.
Contact: hfathall{at}rush.edu
Supplementary information: http://www.rushu.rush.edu/neurosci/Fathallah.html
| INTRODUCTION |
|---|
|
|
|---|
The genomes of several organisms have recently been sequenced and the enthusiasm about the potential of microarrays has been intense (Schena et al., 1995; Lockhart et al., 1996; DeRisi et al., 1997). Microarray studies are increasingly being used to explore biological causes and effects and even to diagnose diseases; however, their data are very noisy and the patterns of expression and molecular signatures of microarrays may not be reproducible (Kothapalli et al., 2002; Ntzani and Ioannidis, 2003; Tan et al., 2003).
MASH is a mathematical algorithm that yields highly specific states of genetic expression (up or downregulation) from the genome-scale profiling of two samples (Fathallah-Shaykh et al., 2004). The term highly specific refers to the high specificity of states of genetic expression discovered by MASH. Specifically, the false discovery rates of MASH and MIDAS in same-to-same comparisons using the 19K microarrays are 1/192 000 and 1347/192 000 measurements, respectively. MASH specificity is significantly better, but its sensitivity is equal to MIDAS. My goals at the outset of this analysis were to better understand the noise and to generate a new algorithm that significantly improves the sensitivity of MASH without lowering its high specificity. Interestingly, sensitivity is not only dependent on the analytical method but also on the quality of the dataset (Fathallah-Shaykh et al., 2004).
| MATERIALS AND METHODS |
|---|
|
|
|---|
Microarrays
Normal brain RNA is obtained by pooling RNA from human occipital lobes harvested from four individuals with no known neurological disease whose brains are frozen <3 h post mortem. Tumor RNAs, isolated from 35 surgical gliomas, 10 surgical meningiomas and 6 cultured glial cell lines, are profiled as compared with aliquots from the same normal brain RNA (Fathallah-Shaykh et al., 2003, 2002, 2004; Fathallah-Shaykh, 2005a). The quality of RNA is assayed by gel electrophoresis, only high quality RNA is processed. Microarray experiments use 1.7K (1920 genes) and 19K (19 200 genes) microarrays containing cDNAs spotted in duplicates (Ontario Cancer Institute, Ontario, Canada). The design, which includes dye swapping as described elsewhere, generates four replicate measurements per gene and sample (Fathallah-Shaykh et al., 2003, 2002). Each slide contains two replicate adjacent spots. The Cy3/Cy5 design generates two ratios. The Cy5/Cy3 design generates two additional ratios. The total is four replicate ratios with dye-swapping. RNA used in spike-in experiments is transcribed from the same Arabidopsis cDNA spotted on microarray slides.
Analysis
The mathematical analysis is performed using functions written in Matlab (Mathworks, Natick, MA). Fs is freely available to academics for non-commercial use. To obtain executable software, please send a request by e-mail to Fs_request{at}rush.edu.
| RESULTS |
|---|
|
|
|---|
Definitions
The state of genetic expression of a spot in sample A versus sample B assayed by cDNA arrays is measured by the ratio of the background-subtracted intensities of sample A/background-subtracted intensities of sample B. A ratio >1 (log2 > 0) implies upregulation of the gene in sample A as compared with B. The terms genes, spots, symmetrical, rank, and spot order are defined using the 1.7K arrays; there terms are also applicable to other microarrays. The 1.7K microarray contains 1920 cDNAs or controls, here referred to as genes, spotted in duplicated to a total of 3840 spots. The term symmetrical refers to the two images, corresponding to the Cy3 and Cy5 fluorescent dyes, generated from a single microarray slide. Background-subtracted spot intensities are sorted in ascending order to assign a rank to every spot. For instance, a spot whose rank is 3000 has a higher background-subtracted spot intensity than all spots whose ranks are <3000. A microarray Spot Order (SO) is a listing of its spots sorted by their ranks. A cDNA spotted slide generates two spot orders, SO1 and SO2, which correspond to Cy3 and Cy5, respectively.
Dye swapping refers to experiments where the Cy3 and Cy5 dyes are reversed between the two samples; they are performed to annul confounding variables introduced by heterogeneous fluorescence of the Cy5 and Cy3 molecules. Each microarray slide yields of a set of symmetrical Cy3/Cy5 images that generate two replicate ratios. Each dye swapping dataset generates four replicate ratios.
The datasets and rationale
The true negative datasets compare the same pool of brain RNA with itself (same-to-same). The goal of the same-to-same comparisons is to collect experimental noise (technical artifacts) independent of biological heterogeneity. In this design, normalized expression ratios
1 (log2
0) are false positive (noise) because the Cy3/Cy5 symmetrical images contain identical genetic information. The artifactual measurements may be caused by several factors including slide-to-slide differences, variations in the reverse transcription reactions, hybridization, labeling and laser. The same-to-same comparisons include 18 and 20 experiments that generate a total of 9 and 10 dye swapping datasets using the human 1.7K and 19K microarrays, respectively. The experiments are paired by consecutive order. The goal of the new algorithm is to filter the largest number of same-to-same expression ratios originating from technical noise. Ideally, as compared with MASH the new algorithm should discover a smaller number of genes as being differentially expressed in the same-to-same design.
The 1.7K microarray includes 64 genes of Arabidopsis cDNA. The true positive datasets include four sets of spike-in dye swapping experiments using 1.7K microarrays, where 1 ng of Arabidopsis RNA is added to one sample but not the other. In this design, all 64 genes of Arabidopsis cDNA serve as true positives. The sensitivity of MASH is 26/64 [41%, (Fathallah-Shaykh et al., 2004)]. Ideally, the new algorithm is expected to discover all 64 Arabidopsis genes as being differentially expressed.
MASH summary
MASH includes two filters, F1 and F2. A spot is sensitive to F1 if both its symmetrical ranks in SO1 and SO2 are less than the Cutoff Rank (CR). To be resistant to F1, either Cy3 or Cy5 images of the spot must contain enough signals such that at least one of the symmetrical ranks is larger than the CR. The latter is computed empirically from the slopes of the ranking curves (Fathallah-Shaykh et al., 2004).
The second filter (F2) of MASH consists of two rules. The first Rule (F2a or f4) necessitates that all four replicate ratios consistently show up or downregulation; i.e. all four replicate log2(ratios) > 0 or all four < 0. The second Rule of F2 (F2b) necessitates that all four replicate F2b-resistant log2(ratios) must be outside the interval of ± 3 * the largest standard deviations of all F1-resistant log2(ratios). Genes sensitive to either f4 or F2b are filtered by transforming their mean log2(ratio) to 0.
Heterogeneous geometrical distributions of noise
The same-to-same datasets consist of errors in measurement generated by technical noise. Figure 1a and d plot the distributions of same-to-same 19K and 1.7K microarray datasets in the 3D space defined by (1) ranks in SO1 (x-axis), (2) ranks in SO2 (y-axis), and (3) log2(ratios) (z-axis), respectively. If the experimental system generates no noise, the z-axis coordinates would all be equal to 0. Large positive or negative log2(ratios) reflect large errors (red arrows). The findings reveal that the distributions of noise in the 3D space are heterogeneous because each microarray dataset generates its unique geometrical structure, which differs between datasets. Specifically, the z-axis variations of log2(ratios) about 0 are rank-dependent and unique to a specific dataset (Fig. 1, black arrows).
|
Zone of instability
Figure 1a and 1d also reveal that the distributions in the 3D space include zones of instability where the log2(ratios) have large absolute values (red arrows). Figure 2a plots the projections of the distribution of noise of Figure 1a onto the 2D space defined by ranks in SO1 (x-axis) and log2(ratios) (y-axis). Figure 2b plots the projections of the distribution of noise of Figure 1a onto the 2D space defined by ranks in SO2 (x-axis) and log2(ratios) (y-axis), respectively. To visualize the rank-dependent behavior of noise, the spots are colored by their ascending ranks (Fig. 2). As in Figure 1a, Figure 2a and b also reveal that the distributions of noise include zones of instability, where log2(ratios) have large values (large errors; red arrows).
|
Figures 1 and 2 suggest that the zone of instability is dependent on the ranks in both SO1 and SO2. For example, the spots of Figure 2a and b whose ranks in SO1 and SO2 are both <10 000 generate unstable or large log2(ratios). The number 10 000 is unique to the dataset plotted in Figure 2; other datasets may be associated with different ranks. Since the first filter of MASH (F1) excludes spots whose ranks in SO1 and SO2 are both smaller than the CR, the question arises whether F1 defines the zone of instability.
A spot is resistant to F1 if either one of its ranks in either SO1 or SO2 is larger than the CR. To study the effects of F1 on the zone of instability, it is applied to filter the data shown in Figures 1a, d, 2a and b. Figure 1b and e plot the log2(ratios) of F1-resistant spots of Figure 1a and d, respectively. Figure 2c and d plot the log2(ratios) of F1-resistant spots of Figure 2a and b, respectively. Figure 2e reveals that the standard deviations of F1-sensitive log2(ratios) (blue; spots filtered by F1) are 510 folds larger than F1-resistant log2(ratios) (green; spots not filtered by F1). The findings support the conclusions that (1) spots whose symmetrical ranks are both less than the CR generate a zone of instability containing large errors of measurement [large absolute log2(ratios)], and (2) F1 filters the zone of instability.
Mathematical modeling of noise
Next, I study the effects of F1 on the distributions of different-to-different datasets. As in the geometrical distributions of same-to-same datasets, the distributions of the different-to-different datasets (1) are heterogeneous between datasets and rank-dependent, and (2) include F1-sensitive zones of instability (Fig. 3). However, unlike the same-to-same design, where any log2(ratio) different than 0 is false positive, the distinction between true and false positive ratios in different-to-different comparisons is not evident.
|
Figures 2 and 3 reveal that F1 deletes the zone of instability, which includes a large portion of the data. The goal of the next section is to devise a method that models the noise intrinsic within each dataset in such a way that the method (1) is applicable to all datasets despite their heterogeneity (Fig. 1), and (2) contours the zone of instability instead of deleting it.
Geometrical filter Fs
The filter f4 (F2a) was devised in the glioma study (Fathallah-Shaykh et al., 2002). A gene is resistant to f4 if its four replicate log2(ratios) are all positive or all negative (consistently showing up or downregulation). A gene is sensitive to f4 if all four replicate log2(ratios) are not of the same sign. Because the false negative rate of f4 is only 1.6%, the predominant majority of f4-sensitive spots are false positive or noise.
Figures 1c, 1f and 3c plot the f4-sensitive spots of the datasets shown in Figures 1a, 1d and 3a. Interestingly, the findings reveal that the geometrical distribution of f4-sensitive spots (noise) replicates the unique geometrical distribution of all the spots in the dataset. This is not surprising considering that (1) in same-to-same experiments any log2(ratio) different than 0 is false positive, and (2) only a small fraction of different-to-different datasets is truly differentially expressed. Most importantly, because the geometrical structures/distributions created by f4-sensitive noise are independent of the spots that are truly differentially expressed, they will serve as a platform for constructing a new filter.
Each dataset generates two geometrical structures in the 3D spaces generated by the log2(ratios) of (1) all the spots (Figs 1a, d and 3a), and (2) the f4-sensitive spots (spots filtered by f4; Figs 1c, f and 3c). The distribution of the latter represents noise intrinsic to each dataset. In reality, the two distributions are interwoven in the same space. However, for practical purposes, the spaces/geometrical distributions of all spots in the dataset and f4-sensitive spots will be referred to as G and G4, respectively (Fig. 4a and b).
|
The rationale of the new filter (Fs) is based on the idea that a method that filters all f4-sensitive spots in G4 (G4, see Fig. 4b) will lead to a high degree of certainty that the unfiltered spots of G are true (Fig. 4a). Fs consists of upper and lower bound contour surfaces that are patterned based on the geometrical structure of G4. These contour surfaces will set the upper and lower bound z-axis limits at specific ranks such that any spot of G that maps outside these bounds is true to a high degree of certainty.
The geometrical structures in G and G4 consist of spots whose x-, y- and z-axis coordinates are the ranks in SO1, SO2 and log2(ratio), respectively (Figs 14). For every spot k of coordinates (xk, yk, zk) in G (Fig. 4a), the algorithm applies a square-column (Ck) within G4 such that (1) the column is parallel to the z-axis, and (2) the center of the square maps at the coordinates (xk, yk, zk) (Fig. 4b). Since the log2(ratios) of the spots present within each column reflect the local variability of noise at ranks xk and yk, they are isolated and their standard deviation and the means of the positive and negative log2(ratios) are computed (Fig. 4c and d). The algorithm increases the width of the square column until it includes a minimum of 100 spots. The optimal number of spots isolated by Ck is varied and computed empirically when the algorithm is completed; 100 spots optimize the false discovery rate.
Let sd be the standard deviation of all log2(ratios) isolated by column Ck. Let µp and µn be the means of their positive and negative log2(ratios), respectively (Fig. 4d). At every spot of coordinates (xk, yk, zk) in G, the upper and lower limits are set at spots in G having the following coordinates:
- An upper-bound limit at (xk, yk, µp + n * sd).
- A lower-bound limit at (xk, yk, µn n * sd).
- A spot is filtered if its log2(ratio) maps within the 3D space bound by the upper and lower contour surfaces. Alternatively, a spot is resistant if its log2(ratio) maps above the upper-bound surface or below the lower-bound surface.
- A gene is resistant if all of its four replicate spots are resistant to the rule above. The log2(ratio) of a sensitive gene is transformed to 0.
|
In theory, the variable n determines both specificity and sensitivity. For example, if n is very large, one expects sensitivity to be low because (1) the z-axis limits of the contour surfaces will also be large, and (2) the space between the contour surfaces will include all log2(ratios). However, if n is small, specificity could be low because the contour surfaces may not include all the noise. The goal is to find a value of n that yields optimal specificity and sensitivity. Specificity will be assayed as 1 the false discovery rate of Fs in same-to-same comparisons. Ideally, Fs should filter all same-to-same log2(ratios). Sensitivity will be assayed by percent discovery of Arabidopsis genes in different-to-different spike-in experiments (see above). Ideally, Fs should discover all the Arabidopsis genes.
Optimizing n and comparing the sensitivity and specificity of Fs to MASH
The goal is to compare the specificity of MASH to Fs while varying n within the interval [2,6]. MASH consists of F1 + f4 (F2a) + F2b. The false discovery rate is computed from nine 1.7K and ten 19K same-to-same experiments (Fig. 6a and b). The results reveal that the specificity of Fs alone is as high as MASH for n
3 (Fig. 6a and b and Table 1).
|
|
Sensitivity is assayed by the percentage of Arabidopsis genes discovered from the best of four replicate spike-in experiments, where 1 ng Arabidopsis RNA is added to one RNA sample but not the other (Fig. 6c and d). The following filter combinations are applied: (1) Fs alone, and (2) F1 and Fs (Fig. 6c). As compared with MASH, Fs improves the best sensitivity from 41 to 91% at n = 3. However, adding F1 and f4 to Fs lowers the sensitivity to 86% at n = 3. In addition, Figure 6d demonstrates that, as compared with MASH, the increase in sensitivity of Fs at n = 3 is statistically significant in all four replicate spike-in experiments (P = 0). These findings support the conclusion that Fs at n = 3 significantly improves sensitivity without lowering the high specificity of MASH. Receiver Operating Characteristics (ROC) is the standard approach to evaluate the sensitivity and specificity of diagnostic procedures (Swets and Pickett, 1992). MASH and Fs at n = 2, 2.5 and 3 generate the empiric ROC areas of 0.703, 0.96, 0.96 and 0.95, respectively. The accuracy rates are 99.8%, 99.9%, 100% and 100%, respectively (Table 1).
Fs is also applied to analyze four same-to-same datasets of Rosenzweig et al. (2004). Each dataset includes 710 genes spotted in duplicates to a total of 1420 spots. The false discovery rates per 2840 genes are 4, 2 and 0 at n = 2, 4 and 36, respectively. The findings demonstrate that Fs is also effective in filtering the noise of datasets acquired in independent laboratories.
| DISCUSSION |
|---|
|
|
|---|
The findings reveal that microarray datasets are heterogeneous (Figs 13). This heterogeneity is reflected by their geometrical structured in the 3D space, whose axes are the ranks in SO1 and SO2 and the log2(ratios). Specifically, this geometry/distribution (1) is unique to each dataset (Figs 1 and 3), (2) includes a zone of instability, whose F1-senstitive spots generate large errors (Fig. 13), and (3) displays rank-dependent variability of log2(ratios). Interestingly, the f4-sensitive spots intrinsic to each dataset (1) replicate the geometry/distribution of all spots in the dataset (Figs 1 and 3) and (2) are independent of the genes that are differentially expressed. This new algorithm constructs rank-dependent upper- and lower-bound contour surfaces that are patterned based on the geometrical structure of f4-sensitive spots (Fig. 4).
The zone of instability is generated by ratios whose ranks are both less than the CR. This finding is consistent with the results of Baggerly et al. (2001) who report that ratios computed from spots containing a small amount of total signal are highly variable, whereas ratios derived from spots containing large amount of total signal are fairly stable. The zone of instability (Figs 1 3) may also explain the results of Tan et al. (2003) who demonstrate poor reproducibility of states of genetic expression across different platforms.
Sensitivity is a function of measurable quality parameters; specifically, it is negatively correlated with the Noise Factor (Fathallah-Shaykh et al., 2004). In addition, poor data quality has a negative impact on the efficient detection of low-level regulated genes (Raffelsberger et al., 2003). Specifically, the distributions of false positive ratios vary between datasets; poor quality datasets contain large false positive ratios (Fathallah-Shaykh et al., 2004). Therefore, the degree of confidence that a low- or moderate-level expression ratio is true is dependent, not only on the analytical methods, but also on the unique distribution of noise in that specific dataset. Thus, to annul the confounding effects of data quality on sensitivity, the true positives of this study are designed to represent large differentials generated by adding Arabidopsis RNA to one sample but not the other. The specificity and sensitivity of the algorithm are optimal at n = 3. Values of n > 3 yield lower sensitivity (Fig. 6c) and values of n < 3 yield lower specificity (Fig 6a and b). The z-axis positions of the contour surfaces are dependent on the standard deviation of the local (neighborhood) noise isolated by the square columns at specific ranks. The theory developed in this paper leads to a test (Fs) that compares the geometrical structures of distributions in the 3D space. Specifically, Fs divides the space into small subspaces (neighborhoods) and constructs contour surfaces whose z-axis variance is based on the local distributions at specific ranks (Fig. 4).
The first filter of MASH, F1, is stochastic; in addition, the position of the CR is computed empirically (Fathallah-Shaykh et al., 2004). However, the analysis detailed in this paper generates the mathematical basis for F1; specifically, F1 filters the zone of instability (Figs 13). It is not surprising that Fs is more sensitive than F1 (Fig. 6c); specifically, instead of deleting the zone of instability, Fs generates upper- and lower-bound contour surfaces around it (Figs 1, 2 and 5). Fs, MASH and MIDAS were applied to analyze the same datasets. The specificity of Fs at n = 3 is similar to MASH (Table 1 and Fig. 6), whose specificity is significantly better than MIDAS (Fathallah-Shaykh et al., 2004). MIDAS includes the Locfit (LOWESS) normalization (Quackenbush, 2002; Yang IV et al., 2002), standard deviation regularization (Yang YH et al., 2002), iterative linear regression normalization (Quackenbush, 2002), iterative log mean centering normalization (Causton et al., 2003), ratio statistics normalization and confidence interval checking (confidence range at 99%) (Chen et al., 1997), standard deviation regularization, low intensity filter, slice analysis (Quackenbush, 2002; Yang IV et al., 2002), and flip dye consistency checking (Yang YH et al., 2002; Quackenbush, 2002).
Unlike other methods this analysis does not (1) assume linearity in the error model, (2) correlate levels of transcripts to signal levels, or (3) address the question of accuracy of fold-changes in gene expression (Newton et al., 2001; Theilhaber et al., 2001; Yang et al., 2002; Huber et al., 2002; Goryachev et al., 2001; Bolstad et al., 2003; Irizarry et al., 2003). The goal is to discover the genes that are up or downregulated between the samples to a high degree of certainty. The results reveal that the geometrical distributions of f4-sensitive spots (noise) in the 3D space are non-linear (Figs 13). Nonetheless, because the distribution of f4-sensitive spots models the distribution of all spots in the dataset (Figs 1 and 3), the algorithm builds contour upper- and lower-bound surfaces based on the distributions of f4-sensitive spots (Figs 4 and 5). Herein, the datasets are normalized by the non-linear method described elsewhere (Fathallah-Shaykh et al., 2004). Colantuoni et al. (2002) have also described methods for local normalization by non-linear transformations.
Fs is applicable to 2-color (2-channel) microarray data with dye-swapping replicates. Highly specific discovery of states of genetic expression has immediate and numerous applications; specifically, it generates testable hypotheses in biology and medicine that uncover molecular systems behind biological phenotypes. Examples include the phenotypes of resistance to oxidative stress and motility in cultured glioma and ectopic calcification in meningioma (Fathallah-Shaykh et al., 2003; Fathallah-Shaykh, 2005a, b).
Conflict of Interest: None declared.
Received on July 27, 2005; revised on September 17, 2005; accepted on September 20, 2005
| REFERENCES |
|---|
|
|
|---|
Baggerly, K., et al. (2001) Identifying differentially expressed genes in cDNA microarray experiments. J Comput. Biol, . 8, 639659[CrossRef][Web of Science][Medline].
Bolstad, B.M., et al. (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19, 185193
Causton, H.C., Quackenbush, J., Brazma, A. Microarray Gene Expression Data Analysis: A Beginner's Guide, (2003) Blackwell Publishing, pp. 5556.
Chen, Y., et al. (1997) Ratio-based decisions and the quantitative analysis of cDNA microarray images. J. Biomed. Optics, 2, 364374[CrossRef].
Colantuoni, C., et al. (2002) SNOMAD (Standardization and NOrmalization of MicroArray Data): web-accessible gene expression data analysis. Bioinformatics, 18, 15401541
DeRisi, J.L., et al. (1997) Exploring the metabolic and genetic control of gene expression on a genomic scale. Science, 278, 680686
Fathallah-Shaykh, H.M. (2005a) Genomic discovery reveals a molecular system for resistance to ER and oxidative stress in cultured glioma. Arch. Neurol, . 62, 233236
Fathallah-Shaykh, H.M. (2005b) Logical networks inferred from highly specific discovery of transcriptionally regulated genes predict protein states in cultured gliomas. Biochem. Biophys. Res. Comm, . 336, 12781284[CrossRef][Web of Science][Medline].
Fathallah-Shaykh, H.M., et al. (2002) Mathematical modeling of noise and discovery of genetic expression classes in gliomas. Oncogene, 21, 71647174[CrossRef][Web of Science][Medline].
Fathallah-Shaykh, H.M., et al. (2003) Genomic expression discovery predicts pathways and opposing functions behind phenotypes. J. Biol. Chem, . 278, 2383023833
Fathallah-Shaykh, H.M., et al. (2004) Mathematical algorithm for discovering states of expression from direct genetic comparison by microarrays. Nucleic Acids Res, . 32, 38073814
Goryachev, A.B., et al. (2001) Unfolding of microarray data. J. Comp. Biol, . 8, 443461.
Huber, W., et al. (2002) Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics, 18, S96S104[Abstract].
Irizarry, R.A., et al. (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 4, 249265[Abstract].
Kothapalli, R., et al. (2002) Microarray results: how accurate are they? BMC Bioinformatics, 3, 22[CrossRef][Medline].
Lockhart, D.J., et al. (1996) Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol, . 14, 16751680[CrossRef][Web of Science][Medline].
Metz, C.E. (1986) Methodology in radiologic imaging. Invest. Radiol, . 21, 720733[Web of Science][Medline].
Newton, M., et al. (2001) On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J. Comp. Biol, . 8, 3752.
Ntzani, E.E. and Ioannidis, J.P. (2003) Predictive ability of DNA microarrays for cancer outcomes and correlates: an empirical assessment. Lancet, 362, 14391444[CrossRef][Web of Science][Medline].
Obuchowski, N.A. (2003) Receiver operating characteristic curves and their use in radiology. Radiology, 229, 38
Quackenbush, J. (2002) Microarray data normalization and transformation. Nat. Genetics, 32, Suppl., 496501.
Raffelsberger, W., et al. (2003) Quality indicators increase the reliability of microarray data. Genomics, 80, 385394.
Rosenzweig, B.A., et al. (2004) Dye bias correction in dual-labeled cDNA microarray gene expression measurements. Environ. Health Perspect, . 112, 480487[Web of Science][Medline].
Schena, M., et al. (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science, 270, 467470
Swets, J.A. (1979) ROC analysis applied to the evaluation of medical imaging techniques. Invest. Radiol, . 14, 109121[Web of Science][Medline].
Swets, J.A. and Pickett, R.M. Evaluation of Diagnostic Systems: Methods from Signal Detection Theory, . (1992) , New York Academic Press.
Tan, P.K., et al. (2003) Evaluation of gene expression measurements from commercial microarray platforms. Nucleic Acids Res, . 31, 56765684
Theilhaber, J., et al. (2001) Bayesian estimation of fold-chnages in the analysis of gene expression: the PFOLD algorithm. J. Comp. Biol, . 8, 585614.
Yang, I.V., et al. (2002) Within the fold: assessing differential expression measures and reproducibility in microarray assays. Genome Biol, . 3, research0062[Medline].
Yang, Y.H., et al. (2002) Normalization of cDNA microarray data; a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res, . 30, e15
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





