Bioinformatics Advance Access originally published online on November 15, 2007
Bioinformatics 2008 24(5):666-673; doi:10.1093/bioinformatics/btm507
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Genome-wide co-expression based prediction of differential expressions
Department of Statistics and Biostatistics Center, The George Washington University, 2140 Pennsylvania Avenue, NW Washington, DC 20052, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Microarrays have been widely used for medical studies to detect novel disease-related genes. They enable us to study differential gene expressions at a genomic level. They also provide us with informative genome-wide co-expressions. Although many statistical methods have been proposed for identifying differentially expressed genes, genome-wide co-expressions have not been well considered for this issue. Incorporating genome-wide co-expression information in the differential expression analysis may improve the detection of disease-related genes.
Results: In this study, we proposed a statistical method for predicting differential expressions through the local regression between differential expression and co-expression measures. The smoother span parameter was determined by optimizing the rank correlation between the observed and predicted differential expression measures. A mixture normal quantile-based method was used to transform data. We used the gene-specific permutation procedure to evaluate the significance of a prediction. Two published microarray data sets were analyzed for applications. For the data set collected for a prostate cancer study, the proposed method identified many genes with weak differential expressions. Several of these genes have been shown in literature to be associated with the disease. For the data set collected for a type 2 diabetes study, no significant genes could be identified by the traditional methods. However, the proposed method identified many genes with significantly low false discovery rates.
Availability: The R codes are freely available at http://home.gwu.edu/~ylai/research/CoDiff, where the gene lists ranked by our method are also provided as the Supplementary Material.
Contact: ylai{at}gwu.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
| 1 INTRODUCTION |
|---|
|
|
|---|
Since the microarray technology was introduced (Lockhart et al., 1996; Schena et al., 1995), it has been widely used in biological and medical studies (Golub et al., 1999; Spellman et al., 1998). Microarray is an experimental method by which thousands of genes can be printed on a small chip. With microarrays, we can monitor gene expressions at a genomic level. Novel disease-related genes may be identified when microarrays are used to study samples from different groups, such as normal/disease groups or groups at different disease stages (Mootha et al., 2003; Singh et al., 2002). These data also allow us to perform disease classifications at a molecular level (Golub et al., 1999; van't Veer et al., 2002).
When microarrays are used to identify disease-related genes, we usually conduct two-sample or multi-sample statistical tests simultaneously for a huge number of variables (genes). Therefore, it is necessary to consider the adjustment for multiple hypothesis testing (Benjamini and Hochberg, 1995; Storey and Tibshirani, 2003). Although we can simultaneously monitor a huge number of genes, the sample size is usually small due to the relatively expensive costs of microarrays. Furthermore, since microarray is a relatively new technology, it still has considerable system noises due to various experimental factors. Because of these limitations, it is usually difficult to control false positives and to identify disease-related genes with weak differential expressions (Mootha et al., 2003).
Based on the exploration of microarray data, many statistical methods have been proposed to improve the detection of differential expressions. These include several different generalized t/F-statistics (Baldi and Long, 2001; Cui and Churchill, 2003; Cui et al., 2005; Lai, 2006; Wu, 2005, 2006; Tusher et al., 2001), a likelihood ratio test (Wang and Ethier, 2004) and other approaches (Guan and Zhao, 2005). Each of these methods has been shown to provide a satisfactory performance under certain situations. These methods are all univariate. However, it is well known that genes interact with each other during cellular and molecular processes. Disease-related genes are not isolated. Furthermore, to understand gene interactions, we can use microarray data to measure the co-expressions among genes at the mRNA level. The genome-wide co-expression information is useful in the detection of disease-related genes with relatively weak differential expressions, since these genes are expected to co-express with many other differentially expressed genes. Therefore, we expect to further improve the detection of disease-related genes if an efficient statistical method can be developed to incorporate the genome-wide co-expression information into differential expression analysis. However, this issue has not been well addressed in the literature although the co-expression analysis has been widely studied (Lee et al., 2004).
Recently, Storey et al. (2007) proposed an optimal discovery procedure for large-scale significance testing. Their method evaluates the differential expression of a gene with the consideration of information from the other genes. However, the co-expression information is not explicitly considered in the procedure. Tibshirani and Wasserman (2006) proposed a correlation-sharing method for detecting differential expressions. Their method evaluates differential expressions through the correlation-sharing based maximization procedure: for a fixed gene X, its neighbors can be first defined as these genes (including X itself) with absolute correlations (with X) greater than a given threshold value; then, the average of differential expression measures of these neighbors can be obtained; the maximal average is reported after the threshold value is screened from 0 to 1. However, since the correlation is only used to rank genes, the magnitude of co-expression is not explicitly considered in the procedure.
In this study, we incorporate the genome-wide co-expression information into the differential expression analysis through a local regression technique that has been widely used in microarray data pre-processing. In this way, the magnitude of co-expression can be considered when we evaluate the differential expression of a gene. In the following sections, we first introduce a motivating example and then propose the genome-wide co-expression based prediction of differential expressions. Two published microarray gene expression data sets are used for illustrations. Finally, we discuss the advantages and disadvantages of our method, and propose several future research topics.
| 2 METHODS |
|---|
|
|
|---|
2.1 A motivating example
Before introducing our method, we first provide a motivating example. Figure 1 shows the scatter plots for six genes based on our first experimental data set: the three genes in the upper panel have the most significant differential expressions and the three genes in the lower panel have the most insignificant differential expressions. The traditional Student's t-test was used to measure the differential expressions (observed) of all genes. For a given gene, we used the Pearson's correlation coefficient to measure the co-expressions between this gene and all the other genes. Then, both the differential expression and co-expression measures were modified so that we could focus on non-negative correlations. Therefore, for the given gene, there is a pair of modified differential expression and co-expression measures from another gene, and a scatter plot can be generated based all these pairs. Figure 1 shows that, for each gene, the LOWESS (a local regression technique) curve can be used to model the relationship between the differential expression and co-expression measures, and then it can be extended to predict the differential expression of the given gene. It is surprising that the observed and predicted differential expression measures are very close.
|
Since the differential expressions of these six genes are either very strong or very weak, they are likely to be truly differentially expressed or truly non-differentially expressed. Figure 1 suggests the feasibility of genome-wide co-expression based prediction of differential expressions. In the Results section, we will further show more advantages of our method. In addition to its improved control of false positives, our method can also detect disease-related genes with weak differential expressions.
Figure 1 also suggests that the prediction of differential expressions depends on the smoothness of LOWESS curve, which is determined by the proportion of nearest neighbors used for the local regression. More technical details are described as follows.
2.2 Data sets
We consider two published microarray gene expression data sets to illustrate our method. The first one was collected for a prostate cancer study (Singh et al., 2002). There are 50 normal and 52 cancerous samples in the data set, which contains expression measurements for 6034 genes after data pre-processing (Singh et al., 2002). When the traditional univariate methods are used for detecting differentially expressed genes, many genes can be identified after multiple comparison adjustment (Benjamini and Hochberg, 1995; Storey and Tibshirani, 2003). These include gene hepsin, which is a well-known prostate cancer-related gene (DeMarzo et al., 2003), and other novel genes. However, several other well-known disease-related genes, such as genes GSTP1 and AMACR (DeMarzo et al., 2003), cannot be identified on the top list.
The second data set was collected for a type 2 diabetes study (Mootha et al., 2003). There are 17 normal and 18 diabetic samples in the data set, which contains expression measurements for 10 983 genes after data pre-processing (Mootha et al., 2003). When the traditional univariate methods are used for detecting differentially expressed genes, no gene can be identified after multiple comparison adjustment. However, Mootha et al. (2003) introduced a gene set enrichment analysis, which identified several clusters of genes related with type 2 diabetes.
2.3 Data transformation
We have proposed a mixture normal quantile-based data transformation method (Lai et al., 2004). Based on our experiences, this method can generally improve the agreement between the underlying distribution of data and normality. Furthermore, it can reduce the impact from outliers, especially in the correlation-related analyses (data not shown). Let n be the total sample size (combining all different sample groups). The procedure of this method is described as follows.
- Rank all observations for each gene X in an increasing order: {x1, x2, ... , xn}
{r1, r2, ... , rn};
- Construct a mixture cumulative distribution function (c.d.f.) from two normal distributions:
, where
,
and
are the sample proportion, sample mean and sample variance in the j-th group, respectively, j = 1, 2;
- Perform the inverse transformation t =
–1[r/(n + 1)] through simulations: {r1, r2, ... , rn}
{t1, t2, ... , tn}.
2.4 Measures for differential expression and co-expression
In this study, we focus on two-sample comparisons. We consider the traditional Student's t-test for measuring differential expressions. Let n1 and n2 be the sample sizes of the first and second groups, respectively, n1 + n2 = n. For a gene X, we use
and
to denote its observations in the first and second groups, respectively. The Student's t-test is given by t =
/s, where
and
;
,
and
|
|
|
|
2.5 Predicting differential expressions
We consider a local regression technique to model the relationship between differential expression and co-expression measures, and then to predict the differential expression measure of a given gene. For a given gene Xi, we have its measure of co-expression
ik between Xi and another gene Xk, which has a measure of differential expression tk. Therefore, for each gene Xk, k = 1,2,...,m (m is the total number of genes under study), there are a pair of measurements (
ik, tk). To gather more observations for the local regression, we consider a modified pair (
'ik, t'k) = (sik
ik, siktk), where sik is the sign of
ik. In this way, all the co-expression measures will be non-negative (see Fig. 1 for illustrations).
For a given gene Xi, there are m paired measures
. We consider (
'ii = 1, t'i) as a pair of outliers and exclude them from the local regression. We use LOWESS, a well-known statistical technique for local regression, to fit the rest m – 1 pairs. To predict the differential expression measure of Xi, we linearly extend the fitted curve to the right and use the fitted value at
';ii = 1 for prediction.
2.6 LOWESS
LOWESS is a well-developed statistical technique for local regression (Cleveland, 1979). Here, we briefly describe it as follows. Suppose there are l paired observations {(xi, yi) : i = 1, 2,..., l}. Let f be the smoother span with constrain 0 < f
1 and r = [fl] be the integer closest to fl. For each xi, we define the weight for xk relative to xi as
, where W(·) is the tricube function and hi is the distance from xi to the r-th nearest neighbor of xi. Then, we perform the following procedure:
- For each xi, fit a zi for yi by the weighted least squares with a polynomial model with degree d and weights {wk(xi)};
- Calculate the residuals {yi – zi} and update the weights with the bisquare function;
- Repeat steps (a) and (b) b times;
- Return the final {zi} as the robust locally weighted regression fitted values for {yi}.
2.7 Optimizing LOWESS
There are several parameters to be determined in the above LOWESS procedure. In practice, satisfactory results have been observed when d = 1 and b = 2. The default value for the smoother span in R function lowess is f = 2/3. However, in practice, the choice of smoother span f still depends on data and it is the proportion of nearest neighbors used for the local regression. For example, Berger et al. (2004) discussed the choice of f to optimize LOWESS normalization for microarray data. In this study, we intend to predict the differential expression for each gene. One may consider to choose different f for different genes. However, this may over-fit the data and also increase the computation burden. To obtain more robust genome-wide predictions and to reduce the computation burden, we propose the following global optimization of f so that the predicted and observed differential expression measures are consistent at the genome-wide level.
For a given smoother span f, we can obtain a set of predicted differential expression measures
in addition to the observed measures {ti}. We use the Spearman's rank correlation coefficient to calculate the correlation between these two vectors of measures. Among the set {0.05, 0.1, ... , 0.95}, f is chosen to be the number that maximizes this rank correlation (see Fig. 2 for illustrations).
|
2.8 Significance assessment
Gene-specific permutation P-value.
The permutation procedure has been widely used in microarray studies to evaluate P-values. The traditional permutation procedure simply permutes the sample group labels and recalculates the test statistics. However, this procedure does not change the co-expression measures, which may impair the random patterns (the relationship between the differential expression and co-expression measures) that the permutation procedure intends to generate. Therefore, we use the following gene-specific permutation procedure for evaluating P-values.
- Obtain a permuted data set by randomly reassigning the expression measurements to different arrays for each gene;
- Obtain a group of permuted scores by performing the genome-wide co-expression based prediction of differential expressions for each gene with the above permuted dat set;
- Obtain a pool from R groups of permuted scores by performing steps (a) and (b) R(= 100) times;
- Evaluate the P-value of a calculated score with the above pool as a null distribution.
False discovery rate.
Benjamini and Hochberg (1995) introduced the concept of false discovery rate (FDR) to evaluate the expected proportion of false positives among the claimed positives. They also proposed a procedure to control FDRs (Benjamini and Hochberg, 1995). Although a more sophisticated FDR estimation procedure has been also proposed (Storey and Tibshirani, 2003), it requires an estimation of proportion of differentially expressed genes. The purpose of this study is to introduce a statistical method for predicting differential expressions with the consideration of co-expression information. To keep a simple illustration, we will still use the FDR control procedure (Benjamini and Hochberg, 1995) in this study.
False positives versus claimed positives.
To show the improvement of our method against the traditional univariate method (Student's t-test), we consider the curve of the number of false positives against the number of claimed positives (Lai, 2006; Wu, 2005). The procedure for generating the curve is briefly described as follows. Given a positive threshold value c, for the vector of scores calculated based on the original data, count the number of genes with absolute values of scores greater than c. This is the number of claimed positives. For each vector of scores calculated based on the permuted data (see above for the permutation procedure), also calculate such a number of genes. The average is an upper bound for the number of false positives (if we assume all genes are non-differentially expressed). A curve can be generated after screening different values of c. It is obvious that a good method should give a relatively low curve.
| 3 RESULTS |
|---|
|
|
|---|
3.1 Prostate cancer data set
The optimized smoother span for this data set is f = 0.25, or 25% of the nearest neighbors are used for local fitting for each gene. Genes hepsin, AMACR and GSTP1 have been shown to be important in the study of prostate cancer (DeMarzo et al., 2003). When the traditional Student's t-test was used for measuring differential expressions, gene hepsin received the first rank. However, genes AMACR and GSTP1 were ranked only 672 and 106, respectively, and it was difficult to identify them. When our method was used for predicting differential expressions, we were surprised to observe that the ranks of genes AMACR and GSTP1 were increased to 10 and 35, respectively. Furthermore, the rank of gene hepsin was still as high as 17. Figure 3 gives their scatter plots and LOWESS curves as well as the predictions of differential expressions. The whole gene list ranked by our method was included in the Supplementary Material.
|
Figure 4 gives the comparison of differential expression measures and their FDRs between the observations (given by the traditional t-test) and predictions (given by our method). Many genes with weak observed differential expressions have significant predicted differential expressions. Another observation from the FDR plot is that if a gene has a relatively significant observed differential expression, then its predicted differential expression will usually be relatively significant as well (the upper left portion of the FDR plot is almost empty). Figure 5 shows the improvement in the curve of the number of false positives against the number of claimed positives when our method (predicted) is compared with the Student's t-test (observed). Both curves are significantly lower than the diagonal line (which represents that data are generated totally from randomness). With some overlaps, almost the whole curve given by our method is clearly lower than the curve given by the Student's t-test.
|
|
3.2 Type 2 diabetes data set
The optimized smoother span for this data set is f = 0.3, or 30% of the nearest neighbors are used for local fitting for each gene. Figure 6 gives the comparison of differential expression measures and their FDRs between the observations (given by the traditional t-test) and predictions (given by our method). Figure 5 shows the improvement in the curve of the number of false positives against the number of claimed positives when our method (predicted) is compared with the Student's t-test (observed). Notice that the curve given by the Student's t-test is almost overlapped with the diagonal line, which implies that the number of disease-related genes in the data is relatively small and their differential expressions are weak. However, our method gives a clearly lower curve. The advantage is clearer when the lower left portion of the plot is enlarged.
|
For three genes highly ranked by our method with annotations, Figure 7 gives their scatter plots and LOWESS curves as well as the predictions of differential expressions. To our surprise, the fitted LOWESS curves are all very close to straight lines, which is quite different from what has been observed from the previous prostate cancer data analysis. The relationship between co-expression and differential expression measures can be very different for different data sets, which suggests the necessity of optimizing the smoother span f based on the data structure.
|
The whole gene list ranked by our method was included in the Supplementary Material. Among the top ranks, many annotated genes were found in literature to be directly or indirectly associated with type 2 diabetes. These include ADSL (Jenkins et al., 1988), ANK1 and ANK3 (Schwartz et al., 1991), APOD (Hansen et al., 2004), AQP3 (Ma et al., 2000), CAPN3 (Zatz and Starling, 2005), GAD2 (Zinman et al., 2004), PDK4 (Ma et al., 2005) and TCF7 (Noble et al., 2003; for type 1 diabetes).
| 4 DISCUSSION |
|---|
|
|
|---|
We proposed a statistical method for predicting the differential expression of a given gene with the consideration of genome-wide co-expressions. Since genes interact with each other during cellular and molecular processes, co-expression and differential expression measures are likely to highly correlate. In this study, we considered LOWESS, a well-developed statistical technique for robust local weighted regression, for modeling the relationship between the co-expression and differential expression measures. Such a model could be used to predict the differential expression of a given gene. We used a gene-specific permutation procedure to evaluate the significance of a prediction. Through the applications to two microarray gene expression data sets, we showed that our method could provide better control of false positives and identify disease-related genes with weak differential expressions. Recently, several multivariate statistical methods have been proposed for detecting disease-related genes (Storey et al., 2007; Tibshirani and Wasserman, 2006). It is necessary to conduct a comprehensive comparison study so that the performance of different methods can be thoroughly understood.
One may wonder whether it is possible to find a common correlation threshold for the prediction of differential expressions (extensions of LOWESS curves). If this is possible, then the predictions can be simplified to linear regression lines based on genes with co-expression measures greater than the threshold. However, Figures 1, 3 and 7 show different LOWESS patterns as well as different genome-wide correlation ranges for different genes. Furthermore, Figure 8 shows the empirical distributions of the 95-th percentile of absolute correlations in the prostate cancer data set and the type 2 diabetes data set. (For each gene, the absolute Pearson's correlation coefficients between this gene and all the other genes are calculated; then, the 95-th percentile is obtained from these correlations.) Many genes have relatively low genome-wide correlations, while many other genes have relatively high genome-wide correlations. Therefore, the correlation threshold is gene-specific. It is difficult to rigorously define this gene-specific threshold.
|
The prediction results may depend on the microarray chip used in the study. To understand the chip size (the number of genes) impact, we also conducted the following analyses (data not shown). Based on an experimental data set, for a given gene, we made the chip size smaller by randomly sampling a certain proportion of the rest genes. In this way, the proportions of non-differentially/differentially expressed genes, as well as the correlation structure among these genes, could be relatively well maintained. With the reduced microarray data set, we predicted the differential expression for the given gene. We observed consistent predictions when the chip size was relatively large (e.g. > 3000 genes). However, when the chip size was relatively small (e.g. < 100 genes), the predictions could be very unreliable. It is necessary to pursue more theoretical and simulation studies so that the chip impact can be further understood.
There are certain disadvantages of our proposed method. First, predicting a differential expression is actually the prediction of an outlier (see Figs 1, 3 and 7), and therefore may not be reliable. Second, the choice of smoother span f in LOWESS is not trivial. Although we have proposed a global optimization for this parameter, it is still no clear whether there is a better way to determine f. Third, the method is computer intensive since we need to calculate all possible pairwise correlations and the permutation procedure has to be used for evaluating significance. Therefore, it is necessary to develop more statistically and computationally efficient method for predicting differential expressions.
In addition to genome-wide co-expression information, incorporating more biological information may further improve the detection of disease-related genes. Recently, Pan (2006) has shown that incorporating the function information of genes can improve the clustering analysis of microarray gene expression data. In addition to gene ontology information, there are other genome-wide information collected to understand cellular and molecular processes, such as the DNA sequence data (International Human Genome Sequencing Consortium, 2004) and the cell cycle gene expression data (Whitfield et al., 2002). We expect to develop more efficient statistical and computational methods for detecting disease-related genes if these different types of biological information can be efficiently incorporated into the analysis.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
We thank the associate editor and an anonymous reviewer for their valuable comments. This work was support by a NIH/NIDDK grant DK-75004.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Trey Ideker
Received on May 31, 2007; revised on September 9, 2007; accepted on October 4, 2007
| REFERENCES |
|---|
|
|
|---|
Baldi P, Long AD. A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics (2001) 17:509–519.
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser B (1995) 57:289–300.
Berger JA, et al. Optimized LOWESS normalization parameter selection for DNA microarray data. BMC Bioinformatics (2004) 5:194.[CrossRef][Medline]
Cleveland WS. Robust locally weighted regression and smoothing scatterplots. J. Am. Stat. Assoc (1979) 74:829–836.[CrossRef][Web of Science]
Cui X, Churchill GA. Statistical tests for differential expression in cDNA microarray experiments. Genome Biol (2003) 4:210.[CrossRef][Medline]
Cui X, et al. Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics (2005) 6:59–75.[Abstract]
DeMarzo AM, et al. Pathological and molecular aspects of prostate cancer. Lancet (2003) 361:955–964.[CrossRef][Web of Science][Medline]
Golub TR, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science (1999) 286:531–537.
Guan Z, Zhao H. A semiparametric approach for marker gene selection based on gene expression data. Bioinformatics (2005) 21:529–536.
Hansen L, et al. Expression profiling of insulin action in human myotubes: induction of inflammatory and pro-angiogenic pathways in relationship with glycogen synthesis and type 2 diabetes. Biochem. Biophys. Res. Commun (2004) 323:685–695.[CrossRef][Medline]
International,Human and Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature (2004) 431:931–945.[CrossRef][Medline]
Jenkins RL, et al. Adenine nucleotide metabolism in hearts of diabetic rats. Comparison to diaphragm, liver, and kidney. Diabetes (1988) 37:629–636.[Abstract]
Lai Y. A statistical method for estimating the proportion of differentially expressed genes. Comput. Biol. Chem (2006) 30:193–202.[CrossRef][Web of Science][Medline]
Lai Y, et al. A statistical method for identifying differential gene-gene co-expression patterns. Bioinformatics (2004) 20:3146–3155.
Lee HK, et al. Coexpression analysis of human genes across many microarray data sets. Genome Res (2004) 14:1085–1094.
Lockhart D, et al. Expression monitoring by hybridization to high-density oligonuleotide arrays. Nat. Biotechnol (1996) 14:1675–1680.[CrossRef][Web of Science][Medline]
Ma K, et al. Cloning of the rat pyruvate dehydrogenase kinase 4 gene promoter. J. Bio. Chem (2005) 280:29525–29532.
Ma T, et al. Nephrogenic diabetes insipidus in mice lacking aquaporin-3 water channels. Proc. Natl Acad. Sci. USA (2000) 97:4386–4391.
Mootha VK, et al. PGC-1
-response genes involved in oxidative phos-phorylation are coordinately downregulated in human diabetes. Nat. Genet (2003) 34:267–273.[CrossRef][Web of Science][Medline]
Noble JA, et al. A polymorphism in the TCF7 gene, C883A, is associated with type 1 diabetes. Diabetes (2003) 52:1579–1582.[Medline]
Pan W. Incorporating gene functions as priors in model-based clustering of microarray gene expression data. Bioinformatics (2006) 22:795–801.
Schena M, et al. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science (1995) 270:467–470.
Schwartz RS, et al. Oxidation of spectrin and deformability defects in diabetic erythrocytes. Diabetes (1991) 40:701–708.[Abstract]
Singh D, et al. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell (2002) 1:203–209.[CrossRef][Web of Science][Medline]
Spellman PT, et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell (1998) 9:3273–3297.
Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA (2003) 100:9440–9445.
Storey JD, et al. The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments. Biostatistics (2007) 8:414–432.
Tibshirani R, Wasserman L. Correlation-sharing for detection of differential gene expression. In: Technical report. (2006).
Tusher VG, et al. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA (2001) 98:5116–5121.
van't Veer LJ, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature (2002) 415:530–536.[CrossRef][Medline]
Wang S, Ethier S. A generalized likelihood ratio test to identify differentially expressed genes from microarray data. Bioinformatics (2004) 20:100–104.
Whitfield ML, et al. Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol. Biol. Cell (2002) 13:1977–2000.
Wu B. Differential gene expression detection using penalized linear regression models: the improved SAM statistics. Bioinformatics (2005) 21:1565–1571.
Wu B. A unified statistical framework for differential gene expression detection and sample classification using penalized linear regression models. Bioinformatics (2006) 22:472–476.
Zatz M, Starling A. Calpains and diseaes. N. Engl. J. Med (2005) 352:2413–2423.
Zinman B, et al, for the ADOPT Study Group. Phenotypic characteristics of GAD antibody-positive recently diagnosed patients with type 2 diabetes in North America and Europe. Diabetes (2004) 53:3193–3200.
This article has been cited by other articles:
![]() |
V. Zuber and K. Strimmer Gene ranking and biomarker discovery under correlation Bioinformatics, October 15, 2009; 25(20): 2700 - 2707. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||








