Bioinformatics Advance Access originally published online on June 22, 2007
Bioinformatics 2007 23(17):2298-2305; doi:10.1093/bioinformatics/btm328
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Multivariate correlation estimator for inferring functional relationships from replicated genome-wide data
1Stowers Institute for Medical Research, 1000 E 50th Street, Kansas City, MO 64110 and 2Department of Statistics, University of Michigan, Ann Arbor, MI 48105, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: Estimating pairwise correlation from replicated genome-scale (a.k.a. OMICS) data is fundamental to cluster functionally relevant biomolecules to a cellular pathway. The popular Pearson correlation coefficient estimates bivariate correlation by averaging over replicates. It is not completely satisfactory since it introduces strong bias while reducing variance. We propose a new multivariate correlation estimator that models all replicates as independent and identically distributed (i.i.d.) samples from the multivariate normal distribution. We derive the estimator by maximizing the likelihood function. For small sample data, we provide a resampling-based statistical inference procedure, and for moderate to large sample data, we provide an asymptotic statistical inference procedure based on the Likelihood Ratio Test (LRT). We demonstrate advantages of the new multivariate correlation estimator over Pearson bivariate correlation estimator using simulations and real-world data analysis examples.
Availability: The estimator and statistical inference procedures have been implemented in an R package CORREP that is available from CRAN [http://cran.r-project.org] and Bioconductor [http://www.bioconductor.org/].
Contact: doz{at}stowers-institute.org or dongxiaozhu{at}yahoo.com
Supplementary information: Supplementary data are available at Bioinformatics online.
| 1 INTRODUCTION |
|---|
|
|
|---|
The ever-increasing use of high-throughput data acquisition technologies, such as gene expression arrays and mass spectrometry, has generated an enormous amount of genome-scale data, in which every gene or gene product is associated with a series of numeric measurements. Analysis of these data provide many opportunities to understand the underlying cellular processes. Some of the familiar tasks in analysis of genome-wide measurements include pre-processing (Irizarry et al., 2003; Li and Wong, 2001; Zhou and Rocke, 2005), detecting differentially expressed genes (Cui and Churchill, 2003; Pavelka et al., 2004; Sartor et al., 2006), gene clustering (Medvedovic and Sivaganesan, 2002; Medvedovic et al., 2004; Liu et al., 2006; Yeung et al., 2001), sample classification and biomarker discovery (Nguyen and Rocke, 2002a, Nguyen and Rocke, 2002b; Yeung and Bumgarner, 2005) and gene network reconstruction (Lee et al., 2004; Zhu et al., 2005a). These analyses are useful in understanding the complex web of bio-molecule interaction and regulation in the living cell.
Many types of genome-wide data are noisy, and have to be replicated to account for the inherent variability. In this work, we discuss replication in two contexts, i.e. profiling gene expression using expression arrays, and profiling protein abundance in biological samples using mass-spectrometry.
Various forms of data replication are employed in both cDNA and Affymetrix microarray experiments. In cDNA microarray experiments, the differences between replication strategies are in the extent to which the replicates can be treated as statistically independent random samples from a population. The more popular replication methods can be divided into within-slide and between-slide replications (Speed, 2003). The former is generated by printing the same cDNA on a slide more than once for quality control purposes. The latter is generated by using different mRNA aliquot for each slide hybridization that can be further divided into technical replicates and biological replicates depends on whether the mRNA aliquots are from a single mouse or strain or not. In Affymetrix experiments, each probe set consists of about 11 probe pairs. The probe-level intensities can be treated as replicated measures of the gene expression scores.
In proteomics research, quantification experiments of the proteins or the peptides present in a sample are usually performed in replicates. For example, the recently introduced iTRAQ protein quantification technique (Ross et al., 2004) derivatizes peptides using multiple (e.g. four) isobaric mass tags i.e. mass-to-charge ratios (m/z) 114, 115, 116 and 117 Da, and it also imparts identical reversed-phase retention properties to differentially labeled peptides. Therefore, four separate peptide mixtures can be individually labeled with different mass tags, combined, and separated by reversed-phase high performance liquid chromatography (HPLC). The relative abundance of a protein is thus represented by a number of ratios of peptides abundance to the internal standard. These ratios can be viewed as replicated measurements of the protein abundance (Hardt et al., 2005).
Many data analysis approaches were able to draw reliable inference from noisy data by taking good advantage of replicates. For example, analysis of variance (ANOVA) have been applied to screen differentially expressed genes between different physiological/genetic conditions (Cui and Churchill, 2003). Bayesian mixture models have been developed and applied to replicated microarray data in order to infer gene clusters and classify tissues (Medvedovic and Sivaganesan, 2002; Medvedovic et al., 2004; Yeung and Bumgarner, 2005). However, many popular approaches to multivariate pattern discovery, such as hierarchical clustering or co-expression network construction, are based on estimating the statistical correlation between a pair of replicated biomolecule expression profiles, and the widespread bivariate correlation estimators used for that purpose (e.g. Pearson correlation coefficient) employ average expression profiles over replicates. The averaging, while on one hand reduces variance, on the other hand, tends to give rise to more biased correlation estimation. Thus the simple averaging over replicates is not completely satisfactory especially for noisy omics data that do not have good within-replicate correlation.
Here, we propose a new multivariate correlation estimator that exploits all replicates of a data set by assuming they are i.i.d. samples from the multivariate normal distribution. Based on a covariance structure that explicitly models within-replicate and between-replicate correlations of the data, we derive a maximum likelihood (ML)-based correlation estimator by maximizing the likelihood function of the replicated data. We establish a theoretical connection between the two correlation estimators when there is no replicate, and we provide a suite of statistical inference procedures for small sample size and large sample size data. Our work was strongly influenced by the previous works in estimating interclass and intraclass correlation from familial data (Chu and kshirsagar, 1994; Keen and Srivastava, 1991; Konishi, 1985; Konishi et al., 1991; Srivastava, 1984).
| 2 METHODS |
|---|
|
|
|---|
2.1 Pearson bivariate correlation estimator
Suppose the biomolecule abundance levels are simultaneously measured over n independent samples. There are m replicated measures available for each sample. Let xij, yij denote the abundance levels for arbitrary biomolecule X and biomolecule Y in the i th replicate in the j th sample. The application of Pearson correlation coefficient requires averaging biomolecule abundance profiles, i.e.
|
| (1) |
2.2 Multivariate correlation estimator
Instead of averaging, we exploit all the replicated observations by assuming the data are i.i.d. samples from a multivariate normal distribution with a specified correlation matrix and a mean vector, i.e. Zj = (xj1, ...,xjm,yj1,...,yjm)T, j = 1, 2, ...,n, follows a 2m-variate normal distribution N(µ,
), where
is an m x 1 vector, the correlation matrix
is a 2m x 2m matrix:
|
| (2) |
is the parameter of interest, and the intra-molecule correlation
x or
y are nuisance parameters. The
x and
y indicate the quality of replicates that high quality replicates tend to have high value, and vice versa. We employ three parameters:
,
x and
y to model the correlation structure of replicated data.
Assuming a multivariate normal model, the maximum likelihood estimate (MLE) of
can be derived as follows (see Appendix for more details):
|
| (3) |
|
| (4) |
|
| (5) |
To derive the MLE of
, it would be preferable to obtain the likelihood explicitly as a function of
. However, this is intractable in practice (see Appendix for a detailed discussion). Our approach is to use the average of the elements of
estimated via Equation (5):
|
| (6) |
It follows that the
remains invariant for data with unequal numbers of replicates (m) since it is not a function of m.
2.3 Theoretical connection between the two estimators
The sample Pearson correlation coefficient [Equation (1)] can be also written into the following form:
|
| (7) |
is reduced to a 2 by 2 matrix with diagonal elements equal to 1 and off-diagonal elements equal to
. It is easy to show from Equation (5) that
|
| (8) |
|
| (9) |
2.4 Small sample inference procedure
For small sample size data, we provide a resampling-based statistical inference procedure. For n replicated bivariate observations Z1 = (X.1,Y.1), Z2 = (X.2,Y.2), ..., Zn = (X.n,Y.n), where . represents row index, and 1,..., n represent column index, we test the following hypothesis:
|
| (10) |
|
|
2.5 Large sample inference procedure
For moderate to large sample data, we provide an LRT procedure for testing the hypothesis that the multivariate correlation
vanishes. Under the multivariate normal distribution assumption stated in Section 2.2, Zj
N(µ,
), and we test the following hypothesis:
|
| (13) |
0 and
1 are (m1 + m2) x (m1 + m2) matrices, where m1 and m2 are number of replicates for biomolecule X and Y (in the case of equal numbers of replicates for X and Y, we have m1 = m2 = m) and
x and
y, with diagonal elements identity and all the other entries being
x and
y, respectively. 0m12 is an m1 x m2 zero matrix and 0m21 is an m2 x m1 zero matrix, i.e. under the null hypothesis, the intermolecule correlation
vanishes.
xy is an m1 x m2 matrix with all entries equal to
. Likewise,
. The likelihood ratio (LR) statistic for testing the two different correlation structures can be derived as follows:
|
| (14) |
. The MLE of the correlation matrix under H0 can be determined as |
| (15) |
|
| (16) |
The LR statistic, denoted by G2, is therefore:
|
| (17) |
2 distributed with 2(m1x m2) degrees of freedom under H0 (Anderson, 1958). | 3 RESULTS |
|---|
|
|
|---|
3.1 Simulation setup
We used mean squared error (MSE) as an objective criterion for comparison. It is defined as
|
| (18) |
is the true correlation coefficient and
Genome-wide data may have different sample sizes (n), different number of replicates (m) and different, often not well-studied, correlation structure (
). We aim to show that our multivariate estimator is consistently superior to Pearson bivariate correlation estimator in almost all the realistic combinations of the above parameters. In particular, we set the parameters to the following values:
- The sample size n: sample size represents the number of independent biological samples of interest. We set n to be 5, 10, 20 and 50, corresponding to small/median/large samples.
- The number of replicates m: most data only have a few replicates due to the cost of the genome-wide experiments. Therefore, we set m to be 3, 4 and 5.
- The within-replicate correlations
x or
y (see Methods Section): it is an indicator of data quality. We set it at three levels: very noisy (L) (0.1–0.3), noisy (M) (0.3–0.5) and clean (H) (0.6–0.8). Evaluation based on the first two levels are of particular interest to us, since genome-wide profiling is typically noisy.
- The between-replicate correlation
(see Methods Section): similarly,
can be chosen as low (0.2), median (0.4) and high (0.6). The high between-replicate correlation (typically 0.6) is of particular interest since it more reliably predict functional similarity (Zhu et al., 2005b).
3.2 Simulation results
In Figures 1 and 2, vertical axis represents the MSE ratio of Pearson estimator over multivariate estimator. Ratios greater than 1 indicate that the multivariate correlation estimator outperforms the traditional bivariate estimator. The horizontal axis represents the different combinations of categorical within-replicate correlations. We compared the two estimators under five within-replicate correlation structures: i.e. (L, L), (L, M), (L, H), (M, M) and (M, H) and three between-replicate correlation cut-offs, 0.2, 0.4 and 0.6. Note that under few combination, simulations could not be preformed because the corresponding correlation matrices were not positive definite (PD).
|
|
In Figure 1, we fixed the sample size at n = 20 to examine the performance of the multivariate estimator with varying number of replicates versus the bivariate estimator by averaging over replicates. In Figure 1a and b, almost all examined MSE ratio are greater than 1 indicating superior performance of the multivariate estimator to the bivariate estimator. In particular, Figure 1a (lower between-replicate correlation cut-off
= 0.4) has a much higher overall MSE ratio. In Figure 2, we fixed the number of replicates at m = 4, which is typical in omics experiments to examine the performance over different number of samples. Similarly, almost all examined MSE ratio are greater than 1. In general, the superior performance of the multivariate estimator to the bivariate estimator is an increasing function of the sample size (n), number of replicates (m), but it is a decreasing function of data quality (
x and
y) and correlation cut-off. For real-world data, we expect that our correlation estimator work even better than Pearson estimator as sample size, number of replicates increase and data quality decrease. The set of comprehensive comparison results are summarized in the Supplementary Table S1. Checking into the results more carefully, we found that the mean bivariate estimates are greatly biased upward and the mean multivariate estimates are slightly biased downward (Figure 3, Supplementary Table S1). This can be translated into a significantly reduced false positive rate when applying the multivariate estimator. We also examined variances of the two estimators, our estimator has smaller variance for noisy data, and larger variance otherwise (Supplementary Table S1). Our multivariate estimator, while a little conservative and sometime having larger variance, is much more accurate than the Pearson bivariate correlation estimator (Figure 3).
|
3.3 Microarray data analysis examples
Many real-world data analysis tasks rely on accurate estimation of correlation matrix. Affymetrix made two spike-in data sets publicly available as benchmark to compare different probe set expression summarization methods, such as RMA (Irizarry et al., 2003) and GCRMA (Wu and Irizarry, 2005) [downloadable from http://affycomp.biostat.jhsph.edu/]. We instead use the two spike-in data sets as benchmark to compare correlation estimation methods: Pearson correlation coefficient using either RMA or GCRMA expression scores (GR and GG in Table 1). Multivariate correlation estimator using probe-level data where perfect match (PM) intensities of the same probe set are treated as replicated measures of the gene expression score (PR and PG in Table 1). For each spike-in data set, both nominal values of gene expression and observed probe-level intensities are available. The correlation matrix estimated by the four methods summarized in Table 1 are then compared to the nominal correlation matrix in terms of MSE. The smaller the MSE is, the closer the estimated correlation matrix to the nominal one.
|
Either array or probe set can be treated as multivariate random variable, correspondingly we calculated MSE of both array correlation matrix and probe set correlation matrix. Note that in the second data set, only probe set correlation matrix is computable due to unequal probe levels within array. In Table 2, we observe that MSE of the multivariate correlation estimators (PR and PG) are uniformly smaller than that of Pearson correlation estimator (GR and GG). It strongly indicates that pattern discovery methods based on multivariate correlation estimator are more capable to discover the true patterns of the data.
We then compared our multivariate correlation estimator with Pearson correlation estimator in the context of hierarchical gene clustering. We analyzed a subset of 205 genes whose expression were profiled using four replicates under 20 physiological/genetic conditions (Ideker et al., 2000). The 205 genes were previously classified into four functional groups (Yeung et al., 2003) that we employed it as the external knowledge for comparing performance of two correlation estimators through hierarchical gene clustering approaches. Similar to Medvedovic et al. (2004), we constructed six data sets of two replicates and four data sets of three replicates. The (average) adjusted RAND index (Hubert and Arabie, 1985) was used to measure the consistency between clustering results and the external knowledge.
|
As shown in Table 3, overall clustering algorithms based on the multivariate correlation estimator are better consistent to the external biological knowledge (higher adjusted RAND index). Note that we are interested in identifying four large clusters instead of many small clusters, therefore, we expect the divisive hierarchical clustering more suitable for this task (Speed, 2003). In Table 3, the (average) adjusted RAND index of divisive hierarchical clustering (represented by Diana) monotonically increases with the number of replicates increases and is consistent best to the external knowledge with four replicates (adjusted RAND index 0.95).
|
We further compared the sensitivity and specificity of the two test procedures over a wide biologically relevant range. For fair comparisons, asymptotic Fisher Z-transform was used as test procedure for Pearson correlation estimator in parallel with the asymptotic LRT test procedure for the multivariate correlation estimator. Sensitivity is calculated as the ratio between the number of gene pairs called significant by the test procedure and the number of genes pairs that does fall into the same one of four previously defined functional categories. Specificity is calculated as the ratio between the number of gene pairs NOT called significant and the number of gene pairs that does NOT fall into the same one of four previously defined functional categories.
We made comparisons using top correlated gene pairs (1–15%) and the selection is based on the empirical results that only top correlated gene pairs are likely to be functionally relevant (Griffith et al., 2005; Lee et al., 2004). In the context of co-expression network, it corresponds to the well-observed phenomenon that gene co-expression network is typically very sparse (Lee et al., 2004; Zhu et al., 2005a). Our LRT procedure performs slightly better than Fisher's Z-transform test procedure in terms of both sensitivity and specificity (Supplementary Table S2). Note that there is much overlap of significant calls between the two test procedures (Supplementary Table S2).
We claim that the better performance of our correlation estimator and test procedure was achieved under an adverse condition of high within-replicate correlation (75% quantile: 0.89, 50% quantile: 0.73, 25% quantile: 0.34, see Supplementary Fig. S1). As demonstrated in simulations, we expect our estimator to significantly outperform Pearson bivariate estimator with noisy data (low within-replicate correlation) and more replicates.
| 4 CONCLUSION |
|---|
|
|
|---|
In this article, we proposed a new multivariate correlation estimator that exploits all replicated data points in genome-scale data instead of averaging over replicates when using bivariate correlation estimator. Our simulation studies showed that the new estimator possesses low MSE, high accuracy and comparable variance to the Pearson correlation estimator. The analysis of a real-world data set further demonstrated the potential application of our method to pattern discovery problems from noisy genome-wide profiling data. In particular, our approach provide a new way of exploiting Affymetrix probe-level data to pattern discovery problems. Using two Affymetrix spike-in data sets, we have shown that our estimated correlation matrix is closer to the nominal correlation matrix than the competing methods. It seems to indicate that pattern discovery methods based on multivariate correlation estimator are better able to discover the true patterns from probe-level data. Since the multivariate correlation estimator was presented in a closed form, its computational cost is moderate compared to Pearson correlation estimator, therefore, pattern discovery algorithms based on the multivariate correlation estimator is very scalable to large omics data set.
| Appendix |
|---|
|
|
|---|
The likelihood function of a 2m-variate normal family is as the following:
|
| (19) |
is the 2m x 2m covariance matrix with the structure specified in Equation (2). To obtain the MLE, equivalently, we wish to minimize:
|
| (20) |
|
| (21) |
|
| (22) |
, we do a slight transformation on the log-likelihood: |
| (23) |
|
| (24) |
|
| (25) |
is to get the likelihood explicitly expressed as a function of the unknown parameters
= (µx, µy,
x,
y,
)T. In order to do that, we can introduce the transformation used by Srivastava (1984) for interclass correlations in familial data.Let em = (1, ... , 1)T, 0m = (0, ... ,0)T be constant m x 1 vectors. Im denotes the m x m identity matrix and 0m is an m x m zero matrix. The canonical transformation of the original random vector Zj, |
| (26) |
|
| (27) |
|
| (28) |
–1 in the log likelihood function can be expressed as:
|
| (29) |
| in terms of the parameters
x,
y and
explicitly:
|
| (30) |
|
| (31) |
x + 1, b = (m– 1)
y + 1 and h = [(m – 1)
x + 1][(m – 1)
y + 1] – m2
2, and SSxj, SSyj denote the sample sum of squares for gene X and Y under the jth condition, respectively:
|
| (32) |
|
| (33) |
= (µx, µy,
x,
y,
)T, we can get the true MLE estimates. However, we cannot get explicit solutions for any of the unknown parameters. The calculations were proved to be intractable in implementation, and we chose to use the estimation specified in Section 2 rather that the MLE solution for
.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
We thank Arcady Mushegian, Norman Pavelka and Frank Emmert-Streib for helpful discussion. We would to thank anonymous reviewers for their helps in improving the quality of this manuscript.
| FOOTNOTES |
|---|
Associate Editor: Golan Yona
Received on January 23, 2007; revised on May 31, 2007; accepted on June 17, 2007
| REFERENCES |
|---|
|
|
|---|
Anderson TW. An introduction to Multivariate Analysis, ( (1958) ) New York: Wiley,.
Chu SK, Kshirsagar AM. Correlation coefficient between two variables when the data set consists of observations on twins. Parisankhyan Samikkha: Int. J. Stat, ( (1994) ) 1, : 1..
Cui XQ, Churchill GA. Statistical tests for differential expression in cDNA microarray experiments. Genome Biol, ( (2003) ) 4, : 201.[CrossRef][Medline].
Efron B, Tibshirani RJ. An Introduction to the Boostrap, ( (1993) ) Chapman & Hall, New York, USA..
Griffith OL, et al. Assessment and integration of publicly available SAGE, cDNA microarray, and oligonucleotide microarray expression data for global coexpression analyses. Genomics, ( (2005) ) 86, : 476–488.[CrossRef][ISI][Medline].
Hardt M, et al. Assessing the effects of diurnal variation on the composition of human parotid saliva: quantitative analysis of native peptides using iTRAQ reagents. Anal. Chem, ( (2005) ) 77, : 4947–4954.[Medline].
Hollander A, Wolfe D. Nonparametric Statistical Methods, ( (1999) ) Wiley-Interscience, Hoboken NJ, USA..
Hubert L, Arabie P. Comparing partitions. J. Classif, ( (1985) ) 2, : 193–218.[CrossRef].
Ideker T, et al. Testing for differentially-expressed genes by maximum-likelihood analysis of microarray data. J. Comput. Biol, ( (2000) ) 7, : 805–817.[CrossRef][ISI][Medline].
Irizarry RA, et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, ( (2003) ) 4, : 249–264.[Abstract].
Keen KJ, Srivastava MS. The asymptotic variance of the interclass correlation coefficient. Biometrika, ( (1991) ) 78, : 225–228.
Konishi S. Normalizing and variance stabilizing transformations for intraclass correlations. Ann. Inst. Stat. Math, ( (1985) ) 37, : 87–94.[CrossRef].
Konishi S, et al. Inferences on multivariate measures of interclass and intraclass correlations in familial data. J. R. Stat. Soc. B, ( (1991) ) 53, : 649–659..
Lee HK, et al. Coexpression analysis of human genes across many microarray data sets. Genome Res, ( (2004) ) 14, : 1085–1094.
Li C, Wong WH. Model-based analysis of oligonucleotide arrays: expression score computation and outlier detection. Proc. Natl Acad. Sci. USA, ( (2001) ) 98, : 31–36.
Liu X, et al. Bayesian context-specific infinite mixture model for clustering of gene expression profiles across diverse microarray datasets. Bioinformatics, ( (2006) ) 22, : 1737–1744.
Medvedovic M, Sivaganesan S. Bayesian infinite mixture model based clustering of gene expression profiles. Bioinformatics, ( (2002) ) 18, : 1194–1206.
Medvedovic M, et al. Bayesian mixtures for clustering replicated microarray data. Bioinformatics, ( (2004) ) 20, : 1222–1232.
Nguyen DV, Rocke DM. Tumor classification by partial least squares using microarray gene expression data. Bioinformatics, ( (2002a) ) 18, : 39–50.
Nguyen DV, Rocke DM. Multi-class cancer classification via partial least squares using gene expression profiles. Bioinformatics, ( (2002b) ) 18, : 1216–1226.
Pavelka N, et al. A power law global error model for the identification of differentially expressed genes in microarray data. BMC Bioinformatics, ( (2004) ) 5, : 203.[CrossRef][Medline].
Ross PL, et al. Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol. Cell Proteomics, ( (2004) ) 3, : 1154–1169.
Sartor MA, et al. Intensity-based hierarchical Bayes method improves testing for differentially expressed genes in microarray experiments. BMC Bioinformatics, ( (2006) ) 7, : 538.[CrossRef][Medline].
Speed T, ed. Statistical analysis of gene expression microarray data. ( (2003) ) CRC Press. Chapman & Hall Boca Raton FL USA..
Srivastava MA. Estimation of interclass correlations in familial data. Biometrika, ( (1984) ) 71, : 177–185.
Wu Z, Irizarry RA. Stochastic models inspired by hybridization theory for short oligonucleotide arrays. J. Comput. Biol, ( (2005) ) 12, : 882–893.[CrossRef][ISI][Medline].
Yeung KY, Bumgarner R. Multi-class classification of microarray data with repeated measurements: application to cancer. Genome Biol, ( (2005) ) 4, : R83..
Yeung KY, et al. Model-based clustering and data transformations for gene expression data. Bioinformatics, ( (2001) ) 17, : 977–987.
Yeung KY, Medvedovic M, Bumgarner R. Clustering gene expression data with repeated measurements. Genome Biol, ( (2003) ) 4, : R34.[CrossRef][Medline].
Zhou L, Rocke D. An expression index for Affymetrix GeneChips based on the generalized algorithm. Bioinformatics, ( (2005) ) 21, : 3983–3989.
Zhu D, et al. High throughput screening of co-expressed gene pairs with controlled False Discovery Rate (FDR) and Minimum Acceptable Strength (MAS). J. Comput. Biol, ( (2005a) ) 12, : 1027–1043..
Zhu D, et al. Network constrined clustering for gene microarray data. Bioinformatics, ( (2005b) ) 21, : 4014–4021.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

10 then





