Bioinformatics Advance Access originally published online on May 5, 2007
Bioinformatics 2007 23(12):1537-1544; doi:10.1093/bioinformatics/btm129
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A Markov random field model for network-based analysis of genomic data
1Genomics and Computational Biology Graduate Group and 2Department of Biostatistics and Epidemiology, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: A central problem in genomic research is the identification of genes and pathways involved in diseases and other biological processes. The genes identified or the univariate test statistics are often linked to known biological pathways through gene set enrichment analysis in order to identify the pathways involved. However, most of the procedures for identifying differentially expressed (DE) genes do not utilize the known pathway information in the phase of identifying such genes. In this article, we develop a Markov random field (MRF)-based method for identifying genes and subnetworks that are related to diseases. Such a procedure models the dependency of the DE patterns of genes on the networks using a local discrete MRF model.
Results: Simulation studies indicated that the method is quite effective in identifying genes and subnetworks that are related to disease and has higher sensitivity and lower false discovery rates than the commonly used procedures that do not use the pathway structure information. Applications to two breast cancer microarray gene expression datasets identified several subnetworks on several of the KEGG transcriptional pathways that are related to breast cancer recurrence or survival due to breast cancer.
Conclusions: The proposed MRF-based model efficiently utilizes the known pathway structures in identifying the DE genes and the subnetworks that might be related to phenotype. As more biological networks are identified and documented in databases, the proposed method should find more applications in identifying the subnetworks that are related to diseases and other biological processes.
Contact: hongzhe{at}mail.med.upenn.edu or hli{at}cceb.upenn.edu
| 1 INTRODUCTION |
|---|
|
|
|---|
Identification of genes and pathways involved in diseases and other biological processes is one of the important problems in genomic research. Microarray technology makes it possible to measure the expression levels of almost all human genes and therefore facilitate the identification of genes and pathways that are related to disease initiation and development. In a typical experiment, several phenotypes are compared, with a certain number of biological replicates for each phenotype. The goal is to identify the differentially expressed (DE) genes among the different phenotype groups.
There are many novel statistical methods that have been developed for identifying the DE genes. A general approach is to conduct a hypothesis test at each gene and then correct for multiple testing. Most of the statistics used are t statistics and differ primarily in the estimation of the variance (Dudoit et al., 2002; Tusher et al., 2001). Other methods include the empirical Bayes methods that can effectively pool data from different genes (Efron et al., 2001; Kendziorski et al., 2003; Lonnstedt and Speed, 2002; Newton et al., 2003). These DE genes identified are often linked to a predefined list of groups of genes such as known pathways in order to identify which groups include more DE genes than expected by chance using a hypergeometric distribution. Alternatively, the gene set enrichment analysis (GSEA) (Subramanian et al., 2005; Tian et al., 2005) can be used. For such a GSEA, one starts with a predefined list of groups of genes and assigns every such group a score that is essentially the average of the univariate test statistics of its member genes. The groups with high scores are more likely to be DE and P-values can be obtained by permutation methods.
One limitation of the commonly used methods of identifying the DE genes or the GSEA is that network structures are not utilized in the analysis. However, the interaction network is a more precise way to represent information than lists of genes or pathways, as it describes which genes are closely connected within a given pathway. Hence, it has the potential to detect more subtle changes of gene expressions, such as local disturbances within known pathways. Rahnenführer et al. (2004) demonstrated that the sensitivity of detecting relevant pathways can be improved by integrating information about pathway topology. In Sivachenko et al. (2005), a network topology extracted from literature was used jointly with microarray data to find significantly affected pathway regulators. Nacu et al. (2006) proposed an interesting permutation-based test for identifying subnetworks from a known network of genes that are related to phenotypes. The method is essentially based on a spatial scan statistic treating genes collected on the networks as neighbors. However, their method does not explicitly utilize the dependency of gene differential expression patterns on the network. Rapaport et al. (2007) proposed to first smooth the gene expression data on the network based on the spectral graph theory and then to use the smoothed data for classification. The method explicitly assumes that the true gene expression levels should be similar for genes that are neighbors on the networks. However, this assumption may be questionable due to both activating and inhibiting effects of gene regulations.
Markov random field (MRF) models have been widely used in image analysis in order to account for the local dependency of the observed pixel intensities (Besag, 1986) and have also been applied for functional prediction of proteins in order to account for the local dependency of protein functions in the protein–protein interaction networks (Deng et al., 2002, 2004; Letovsky and Kasif, 2003). The MRF model has also been applied for discovering molecular pathways from protein interaction and gene expression data (Segal et al., 2003). In this article, we propose to develop a MRF-based method for identifying the DEs and the subnetworks that are related to the phenotypes, where the MRF model is used to capture the dependency of the differential expression patterns for genes on the networks. Our method combines the two-group empirical Bayes method of Kendziorski et al. (2003) and Newton et al. (2003) with a MRF model to model the dependency of the DE patterns. In our model and those of Kendziorski et al. (2003) and Newton et al. (2003), each gene is either DE or equally expressed. Those genes which are Equally Expression (EE) present data according to some background Gamma distribution, and those which are DE present data according to a different distribution. The specific forms of these distributions arise by another layer of mixing over the latent mean expression level for each gene and the latent means follow another Gamma distribution. Such empirical Bayes approaches allow a level of information sharing amongst genes.
The rest of the article is organized as follows. We first introduce the model assumptions and a MRF model. We then provide an iterative conditional mode algorithm (ICM) of Besag (1986) for parameter estimation and for identifying the DE genes. We present simulation studies to demonstrate the methods and to compare the results with other commonly used methods for DE gene identification. Finally, we present results of applying the proposed method to two breast cancer gene expression datasets in order to identify the genes and the subnetworks that are related to breast cancer recurrence or death due to breast cancer. We conclude the article with a brief discussion of the results and methods.
| 2 STATISTICAL MODELS AND METHODS |
|---|
|
|
|---|
2.1 Notation and assumptions
Given microarray gene expression profiling data under two conditions, we want to determine which genes are differentially expressed. Each gene can have two states, labeled 0 and 1, representing EE and DE, respectively. An arbitrary state assignment of gene set S will be denoted by
In order to specify the joint distribution of Y, we make the following assumptions:
Assumption 1.
Given any particular realization x, the random variablesare conditionally independent and each Yi has the same unknown conditional density function
, dependent only on xi. The conditional density of the observed gene expression y, given, x, is simply,
Assumption 2.
The true stateis a realization of a locally dependent discrete MRF with a specified distribution
, which is defined in the next section.
2.2 Gamma–Gamma model for gene expression data
In this section, we first briefly review the Gamma–Gamma model for gene expression data introduced in Newton et al. (2003) and Kendziorski et al. (2003). We define
, which characterizes fluctuations in repeated measurements under the same condition from a gene i having latent mean expression level µi, and
, which describes fluctuations in these means among genes. We assume that the observation yi is a sample from a gamma distribution having shape parameter
> 0 and a mean value µi; thus, with scale parameter
. The corresponding density function can be written as
|
|
, the quantity
0 and scale parameter v. Let |
|
|
|
|
|
|
|
|
|
Therefore, given the differential expression state xi, we have
|
|
|
|
|
| (1) |
2.3 Discrete local MRF model for joint differential expression states
The gene differential expression states
are not independent. For example, if a gene is DE, it is more likely that its upstream regulators are also DE and that these regulators in turn affect their downstream-regulated genes. In order to explicitly account for such dependency of DE patterns over genes on the networks, we propose to use the known biological network information compiled in the form of pathways. Examples of such pathways include the KEGG pathway (Kanehisa and Goto, 2002) and BioCyc pathways (http://biocyc.com/). In our model, the network is expressed as an undirected graph with the nodes for genes and edges for connections on the network. Consider the p genes on the network, let
be the vector of unobserved differential expression states for the p genes. We propose to model the dependency of
using a MRF with parameter
. Specifically, we assume
|
|
0 and
1 are arbitrary parameters and we require that ß > 0, which discourages neighboring genes to have different DE states. By considering any two realizations which differ only at gene i, it follows that the conditional probability of state k occurring for gene i, given the states of all other genes is
|
| (2) |
, however, is computationally intractable. In general, it is the constant of proportionality in |
| (3) |
In order to account for different numbers of neighbors for different genes on the network, we propose to modify the conditional probability (2) as
|
| (4) |
2.4 Parameter estimation using ICM and identification of subnetworks
When inferring the true differential expression state x* for the p genes, the parameter estimation must be carried out simultaneously. We propose the following algorithm based on the ICM algorithm of Besag (1986) to estimate the parameter
in the Gamma–Gamma model for gene expression data and the parameter
in the MRF model. The algorithm involves the following iterative steps:
- Obtain an initial estimate
of the true state x*, using a simple two sample t-test.
- Estimate
by the value
which maximizes the likelihood
[see Equation (1)].
- Estimate
by the value
which maximizes the conditional likelihood
[see Equation (3)] based on current
.
- Carry out a single cycle of ICM based on the current
and
, to obtain a new
. Specifically, for i = 1 to p, update xi which maximizes
subject to xi = 1 or xi = 0.
- Go to step 2 for a fixed number of cycles or until approximate convergence of
.
The converged
are then taken to be the estimate of the true DE states. These estimates can then be mapped back to the network to identify the subnetworks, which are defined as those connected genes that show DE between the two experimental conditions.
| 3 SIMULATION STUDIES |
|---|
|
|
|---|
We first present simulation results to demonstrate our proposed methods. To simulate the data, we first obtained the network structure of 33 human regulatory pathways from the KEGG database (December 2006 version). We are only interested in gene–gene regulatory relations and any non-gene–gene interactions, e.g. compound-gene relations, compound–compound relations, were excluded from our analysis. The remaining gene–gene regulatory data are represented as an undirected graph where each node is a gene and two nodes are connected by an edge if there is a regulatory relation between them. Loops (nodes connected to themselves) were eliminated. This results in a graph with 1668 nodes and 8011 edges.
To simulate X, the gene expression states, we initialized the genes in the K pathways to be DE and the rest of genes to be EE, which gives us the initial
. Then we performed sampling five times based on
, according to Equation (3), with
. We chose
, so that we have different percentages of genes in DE states. Next, given X, we simulated the gene expression level Y according to GG model [Equation (1)] for 1668 genes in two conditions, having three replicates in each condition. We took model parameters similar to those in Newton et al. (
). Simulations were repeated 100 times to assess the sensitivity, specificity, and false discovery rates (FDRs) of the proposed MRF GG model (MRFGG). We used the conditional probability (2) and the conditional likelihood (3) in our analysis. As a comparison, we also performed analysis on the simulated datasets using the standard two sample t-test which does not consider any prior information at all, and the empirical Bayesian GG model of Kendziorski et al. (2003).
The results over 100 replications are presented in Table 1, where the sensitivity is calculated as the average over the 100 replications of the fraction of DE genes correctly identified by the method; specificity is the average of the EE genes correctly identified; and the FDR is the average of the ratio of the number of false positives to the number of the genes identified as DE. For t-tests, a cut-value of 0.05 was used for declaring a gene to be the DE. We observed that overall specificity is high for all three procedures and the MRFGG model resulted in higher sensitivity than the GG model while the FDRs are similar. As expected, using a P-value of 0.05 can result in substantially higher FDRs. On the other hand, if the FDR controlling procedure of Benjamini and Hochberg (1995) was used, the two sample t-test resulted in very low sensitivity. The gain in sensitivity over the GG model is greater when there are more DE genes. These results demonstrated that by incorporating the network structure information, we can indeed gain sensitivity in identifying the DE genes.
|
| 4 APPLICATION TO REAL DATASETS |
|---|
|
|
|---|
We present results from application of the proposed methods to two breast cancer microarray gene expression studies in order to identify the subnetworks that are related to breast cancer metastasis or survival from breast cancer. We used the modified conditional probability (4) and the corresponding conditional likelihood function in the following analyses.
4.1 Application to the breast cancer gene expression dataset of Wang et al.
Wang et al. (2005) reported a large Affymetrix-based gene expression profiling for 286 patients with lymph-node-negative primary breast cancer. These patients were treated between 1980 and 1995 with age at surgery ranging 26–86 years and a median age at surgery of 52 years. No patient received any adjuvant therapy. During the follow-up period, 179 of these patients were relapse-free at 5 years, and 107 of them developed distant metastasis. Gene expression profiling using Affymetrix HG-133A was performed on all these patients, including 17 819 transcripts that were present in two or more samples. We merge the gene expression data with the 33 KEGG regulatory pathways and identified 1533 genes on the U133A array that can be found in the 1668-node KEGG network with 8011 edges. Our goal is to identify which genes and which subnetworks of the KEGG network of 33 pathways are related to breast cancer metastasis.
Two-sample t-tests identified only eight DE genes for FDR of 0.05 using the Benjamini and Hochberg's procedure. As a comparison, our proposed procedure identified 72 DE genes. The parameter estimates were
and ß = 1.24 in the MRF model. Figure 1 shows 17 of these genes that are mapped to the KEGG pathways, where the largest connected subnetwork includes six genes on the Cytokine–cytokine receptor interaction pathway. This subnetwork, centered around interleukin 8 receptor Beta (IL8RB), is down-regulated in cancer with relapse. In addition, the chemokine receptor (CCR6) and chemokine (C-C motif) ligand 20 (CCL20) are also down-regulated in cancers with relapse, indicating that a chemokine pathway is down-regulated in cancers with relapse. In addition, CCL20/CCR6 involvement in the neoplastic progression and metastatic spread was reported in several tumor types (Rubie et al., 2006), including breast cancer metastasis (Muller et al., 2001).
|
Another subnetwork includes four genes on the Wnt signaling pathway, including WNT4, WNT11 and secreted frizzled-related protein 1 (SFRP1), which were down-regulated in cancer with metastasis. SFRPs are secreted Wnt antagonists that directly interact with the Wnt ligand to inhibit signaling and members of the SFRP class bind directly to Wnts, thereby altering their ability to bind to the Wnt receptor complex. In particular, the SFRP1 gene is found at chromosome 8p21, a site of frequent loss of heterozygosity in human tumors and is down-regulated in cervical carcinoma, breast carcinoma and ovary and kidney carcinomas (Shulewitz et al., 2006). Wnt5b partially inhibits the canonical Wnt/beta-catenin signaling pathway. These findings agree with current knowledge of the involvement of the Wnt signaling pathway in breast cancer progression (Barker and Clevers, 2006).
We also found that the fibroblast growth factor 3 (FGF-3) and fibroblast growth factor 14 (FGF-14) are up-regulated in breast cancers with metastasis, while the fibroblast growth factor receptor-4 (FGFR4) is down-regulated. These three genes are connected on the MAPK signaling pathway. The four closely related human FGFRs and their more than 20 known ligands control a multitude of cellular processes, including cell growth, differentiation and migration, and it has been shown that the FGF/FGFR system plays a critical role in cancer development due to its angiogenic potential or direct enhancement of tumor growth (Burke et al., 1998).
Finally, recent study by Souaze et al. (2006) supports the contribution of neurotensin receptor (NTSR) in human breast cancer progression and pointed out the utility to develop therapeutic molecules targeting the neurotensin or NT1 receptor signaling cascade. We found that NTS and NTSR in the neuroactive ligand–receptor interaction pathway are linked and are both up-regulated in breast cancer with metastasis.
4.2 Application to the breast cancer gene expression dataset of Miller et al.
Miller et al. (2005) reported a gene expression profiling study of 251 primary breast cancer tissues resected in Uppsala County, Sweden from January 1, 1987 to December 31, 1989, using Affymetrix Chip HG-133A and HG-133B (GEO Accession No. GSE3494
[NCBI GEO]
). The authors identified an expression signature for p53 which can be used for predicting the mutation status, transcriptional effects, and patient survival. Among these patients, 236 of them had follow-up information in terms of time and event of disease-specific survival. Different from the previous dataset, these patients included both lymph-negative and lymph-positive patients.
In single gene analysis using t-tests, we obtained only four genes that are DE for FDR = 0.05 using the Benjamini and Hochberg's FDR procedure. The GG methods of Kendziorski et al. (2003) identified 82 genes and our MRFGG method identified 103 genes. The parameter estimates were
and ß = 0.55 in the MRF model. Figure 2 shows several of the connected subnetworks that were identified by the MRFGG method. Similar to the previous example, we found the genes in the WNT pathway (WNT4, SFRP4 and FZD7), genes related to the chemokine pathway (CCR7 and CCL19) and neurotensin and its receptor (NTS and NTSR2) are related to survival from breast cancer. We also found that Caveolin-1, Caveolin-2 (CAV1 and CAV2) are down-regulated in cancers in patients who died of breast cancer. A recent study by Sagara et al. (2004) indicated that a reduced CAV1 mRNA level was significantly associated with increasing tumor size and negative estrogen receptor status. There was also a significant association between low CAV2 mRNA level and negative progesterone receptor status. Sagara et al. (2004) further indicated that CAV1 suppression correlated closely with that of CAV2 in breast cancer, that CAV1 level was inversely correlated with tumor size and that CAV1 and CAV2 levels were correlated with hormonal receptor status. Therefore, CAV1 and CAV2 play an important role in tumor progression in breast cancer patients.
|
The largest subnetwork identified includes six genes on the GAP Junction pathway, of which five genes (CDC25B, CDC2, TUBA1, PRKACB and RAP1B) are up-regulated in cancer samples from individuals who died of cancer. Interestingly, the gap junction membrane channel protein alpha 1 (GJA1) is, however, down-regulated. GJA1 has been reported to suppress cell proliferation (Yu et al., 2006) and therefore its down-regulation can lead to more cancer cell proliferation and increase the chance of death from breast cancer. In addition, we also observed over-expression of the tight junction proteins claudin-3 (CLDN3) and claudin-8 (CLDN8). Claudin-3 and claudin-4 are frequently over-expressed in several neoplasias, including ovarian, breast, pancreatic and prostate cancers (Hewitt et al., 2006; Morin, 2005).
We also observed over-expression of genes related to Cyclin B1 (CCNB1) and B2 (CCNB2), together with over-expression of STRATIFIN (SFN) in cancer samples from individuals who died of cancer. These three genes are on the cell cycle pathway. Cyclins are a family of regulatory proteins that play a key role in controlling the cell cycle. Abnormalities of cell cycle regulators, including cyclins and cyclin-dependent kinases, have been reported in various malignant tumors. Zhao et al. (2006) observed significantly greater cyclin B1 expression in invasive cervical cancer than in normal cervical tissue and indicated that aberrant expression of cyclin B1 might play an important role in cervical carcinogenesis. In addition, CCNB1 expression was highly correlated with the labeling index for antigen identified by mAb ki067 (Ki067, associated with increased tumor cell proliferation), which suggests a key role for CCNB1 in the regulation of neuroendocrine tumor cell proliferation (Igarashi et al., 2004; Lahad et al., 2005).
Other genes related to breast cancer survival include genes involving ECM-receptor interaction and cell communication pathway (COL2A1, SDC2, TNN, LAMA2, LAMA3 and SV2B) and genes in the insulin signaling pathway (GYS2, PPP1CAACACB, SREBF1 and PFKP).
| 5 CONCLUSION AND DISCUSSION |
|---|
|
|
|---|
We have proposed a MFR-based procedure that uses information of interaction networks in identification of DE genes. The proposed method utilizes the structure information of the interaction networks in order to capture the dependency of differential expression patterns for genes on the network. By doing so, we can expect to obtain results that are not found by single–gene analysis. Simulation studies and application to several real microarray gene expression datasets demonstrated that our methods are more sensitive in identifying the DE genes than some of the commonly used methods while maintaining low FDR. Results from analysis of two breast cancer microarray gene expression datasets identified several sub-networks that are related to breast metastasis or death from breast cancer. Some of the subnetworks were reported in the literature.
We make several assumptions for the proposed methods. First, our proposed methods depend on the reliability of the structure of the interaction networks. We used KEGG interaction networks made of 33 regulatory pathways in our analysis of the breast cancer gene expression data. The edges of the networks include both protein–protein and DNA–protein interactions. Our methods treat all interactions equally, regardless of type and direction. However, if different types of interactions can be clearly defined, we can modify our method to allow for different dependency parameters for different interaction types. Important future research is to refine network structures and to extend our MFR models to more complex and refined network structures. Second, we used the Gamma–Gamma model of Kendziorski et al. (2003) and Newton et al. (2003) for modeling the gene expression data. Alternatively, one can assume a log-normal–normal model for the gene expression data. However, as shown in Kendziorski et al. (2003), the Gamma–Gamma model is quite robust to model misspecification. Third, in our model formulation, for each gene, we only consider its immediate neighbors on the network as its neighbors (i.e. first degree neighbors). However, if the differential expression patterns are dependent in neighbors centered at this gene with a radius r, we may want to include as its neighbors all the genes in this ball. This can potentially increase the sensitivity of identifying more DE genes.
In this article, we have focused on the problem of identifying the differentially expressed genes between two experimental conditions. The MRF methods can, however, be extended in several ways. First, it can be easily extended for identifying genes that show DEs among multiple groups using replicated gene expression profiles following the parametric empirical Bayes setup of Kendziorski et al. (2003). Second, the methods can also be extended to deal with other phenotypes such as continuous or censored survival phenotypes by considering whether a gene is related to the phenotype as a latent state and using the MFR for modeling such latent states. Third, the methods can also be extended to microarray time course gene expression data in order to identify subnetworks that change their expression states during a biological time course such as cancer initiation and progression. We are currently working on these extensions. Finally, important future research will include how to represent and assess the uncertainty of the inference of the true DE states x*.
In summary, we have proposed an MRF model for identifying DE genes between two experimental conditions in order to utilize the network structure information. As more and more networks become available, we expect more applications of such methods for identifying genes and pathways that are related to various phenotypes.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
This research was supported by NIH grant ES009911 and a grant from the Pennsylvania Department of Health. We thank Mr. Edmund Weisberg, MS at Penn CCEB for editorial assistance and two referees for helpful comments.
Conflict of interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Olga Troyanskaya
Received on February 1, 2007; revised on March 26, 2007; accepted on March 27, 2007
| REFERENCES |
|---|
|
|
|---|
Barker N, Clevers H. Mining the Wnt pathway for cancer therapeutics. Nat. Rev. Drug Discov, ( (2006) ) 5, : 997–1014.[CrossRef][ISI][Medline].
Benjamini Y, Horchberg Y. Controling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B, ( (1995) ) 57, : 289–300..
Besag J. On the statistical analysis of dirty pictures. J. R. Stat. Soc. B, ( (1986) ) 48, : 259–302..
Burke D, et al. Fibroblast growth factor receptors: lessons from the genes. Trends Biochem. Sci, ( (1998) ) 23, : 59–62.[CrossRef][ISI][Medline].
Deng MH, et al. Integrated probabilistic model for functional prediction of proteins. J. Comput. Biol, ( (2004) ) 11, : 463–476.[CrossRef][ISI][Medline].
Deng MH, et al. Prediction of protein function using protein-protein interaction data. ( (2002) ) The First IEEE Computer Society Bioinformatics Conference, CSB2002. 117–126..
Dudoit S, et al. Statistucal methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat. Sin, ( (2002) ) 12, : 111–139..
Efron B, et al. Empirical Bayes Analysis of Microarray Experiment. J. Am. Stat. Assoc, ( (2001) ) 96, : 1151–1160.[CrossRef][ISI].
Hewitt KJ, et al. The claudin gene family: expression in normal and neoplastic tissues. BMC Cancer, ( (2006) ) 6, : 186.[CrossRef][Medline].
Igarashi T, et al. Divergent cyclin B1 expression and Rb/p16/cyclin D1 pathway aberrations among pulmonary neuroendocrine tumors. Mod. Pathol, ( (2004) ) 17, : 1259–1267.[CrossRef][ISI].
Kanehisa M, Goto S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res, ( (2002) ) 28, : 27–30..
Kendziorski CM, et al. On paramertic empirical Bayes methods for comparing multiple groups using replicated gene expressionm profiles. Stat. Med, ( (2003) ) 22, : 3899–3914.[CrossRef][ISI][Medline].
Lahad JP, et al. Stem cell ness: a "magic marker" for cancer. J. Clin. Invest, ( (2005) ) 115, : 1463–1467.[CrossRef][ISI][Medline].
Letovsky S, Kasif S. Predicting protein function from protein/protein interaction data: a probabilistic approach. Bioinformatics, ( (2003) ) 19, (Suppl. 1): i197–i204.[Abstract].
Lönnstedt I, Speed T. Replicated microarray data. Stat. Sin, ( (2002) ) 12, : 31–46..
Miller LD, et al. An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proc. Natl Acad. Sci, ( (2005) ) 102, : 13550–13555.
Morin PJ. Claudin proteins in human cancer: promising new targets for diagnosis and therapy. Cancer Res, ( (2005) ) 65, : 9603–9606.
Muller A, et al. Involvement of chemokine receptors in breast cancer metastasis. Nature, ( (2001) ) 410, : 50–56.[CrossRef][Medline].
Nacu S, et al. Gene expression network analysis, and applications to immunity. In: Technical report., ( (2006) ) Department of Statistics, Stanford Univerisity..
Newton MA, et al. On differntial variability of expression ratios: improving statistical inference abou gene expression changes from micorarray data. J. Comput. Biol, ( (2001) ) 8, : 37–52.[CrossRef][ISI][Medline].
Rahnenführer J, et al. Calculating the statistical significance of changes in pathway activity from gene expression data. Stat. Appl. Genet. Mol. Biol, ( (2004) ) 3, . Article 16..
Rapaport F, et al. Classification of microarray data using gene networks. BMC Bioinformatics, ( (2007) ) Feb 1; 8, : 35.[CrossRef][Medline].
Rubie C, et al. Chemokine receptor CCR6 expression in colorectal liver metastasis. J. Clin. Oncol, ( (2006) ) 24, : 5173–5174.
Sagara Y, et al. Clinical significance of Caveolin-1, Caveolin-2 and HER2/neu mRNA expression in human breast cancer. Br. J. Cancer, ( (2004) ) 91, : 959–965.[ISI][Medline].
Segal E, et al. Discovering Molecular Pathways from Protein Interaction and Gene Expression Data. Bioinformatics, ( (2003) ) 19, (Suppl. 1): 264–272.[CrossRef].
Shulewitz M, et al. Repressor roles for TCF-4 and Sfrp1 in Wnt signaling in breast cancer. Oncogene, ( (2006) ) 25, : 4361–9.[CrossRef][ISI][Medline].
Sivachenko A, et al. Identifying Local Gene Expression Patterns in Biomolecular Networks. ( (2005) ) Proceedings of 2005 IEEE Computational Systems Bioinformatics Conference: Stanford, California..
Souaze F, et al. Expression of neurotensin and NT1 receptor in human breast cancer: a potential role in tumor progression. Cancer Res, ( (2006) ) 66, : 6243–9.
Subramanian A, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA, ( (2005) ) 102, : 15545–50.
Tian L, et al. Discovering statistically significant pathways in expression profiling studies. Proc. Natl Acad. Sci, ( (2005) ) 103, : 13544–13549..
Tusher V, et al. Significance analyusis of miocrarrays applied to ionizing radiation response. Proc. Natl Acad. Sci, ( (2001) ) 98, : 5116–5121.
Wang Y, et al. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet, ( (2005) ) 365, : 671–9.[ISI][Medline].
Yu K, et al. A modular analysis of breast cancer reveals a novel low-grade molecular signature in estrogen receptor positive tumors. Clin. Cancer Res, ( (2006) ) 12, : 3288–3296.
Zhao M, et al. Expression profiling of cyclin B1 and D1 in cervical carcinoma. Exp. Oncol, ( (2006) ) 28, : 44–48.[Medline].
This article has been cited by other articles:
![]() |
C. Li and H. Li Network-constrained regularization and variable selection for analysis of genomic data Bioinformatics, May 1, 2008; 24(9): 1175 - 1182. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Sanguinetti, J. Noirel, and P. C. Wright MMG: a probabilistic tool to identify submodules of metabolic pathways Bioinformatics, April 15, 2008; 24(8): 1078 - 1084. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Noirel, S. Y. Ow, G. Sanguinetti, A. Jaramillo, and P. C. Wright Automated extraction of meaningful pathways from quantitative proteomics data Brief Funct Genomic Proteomic, March 7, 2008; (2008) eln011v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Wei and W. Pan Incorporating gene networks into statistical tests for genomic data via a spatially correlated mixture model Bioinformatics, February 1, 2008; 24(3): 404 - 411. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





