Bioinformatics Advance Access originally published online on March 29, 2005
Bioinformatics 2005 21(11):2730-2738; doi:10.1093/bioinformatics/bti398
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Correlation between gene expression profiles and proteinprotein interactions within and across genomes
Bioinformatics Program, Department of Bioengineering, University of Illinois at Chicago Chicago, IL 60607, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Motivation: Function annotation of an unclassified protein on the basis of its interaction partners is well documented in the literature. Reliable predictions of interactions from other data sources such as gene expression measurements would provide a useful route to function annotation. We investigate the global relationship of proteinprotein interactions with gene expression. This relationship is studied in four evolutionarily diverse species, for which substantial information regarding their interactions and expression is available: human, mouse, yeast and Escherichia coli.
Results: In E.coli the expression of interacting pairs is highly correlated in comparison to random pairs, while in the other three species, the correlation of expression of interacting pairs is only slightly stronger than that of random pairs. To strengthen the correlation, we developed a protocol to integrate ortholog information into the interaction and expression datasets. In all four genomes, the likelihood of predicting protein interactions from highly correlated expression data is increased using our protocol. In yeast, for example, the likelihood of predicting a true interaction, when the correlation is >0.9, increases from 1.4 to 9.4. The improvement demonstrates that protein interactions are reflected in gene expression and the correlation between the two is strengthened by evolution information. The results establish that co-expression of interacting protein pairs is more conserved than that of random ones.
Availability: Complete lists of metagenes across the genomes, microarray and protein interaction dataset used in this study are available on our webpage: http://proteomics.bioengr.uic.edu/inter_expr/index.htm
Contact: huilu{at}uic.edu
| INTRODUCTION |
|---|
|
|
|---|
Protein function annotation is one of the most challenging problems in the post-genomic era. In this context, the search for decisive bioinformatics methods for assigning protein functions is of prime consequence. Various approaches are available for assigning putative functions to unannotated proteins including sequence similarity methods, clustering patterns of co-regulated genes (Zhang, 1999; Harrington et al., 2000), phylogenetic profiles (Pellegrini et al., 1999), proteinprotein interactions (Uetz et al., 2000) and protein complexes (Ito et al., 2001; Schwikowski et al., 2000). Many approaches combine and use more than one of the above properties.
An important part of protein function annotation is identifying and characterizing proteinprotein interactions. The function of an unclassified protein can be suggested by that of its interaction partners. This approach is becoming more reliable with the accumulation of interaction data and improvement in their quality. Vazquez et al. (2003) proposed assignment of functions to proteins on the basis of their network of physical interactions. Letovsky and Kasif (2003) developed a method based on a probabilistic analysis of graph neighborhoods in a proteinprotein interaction network to predict functions. Tornow and Mewes (2003) used protein interactions with gene expression to discover functional modules. The correlation between proteinprotein interactions and other types of data, including protein function and subcellular location, was reviewed recently (Chen and Xu, 2003).
In the current work, we investigate the global relationship of protein interactions with gene expression within and across evolutionary divergent organisms. Interacting proteins are more likely to be involved in similar biological functions and processes and thus they are more likely to be co-expressed. Earlier, Grigoriev (2001) analyzed physical interactions in yeast and observed that proteins encoded by co-expressed genes interact with each other more frequently than with random pairs. Ge et al. (2001) showed that interacting protein pairs are more likely to be in the same expression cluster than random pairs for yeast. On a genomic scale, Jansen et al. (2002) attempted to relate the absolute mRNA expression levels and the expression profiles in yeast to proteinprotein interactions. They found that apart from a few big known protein complexes, such as ribosome and proteasome that have clearly defined interactions between their subunits, the relationship between the two is weak.
A few hypotheses may explain the observation of a genome-scale weak correlation in yeast: first, gene expression is weakly correlated with protein interactions only in yeast, in which case the relationship should be studied in other species; second, the expression data is too noisy to capture the correlation, and thus efficient ways to reduce the noise are required. The third possibility, of course, is that the two might be very weakly correlated in all species and their relationship is no easier to detect than that among random pairs.
To comprehensively investigate the relationship on a genomic and evolutionary scale, we will study four genomes: Escherichia coli, Saccharomyces cerevisiae (yeast), Mus musculus (mouse) and Homo sapiens (human). The choice of these four species was based partly on the fact that the data for these species are largely available, which facilitates inference of statistically significant and sound conclusions about these species. More importantly, these species broadly cover the diverse life-forms: from a prokaryote of few thousand genes (E.coli) to the highly complex eukaryote (H.sapiens). We will examine in a range of genomes if the gene expression correlations between interacting pairs are significantly different from those between random pairs. A consensual conclusion will be valuable in the general context of phylogenesis.
Various components and features are preserved by natural selection to the extent of their contribution to conserved cellular functions. In this context, previous studies have claimed that the evolutionary rate of a protein correlates with the essential function of the protein and its level of interaction with other proteins (Hurst and Smith, 1999; Hirsh and Fraser, 2001; Hirsh and Fraser, 2003; Jordan et al., 2002; Fraser et al., 2004; Pal et al., 2003). It has been shown that including multiple species in the analysis of gene expression data reduces the noise and improves function prediction (Stuart et al., 2003). Evolutionary conservation combined with co-expression is a better criterion to identify the genes that are functionally important than either information in isolation (Stuart et al., 2003).
On the basis of the above observations, we argued that the co-expression of two proteins that interact is more conserved than random and that this conservation can be used to strengthen the correlation between interactome and transcriptome data. The goal here is to combine information from heterogeneous datasets and thereby decrease the noise level of genome-wide predictions. The stronger relationship may also help to integrate other types of functional genomics and proteomics data.
We first present the methodology used to investigate the relationship between proteinprotein interactions and gene expression. We then describe our approach of integrating conservation information to strengthen the correlation and present our results. Finally, we conclude with an interpretation of the results and discuss future works that will further complement and extend these findings.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Databases
Various sets of pre-processed gene expression data were obtained from Stanford Microarray Database (Golub et al., 2003; http://genome-www5.stanford.edu) for the four genomes (Table 1). These sets, though not very large in profile length, provide a good sampling of different cellular states and biological conditions: cell-cycle in different states, stress response to different perturbations, sporulation, signaling, stimulation, heat-shock response, UV radiation exposure and enzyme inhibition/promotion. The data was normalized against the different conditions used in these experiments by setting the mean of the profile in each experiment to 0 in the z-score fashion, as used in other published works (Cheadle et al., 2003).
|
Proteinprotein interaction data were obtained from the DIP database of protein interactions (Xenarios et al., 2002; http://dip.doe-mbi.ucla.edu). Although the quality of the results from the two-hybrid studies, which are one of the sources of the data, is debated (Ravasz et al., 2002), this manually created database represents the current best approximation for protein interactions and provides sufficient data for unambiguous statistical analysis. The database for yeast had approximately 11 000 interactions but the number of pairs for the other three species was not very large at the time of download (Table 1).
Orthologs across different species were obtained from the InParanoid Database of Pairwise Orthologs (Remm et al., 2001; http://inparanoid.cgb.ki.se/), which is publicly available. The calculations in this database are based on all entries in SwissProt/Trembl for a given species.
Quantifying the relationship between the expression and the interaction data
We used Pearson correlation (PC) coefficient as the measure of relationship between gene expression and protein interactions for individual species. For each species, we computed the PC coefficient between the expression profiles of the genes whose corresponding proteins are known to interact. The PC coefficient measures the relative shape of the relationship rather than absolute levels and it captures both positive and negative correlation. Any interactions involving a protein with itself (homodimers) were discarded because these would indicate a perfect correlation that skews the results for a trivial reason. To judge the statistical significance of the interaction data we compared its distributions to that of a control set. The control set was obtained by generating approximately the same number of random gene pairs as the actual interaction pairs (Table 1).
For a quantitative comparison of the interaction dataset with the random one, three key statistics of the datasets were used. For a quick estimation of the dissimilarity of the two datasets, we calculated the difference of their means and medians. In addition, we employed the non-parametric KolmogorovSmirnov (KS) test, which has been used for similar purposes in earlier studies (Fraser et al., 2004). The P-value reported by this test represents the probability that both the datasets were sampled from the same underlying distribution. Thus, a lower P-value indicates the greater dissimilarity of the two datasets. Especially, the P-value depends upon the sample size as well as the dissimilarity, so we can distinguish objectively when the datasets are similar. Besides these statistical features, an estimate of the likelihood of predicting a true interaction when the correlation between the corresponding genes is significantly strong would be of equal interest. At a coarse-grained level, this can be obtained by dividing the number of true interactions by random interactions having correlation values above a particular cut-off, say PC
0.7.
Meta-interactions across different species
We integrated conservation information with protein interactions and gene expression datasets using the following approach. One species (called the template species) is selected, say yeast, and for each gene from this species, all the orthologs from the other three species were listed. This single set of genes across the multiple species was called a metagene (Stuart et al., 2003) and had genes from at least two species (template species and at least one of the other three species) (Fig. 1A). Genes from the template species that could not find orthologs from any of the other species were removed. In some cases, more than one ortholog appeared from a single species because of one-to-many and/or many-to-many orthology. In a lot of cases, the metagene did not have orthologous genes from all four species because not every gene in the template species may have its ortholog in all the other three species. We define a meta-interaction as the interaction between two metagenes whose component proteins from the template species are known to interact physically. The two metagenes are then said to meta-interact. The procedure was performed for all four species one by one obtaining four such lists.
|
We computed the PC coefficient between the expression profiles of the metagenes that meta-interact. The correlation between two metagenes was obtained by averaging the total correlation from all the four species (Fig. 1B). In case a set of multiple genes appeared from a species, correlation coefficients were calculated for all possible pairs from this set. One single correlation coefficient for this species was obtained by dividing the sum of these correlation coefficients by the total number of pairs (as in Fig. 1B for human). Besides reflecting the relationship (positive or negative) between gene expression profiles and proteinprotein interactions, PC coefficients serve another purpose. If the co-expression of genes encoding for interacting proteins is conserved across species, there would be a high correlation between the metagenes. The extent of this conservation of co-expression of interaction pairs across the species would be reflected in the value of the PC coefficient. It should be noted that in order to calculate the correlation coefficient between the metagenes, the expression profiles for the two component genes from a species must be from the same experiment and this in turn poses a constraint. For mouse, especially, whose available interaction data is already not very large, this constraint further reduces it.
To judge the statistical significance of the distribution of the correlation values for the metagenes that meta-interact, we compare it to the distribution for the control networks using the same statistical measures as mentioned above. We built control networks by generating random gene pairs for the template species. We then picked up their orthologous counterparts in the other three species from the orthologs list and carried out similar computations for this random data. Although we pick up the genuine orthologs, this dataset still remains random because the pairs of genes in the template species are random.
| RESULTS |
|---|
|
|
|---|
Relationship between the expression profiles and the proteinprotein interactions in individual species
We calculated the correlation coefficient between the expression profiles of the genes encoding for proteins that are known to interact. For comparisons, we used the kernel density function, the probability density estimation f() of a random variable X. Kernel density function
of the density value f(x) at point x is defined as
![]() |
1). The kernel density function then becomes
![]() |
|
|
For E.coli, there is a significant correlation between expression profiles and protein interactions while for the other species, this correlation is only weakly defined. Figure 2 shows that for E.coli, there is a marked difference in the distribution of correlation values for the interaction and the random pairs towards the right half of the plot, especially when r
0.6. Statistically, the strong correlation is also reflected in a very large difference in the means and the medians of two datasets and by a low P-value from the KS-test (Table 2). A large difference in the distribution of these two types of data for E.coli might be partly attributed to the small size of its genome. The results above imply that in E.coli, the proteinprotein interactions are strongly mirrored in their gene expression profiles.
|
The protein interaction data and the gene expression data do not reinforce each other strongly for the other three species. There is no significant difference in the distributions of the correlation values of the random and the interaction data for these species as suggested by their plots and key statistics (Fig. 2, Table 2). In yeast, particularly, there is little distinction in the two distributions and the difference in their means and medians is not significant. However, a low P-value for yeast still suggests, that the distribution of interacting pairs is different from that of random pairs. For human and mouse, although there is some difference in the two distributions towards the right extreme, the difference in the means and medians of the interaction and random data is not statistically significant. For mouse especially, the P-value is not small enough to reject the null hypothesis that both the datasets originate from the same parent distribution.
The results above suggest that while for other species, protein interaction data and expression data are not significantly correlated, for E.coli due to a strong correlation between the two datasets, reliable predictions of interactions can be made from expression profiles. A quantitative idea of this confidence can be derived from the likelihood values (the last column of Table 1), which are higher for E.coli. For other species, a stronger correlation between protein interactions and gene expression needs to be identified in order to increase the confidence of prediction.
Relationship across the species
A previous study shows that the connectivity of well-conserved proteins in the interaction network is negatively correlated with their rate of evolution implying that proteins with more interactions evolve more slowly and selectively than others (Fraser et al., 2002). Another study that dealt with the interaction network of yeast classified the proteins into isotemporal categories and showed that two proteins tend to interact with each other if they are in the same or similar categories, but tend to avoid each other otherwise (Qin et al., 2003). These observations lay the groundwork for our hypothesis that co-expression of interacting proteins is more conserved than that of random pairs. Thus, we argued that including evolutionary considerations should strengthen the correlation between the expression and the interaction data.
First, using the current knowledge of the protein interaction, we determined the percent of intact interactions over the period of evolution. Here a protein interaction remaining intact means that if two proteins interact in one species, their orthologous counterparts in the other species are also known to interact. We find that, as of the information available so far, <10% of the interactions in yeast are intact in human and mouse, whereas this value for E.coli is higher,
40%. This might be explained by the smaller genetic evolutionary distance between yeast and E.coli than between yeast and human, or yeast and mouse. However these numbers are not necessarily representative because the current information available about protein interactions is far from complete and very sparse, especially for mouse. Thus we decide to use a less rigorous definition for meta-interaction: when two proteins interact in the template species, their orthologs are assumed to interact in the other species.
We, then, calculate the PC coefficient between the metagenes whose proteins from the template species are known to interact and compare the distribution of their correlation values with those in the control sets (see Methods section). We find that including other species can capture the correlation between protein interactions and corresponding gene expressions when it is not otherwise strongly defined (Fig. 4).
|
The plots of the distribution of the correlation values for the random and the metagene interaction data (Fig. 4) depict that in case of yeast (apart from E.coli), there is now a consistently larger number of interacting pairs than random pairs towards the positive extreme of the correlation range (>0.3). It can be seen that when this species is chosen as the template, the distribution of the correlation values for the meta-interaction data shows more deviation from randomness than when it is considered individually. This improvement is also clearly reflected quantitatively in the key statistics (Table 2). For yeast there is a sharp increase not only in the difference between the means and the medians, but also in the likelihood of correct prediction when PC
0.7. In case of mouse, although there is not a large increase in the difference between the means and the medians, the P-value is now small enough (<0.05) to safely reject the null hypothesis of the two datasets following the same distribution. An increase in the likelihood of a true prediction is also experienced for mouse. The observations above imply that the signal-to-noise ratio increases for these species when multiple species are included in the analysis. For human, although a significant deviation of the interaction data from randomness is not observed (also suggested by only a small increase in the difference of the means and the medians and in the likelihood), the P-value decreases further. For E.coli, there is only a slight improvement in terms of all key statistics and the likelihood. To further obtain a quantitative measurement of the improvement in the prediction power in the wake of the inclusion of multiple species, we plot the ratio of the distribution of interacting pairs and random pairs at each abscissa for single species and the same ratio for the meta-interaction data when each species is taken as the template (Fig. 5). The ratio can be interpreted as the likelihood of predicting a true interaction in the corresponding cases. So, the vertical difference between the two plots at any abscissa directly gives the improvement in this likelihood.
|
Yeast, among all the four species, witnesses the highest improvement as shown by the striking difference between the two ratios after the first quarter of the positive half. The ratio for the multiple species becomes sharply skewed towards high values along the abscissa. The ratio increases to 9.4 when multiple species are included as compared to only 1.4 in single species towards the right extreme of the range. The improvement is also evident in case of mouse; the likelihood of a correct prediction increases from 4.2 to 12.7. The reason for improvement in true prediction likelihood in these two species is that in the transition from single species to multiple species, random distribution becomes narrower as compared to the interaction distribution. This is not surprising as it is less probable for a random pair to be co-expressed in all the four species than in just one species.
It can be observed that improvement in the case of human and E.coli is minor. In E.coli the ratio of interaction and random pairs is already significantly high when we consider it as a single species thus mitigating the further improvement due to addition of other species.
Finally, we analyzed highly correlated random cases in yeast, for the prediction of proteinprotein interactions based on both the information from individual species and, after including conservation, information from multiple species. Of the total random sets of protein pairs, the ones having the average expression correlation >0.9 were examined for their subcellular localizations and functions. We found that the number of protein pairs having the same function and localizations increased as we included the conservation information from the other species (Table 3). This is a straightforward prediction that uses co-expression, protein functional role and its cellular localization to predict proteinprotein interactions.
|
The results above establish that integration of interactome and transcriptome data in conjunction with conservation information can capture an otherwise weakly defined correlation. Mathematically, for a meta-interaction, there is a higher correlation between component genes from all the four species resulting in a higher average correlation than when the species are considered individually. A higher average correlation can be observed only in the event of co-expression of interaction pairs across all genomes. So, our results establish that co-expression of interacting proteins is more conserved than random. A stronger correlation increases the confidence in the prediction of proteinprotein interactions from the microarray data.
| DISCUSSION AND CONCLUSION |
|---|
|
|
|---|
We have investigated the relationship between proteinprotein interactions and gene expression profiles for both individual species and across evolutionary distant species. For individual genomes, we have found that in E.coli there is a strong correlation between the expression profiles for interacting pairs when compared with random pairs, while in other species the correlation is only slightly more significant than random. This implies that proteinprotein interactions can be predicted with much more confidence in E.coli than in the other three species.
It may be argued that this kind of relationship is true for the species with small genome size. So, to test whether this behavior is representative of prokaryotes in general or is observed only in E.coli, we studied this relationship in another prokaryote, Helicobacter pylori, using the same approaches as used for E.coli. Interaction and expression data for this organism were obtained from the same sources as for the other species. The interaction database for this species was not very large (<200) at the time of download. Expression data collected for this species extended over a range of biological conditions and included 32 microarray data points. We plotted the distribution of the correlation values for the interaction and the random data (Fig. 6). Although we observe that there is only a slight difference between the distributions for interacting and random pairs (PC
0.6, Fig. 6), the sparseness of the interaction data prevented any statistically significant conclusion. More experimental data in H.pylori or any other prokaryote is required before any sound hypothesis can be formulated or validated about the relationship of protein interactions with gene expression for prokaryotes in general.
|
Previous studies have discussed the general relationship between the protein interactions and genetic expression. Our study shows that the strength of this relationship is species-dependent. For some species there is a very strong signal coming from the interaction data to the expression data whereas for others, this relationship is only weakly defined. To some extent, this is not surprising because of the different complexities and levels of evolution of these species.
We have further shown that inclusion of multiple species in the analysis strengthens the correlation between proteinprotein interactions and gene expression towards developing a protocol to increase the signal-to-noise ratio. These results establish that co-expression of interacting proteins is more conserved than that of random pairs on the genome-wide scale of such different organisms. This work further lays the basis for predicting proteinprotein interactions using not only their co-expression properties but also that of their orthologous counterparts. For example, in the case of yeast the odds of predicting a true protein interaction, when the correlation of the two proteins is >0.9, are just 1.4:1 when yeast alone is considered; the same odds increase to 9.1:1 when information from all four species is included. These results demonstrate that integration of interactome data with genomic data in conjunction with conservation information can transform proteinprotein interaction from noisy information into a purposive source of knowledge.
Although it has been shown that the distribution of the correlation values of interacting proteins improves significantly when diverse species are included in the analysis, there are a number of interaction pairs whose correlation values are not significant. One of the reasons for this is partial co-expression: two proteins that interact in one physiological condition need not interact in all other conditions. This decimates the value of the correlation when the expression profiles extend over different conditions. Another reason could be that of time variance: if correlation exists but with a time lag, it cannot be captured by the PC coefficient. In the case of a time-delay, a cross-correlation function could be used which breaks the expression profiles of two genes into two optimal parts to look for a possible correlation between the two profiles (Kato et al., 2001).
It seems reasonable that interacting proteins should be simultaneously represented in a cell. However, the relationship between proteinprotein interactions and gene expression can be complicated. The gene expression does not always represent its true protein abundance. Besides, proteinprotein interactions are in a complex and dynamic behavior. For this study in particular, a possible limitation was the sparseness of the interaction data for some of the species, which was further reduced by the constraint of the presence of the interacting pair in the same microarray experiment. Also, the interaction data for this study was obtained from yeast two-hybrid data that is debated to have some false positives and false negatives. Nevertheless, this cross-correlation study reveals the generic trend inside the data. Therefore, it is important and useful to integrate the two datasets to formulate more meaningful hypotheses. The availability of more high quality interaction data and microarray data will facilitate further studies.
The preference of a protein to interact with another protein in the same isotemporal category and selective evolution of proteins with more interactions form the basis of our hypothesis and consequently our observation that inclusion of multiple species improves the signal-to-noise ratio (Fraser et al., 2002; Qin et al., 2003). Using known conserved orthologous proteins across species in addition to gene co-expression within each species, we have developed a method to globally establish proteinprotein interaction prediction whose confidence is greater than that of using co-expression within individual species.
The main objective of this work is similar in spirit to previous efforts correlating protein interaction data with expression data. However, here we go beyond the concept of co-expression alone and use conservation as the additional tool of cross-correlation of two heterogeneous data. In other words, to determine whether any given proteins interact, we would not only want to look at their co-expression but also would want their orthologs (if present) in other species (all or some) to interact. However, as these constraints might be too imposing sometimes, we suggest that further studies of proteinprotein interaction prediction will be more comprehensive if they are supplemented with other information such as sequence information, phylogenetic trees (Sato et al., 2003; Gertz et al., 2003) or even the protein structures if known (Aloy and Russell, 2002) or predicted (Lu et al., 2002; Lu et al., 2003). Integration of these data will provide a framework to help formulate more meaningful biological hypotheses than the hypotheses generated from either approaches individually.
| Acknowledgments |
|---|
The current work was partially supported by the startup fund to H.L. from the UIC Bioengineering Department. The authors wish to thank Marianela Nelson for critical reading of the manuscript and also the reviewers for valuable suggestions on data-representation.
Received on August 11, 2004; revised on March 18, 2005; accepted on March 18, 2005
| REFERENCES |
|---|
|
|
|---|
Aloy, P. and Russell, R.B. (2002) Interrogating protein interaction networks through structural biology. Proc. Natl Acad. Sci. USA, 99, 58965901
Cheadle, C., et al. (2003) Application of z-score transformation to Affymetrix data. Appl. Bioinform, 2, 209217.
Chen, Y. and Xu, D. (2003) Computational analyses of high-throughput proteinprotein interaction data. Curr. Protein Peptide Sci, 4, 159181[CrossRef][Web of Science][Medline].
Fraser, H.B., et al. (2002) Evolutionary rate in the protein interaction network. Science, 296, 751753[CrossRef].
Fraser, H.B., et al. (2004) Coevolution of gene expression among interacting proteins. Proc. Natl Acad. Sci. USA, 101, 90339038
Ge, H., et al. (2001) Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat. Genet, 29, 482486[CrossRef][Web of Science][Medline].
Gertz, J., et al. (2003) Inferring protein interactions from phylogenetic distance matrices. Bioinformatics, 19, 20392045
Golub, J., et al. (2003) The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Res, 31, 9496
Grigoriev, A. (2001) A relationship between gene expression and protein interactions on the proteome scale: analysis of the bacteriophage T7 and the yeast Saccharomyces cerevisiae. Nucleic Acids Res., 29, 35133519
Harrington, H.C., et al. (2000) Monitoring gene expression using DNA microarrays. Curr. Opin. Microbiol., 3, 285291[CrossRef][Web of Science][Medline].
Hirsh, A.E. and Fraser, H.B. (2001) Protein dispensability and rate of evolution. Nature, 411, 10461049[CrossRef][Medline].
Hirsh, A.E. and Fraser, H.B. (2003) Genomic function (communication arising): rate of evolution and gene dispensability. Nature, 421, 497498[CrossRef].
Hurst, L.D. and Smith, N.G. (1999) Do essential genes evolve slowly? Curr. Biol., 9, 747750[CrossRef][Web of Science][Medline].
Ito, T., et al. (2001) A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl Acad. Sci. USA, 98, 45694574
Jansen, R., et al. (2002) Relating whole-genome expression data with proteinprotein interactions. Genome Res., 12, 3746
Jordan, I.K., et al. (2002) Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res., 12, 962968
Kato, M., et al. (2001) Lag analysis of genetic networks in the cell cycle of budding yeast. Genome Inform., 12, 266267.
Letovsky, S. and Kasif, S. (2003) Predicting protein function from proteinprotein interaction data: a probabilistic approach. Bioinformatics, 19, 197204.
Lu, L., et al. (2002) MULTIPROSPECTOR: an algorithm for the prediction of proteinprotein interaction by multimeric threading. Proteins, 49, 350364[CrossRef][Web of Science][Medline].
Lu, H., et al. (2003) Development of unified statistical potentials describing proteinprotein interactions. Biophys. J., 84, 18951901[Web of Science][Medline].
Pal, C., et al. (2003) Genomic function (communication arising): rate of evolution and gene dispensability. Nature, 421, 496497[Medline].
Pellegrini, M., et al. (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl Acad. Sci. USA, 96, 42854288
Qin, H., et al. (2003) Evolution of the yeast protein interaction network. Proc. Natl Acad. Sci. USA, 22, 1282012824.
Ravasz, E., et al. (2002) Hierarchical organization of modularity in metabolic networks. Science,, 297, 15511555
Remm, M., et al. (2001) Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol., 314, 10411052[CrossRef][Web of Science][Medline].
Sato, T., et al. (2003) Prediction of proteinprotein interactions from phylogenetic trees using partial correlation coefficient. Genome Inform., 14, 496497.
Schwikowski, B., et al. (2000) A network of proteinprotein interactions in yeast. Nat. Biotechnol., 18, 12571261[CrossRef][Web of Science][Medline].
Stuart, J.M., et al. (2003) A gene-coexpression network for global discovery of conserved genetic modules. Science,, 302, 249255
Tornow, S. and Mewes, H.W. (2003) Functional modules by relating protein interaction networks and gene expression. Nucleic Acids Res., 31, 62836289
Uetz, P., et al. (2000) A comprehensive analysis of proteinprotein interactions in Saccharomyces cerevisiae. Nature, 403, 623627[CrossRef][Medline].
Vazquez, A., et al. (2003) Global protein function prediction from proteinprotein interaction networks. Nat. Biotechnol., 21, 697700[CrossRef][Web of Science][Medline].
Xenarios, I., et al. (2002) DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res., 30, 303305
Zhang, M.Q. (1999) Promoter analysis of co-regulated genes in the yeast genome. Comput. Chem., 23, 233250[CrossRef][Web of Science][Medline].
This article has been cited by other articles:
![]() |
A. Lysenko, M. M. Hindle, J. Taubert, M. Saqi, and C. J. Rawlings Data integration for plant genomics--exemplars from the integration of Arabidopsis thaliana databases Brief Bioinform, November 1, 2009; 10(6): 676 - 693. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Lin, B. Hu, L. Chen, P. Sun, Y. Fan, P. Wu, and X. Chen Computational Identification of Potential Molecular Interactions in Arabidopsis Plant Physiology, September 1, 2009; 151(1): 34 - 46. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Obayashi, S. Hayashi, M. Saeki, H. Ohta, and K. Kinoshita ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic Acids Res., January 1, 2009; 37(suppl_1): D987 - D991. [Abstract] [Full Text] [PDF] |
||||
![]() |
T.-t. Soong, K. O. Wrzeszczynski, and B. Rost Physical protein-protein interactions predicted from microarrays Bioinformatics, November 15, 2008; 24(22): 2608 - 2614. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.-Y. Yang, C.-H. Chang, Y.-L. Yu, T.-C. E. Lin, S.-A. Lee, C.-C. Yen, J.-M. Yang, J.-M. Lai, Y.-R. Hong, T.-L. Tseng, et al. PhosphoPOINT: a comprehensive human kinase interactome and phospho-protein database Bioinformatics, August 15, 2008; 24(16): i14 - i20. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. A. Small Retromer Sorting: A Pathogenic Pathway in Late-Onset Alzheimer Disease Arch Neurol, March 1, 2008; 65(3): 323 - 328. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Lercher and C. Pal Integration of Horizontally Transferred Genes into Regulatory Interaction Networks Takes Many Million Years Mol. Biol. Evol., March 1, 2008; 25(3): 559 - 567. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Obayashi, S. Hayashi, M. Shibaoka, M. Saeki, H. Ohta, and K. Kinoshita COXPRESdb: a database of coexpressed gene networks in mammals Nucleic Acids Res., January 11, 2008; 36(suppl_1): D77 - D82. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Langston, A. D. Perkins, A. M. Saxton, J. A. Scharff, and B. H. Voy Innovative Computational Methods for Transcriptomic Data Analysis: A Case Study in the Use of FPT for Practical Algorithm Design and Implementation The Computer Journal, January 1, 2008; 51(1): 26 - 38. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Gao and X. Wang TAPPA: topological analysis of pathway phenotype association Bioinformatics, November 15, 2007; 23(22): 3100 - 3102. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Geisler-Lee, N. O'Toole, R. Ammar, N. J. Provart, A. H. Millar, and M. Geisler A Predicted Interactome for Arabidopsis Plant Physiology, October 1, 2007; 145(2): 317 - 329. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||













