Bioinformatics Advance Access originally published online on January 18, 2005
Bioinformatics 2005 21(9):2076-2082; doi:10.1093/bioinformatics/bti273
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Online Predicted Human Interaction Database
1Division of Cancer Informatics, Ontario Cancer Institute, University of Toronto Toronto, Ontario, Canada
2Department of Medical Biophysics, University of Toronto Toronto, Ontario, Canada
3Department of Computer Science, University of Toronto Toronto, Ontario, Canada
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Motivation: High-throughput experiments are being performed at an ever-increasing rate to systematically elucidate proteinprotein interaction (PPI) networks for model organisms, while the complexities of higher eukaryotes have prevented these experiments for humans.
Results: The Online Predicted Human Interaction Database (OPHID) is a web-based database of predicted interactions between human proteins. It combines the literature-derived human PPI from BIND, HPRD and MINT, with predictions made from Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster and Mus musculus. The 23 889 predicted interactions currently listed in OPHID are evaluated using protein domains, gene co-expression and Gene Ontology terms. OPHID can be queried using single or multiple IDs and results can be visualized using our custom graph visualization program.
Availability: Freely available to academic users at http://ophid.utoronto.ca, both in tab-delimited and PSI-MI formats. Commercial users, please contact I.J.
Contact: juris{at}ai.utoronto.ca
Supplementary information: http://ophid.utoronto.ca/supplInfo.pdf
| INTRODUCTION |
|---|
|
|
|---|
The network of proteinprotein interactions (PPIs), referred to as the interactome, forms a backbone of signaling pathways, metabolic pathways and cellular processes required for normal cell function. Complete knowledge of these pathways will help in the understanding of the normal processes in the cell, as well as how diseases such as cancer develop from mutation of individual pathway components. It has been the central aim of many high-throughput (HTP) experiments to elucidate the PPI networks in model organisms such as Saccharomyces cerevisiae (Gavin et al., 2002; Ho et al., 2002; Ito et al., 2001; Uetz et al., 2000), Caenorhabditis elegans (Li et al., 2004), Drosophilamelanogaster (Giot et al., 2003) and Mus musculus (Suzuki et al., 2003). While few studies have been performed in humans (Colland et al., 2004; Lehner et al., 2004), we have used the HTP model organism interactions to infer some of the millions of potential human PPIs.
Many databases are devoted to the human interactome, with a substantial number of them appearing in recent months [DIP, HPID, HPRD, MINT, PINdb (Han et al., 2004; Luc and Tempst, 2004; Peri et al., 2003; Xenarios et al., 2000; Zanzoni et al., 2002)]. However, the majority of these databases are derived from hand-curated, literature-based interactions. Although highly useful in providing ready access to the known human interactions, they do little to expand the knowledge of the interactome. Several databases have also been published that make predictions about the functional relationships between proteins based on a variety of in silico methods (Predictome, STRING, Prolinks, POINT) (Bowers et al., 2004; Huang et al., 2004; Mellor et al., 2002; von Mering et al., 2003).
The Online Predicted Human Interaction Database (OPHID) was designed to extend the human interactome using model organism data and to provide a repository for already known, experimentally derived human PPIs. While these predictions should be thought of as hypotheses until experimentally validated, there is increasing evidence that PPIs are conserved through evolution (Pagel et al., 2004; Wuchty et al., 2003). OPHID catalogs 16 034 known human PPIs obtained from BIND, MINT and HPRD, and makes predictions for 23 889 additional interactions.
Multiple types of evidence have been used in the literature both to support experimentally derived PPIs and to predict interactions in silico. Examples include domaindomain co-occurrence (Deng et al., 2002; Sprinzak and Margalit, 2001), gene co-expression (Bader et al., 2004; Deane et al., 2002; Deng et al., 2003) and Gene Ontology (GO) terms (Bader et al., 2004; Sprinzak et al., 2003). Using the combination of the three types of evidence allows us to support a broader range of PPIs than any single method.
We have applied all three evidence types to OPHID, providing support for 5483 (23%) of our predicted PPIs. We believe that OPHID will be a useful resource for researchers concerned with the human interactome, especially when integrated with additional HTP datasets that are likely to be available in the future.
| SYSTEM AND METHODS |
|---|
|
|
|---|
OPHID generation
OPHID was constructed by mapping model organism PPIs to human protein orthologs using BLASTP and the reciprocal best-hit approach. Briefly, a database of model organism-to-human orthologs was constructed by BLASTing each model organism protein against the Swiss-Prot database filtered for human proteins. Each top BLAST hit with an E-value <105 was BLASTed back against the set of all model organism protein sequences. If the top hit in the reverse direction (with E-value <105) matched the original query protein, the matching human protein was selected as a potential ortholog. These were filtered to remove any hits that occurred over <50% of the query sequence length, to avoid interactions that may involve a single protein domain.
Each model organism protein was translated to its human ortholog and a predicted human interaction was added if both proteins in the model organism interaction were conserved in humans. Model organism PPIs were added from S.cerevisiae, C.elegans, D.melanogaster and M.musculus using this technique. For a complete listing of data sources and references, refer to Table 1.
|
Domain co-occurrence dataset generation
The literature-derived PPIs from BIND, DIP1 HPRD and MINT were used to create a domaindomain co-occurrence network using the InterPro domains obtained from Swiss-Prot. For every interacting protein pair, each domain from protein A was connected to the domains in protein B. The frequency of these domain pairs was determined for all interacting protein pairs (n = 16 107), as well as all non-interacting pairs (i.e. all proteins not reported to interact in BIND, DIP, MINT or HPRD; n = 1.8 x 107). The hypergeometric distribution was used to determine which domain pairs were enriched in interacting protein pairs compared to the non-interacting pairs. After applying the Bonferroni correction to account for repeated sampling, 4182 domaindomain pairs were identified with P < 9.2 x 107 between 1164 domains.
Co-expression dataset
Human gene expression data was obtained from the GeneAtlas Affymetrix dataset, which includes expression data for 44 775 human genes from 79 normal human tissues (Su et al., 2004). Gene co-expression was determined using the Pearson correlation coefficient between gene vectors for each protein in the interaction.
GO term similarity measure
We used a modification of the semantic similarity measure (Lord et al., 2003) to determine the relatedness of each interacting protein pair. The semantic similarity method examines the frequency with which each GO term appears in Swiss-Prot for human proteins and assigns a higher score to terms that appear less frequently (i.e. have greater information content). For example, non-specific terms such as the top-level molecular_function (GO:0003674) provide little information about the relatedness of two proteins, reflected in the P-value = 1.0. In contrast, more descriptive terms such as translation regulator activity (P = 0.0048) or chaperone activity (P = 0.0052) have greater information content, as they are used less frequently to describe human proteins and are potentially more specific for function. The GO similarity was determined by calculating the maximum semantic similarity from the set of all GO term pairs between interacting proteins. See Supplementary information for a complete example.
Background distributions
Statistically significant cutoffs for domain co-occurrence, gene co-expression and GO term similarity, were determined by estimating the background distributions using a bootstrap approach. Briefly, all OPHID PPIs (known and predicted) were randomized 1000 times to produce equivalent-sized random networks. The mean of the 95th percentiles was chosen as a cutoff. The thresholds for each metric are: domain co-occurrence (one significant domain pair); gene co-expression (Pearson = 0.607; see Supplementary information); GO similarity (GOSim = 3.14).
| IMPLEMENTATION |
|---|
|
|
|---|
Databases and software
Known (literature-derived, LIT) human PPIs were acquired from BIND, DIP, HPRD and MINT (see Supplementary information). The data and sequences from Swiss-Prot (v. 45.0) were loaded into our IBM DB2 database (v. 8.1.1.16 [EC] ). Protein sequences for each organism were obtained from the following sources: S.cerevisiae, Yeast Protein Databank (YPD); C.elegans, WormPep; D.melanogaster, FlyBase; M.musculus, Swiss-Prot (see Supplementary information for full versions). A local NCBI BLAST server (v. 2.2.4) was run through IBM's Information Integrator (v. 8.1.1) using the default BLAST settings. GO terms and InterPro domains were gathered from Swiss-Prot. The OPHID web interface and query engine was implemented on an IBM WebSphere web server (v. 5.0.0). All additional processing software was written in Java.
| RESULTS |
|---|
|
|
|---|
Protein interaction network
OPHID was generated from a total of 108 867 model organism PPIs mapped to human proteins through orthology. Orthologs were identified using the reciprocal best-hit approach (see Systems and Methods section). In total, 31.9% of the S.cerevisiae proteins had orthologs in humans, while 39.7 and 21.2% of the D.melanogaster and C.elegans proteins had orthologs, respectively. Through this orthology database, 23 889 model organism PPIs were mapped to human proteins, providing predictions for interactions that may occur in the human interactome, including 929 that are confirmed human interactions. Seventy two of the predicted interactions were from more than one model organism.
The predicted PPI dataset from OPHID (referred to as the OPHID set hereafter) contains 4552 proteins, 1872 of which are not in the LIT set (6144 proteins). Thus, OPHID extends the human interactome by hundreds of proteins that have not yet been included in the literature-derived databases.
Importantly, there is a large difference in the types of proteins that are being covered in the two datasets. Figure 1 shows the distributions of the functional categories represented in the LIT dataset, compared to the interactions in OPHID. The proteins involved in the LIT dataset are primarily involved in cellular fate and organization pathways (29.3%), such as apoptosis, cell cycle regulation and cytoskeletal remodeling, followed by transcription (9.8%) and transport and sensing (9.0%). Only 19.9% of the proteins in this set are Uncharacterized, meaning that they lack GO terms in the Swiss-Prot database. In contrast, 29.1% of the proteins involved in OPHID are uncharacterized. OPHID is enriched for proteins involved in energy production (2.3% versus 0.9%) and other metabolism (6.0% versus 2.8%) compared to the LIT set, while the LIT set is enriched for proteins involved in stress and defense. This data suggests that the combination of the known and predicted interactions complement each other in many GO categories. In addition, the linking of the uncharacterized proteins, which make up
30% of OPHID, to known interactions will help provide functional information for these unannotated proteins.
|
The use of HTP experiments from model organisms has the potential to include false positive interactions. For example, Sprinzak et al. (2003) suggested that only 50% of yeast Y2H interactions are reliable. Producing a predicted PPI network may compound this problem by including those false positives, as well as potentially creating new ones through the ortholog mapping. In order to help filter out noisy interactions, we chose to look for additional supporting evidence in the form of protein domains, gene co-expression and GO terms (see Systems and Methods section). In essence, this additional evidence provides in silico validation of the OPHID interactions and will help rank the predicted interactions for future experimental confirmation.
Support through domain co-occurrence
The presence of domain pairs has been used extensively to predict de novo protein interactions (Deng et al., 2002; Wojcik and Schachter, 2001), as well as for the validation of reported interactions (Sprinzak and Margalit, 2001). Here, we have used more than 16 000 human PPIs from the LIT dataset to produce a domain co-occurrence network and selected those domain pairs that are significantly enriched in the interacting proteins compared to the non-interacting pairs (Systems and Methods section). While 93.0% of the LIT PPIs have at least one domain for each of the proteins in the pair, 44.3% of those have
2 statistically significant domain pairs (Fig. 2). This is in contrast to the OPHID dataset, where 92.1% of the PPIs have domain information, with 5.6% of these containing significant domains.
|
This difference in domain support is likely due to two factors: (1) The domain network was derived from the LIT dataset, which should lead to higher support for this dataset: (2) Differences in the functions of the proteins in the LIT dataset will also be reflected in the types of domains that are present in this network. The predicted network likely utilizes somewhat different domains than the LIT set. This is in line with the findings of Betel et al. (2004) who recently assessed domaindomain networks in S.cerevisiae and found that there are fundamental differences in the topology of these networks arising from the various yeast HTP datasets. These findings, combined with the data from Figure 1, suggest that at least some of the reduced support for the predicted interactions may be due to the differences in functional categories of the respective interaction networks, as well as the purification techniques that may bias towards transient or stable complexes. In addition, greater annotation of the human proteins will lead to increased support for the predicted interactions. For instance, between Build 44.0 and 45.0 of Swiss-Prot, support for the predicted interactions through domains increased from 3.1 to 5.6%.
Gene co-expression
Several studies have suggested that gene co-expression provides evidence for protein interactions (Deane et al., 2002; Ge et al., 2001; Kemmeren et al., 2002). We used the human GeneAtlas data (Su et al., 2004), derived from 79 normal human tissues, to provide evidence of PPIs through gene co-expression. The cutoff for significance of co-expression was found to correspond to a Pearson correlation = 0.607. GeneAtlas contains gene-expression data for both proteins in the interaction for 85.0% of LIT PPIs, with 9.0% significantly coexpressed. This compares with 86.2% of the OPHID interactions that have expression data, of which 17.3% are statistically significant. The most highly coexpressed protein pairs in the OPHID set involve ribosomal and proteasomal subunits, which show Pearson correlations >0.90. This finding indicates not only the presence of known stable complexes, but also that the gene co-expression of these complexes is conserved from yeast to humans (Jansen et al., 2002).
GO terms
Traditional approaches using GO terms to validate PPIs have employed the Jaccard similarity metric, which looks for cooccurring terms (Bader et al., 2004). This approach works well for highly annotated proteins, such as those found in yeast; however, human proteins do not share this level of annotation. Further, this method fails to take into account the depth within the GO tree of the overlapping terms, where deeper terms infer greater specificity (weight). We therefore used a modified semantic similarity measure described in Lord et al. (2003) (see Systems and Methods section).
The LIT set had a semantic similarity score for 76.9% of the PPIs, with 19.6% of these being significant (Fig. 2). The OPHID set, with a larger fraction of hypothetical and unannotated proteins, had a semantic similarity score for 58.2% of the PPIs, with 12.0% of these being significant. As the annotation of human proteins increases, we expect that support from GO similarity will increase, as was observed for domain support.
Measuring reliability by combined evidence
For the LIT interactions, 99.2% have at least one piece of evidence present (i.e. at least one of domains, expression data or GO terms for both proteins). Of these, 42.5% have evidence that is statistically significant. If the same number of interactions are chosen at random from the same set of proteins (to maintain similar levels of annotation), 10.1% of the randomized interactions are significant. For LIT interactions that have two or more pieces of evidence (92.9%), 15.9% are significant, indicating that, 16% of the known human PPIs are supported by at least two of these evidence types. This compares favorably to the 0.7% that are significant in the randomized network. While it would not be expected that all interactions would be supported by all evidence types, 16% is likely a lower limit on the number that may be supported in future. There are still more than 23% of the known interactions without related GO terms and many others with few terms present.
In the OPHID dataset, 23.0% of the predicted interactions have at least one significant piece of supporting evidence and 5.7% have
2 statistically significant pieces of evidence. This compares with 9.3 and 0.6% for the matching randomized non-interacting set (P < 0.05). Since there are 23 889 predicted PPIs, 5483 PPIs have some evidence (one type) and 1364 have
2 pieces of supporting evidence.
Evaluating the model organism source datasets
To examine the reliability of the model organism data, we have broken down the support for the interactions according to the source of the prediction. Figure 3A shows the breakdown of the percentage of original interactions that were supported by at least two types of evidence. Not surprisingly, two of the Riken (M.musculus) datasets (Suzuki et al., 2001; 2003) showed the highest support, since they are LIT interactions mapped from mouse to humans. This was also expected, as mice are closer evolutionarily to humans than S.cerevisiae, C.elegans or D.melanogaster, with 99% of the mouse genes having a human homolog and 80% having 1 : 1 human orthologs (Waterston et al., 2002). The next most reliable dataset is the INTEROLOG subset mapped from C.elegans. This subset includes interactions that were mapped from S.cerevisiae to C.elegans and then to humans, and thus likely represents a group of highly conserved protein interactions. The C.elegans LITERATURE set is similar to the Riken data, in that it was derived from small-scale published experiments and is therefore of higher quality. The MIPS, high and medium confidence datasets are derived from yeast, but represent the highest quality interactions in yeast, elucidated by multiple experiments. Finally, the remaining C.elegans (CORE_1, CORE_2, NON_CORE) and D.melanogaster(FlyHigh, FlyLow) Y2H experiments appear to be the least reliable source, which is not surprising given the inherent inaccuracy of Y2H (Sprinzak et al., 2003).
|
Figure 3B shows the number of interactions that have two or more types of supporting evidence, albeit not statistically significant. These graphs are not reciprocals, as interactions having only one supporting evidence type are not included. However Figure 3B shows similar trends as seen in Figure 3A, e.g. the C.elegans CORE and D.melanogaster datasets appearing to be the least accurate.
OPHID web interface
OPHID has been designed to aid not only the prediction of novel PPIs, but also to provide a regularly updated and expanded dataset that is easily accessible and can be used to further both small-scale experiments as well as support large-scale bioinformatics efforts. Thus, OPHID has been made available as a web-accessible database, where queries can be entered using a single identifier or by large batch queries using a variety of ID types (Genbank, Swiss-Prot, Unigene, LocusLink, etc.). The entire dataset can be downloaded as a tab-delimited text file or in the PSI-compliant XML format (Hermjakob et al., 2004). The OPHID interface contains a Java-based viewer to display the resulting PPI networks, which allows for the expansion of the search based on selected nodes in the graph or saving the visualized networks as either JPEG or SVG files.
| DISCUSSION |
|---|
|
|
|---|
One goal of the many proteomics projects published to date has been to map the PPI networks that exist in the respective organisms and thus determine the interactions that govern normal cell function. OPHID was designed to utilize this model organism interaction data in order to rapidly extend our knowledge of the human interactome. Only recently have LIT databases of human interactions begun to catch up with those devoted to model organisms, but while these are highly useful resources that improve access to the human interactome, these databases only recapitulate the known interactions published in the literature. Although HTP experiments are being performed on increasingly complex organisms, to date, few have been performed on mouse or humans.
Given the combinatorial explosion in the mouse and human interactomes that will surely emanate from the 20 to 25 000 genes in the genomes (International Human Genome Sequencing Consortium, 2004) (compared to 6000 in S.cerevisiae, 22 000 in C.elegans and 13 500 in D.melanogaster), it is unlikely that the higher eukaryote interactomes will be fully covered by experimental means in the near future. Thus, model organism interactomes must be used to gain insight into the human interaction networks and to begin using the resulting network to explore normal and disease processes in the near term. Further, this provides an opportunity for functional annotation of human and mouse proteins (currently 27 939 human proteins lack GO terms in Swiss-Prot Build 45.0) and provides a means for studying evolutionary conservation of important subnetworks in PPI datasets.
OPHID provides predictions of
24 000 PPIs, many of which we have supported with additional evidence. The database can be used in several ways. First, as a model of the human interactome, it can be used to explore known pathways, add new proteins to existing pathways or develop novel pathways altogether. Second, OPHID may be used as an aid in designing new PPI experiments by indicating whether orthologous proteins have been reported to interact in other organisms. Third, the data within OPHID can be integrated with additional datasets (e.g. expression data from disease profiles, OMIM data on disease-related proteins) to reveal new protein interactions and pathways that may be involved in human disease (Barrios-Rodiles et al., 2005). As new PPI datasets become available, they are being incorporated into OPHID; thus, OPHID will continue to represent an up-to-date, valuable resource for experiment planning.
Homology-based approaches to predicting PPIs may contain some inaccuracies (Deane et al., 2002; Matthews et al., 2001) depending on the filtering criteria used. For example, in mapping S.cerevisiae interactions to C.elegans, Matthews et al. (2001) were only able to reproduce 1631% of the predicted interactions in a Y2H system. In this experiment, the method of mapping interactions was to consider only the best matching C.elegans homolog for each S.cerevisiae protein. The reciprocal best match approach that we have used (System and Methods section) provides a more stringent mapping between orthologous proteins. While providing a lower coverage of the potential interactome, this method provides better accuracy in the predicted interactions (Yu et al., 2004).
Other groups have used InParanoid to predict human PPIs (Lehner and Fraser, 2004) rather than the reciprocal best-hit approach. Using our semantic similarity measure, only 13.7% of interactions in the Lehner dataset are supported, while OPHID has 20.6% supported interactions (considering only those PPIs with GO terms). The reciprocal best-hit approach thus has more in silico support, which suggests greater accuracy than the InParanoid-based predictions.
Our additional evidence currently supports 23% of the predicted PPIs. This is influenced by limitations in the domain network and sparse GO annotation of the human proteins and therefore likely it represents a lower limit to the interaction support. Further, it has been suggested that only 66% of previously known PPIs may show co-expression at the mRNA level (Kemmeren et al., 2002). Therefore, a lack of in silico validation does not necessarily indicate that the interaction is less reliable, but may simply be due to the lower level of annotation of human proteins to date. Despite these challenges, OPHID provides a sizable number of novel PPIs supported by in silico evidence.
In building OPHID, we chose to include the entire von Mering dataset (von Mering et al., 2002), which consists of high, medium and low confidence subsets. The protein complexes in this dataset were connected in an all-to-all (matrix) fashion. While the matrix model has been shown to be less accurate than the spoke model (Bader and Hogue, 2002), the decision to include this data in its entirety was based on providing the largest possible coverage of the human interactome and then filtering at a later time by using supporting evidence. Although the low confidence subset contains fewer supportable interactions relative to the high and medium subsets (Fig. 3B), it is important to note that the results are comparable to the most reliable experimental C.elegans interactions (CORE_1, CORE_2) or the D.melanogaster Y2H interactions.
OPHID users can easily filter out less reliable interactions and include only the highest quality interaction data in their subsequent analysis, bearing in mind that reducing the false-positive rate increases the false-negative rate. We believe that there are numerous reliable (supportable) interactions to be gained by including the low quality data from each of these subsets (yeast low, NON_CORE and FlyLow) and we have indeed found many mapped interactions from these subsets that appear to be reliable human interactions.
| FUTURE DIRECTIONS |
|---|
|
|
|---|
OPHID will continue to grow as new interaction datasets become available and additional evidence will continue to be sought. We expect the in silico evidence for the OPHID interactions to improve in parallel with the annotation of human proteins. Additionally, including metrics such as coevolution can help reinforce the relatedness of the individual predicted interactions (Tan et al., 2004). Ultimately, a machine classifier will be developed to provide a unified confidence score for the OPHID interactions that will allow users an additional means of filtering the predicted protein interactions.
| Acknowledgments |
|---|
The authors thank R. Lu and D. Otasek for software development. We acknowledge the hardware and software support from IBM Life Sciences through a Shared University Research Grant and support from the National Science and Engineering Research Council (RGPIN 203833-02), the Institute for Robotics and Intelligent Systems, Precarn Inc, National Institutes of Health (#P50-GM62413), Fashion Show and Younger Foundations change.
| Footnotes |
|---|
Note: DIP is only used internally for analysis. It is not reproduced on the OPHID website due to copyright restrictions.
Received on September 23, 2004; revised on January 10, 2005; accepted on January 11, 2005
| REFERENCES |
|---|
|
|
|---|
International Human Genome Sequencing Consortium. (2004) Finishing the euchromatic sequence of the human genome. Nature, 431, 931945[CrossRef][Medline].
Bader, G.D. and Hogue, C.W. (2002) Analyzing yeast proteinprotein interaction data obtained from different sources. Nat. Biotechnol, 20, 991997[CrossRef][Web of Science][Medline].
Bader, J.S., Chaudhuri, A., Rothberg, J.M., Chant, J. (2004) Gaining confidence in high-throughput protein interaction networks. Nat. Biotechnol, 22, 7885[CrossRef][Web of Science][Medline].
Barrios-Rodiles, M., Brown, K.R., Ozdamar, B., Liu, Z., Donovan, R.S., Shinfo, F., Liu, Y., Bose, R., Dembowy, J.R. (2005) High-Throughput Mapping of a Dynamic Signalling Network In Mammalian Cells. Science, in press.
Betel, D., Isserlin, R., Hogue, C.W. (2004) Analysis of domain correlations in yeast protein complexes. Bioinformatics, 20, Suppl 1, SI55SI62.
Bowers, P.M., Pellegrini, M., Thompson, M.J., Fierro, J., Yeates, T.O., Eisenberg, D. (2004) Prolinks: a database of protein functional linkages derived from coevolution. Genome Biol, 5, R35[CrossRef][Medline].
Colland, F., Jacq, X., Trouplin, V., Mougin, C., Groizeleau, C., Hamburger, A., Meil, A., Wojcik, J., Legrain, P., Gauthier, J.M. (2004) Functional proteomics mapping of a human signaling pathway. Genome Res, 14, 13241332
Deane, C.M., Salwinski, L., Xenarios, I., Eisenberg, D. (2002) Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol. Cell. Proteomics, 1, 349356
Deng, M., Mehta, S., Sun, F., Chen, T. (2002) Inferring domaindomain interactions from proteinprotein interactions. Genome Res, 12, 15401548
Deng, M., Sun, F., Chen, T. (2003) Assessment of the reliability of proteinprotein interactions and protein function prediction. Pac. Symp. Biocomput, 140151.
Gavin, A.-C., Bösche, M., Krause, R., Grandi, P., Marzioch, M., Bauer, A., Schultz, J., Rick, J., Michon, A.-M., Cruciat, C., et al. (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature, 415, 141147[CrossRef][Medline].
Ge, H., Liu, Z., Church, G.M., Vidal, M. (2001) Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat. Genet., 29, 482486[CrossRef][Web of Science][Medline].
Giot, L., Bader, J.S., Brouwer, C., Chaudhuri, A., Kuang, B., Li, Y., Hao, Y.L., Ooi, C.E., Godwin, B., Vitols, E., et al. (2003) A protein interaction map of Drosophila melanogaster. Science, 302, 17271736
Han, K., Park, B., Kim, H., Hong, J., Park, J. (2004) HPID: the human protein interaction database. Bioinformatics, 20, 24662470
Hermjakob, H., Montecchi-Palazzi, L., Bader, G., Wojcik, J., Salwinski, L., Ceol, A., Moore, S., Orchard, S., Sarkans, U., von Mering, C., et al. (2004) The HUPO PSI's molecular interaction formata community standard for the representation of protein interaction data. Nat. Biotechnol, 22, 177183[CrossRef][Web of Science][Medline].
Ho, Y., Gruhler, A., Heilbut, A., Bader, G.D., Moore, L., Adams, S.L., Millar, A., Taylor, P., Bennett, K., Boutilier, K., et al. (2002) Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature, 415, 180183[CrossRef][Medline].
Huang, T.-W., Tien, A.-C., Huang, W.-S., Lee, Y.C.G., Peng, C.-L., Tseng, H.-H., Kao, C.-Y., Huang, C.-Y.F. (2004) POINT: a database for the prediction of proteinprotein interactions based on the orthologous interactome. Bioinformatics, 20, 32733276
Ito, T., Chiba, T., Ozawa, R., Yoshida, M., Hattori, M., Sakaki, Y. (2001) A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl Acad. Sci. USA, 98, 45694574
Jansen, R., Greenbaum, D., Gerstein, M. (2002) Relating whole-genome expression data with proteinprotein interactions. Genome Res, 12, 3746
Kemmeren, P., van Berkum, N.L., Vilo, J., Bijma, T., Donders, R., Brazma, A., Holstege, F.C.P. (2002) Protein interaction verification and functional annotation by integrated analysis of genome-scale data. Mol. Cell, 9, 11331143[CrossRef][Web of Science][Medline].
Lehner, B. and Fraser, A.G. (2004) A first-draft human protein-interaction map. Genome Biol, 5, R63.61R63.69.
Lehner, B., Semple, J.I., Brown, S.E., Counsell, D., Campbell, R.D., Sanderson, C.M. (2004) Analysis of a high-throughput yeast two-hybrid system and its use to predict the function of intracellular proteins encoded within the human MHC class III region. Genomics, 83, 153167[CrossRef][Web of Science][Medline].
Li, S., Armstrong, C.M., Bertin, N., Ge, H., Milstein, S., Boxem, M., Vidalain, P.O., Han, J.D., Chesneau, A., Hao, T., et al. (2004) A map of the interactome network of the metazoan C. elegans. Science, 303, 540543
Lord, P.W., Stevens, R.D., Brass, A., Goble, C.A. (2003) Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics, 19, 12751283
Luc, P.V. and Tempst, P. (2004) PINdb: a database of nuclear protein complexes from human and yeast. Bioinformatics, 20, 14131415
Matthews, L.R., Vaglio, P., Reboul, J., Ge, H., Davis, B.P., Garrels, J., Vincent, S., Vidal, M. (2001) Identification of potential interaction networks using sequence-based searches for conserved proteinprotein interactions or interologs. Genome Res., 11, 21202126
Mellor, J.C., Yanai, I., Clodfelter, K.H., Mintseris, J., DeLisi, C. (2002) Predictome: a database of putative functional links between proteins. Nucleic Acids Res, 30, 306309
Pagel, P., Mewes, H.W., Frishman, D. (2004) Conservation of proteinprotein interactionslessons from ascomycota. Trends Genet, 20, 7276[CrossRef][Web of Science][Medline].
Peri, S., Navarro, J.D., Amanchy, R., Kristiansen, T.Z., Jonnalagadda, C.K., Surendranath, V., Niranjan, V., Muthusamy, B., Gandhi, T.K.B., Gronborg, M., et al. (2003) Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res, 13, 23632371
Sprinzak, E. and Margalit, H. (2001) Correlated sequence-signatures as markers of proteinprotein interaction. J. Mol. Biol, 311, 681692[CrossRef][Web of Science][Medline].
Sprinzak, E., Sattath, S., Margalit, H. (2003) How reliable are experimental proteinprotein interaction data? J. Mol. Biol, 327, 919923[CrossRef][Web of Science][Medline].
Su, A.I., Wiltshire, T., Batalov, S., Lapp, H., Ching, K.A., Block, D., Zhang, J., Soden, R., Hayakawa, M., Kreiman, G., et al. (2004) A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl Acad. Sci. USA, 101, 60626067
Suzuki, H., Fukunishi, Y., Kagawa, I., Saito, R., Oda, H., Endo, T., Kondo, S., Bono, H., Okazaki, Y., Hayashizaki, Y. (2001) Proteinprotein interaction panel using mouse full-length cDNAs. Genome Res, 11, 17581765
Suzuki, H., Saito, R., Kanamori, M., Kai, C., Schonbach, C., Nagashima, T., Hosaka, J., Hayashizaki, Y. (2003) The mammalian proteinprotein interaction database and its viewing system that is linked to the main FANTOM2 viewer. Genome Res, 13, 15341541
Tan, S.H., Zhang, Z., Ng, S.K. (2004) ADVICE: automated detection and validation of interaction by co-evolution. Nucleic Acids Res, 32, W69W72
Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., Judson, R.S., Knight, J.R., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P., et al. (2000) A comprehensive analysis of proteinprotein interactions in Saccharomyces cerevisiae. Nature, 403, 623627[CrossRef][Medline].
von Mering, C., Huynen, M., Jaeggi, D., Schmidt, S., Bork, P., Snel, B. (2003) STRING: a database of predicted functional associations between proteins. Nucleic Acids Res., 31, 258261
von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., Fields, S., Bork, P. (2002) Comparative assessment of large-scale data sets of proteinprotein Interactions. Nature, 417, 399403[Medline].
Waterston, R.H., Lindblad-Toh, K., Birney, E., Rogers, J., Abril, J.F., Agarwal, P., Agarwala, R., Ainscough, R., Alexanderson, M., An, P., et al. (2002) Initial sequencing and comparative analysis of the mouse genome. Nature, 420, 520562[CrossRef][Medline].
Wojcik, J. and Schachter, V. (2001) Proteinprotein interaction map inference using interacting domain profile pairs. Bioinformatics, 17, S296S305[Abstract].
Wuchty, S., Oltvai, Z.N., Barabasi, A.L. (2003) Evolutionary conservation of motif constituents in the yeast protein interaction network. Nat. Genet, 35, 176179[CrossRef][Web of Science][Medline].
Xenarios, I., Rice, D.W., Salwinski, L., Baron, M.K., Marcotte, E.M., Eisenberg, D. (2000) DIP: the database of interacting proteins. Nucleic Acids Res., 28, 289291
Yu, H., Luscombe, N.M., Lu, H.X., Zhu, X., Xia, Y., Han, J.D., Bertin, N., Chung, S., Vidal, M. (2004) Annotation transfer between genomes: proteinprotein interologs and proteinDNA regulogs. Genome Res., 14, 11071118
Zanzoni, A., Montecchi-Palazzi, L., Quondam, M., Ausiello, G., Helmer-Citterich, M., Cesareni, G. (2002) MINT: a Molecular INTeraction database. FEBS Lett., 513, 135140[CrossRef][Web of Science][Medline].
This article has been cited by other articles:
![]() |
H. Fukasawa, S. Bornheimer, K. Kudlicka, and M. G. Farquhar Slit Diaphragms Contain Tight Junction Proteins J. Am. Soc. Nephrol., July 1, 2009; 20(7): 1491 - 1503. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Blankenburg, F. Ramirez, J. Buch, and M. Albrecht DASMIweb: online integration, analysis and assessment of distributed protein interaction data Nucleic Acids Res., July 1, 2009; 37(suppl_2): W122 - W128. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Blankenburg, R. D. Finn, A. Prlic, A. M. Jenkinson, F. Ramirez, D. Emig, S.-E. Schelhorn, J. Buch, T. Lengauer, and M. Albrecht DASMI: exchanging, annotating and assessing molecular interaction data Bioinformatics, May 15, 2009; 25(10): 1321 - 1328. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Tuncbag, G. Kar, O. Keskin, A. Gursoy, and R. Nussinov A survey of available tools and web servers for analysis of protein-protein interactions and interfaces Brief Bioinform, May 1, 2009; 10(3): 217 - 232. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Wilflingseder, A. Kainz, P. Perco, R. Korbely, B. Mayer, and R. Oberbauer Molecular predictors for anaemia after kidney transplantation Nephrol. Dial. Transplant., March 1, 2009; 24(3): 1015 - 1023. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Chaurasia, S. Malhotra, J. Russ, S. Schnoegl, C. Hanig, E. E. Wanker, and M. E. Futschik UniHI 4: new tools for query, analysis and visualization of the human protein-protein interactome Nucleic Acids Res., January 1, 2009; 37(suppl_1): D657 - D660. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. D. McDowall, M. S. Scott, and G. J. Barton PIPs: human protein-protein interaction prediction database Nucleic Acids Res., January 1, 2009; 37(suppl_1): D651 - D656. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. J. Huang, D. Hang, L. J. Lu, L. Tong, M. B. Gerstein, and G. T. Montelione Targeting the Human Cancer Pathway Protein Interaction Network by Structural Genomics Mol. Cell. Proteomics, October 1, 2008; 7(10): 2048 - 2060. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Michaut, S. Kerrien, L. Montecchi-Palazzi, F. Chauvat, C. Cassier-Chauvat, J.-C. Aude, P. Legrain, and H. Hermjakob InteroPORC: automated inference of highly conserved protein interaction networks Bioinformatics, July 15, 2008; 24(14): 1625 - 1631. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Ozgur, T. Vu, G. Erkan, and D. R. Radev Identifying gene-disease associations using centrality on a literature mined gene-interaction network Bioinformatics, July 1, 2008; 24(13): i277 - i285. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Li, C. Wu, H. Huang, K. Zhang, J. Gan, and S. S.-C. Li Prediction of phosphotyrosine signaling networks using a scoring matrix-assisted ligand identification approach Nucleic Acids Res., June 1, 2008; 36(10): 3263 - 3273. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Geisler-Lee, N. O'Toole, R. Ammar, N. J. Provart, A. H. Millar, and M. Geisler A Predicted Interactome for Arabidopsis Plant Physiology, October 1, 2007; 145(2): 317 - 329. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Schlicker, C. Huthmacher, F. Ramirez, T. Lengauer, and M. Albrecht Functional evaluation of domain domain interactions and human protein interaction networks Bioinformatics, April 1, 2007; 23(7): 859 - 865. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. E. Futschik, G. Chaurasia, and H. Herzel Comparison of human protein protein interaction maps Bioinformatics, March 1, 2007; 23(5): 605 - 611. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Przulj Biological network comparison using graphlet degree distribution Bioinformatics, January 15, 2007; 23(2): e177 - e183. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Chaurasia, Y. Iqbal, C. Hanig, H. Herzel, E. E. Wanker, and M. E. Futschik UniHI: an entry gate to the human protein interactome Nucleic Acids Res., January 12, 2007; 35(suppl_1): D590 - D594. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Xu and Y. Li Discovering disease-genes by topological features in human protein-protein interaction network Bioinformatics, November 15, 2006; 22(22): 2800 - 2805. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. A. George, J. Y. Liu, L. L. Feng, R. J. Bryson-Richardson, D. Fatkin, and M. A. Wouters Analysis of protein sequence and interaction data for candidate disease gene prediction Nucleic Acids Res., November 14, 2006; 34(19): e130 - e130. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Przulj and D. J Higham Modelling protein-protein interaction networks via a stickiness index J R Soc Interface, October 22, 2006; 3(10): 711 - 716. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. F. Jonsson and P. A. Bates Global topological features of cancer proteins in the human interactome Bioinformatics, September 15, 2006; 22(18): 2291 - 2297. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Guo, R. Liu, C. D. Shriver, H. Hu, and M. N. Liebman Assessing semantic similarity measures for the characterization of human regulatory pathways Bioinformatics, April 15, 2006; 22(8): 967 - 973. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Wachi, K. Yoneda, and R. Wu Interactome-transcriptome analysis reveals the high centrality of genes differentially expressed in lung cancer tissues Bioinformatics, December 1, 2005; 21(23): 4205 - 4208. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. E. Cusick, N. Klitgord, M. Vidal, and D. E. Hill Interactome: gateway into systems biology Hum. Mol. Genet., October 15, 2005; 14(suppl_2): R171 - R181. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||











