Skip Navigation


Bioinformatics Advance Access originally published online on January 18, 2005
Bioinformatics 2005 21(9):2076-2082; doi:10.1093/bioinformatics/bti273
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/9/2076    most recent
bti273v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (86)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Brown, K. R.
Right arrow Articles by Jurisica, I.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Brown, K. R.
Right arrow Articles by Jurisica, I.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

Online Predicted Human Interaction Database

Kevin R. Brown 1,2 and Igor Jurisica 1,2,3,*

1Division of Cancer Informatics, Ontario Cancer Institute, University of Toronto Toronto, Ontario, Canada
2Department of Medical Biophysics, University of Toronto Toronto, Ontario, Canada
3Department of Computer Science, University of Toronto Toronto, Ontario, Canada

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 INTRODUCTION
 SYSTEM AND METHODS
 IMPLEMENTATION
 RESULTS
 DISCUSSION
 FUTURE DIRECTIONS
 REFERENCES
 

Motivation: High-throughput experiments are being performed at an ever-increasing rate to systematically elucidate protein–protein interaction (PPI) networks for model organisms, while the complexities of higher eukaryotes have prevented these experiments for humans.

Results: The Online Predicted Human Interaction Database (OPHID) is a web-based database of predicted interactions between human proteins. It combines the literature-derived human PPI from BIND, HPRD and MINT, with predictions made from Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster and Mus musculus. The 23 889 predicted interactions currently listed in OPHID are evaluated using protein domains, gene co-expression and Gene Ontology terms. OPHID can be queried using single or multiple IDs and results can be visualized using our custom graph visualization program.

Availability: Freely available to academic users at http://ophid.utoronto.ca, both in tab-delimited and PSI-MI formats. Commercial users, please contact I.J.

Contact: juris{at}ai.utoronto.ca

Supplementary information: http://ophid.utoronto.ca/supplInfo.pdf


    INTRODUCTION
 TOP
 Abstract
 INTRODUCTION
 SYSTEM AND METHODS
 IMPLEMENTATION
 RESULTS
 DISCUSSION
 FUTURE DIRECTIONS
 REFERENCES
 
The network of protein–protein interactions (PPIs), referred to as the interactome, forms a backbone of signaling pathways, metabolic pathways and cellular processes required for normal cell function. Complete knowledge of these pathways will help in the understanding of the normal processes in the cell, as well as how diseases such as cancer develop from mutation of individual pathway components. It has been the central aim of many high-throughput (HTP) experiments to elucidate the PPI networks in model organisms such as Saccharomyces cerevisiae (Gavin et al., 2002; Ho et al., 2002; Ito et al., 2001; Uetz et al., 2000), Caenorhabditis elegans (Li et al., 2004), Drosophilamelanogaster (Giot et al., 2003) and Mus musculus (Suzuki et al., 2003). While few studies have been performed in humans (Colland et al., 2004; Lehner et al., 2004), we have used the HTP model organism interactions to infer some of the millions of potential human PPIs.

Many databases are devoted to the human interactome, with a substantial number of them appearing in recent months [DIP, HPID, HPRD, MINT, PINdb (Han et al., 2004; Luc and Tempst, 2004; Peri et al., 2003; Xenarios et al., 2000; Zanzoni et al., 2002)]. However, the majority of these databases are derived from hand-curated, literature-based interactions. Although highly useful in providing ready access to the known human interactions, they do little to expand the knowledge of the interactome. Several databases have also been published that make predictions about the functional relationships between proteins based on a variety of in silico methods (Predictome, STRING, Prolinks, POINT) (Bowers et al., 2004; Huang et al., 2004; Mellor et al., 2002; von Mering et al., 2003).

The Online Predicted Human Interaction Database (OPHID) was designed to extend the human interactome using model organism data and to provide a repository for already known, experimentally derived human PPIs. While these predictions should be thought of as hypotheses until experimentally validated, there is increasing evidence that PPIs are conserved through evolution (Pagel et al., 2004; Wuchty et al., 2003). OPHID catalogs 16 034 known human PPIs obtained from BIND, MINT and HPRD, and makes predictions for 23 889 additional interactions.

Multiple types of evidence have been used in the literature both to support experimentally derived PPIs and to predict interactions in silico. Examples include domain–domain co-occurrence (Deng et al., 2002; Sprinzak and Margalit, 2001), gene co-expression (Bader et al., 2004; Deane et al., 2002; Deng et al., 2003) and Gene Ontology (GO) terms (Bader et al., 2004; Sprinzak et al., 2003). Using the combination of the three types of evidence allows us to support a broader range of PPIs than any single method.

We have applied all three evidence types to OPHID, providing support for 5483 (23%) of our predicted PPIs. We believe that OPHID will be a useful resource for researchers concerned with the human interactome, especially when integrated with additional HTP datasets that are likely to be available in the future.


    SYSTEM AND METHODS
 TOP
 Abstract
 INTRODUCTION
 SYSTEM AND METHODS
 IMPLEMENTATION
 RESULTS
 DISCUSSION
 FUTURE DIRECTIONS
 REFERENCES
 
OPHID generation
OPHID was constructed by mapping model organism PPIs to human protein orthologs using BLASTP and the reciprocal best-hit approach. Briefly, a database of model organism-to-human orthologs was constructed by BLASTing each model organism protein against the Swiss-Prot database filtered for human proteins. Each top BLAST hit with an E-value <10–5 was BLASTed back against the set of all model organism protein sequences. If the top hit in the reverse direction (with E-value <10–5) matched the original query protein, the matching human protein was selected as a potential ortholog. These were filtered to remove any hits that occurred over <50% of the query sequence length, to avoid interactions that may involve a single protein domain.

Each model organism protein was translated to its human ortholog and a predicted human interaction was added if both proteins in the model organism interaction were conserved in humans. Model organism PPIs were added from S.cerevisiae, C.elegans, D.melanogaster and M.musculus using this technique. For a complete listing of data sources and references, refer to Table 1.


View this table:
[in this window]
[in a new window]
 
Table 1 Model organism protein–protein interactions in OPHID

 
Domain co-occurrence dataset generation
The literature-derived PPIs from BIND, DIP1 HPRD and MINT were used to create a domain–domain co-occurrence network using the InterPro domains obtained from Swiss-Prot. For every interacting protein pair, each domain from protein A was connected to the domains in protein B. The frequency of these domain pairs was determined for all interacting protein pairs (n = 16 107), as well as all non-interacting pairs (i.e. all proteins not reported to interact in BIND, DIP, MINT or HPRD; n = 1.8 x 107). The hypergeometric distribution was used to determine which domain pairs were enriched in interacting protein pairs compared to the non-interacting pairs. After applying the Bonferroni correction to account for repeated sampling, 4182 domain–domain pairs were identified with P < 9.2 x 10–7 between 1164 domains.

Co-expression dataset
Human gene expression data was obtained from the GeneAtlas Affymetrix dataset, which includes expression data for 44 775 human genes from 79 normal human tissues (Su et al., 2004). Gene co-expression was determined using the Pearson correlation coefficient between gene vectors for each protein in the interaction.

GO term similarity measure
We used a modification of the semantic similarity measure (Lord et al., 2003) to determine the relatedness of each interacting protein pair. The semantic similarity method examines the frequency with which each GO term appears in Swiss-Prot for human proteins and assigns a higher score to terms that appear less frequently (i.e. have greater ‘information content’). For example, non-specific terms such as the top-level ‘molecular_function’ (GO:0003674) provide little information about the relatedness of two proteins, reflected in the P-value = 1.0. In contrast, more descriptive terms such as ‘translation regulator activity’ (P = 0.0048) or ‘chaperone activity’ (P = 0.0052) have greater information content, as they are used less frequently to describe human proteins and are potentially more specific for function. The GO similarity was determined by calculating the maximum semantic similarity from the set of all GO term pairs between interacting proteins. See Supplementary information for a complete example.

Background distributions
Statistically significant cutoffs for domain co-occurrence, gene co-expression and GO term similarity, were determined by estimating the background distributions using a bootstrap approach. Briefly, all OPHID PPIs (known and predicted) were randomized 1000 times to produce equivalent-sized random networks. The mean of the 95th percentiles was chosen as a cutoff. The thresholds for each metric are: domain co-occurrence (one significant domain pair); gene co-expression (Pearson = 0.607; see Supplementary information); GO similarity (GOSim = 3.14).


    IMPLEMENTATION
 TOP
 Abstract
 INTRODUCTION
 SYSTEM AND METHODS
 IMPLEMENTATION
 RESULTS
 DISCUSSION
 FUTURE DIRECTIONS
 REFERENCES
 
Databases and software
Known (literature-derived, LIT) human PPIs were acquired from BIND, DIP, HPRD and MINT (see Supplementary information). The data and sequences from Swiss-Prot (v. 45.0) were loaded into our IBM DB2 database (v. 8.1.1.16 [EC] ). Protein sequences for each organism were obtained from the following sources: S.cerevisiae, Yeast Protein Databank (YPD); C.elegans, WormPep; D.melanogaster, FlyBase; M.musculus, Swiss-Prot (see Supplementary information for full versions). A local NCBI BLAST server (v. 2.2.4) was run through IBM's Information Integrator (v. 8.1.1) using the default BLAST settings. GO terms and InterPro domains were gathered from Swiss-Prot. The OPHID web interface and query engine was implemented on an IBM WebSphere web server (v. 5.0.0). All additional processing software was written in Java.


    RESULTS
 TOP
 Abstract
 INTRODUCTION
 SYSTEM AND METHODS
 IMPLEMENTATION
 RESULTS
 DISCUSSION
 FUTURE DIRECTIONS
 REFERENCES
 
Protein interaction network
OPHID was generated from a total of 108 867 model organism PPIs mapped to human proteins through orthology. Orthologs were identified using the reciprocal best-hit approach (see Systems and Methods section). In total, 31.9% of the S.cerevisiae proteins had orthologs in humans, while 39.7 and 21.2% of the D.melanogaster and C.elegans proteins had orthologs, respectively. Through this orthology database, 23 889 model organism PPIs were mapped to human proteins, providing predictions for interactions that may occur in the human interactome, including 929 that are confirmed human interactions. Seventy two of the predicted interactions were from more than one model organism.

The predicted PPI dataset from OPHID (referred to as the OPHID set hereafter) contains 4552 proteins, 1872 of which are not in the LIT set (6144 proteins). Thus, OPHID extends the human interactome by hundreds of proteins that have not yet been included in the literature-derived databases.

Importantly, there is a large difference in the types of proteins that are being covered in the two datasets. Figure 1 shows the distributions of the functional categories represented in the LIT dataset, compared to the interactions in OPHID. The proteins involved in the LIT dataset are primarily involved in ‘cellular fate and organization’ pathways (29.3%), such as apoptosis, cell cycle regulation and cytoskeletal remodeling, followed by ‘transcription’ (9.8%) and ‘transport and sensing’ (9.0%). Only 19.9% of the proteins in this set are ‘Uncharacterized’, meaning that they lack GO terms in the Swiss-Prot database. In contrast, 29.1% of the proteins involved in OPHID are ‘uncharacterized’. OPHID is enriched for proteins involved in ‘energy production’ (2.3% versus 0.9%) and ‘other metabolism’ (6.0% versus 2.8%) compared to the LIT set, while the LIT set is enriched for proteins involved in ‘stress and defense’. This data suggests that the combination of the known and predicted interactions complement each other in many GO categories. In addition, the linking of the uncharacterized proteins, which make up ~ 30% of OPHID, to known interactions will help provide functional information for these unannotated proteins.



View larger version (20K):
[in this window]
[in a new window]
 
Fig. 1 Distribution of functional categories of proteins within OPHID. The GO terms obtained from Swiss-Prot for each protein within the interaction network were mapped to one of 11 broad categories on a first-matched basis using a custom keyword dictionary. The distributions of protein function are shown for the ‘Known’ PPIs (LIT/BIND, DIP, HPRD and MINT data; 6144 proteins), for the ‘Predicted’ interactions that were mapped from model organisms (4552 proteins) and for all known human proteins within Swiss-Prot (‘HUMAN’; 57 400 proteins). Proteins in the ‘Not matched’ category did not match against the keyword dictionary, while the ‘Uncharacterized’ category represents proteins that lacked any descriptive GO terms.

 
The use of HTP experiments from model organisms has the potential to include false positive interactions. For example, Sprinzak et al. (2003) suggested that only 50% of yeast Y2H interactions are reliable. Producing a predicted PPI network may compound this problem by including those false positives, as well as potentially creating new ones through the ortholog mapping. In order to help filter out noisy interactions, we chose to look for additional supporting evidence in the form of protein domains, gene co-expression and GO terms (see Systems and Methods section). In essence, this additional evidence provides in silico validation of the OPHID interactions and will help rank the predicted interactions for future experimental confirmation.

Support through domain co-occurrence
The presence of domain pairs has been used extensively to predict de novo protein interactions (Deng et al., 2002; Wojcik and Schachter, 2001), as well as for the validation of reported interactions (Sprinzak and Margalit, 2001). Here, we have used more than 16 000 human PPIs from the LIT dataset to produce a domain co-occurrence network and selected those domain pairs that are significantly enriched in the interacting proteins compared to the non-interacting pairs (Systems and Methods section). While 93.0% of the LIT PPIs have at least one domain for each of the proteins in the pair, 44.3% of those have ≥ 2 statistically significant domain pairs (Fig. 2). This is in contrast to the OPHID dataset, where 92.1% of the PPIs have domain information, with 5.6% of these containing significant domains.



View larger version (17K):
[in this window]
[in a new window]
 
Fig. 2 Supporting evidence for known and predicted PPIs. Evidence was gathered to support each of the LIT interactions (BIND, DIP, HPRD and MINT) or OPHID predictions. The evidence was gathered in the form of domain–domain co-occurrence (‘Domains’), gene co-expression (‘Express’) and GO term semantic similarity (‘GO Terms’). For each dataset, the fraction of total PPIs with each evidence type is shown by the white bars. The fraction of total PPIs with significant evidence (≥2 domains, r ≥ 0.607 and GOsim ≥ 3.14) is indicated in black (nknown = 16 107 PPIs, npredicted = 23 889 PPIs).

 
This difference in domain support is likely due to two factors: (1) The domain network was derived from the LIT dataset, which should lead to higher support for this dataset: (2) Differences in the functions of the proteins in the LIT dataset will also be reflected in the types of domains that are present in this network. The predicted network likely utilizes somewhat different domains than the LIT set. This is in line with the findings of Betel et al. (2004) who recently assessed domain–domain networks in S.cerevisiae and found that there are fundamental differences in the topology of these networks arising from the various yeast HTP datasets. These findings, combined with the data from Figure 1, suggest that at least some of the reduced support for the predicted interactions may be due to the differences in functional categories of the respective interaction networks, as well as the purification techniques that may bias towards transient or stable complexes. In addition, greater annotation of the human proteins will lead to increased support for the predicted interactions. For instance, between Build 44.0 and 45.0 of Swiss-Prot, support for the predicted interactions through domains increased from 3.1 to 5.6%.

Gene co-expression
Several studies have suggested that gene co-expression provides evidence for protein interactions (Deane et al., 2002; Ge et al., 2001; Kemmeren et al., 2002). We used the human GeneAtlas data (Su et al., 2004), derived from 79 normal human tissues, to provide evidence of PPIs through gene co-expression. The cutoff for significance of co-expression was found to correspond to a Pearson correlation = 0.607. GeneAtlas contains gene-expression data for both proteins in the interaction for 85.0% of LIT PPIs, with 9.0% significantly coexpressed. This compares with 86.2% of the OPHID interactions that have expression data, of which 17.3% are statistically significant. The most highly coexpressed protein pairs in the OPHID set involve ribosomal and proteasomal subunits, which show Pearson correlations >0.90. This finding indicates not only the presence of known stable complexes, but also that the gene co-expression of these complexes is conserved from yeast to humans (Jansen et al., 2002).

GO terms
Traditional approaches using GO terms to validate PPIs have employed the Jaccard similarity metric, which looks for cooccurring terms (Bader et al., 2004). This approach works well for highly annotated proteins, such as those found in yeast; however, human proteins do not share this level of annotation. Further, this method fails to take into account the depth within the GO tree of the overlapping terms, where deeper terms infer greater specificity (weight). We therefore used a modified semantic similarity measure described in Lord et al. (2003) (see Systems and Methods section).

The LIT set had a semantic similarity score for 76.9% of the PPIs, with 19.6% of these being significant (Fig. 2). The OPHID set, with a larger fraction of hypothetical and unannotated proteins, had a semantic similarity score for 58.2% of the PPIs, with 12.0% of these being significant. As the annotation of human proteins increases, we expect that support from GO similarity will increase, as was observed for domain support.

Measuring reliability by combined evidence
For the LIT interactions, 99.2% have at least one piece of evidence present (i.e. at least one of domains, expression data or GO terms for both proteins). Of these, 42.5% have evidence that is statistically significant. If the same number of interactions are chosen at random from the same set of proteins (to maintain similar levels of annotation), 10.1% of the randomized interactions are significant. For LIT interactions that have two or more pieces of evidence (92.9%), 15.9% are significant, indicating that, 16% of the known human PPIs are supported by at least two of these evidence types. This compares favorably to the 0.7% that are significant in the randomized network. While it would not be expected that all interactions would be supported by all evidence types, 16% is likely a lower limit on the number that may be supported in future. There are still more than 23% of the known interactions without related GO terms and many others with few terms present.

In the OPHID dataset, 23.0% of the predicted interactions have at least one significant piece of supporting evidence and 5.7% have ≥ 2 statistically significant pieces of evidence. This compares with 9.3 and 0.6% for the matching randomized non-interacting set (P < 0.05). Since there are 23 889 predicted PPIs, 5483 PPIs have some evidence (one type) and 1364 have ≥2 pieces of supporting evidence.

Evaluating the model organism source datasets
To examine the reliability of the model organism data, we have broken down the support for the interactions according to the source of the prediction. Figure 3A shows the breakdown of the percentage of original interactions that were supported by at least two types of evidence. Not surprisingly, two of the Riken (M.musculus) datasets (Suzuki et al., 2001; 2003) showed the highest support, since they are LIT interactions mapped from mouse to humans. This was also expected, as mice are closer evolutionarily to humans than S.cerevisiae, C.elegans or D.melanogaster, with 99% of the mouse genes having a human homolog and 80% having 1 : 1 human orthologs (Waterston et al., 2002). The next most reliable dataset is the INTEROLOG subset mapped from C.elegans. This subset includes interactions that were mapped from S.cerevisiae to C.elegans and then to humans, and thus likely represents a group of highly conserved protein interactions. The C.elegans LITERATURE set is similar to the Riken data, in that it was derived from small-scale published experiments and is therefore of higher quality. The MIPS, high and medium confidence datasets are derived from yeast, but represent the highest quality interactions in yeast, elucidated by multiple experiments. Finally, the remaining C.elegans (CORE_1, CORE_2, NON_CORE) and D.melanogaster(FlyHigh, FlyLow) Y2H experiments appear to be the least reliable source, which is not surprising given the inherent inaccuracy of Y2H (Sprinzak et al., 2003).



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 3 Reliability of predicted interactions in OPHID. We have examined which source datasets (after mapping to human proteins) had the most supporting evidence. (A) The proportion of interactions from each dataset with ≥ 2 pieces of supporting evidence (domains, co-expression or GO similarity). (B) The proportion of interactions with ≥2 evidence types, but which are not supported by that evidence. For instance, of the 914 interactions predicted from MIPS, 626 have ≥ 2 evidence types. Of those, 103 (11.3%) are supported, while 523 (57.2%) are not. Another 205 (22.4%) are supported by only one piece of evidence.

 
Figure 3B shows the number of interactions that have two or more types of supporting evidence, albeit not statistically significant. These graphs are not reciprocals, as interactions having only one supporting evidence type are not included. However Figure 3B shows similar trends as seen in Figure 3A, e.g. the C.elegans CORE and D.melanogaster datasets appearing to be the least accurate.

OPHID web interface
OPHID has been designed to aid not only the prediction of novel PPIs, but also to provide a regularly updated and expanded dataset that is easily accessible and can be used to further both small-scale experiments as well as support large-scale bioinformatics efforts. Thus, OPHID has been made available as a web-accessible database, where queries can be entered using a single identifier or by large batch queries using a variety of ID types (Genbank, Swiss-Prot, Unigene, LocusLink, etc.). The entire dataset can be downloaded as a tab-delimited text file or in the PSI-compliant XML format (Hermjakob et al., 2004). The OPHID interface contains a Java-based viewer to display the resulting PPI networks, which allows for the expansion of the search based on selected nodes in the graph or saving the visualized networks as either JPEG or SVG files.


    DISCUSSION
 TOP
 Abstract
 INTRODUCTION
 SYSTEM AND METHODS
 IMPLEMENTATION
 RESULTS
 DISCUSSION
 FUTURE DIRECTIONS
 REFERENCES
 
One goal of the many proteomics projects published to date has been to map the PPI networks that exist in the respective organisms and thus determine the interactions that govern normal cell function. OPHID was designed to utilize this model organism interaction data in order to rapidly extend our knowledge of the human interactome. Only recently have LIT databases of human interactions begun to catch up with those devoted to model organisms, but while these are highly useful resources that improve access to the human interactome, these databases only recapitulate the known interactions published in the literature. Although HTP experiments are being performed on increasingly complex organisms, to date, few have been performed on mouse or humans.

Given the combinatorial explosion in the mouse and human interactomes that will surely emanate from the 20 to 25 000 genes in the genomes (International Human Genome Sequencing Consortium, 2004) (compared to 6000 in S.cerevisiae, 22 000 in C.elegans and 13 500 in D.melanogaster), it is unlikely that the higher eukaryote interactomes will be fully covered by experimental means in the near future. Thus, model organism interactomes must be used to gain insight into the human interaction networks and to begin using the resulting network to explore normal and disease processes in the near term. Further, this provides an opportunity for functional annotation of human and mouse proteins (currently 27 939 human proteins lack GO terms in Swiss-Prot Build 45.0) and provides a means for studying evolutionary conservation of important subnetworks in PPI datasets.

OPHID provides predictions of ~24 000 PPIs, many of which we have supported with additional evidence. The database can be used in several ways. First, as a model of the human interactome, it can be used to explore known pathways, add new proteins to existing pathways or develop novel pathways altogether. Second, OPHID may be used as an aid in designing new PPI experiments by indicating whether orthologous proteins have been reported to interact in other organisms. Third, the data within OPHID can be integrated with additional datasets (e.g. expression data from disease profiles, OMIM data on disease-related proteins) to reveal new protein interactions and pathways that may be involved in human disease (Barrios-Rodiles et al., 2005). As new PPI datasets become available, they are being incorporated into OPHID; thus, OPHID will continue to represent an up-to-date, valuable resource for experiment planning.

Homology-based approaches to predicting PPIs may contain some inaccuracies (Deane et al., 2002; Matthews et al., 2001) depending on the filtering criteria used. For example, in mapping S.cerevisiae interactions to C.elegans, Matthews et al. (2001) were only able to reproduce 16–31% of the predicted interactions in a Y2H system. In this experiment, the method of mapping interactions was to consider only the best matching C.elegans homolog for each S.cerevisiae protein. The reciprocal best match approach that we have used (System and Methods section) provides a more stringent mapping between orthologous proteins. While providing a lower coverage of the potential interactome, this method provides better accuracy in the predicted interactions (Yu et al., 2004).

Other groups have used InParanoid to predict human PPIs (Lehner and Fraser, 2004) rather than the reciprocal best-hit approach. Using our semantic similarity measure, only 13.7% of interactions in the Lehner dataset are supported, while OPHID has 20.6% supported interactions (considering only those PPIs with GO terms). The reciprocal best-hit approach thus has more in silico support, which suggests greater accuracy than the InParanoid-based predictions.

Our additional evidence currently supports 23% of the predicted PPIs. This is influenced by limitations in the domain network and sparse GO annotation of the human proteins and therefore likely it represents a lower limit to the interaction support. Further, it has been suggested that only 66% of previously known PPIs may show co-expression at the mRNA level (Kemmeren et al., 2002). Therefore, a lack of in silico validation does not necessarily indicate that the interaction is less reliable, but may simply be due to the lower level of annotation of human proteins to date. Despite these challenges, OPHID provides a sizable number of novel PPIs supported by in silico evidence.

In building OPHID, we chose to include the entire von Mering dataset (von Mering et al., 2002), which consists of high, medium and low confidence subsets. The protein complexes in this dataset were connected in an all-to-all (matrix) fashion. While the matrix model has been shown to be less accurate than the spoke model (Bader and Hogue, 2002), the decision to include this data in its entirety was based on providing the largest possible coverage of the human interactome and then filtering at a later time by using supporting evidence. Although the low confidence subset contains fewer supportable interactions relative to the high and medium subsets (Fig. 3B), it is important to note that the results are comparable to the most reliable experimental C.elegans interactions (CORE_1, CORE_2) or the D.melanogaster Y2H interactions.

OPHID users can easily filter out less reliable interactions and include only the highest quality interaction data in their subsequent analysis, bearing in mind that reducing the false-positive rate increases the false-negative rate. We believe that there are numerous reliable (supportable) interactions to be gained by including the low quality data from each of these subsets (yeast low, NON_CORE and FlyLow) and we have indeed found many mapped interactions from these subsets that appear to be reliable human interactions.


    FUTURE DIRECTIONS
 TOP
 Abstract
 INTRODUCTION
 SYSTEM AND METHODS
 IMPLEMENTATION
 RESULTS
 DISCUSSION
 FUTURE DIRECTIONS
 REFERENCES
 
OPHID will continue to grow as new interaction datasets become available and additional evidence will continue to be sought. We expect the in silico evidence for the OPHID interactions to improve in parallel with the annotation of human proteins. Additionally, including metrics such as coevolution can help reinforce the relatedness of the individual predicted interactions (Tan et al., 2004). Ultimately, a machine classifier will be developed to provide a unified confidence score for the OPHID interactions that will allow users an additional means of filtering the predicted protein interactions.


    Acknowledgments
 
The authors thank R. Lu and D. Otasek for software development. We acknowledge the hardware and software support from IBM Life Sciences through a Shared University Research Grant and support from the National Science and Engineering Research Council (RGPIN 203833-02), the Institute for Robotics and Intelligent Systems, Precarn Inc, National Institutes of Health (#P50-GM62413), Fashion Show and Younger Foundations change.


    Footnotes
 
Note: DIP is only used internally for analysis. It is not reproduced on the OPHID website due to copyright restrictions. Back

Received on September 23, 2004; revised on January 10, 2005; accepted on January 11, 2005

    REFERENCES
 TOP
 Abstract
 INTRODUCTION
 SYSTEM AND METHODS
 IMPLEMENTATION
 RESULTS
 DISCUSSION
 FUTURE DIRECTIONS
 REFERENCES
 

    International Human Genome Sequencing Consortium. (2004) Finishing the euchromatic sequence of the human genome. Nature, 431, 931–945[CrossRef][Medline].

    Bader, G.D. and Hogue, C.W. (2002) Analyzing yeast protein–protein interaction data obtained from different sources. Nat. Biotechnol, 20, 991–997[CrossRef][Web of Science][Medline].

    Bader, J.S., Chaudhuri, A., Rothberg, J.M., Chant, J. (2004) Gaining confidence in high-throughput protein interaction networks. Nat. Biotechnol, 22, 78–85[CrossRef][Web of Science][Medline].

    Barrios-Rodiles, M., Brown, K.R., Ozdamar, B., Liu, Z., Donovan, R.S., Shinfo, F., Liu, Y., Bose, R., Dembowy, J.R. (2005) High-Throughput Mapping of a Dynamic Signalling Network In Mammalian Cells. Science, in press.

    Betel, D., Isserlin, R., Hogue, C.W. (2004) Analysis of domain correlations in yeast protein complexes. Bioinformatics, 20, Suppl 1, SI55–SI62.

    Bowers, P.M., Pellegrini, M., Thompson, M.J., Fierro, J., Yeates, T.O., Eisenberg, D. (2004) Prolinks: a database of protein functional linkages derived from coevolution. Genome Biol, 5, R35[CrossRef][Medline].

    Colland, F., Jacq, X., Trouplin, V., Mougin, C., Groizeleau, C., Hamburger, A., Meil, A., Wojcik, J., Legrain, P., Gauthier, J.M. (2004) Functional proteomics mapping of a human signaling pathway. Genome Res, 14, 1324–1332[Abstract/Free Full Text].

    Deane, C.M., Salwinski, L., Xenarios, I., Eisenberg, D. (2002) Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol. Cell. Proteomics, 1, 349–356[Abstract/Free Full Text].

    Deng, M., Mehta, S., Sun, F., Chen, T. (2002) Inferring domain–domain interactions from protein–protein interactions. Genome Res, 12, 1540–1548[Abstract/Free Full Text].

    Deng, M., Sun, F., Chen, T. (2003) Assessment of the reliability of protein–protein interactions and protein function prediction. Pac. Symp. Biocomput, 140–151.

    Gavin, A.-C., Bösche, M., Krause, R., Grandi, P., Marzioch, M., Bauer, A., Schultz, J., Rick, J., Michon, A.-M., Cruciat, C., et al. (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature, 415, 141–147[CrossRef][Medline].

    Ge, H., Liu, Z., Church, G.M., Vidal, M. (2001) Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat. Genet., 29, 482–486[CrossRef][Web of Science][Medline].

    Giot, L., Bader, J.S., Brouwer, C., Chaudhuri, A., Kuang, B., Li, Y., Hao, Y.L., Ooi, C.E., Godwin, B., Vitols, E., et al. (2003) A protein interaction map of Drosophila melanogaster. Science, 302, 1727–1736[Abstract/Free Full Text].

    Han, K., Park, B., Kim, H., Hong, J., Park, J. (2004) HPID: the human protein interaction database. Bioinformatics, 20, 2466–2470[Abstract/Free Full Text].

    Hermjakob, H., Montecchi-Palazzi, L., Bader, G., Wojcik, J., Salwinski, L., Ceol, A., Moore, S., Orchard, S., Sarkans, U., von Mering, C., et al. (2004) The HUPO PSI's molecular interaction format—a community standard for the representation of protein interaction data. Nat. Biotechnol, 22, 177–183[CrossRef][Web of Science][Medline].

    Ho, Y., Gruhler, A., Heilbut, A., Bader, G.D., Moore, L., Adams, S.L., Millar, A., Taylor, P., Bennett, K., Boutilier, K., et al. (2002) Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature, 415, 180–183[CrossRef][Medline].

    Huang, T.-W., Tien, A.-C., Huang, W.-S., Lee, Y.C.G., Peng, C.-L., Tseng, H.-H., Kao, C.-Y., Huang, C.-Y.F. (2004) POINT: a database for the prediction of protein–protein interactions based on the orthologous interactome. Bioinformatics, 20, 3273–3276[Abstract/Free Full Text].

    Ito, T., Chiba, T., Ozawa, R., Yoshida, M., Hattori, M., Sakaki, Y. (2001) A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl Acad. Sci. USA, 98, 4569–4574[Abstract/Free Full Text].

    Jansen, R., Greenbaum, D., Gerstein, M. (2002) Relating whole-genome expression data with protein–protein interactions. Genome Res, 12, 37–46[Abstract/Free Full Text].

    Kemmeren, P., van Berkum, N.L., Vilo, J., Bijma, T., Donders, R., Brazma, A., Holstege, F.C.P. (2002) Protein interaction verification and functional annotation by integrated analysis of genome-scale data. Mol. Cell, 9, 1133–1143[CrossRef][Web of Science][Medline].

    Lehner, B. and Fraser, A.G. (2004) A first-draft human protein-interaction map. Genome Biol, 5, R63.61–R63.69.

    Lehner, B., Semple, J.I., Brown, S.E., Counsell, D., Campbell, R.D., Sanderson, C.M. (2004) Analysis of a high-throughput yeast two-hybrid system and its use to predict the function of intracellular proteins encoded within the human MHC class III region. Genomics, 83, 153–167[CrossRef][Web of Science][Medline].

    Li, S., Armstrong, C.M., Bertin, N., Ge, H., Milstein, S., Boxem, M., Vidalain, P.O., Han, J.D., Chesneau, A., Hao, T., et al. (2004) A map of the interactome network of the metazoan C. elegans. Science, 303, 540–543[Abstract/Free Full Text].

    Lord, P.W., Stevens, R.D., Brass, A., Goble, C.A. (2003) Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics, 19, 1275–1283[Abstract/Free Full Text].

    Luc, P.V. and Tempst, P. (2004) PINdb: a database of nuclear protein complexes from human and yeast. Bioinformatics, 20, 1413–1415[Abstract/Free Full Text].

    Matthews, L.R., Vaglio, P., Reboul, J., Ge, H., Davis, B.P., Garrels, J., Vincent, S., Vidal, M. (2001) Identification of potential interaction networks using sequence-based searches for conserved protein–protein interactions or ‘interologs’. Genome Res., 11, 2120–2126[Abstract/Free Full Text].

    Mellor, J.C., Yanai, I., Clodfelter, K.H., Mintseris, J., DeLisi, C. (2002) Predictome: a database of putative functional links between proteins. Nucleic Acids Res, 30, 306–309[Abstract/Free Full Text].

    Pagel, P., Mewes, H.W., Frishman, D. (2004) Conservation of protein–protein interactions—lessons from ascomycota. Trends Genet, 20, 72–76[CrossRef][Web of Science][Medline].

    Peri, S., Navarro, J.D., Amanchy, R., Kristiansen, T.Z., Jonnalagadda, C.K., Surendranath, V., Niranjan, V., Muthusamy, B., Gandhi, T.K.B., Gronborg, M., et al. (2003) Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res, 13, 2363–2371[Abstract/Free Full Text].

    Sprinzak, E. and Margalit, H. (2001) Correlated sequence-signatures as markers of protein–protein interaction. J. Mol. Biol, 311, 681–692[CrossRef][Web of Science][Medline].

    Sprinzak, E., Sattath, S., Margalit, H. (2003) How reliable are experimental protein–protein interaction data? J. Mol. Biol, 327, 919–923[CrossRef][Web of Science][Medline].

    Su, A.I., Wiltshire, T., Batalov, S., Lapp, H., Ching, K.A., Block, D., Zhang, J., Soden, R., Hayakawa, M., Kreiman, G., et al. (2004) A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl Acad. Sci. USA, 101, 6062–6067[Abstract/Free Full Text].

    Suzuki, H., Fukunishi, Y., Kagawa, I., Saito, R., Oda, H., Endo, T., Kondo, S., Bono, H., Okazaki, Y., Hayashizaki, Y. (2001) Protein–protein interaction panel using mouse full-length cDNAs. Genome Res, 11, 1758–1765[Abstract/Free Full Text].

    Suzuki, H., Saito, R., Kanamori, M., Kai, C., Schonbach, C., Nagashima, T., Hosaka, J., Hayashizaki, Y. (2003) The mammalian protein–protein interaction database and its viewing system that is linked to the main FANTOM2 viewer. Genome Res, 13, 1534–1541[Abstract/Free Full Text].

    Tan, S.H., Zhang, Z., Ng, S.K. (2004) ADVICE: automated detection and validation of interaction by co-evolution. Nucleic Acids Res, 32, W69–W72[Abstract/Free Full Text].

    Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., Judson, R.S., Knight, J.R., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P., et al. (2000) A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae. Nature, 403, 623–627[CrossRef][Medline].

    von Mering, C., Huynen, M., Jaeggi, D., Schmidt, S., Bork, P., Snel, B. (2003) STRING: a database of predicted functional associations between proteins. Nucleic Acids Res., 31, 258–261[Abstract/Free Full Text].

    von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., Fields, S., Bork, P. (2002) Comparative assessment of large-scale data sets of protein–protein Interactions. Nature, 417, 399–403[Medline].

    Waterston, R.H., Lindblad-Toh, K., Birney, E., Rogers, J., Abril, J.F., Agarwal, P., Agarwala, R., Ainscough, R., Alexanderson, M., An, P., et al. (2002) Initial sequencing and comparative analysis of the mouse genome. Nature, 420, 520–562[CrossRef][Medline].

    Wojcik, J. and Schachter, V. (2001) Protein–protein interaction map inference using interacting domain profile pairs. Bioinformatics, 17, S296–S305[Abstract].

    Wuchty, S., Oltvai, Z.N., Barabasi, A.L. (2003) Evolutionary conservation of motif constituents in the yeast protein interaction network. Nat. Genet, 35, 176–179[CrossRef][Web of Science][Medline].

    Xenarios, I., Rice, D.W., Salwinski, L., Baron, M.K., Marcotte, E.M., Eisenberg, D. (2000) DIP: the database of interacting proteins. Nucleic Acids Res., 28, 289–291[Abstract/Free Full Text].

    Yu, H., Luscombe, N.M., Lu, H.X., Zhu, X., Xia, Y., Han, J.D., Bertin, N., Chung, S., Vidal, M. (2004) Annotation transfer between genomes: protein–protein interologs and protein–DNA regulogs. Genome Res., 14, 1107–1118[Abstract/Free Full Text].

    Zanzoni, A., Montecchi-Palazzi, L., Quondam, M., Ausiello, G., Helmer-Citterich, M., Cesareni, G. (2002) MINT: a Molecular INTeraction database. FEBS Lett., 513, 135–140[CrossRef][Web of Science][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J. Am. Soc. Nephrol.Home page
H. Fukasawa, S. Bornheimer, K. Kudlicka, and M. G. Farquhar
Slit Diaphragms Contain Tight Junction Proteins
J. Am. Soc. Nephrol., July 1, 2009; 20(7): 1491 - 1503.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
H. Blankenburg, F. Ramirez, J. Buch, and M. Albrecht
DASMIweb: online integration, analysis and assessment of distributed protein interaction data
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W122 - W128.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
H. Blankenburg, R. D. Finn, A. Prlic, A. M. Jenkinson, F. Ramirez, D. Emig, S.-E. Schelhorn, J. Buch, T. Lengauer, and M. Albrecht
DASMI: exchanging, annotating and assessing molecular interaction data
Bioinformatics, May 15, 2009; 25(10): 1321 - 1328.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
N. Tuncbag, G. Kar, O. Keskin, A. Gursoy, and R. Nussinov
A survey of available tools and web servers for analysis of protein-protein interactions and interfaces
Brief Bioinform, May 1, 2009; 10(3): 217 - 232.
[Abstract] [Full Text] [PDF]


Home page
Nephrol Dial TransplantHome page
J. Wilflingseder, A. Kainz, P. Perco, R. Korbely, B. Mayer, and R. Oberbauer
Molecular predictors for anaemia after kidney transplantation
Nephrol. Dial. Transplant., March 1, 2009; 24(3): 1015 - 1023.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
G. Chaurasia, S. Malhotra, J. Russ, S. Schnoegl, C. Hanig, E. E. Wanker, and M. E. Futschik
UniHI 4: new tools for query, analysis and visualization of the human protein-protein interactome
Nucleic Acids Res., January 1, 2009; 37(suppl_1): D657 - D660.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. D. McDowall, M. S. Scott, and G. J. Barton
PIPs: human protein-protein interaction prediction database
Nucleic Acids Res., January 1, 2009; 37(suppl_1): D651 - D656.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. ProteomicsHome page
Y. J. Huang, D. Hang, L. J. Lu, L. Tong, M. B. Gerstein, and G. T. Montelione
Targeting the Human Cancer Pathway Protein Interaction Network by Structural Genomics
Mol. Cell. Proteomics, October 1, 2008; 7(10): 2048 - 2060.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Michaut, S. Kerrien, L. Montecchi-Palazzi, F. Chauvat, C. Cassier-Chauvat, J.-C. Aude, P. Legrain, and H. Hermjakob
InteroPORC: automated inference of highly conserved protein interaction networks
Bioinformatics, July 15, 2008; 24(14): 1625 - 1631.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Ozgur, T. Vu, G. Erkan, and D. R. Radev
Identifying gene-disease associations using centrality on a literature mined gene-interaction network
Bioinformatics, July 1, 2008; 24(13): i277 - i285.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
L. Li, C. Wu, H. Huang, K. Zhang, J. Gan, and S. S.-C. Li
Prediction of phosphotyrosine signaling networks using a scoring matrix-assisted ligand identification approach
Nucleic Acids Res., June 1, 2008; 36(10): 3263 - 3273.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
J. Geisler-Lee, N. O'Toole, R. Ammar, N. J. Provart, A. H. Millar, and M. Geisler
A Predicted Interactome for Arabidopsis
Plant Physiology, October 1, 2007; 145(2): 317 - 329.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Schlicker, C. Huthmacher, F. Ramirez, T. Lengauer, and M. Albrecht
Functional evaluation of domain domain interactions and human protein interaction networks
Bioinformatics, April 1, 2007; 23(7): 859 - 865.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. E. Futschik, G. Chaurasia, and H. Herzel
Comparison of human protein protein interaction maps
Bioinformatics, March 1, 2007; 23(5): 605 - 611.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
N. Przulj
Biological network comparison using graphlet degree distribution
Bioinformatics, January 15, 2007; 23(2): e177 - e183.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
G. Chaurasia, Y. Iqbal, C. Hanig, H. Herzel, E. E. Wanker, and M. E. Futschik
UniHI: an entry gate to the human protein interactome
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D590 - D594.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Xu and Y. Li
Discovering disease-genes by topological features in human protein-protein interaction network
Bioinformatics, November 15, 2006; 22(22): 2800 - 2805.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. A. George, J. Y. Liu, L. L. Feng, R. J. Bryson-Richardson, D. Fatkin, and M. A. Wouters
Analysis of protein sequence and interaction data for candidate disease gene prediction
Nucleic Acids Res., November 14, 2006; 34(19): e130 - e130.
[Abstract] [Full Text] [PDF]


Home page
J R Soc InterfaceHome page
N. Przulj and D. J Higham
Modelling protein-protein interaction networks via a stickiness index
J R Soc Interface, October 22, 2006; 3(10): 711 - 716.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
P. F. Jonsson and P. A. Bates
Global topological features of cancer proteins in the human interactome
Bioinformatics, September 15, 2006; 22(18): 2291 - 2297.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
X. Guo, R. Liu, C. D. Shriver, H. Hu, and M. N. Liebman
Assessing semantic similarity measures for the characterization of human regulatory pathways
Bioinformatics, April 15, 2006; 22(8): 967 - 973.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Wachi, K. Yoneda, and R. Wu
Interactome-transcriptome analysis reveals the high centrality of genes differentially expressed in lung cancer tissues
Bioinformatics, December 1, 2005; 21(23): 4205 - 4208.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
M. E. Cusick, N. Klitgord, M. Vidal, and D. E. Hill
Interactome: gateway into systems biology
Hum. Mol. Genet., October 15, 2005; 14(suppl_2): R171 - R181.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/9/2076    most recent
bti273v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (86)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Brown, K. R.
Right arrow Articles by Jurisica, I.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Brown, K. R.
Right arrow Articles by Jurisica, I.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?