Bioinformatics Advance Access originally published online on July 24, 2008
Bioinformatics 2008 24(19):2265-2266; doi:10.1093/bioinformatics/btn380
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Cytoprophet: a Cytoscape plug-in for protein and domain interaction networks inference
1Department of Computer Science and Engineering and 2Department of Electrical Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: Cytoprophet is a software tool that allows prediction and visualization of protein and domain interaction networks. It is implemented as a plug-in of Cytoscape, an open source software framework for analysis and visualization of molecular networks. Cytoprophet implements three algorithms that predict new potential physical interactions using the domain composition of proteins and experimental assays. The algorithms for protein and domain interaction inference include maximum likelihood estimation (MLE) using expectation maximization (EM); the set cover approach maximum specificity set cover (MSSC) and the sum-product algorithm (SPA). After accepting an input set of proteins with Uniprot ID/Accession numbers and a selected prediction algorithm, Cytoprophet draws a network of potential interactions with probability scores and GO distances as edge attributes. A network of domain interactions between the domains of the initial protein list can also be generated. Cytoprophet was designed to take advantage of the visual capabilities of Cytoscape and be simple to use. An example of inference in a signaling network of myxobacterium Myxococcus xanthus is presented and available at Cytoprophet's website.
Availability: http://cytoprophet.cse.nd.edu
Contact: cytoprophet{at}cse.nd.edu
Supplementary information: Examples and supplementary data are accessible at http://cytoprophet.cse.nd.edu
| 1 INTRODUCTION |
|---|
|
|
|---|
Prediction of protein interaction networks is an important topic of research. It aims to elucidate and characterize novel protein interactions with the help of biological knowledge of proteins, large quantities of experimental data available in online databases and the use of biochemical and biophysical principles. Knowing the inherent protein interaction network in any organism provides an important road map to understanding the behavior and function of cellular pathways. Hence, computational tools that complement experimental data represent an important step towards the characterization of interactome networks in living systems.
To the present day, numerous algorithms and methods to predict protein interactions have been developed. Different approaches use specific paradigms to infer interactions. These paradigms include the knowledge of gene fusion events, phylogenetic profiles, in silico protein interaction assays and biophysics simulations (Valencia and Pazos, 2002). Although several approaches have been proposed, only a handful of them are actually accessible to researchers trying to extract biological insights from these results. In an attempt to broaden the impact of recent molecular interaction prediction algorithms, we developed a software tool that allows the user to predict and visualize interactions in Cytoscape (Shannon et al., 2003).
| 2 DESCRIPTION |
|---|
|
|
|---|
This Cytoscape plug-in called Cytoprophet has two types of inputs and outputs described in Table 1. It receives as input proteins with Uniprot (Consortium, 2007) ID or accession numbers and the output are networks with interaction attributes.
|
Three existent algorithms for the prediction of protein–protein interactions (PPI) and domain–domain interactions (DDI) are implemented on the Cytoprophet server. These algorithms include a maximum likelihood estimation (MLE) with expectation maximization (EM) as proposed in Deng et al. (2002), a set cover approach called maximum specificity set cover (MSSC) as described in Huang et al. (2007) and the implementation of the sum-product algorithm (SPA) for DDI and PPI inference as proposed in Sikora et al. (2007). These three algorithms have the common characteristic of using high-throughput experimental data (e.g. Yeast two-hybrid screenings and the domain architecture of proteins) to estimate the probability of interacting domain and protein pairs. Cytoprophet gathers experimental interaction data from specialized databases such as the database for interacting proteins (DIP) (Xenarios et al., 2002). Information about the domain composition of proteins and domain interactions is obtained from Pfam (Bateman et al., 2004) and structural data is retrieved from the protein data bank (PDB) (Deshpande et al., 2005). Figure 1 shows the operation of this software tool. Once a set of proteins with Uniprot ID/Accession labels has been specified, the user has the option to select one of the three different inference algorithms: MLE, MSSC and SPA. A second option is to draw, in a second window, a network of interacting domains related to the predicted protein network. If two proteins interact this is due to an inherent domain pair interaction as depicted in the DDI network.
|
The client sends a list of proteins to the Cytoprophet server where it is processed. The server precalculates the likelihood of all domain pair interactions for which there is information available and uses this information to calculate PPI probabilities for the client program. The server sends a response in an XGMML format that includes the nodes, edges and attributes like probabilities and GO distances.
The user has the option to enrich the current predicted PPI network with GO distance attributes. The visualization colors the edges according to a mapping that allows easy identification of key GO distance values, ranging from red for low distances to green for a distance of one. For those pairs where no GO annotations are available, the edge is colored gray. Another visual property in Cytoprophet is the scaling of node size depending on its degree distribution, making it easier to identify proteins involved in more interactions.
| 3 EXAMPLE |
|---|
|
|
|---|
An example of how to use this plug-in is provided for a signaling network of the gram-negative bacteria Myxococcus xanthus. The data and a step by step description are available at Cytoprophet's website. A number of proteins in M.xanthus are involved in the process of switching the motility direction of these bacteria. The frequency of reversals is dictated by a set of proteins called Frizzy proteins that form a biochemical oscillator (Igoshin et al., 2004). An important topic is the link between the Frizzy proteins and those proteins involved in the construction of motility engines. Using Frizzy and other motility proteins in M.xanthus as input for Cytoprophet, we can identify a series of plausible interactions that gave support to previous hypothesis of how switching motility is signaled to the motility engines.
| 4 DISCUSSION |
|---|
|
|
|---|
We have introduced Cytoprophet as a plug-in for Cytoscape with the goal of reaching a larger set of scientists that could take advantage of PPI/DDI prediction algorithms. This tool will aid in the process of understanding biological networks and the discovery of novel protein interactions. Since updates of the algorithms and datasets are done server-side, modifications will be transparent to the end-user. We decided to implement the algorithms in Cytoscape as it is known for its simplicity, powerful capabilities and strong support from the scientific community.
Funding: NSF grants (DBI-0450067 and CCF-0622940).
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Olga Troyanskaya
Received on May 2, 2008; revised on May 2, 2008; accepted on July 19, 2008
| REFERENCES |
|---|
|
|
|---|
Bateman A, et al. The pfam protein families database. Nucleic Acids Res (2004) 32(Database issue):D138–D141.
Consortium U. The universal protein resource (uniprot). Nucleic Acids Res (2007) 35(Database issue):D193–D197.
Deng M, et al. Inferring domain-domain interactions from protein-protein interactions. Genome Res (2002) 12:1540–1548.
Deshpande N, et al. The rcsb protein data bank: a redesigned query system and relational database based on the mmcif schema. Nucleic Acids Res (2005) 33(Database issue):D233–D237.
Huang C, et al. Predicting protein-protein interactions form protein domains using a set cover approach. IEEE/ACM Trans. Comput. Biol. Bioinform (2007) 4:78–87.[CrossRef]
Igoshin OA, et al. A biochemical oscillator explains several aspects of myxococcus xanthus behavior during development. Proc. Natl Acad. Sci. USA (2004) 101:15760–15765.
Shannon P, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res (2003) 13:2498–2504.
Sikora M, et al. Bayesian inference of protein and domain interactions using the sum-product algorithm. In: Proceedings of the 2007 Information Theory and Applications Workshop. (2007) Institute of Electrical and Electronics Engineers (IEEE).
Valencia A, Pazos F. Computational methods for the prediction of protein interactions. Curr. Opin. Struct. Biol (2002) 12:368–373.[CrossRef][Web of Science][Medline]
Xenarios I, et al. Dip, the database of interacting proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res (2002) 30:303–305.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
