Bioinformatics Advance Access originally published online on February 1, 2006
Bioinformatics 2006 22(8):974-980; doi:10.1093/bioinformatics/btl030
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Efficient estimation of graphlet frequency distributions in proteinprotein interaction networks
ulj 1,3
1 Department of Computer Science, University of Toronto Toronto M5S 3G4, Canada
2 Ontario Cancer Institute, Division of Signaling Biology Toronto M5G 1L7, Canada
3 Department of Computer Science, UC Irvine Irvine, CA 92697-3435, USA
*To whom correspondence should be addressed.
Motivation: Algorithmic and modeling advances in the area of proteinprotein interaction (PPI) network analysis could contribute to the understanding of biological processes. Local structure of networks can be measured by the frequency distribution of graphlets, small connected non-isomorphic induced subgraphs. This measure of local structure has been used to show that high-confidence PPI networks have local structure of geometric random graphs. Finding graphlets exhaustively in a large network is computationally intensive. More complete PPI networks, as well as PPI networks of higher organisms, will thus require efficient heuristic approaches.
Results: We propose two efficient and scalable heuristics for finding graphlets in high-confidence PPI networks. We show that both PPI and their model geometric random networks, have defined boundaries that are sparser than the inner parts of the networks. In addition, these networks exhibit uniformity of local structure inside the networks. Our first heuristic exploits these two structural properties of PPI and geometric random networks to find good estimates of graphlet frequency distributions in these networks up to 690 times faster than the exhaustive searches. Our second heuristic is a variant of a more standard sampling technique and it produces accurate approximate results up to 377 times faster than the exhaustive searches. We indicate how the combination of these approaches may result in an even better heuristic.
Availability: Supplementary information is available at http://www.cs.toronto.edu/~natasha/BIOINF-2005-0946/Supplementary.pdf Software implementing the algorithms is available at http://www.cs.toronto.edu/~natasha/BIOINF-2005-0946/estimate_grap-hlets.html
Contact: juris{at}cs.toronto.edu; natasha{at}igor.ics.uci.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
Received on May 19, 2005; revised on January 25, 2006; accepted on January 26, 2006
This article has been cited by other articles:
![]() |
G. Ciriello and C. Guerra A review on models and algorithms for motif discovery in protein-protein interaction networks Brief Funct Genomic Proteomic, April 28, 2008; (2008) eln015v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. J. Higham, M. Rasajski, and N. Przulj Fitting a geometric graph to a protein-protein interaction network Bioinformatics, April 15, 2008; 24(8): 1093 - 1099. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Przulj Biological network comparison using graphlet degree distribution Bioinformatics, January 15, 2007; 23(2): e177 - e183. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Aittokallio and B. Schwikowski Graph-based methods for analysing networks in cell biology Brief Bioinform, September 1, 2006; 7(3): 243 - 255. [Abstract] [Full Text] [PDF] |
||||


