Bioinformatics Advance Access originally published online on February 10, 2006
Bioinformatics 2006 22(8):1021-1023; doi:10.1093/bioinformatics/btl039
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
CFinder: locating cliques and overlapping modules in biological networks
1 Department of Biological Physics, Eötvös University Pázmány P. stny. 1A, H-1117 Budapest, Hungary
2 Biological Physics Research Group of the Hungarian Academy of Sciences Pázmány P. stny. 1A, H-1117 Budapest, Hungary
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: Most cellular tasks are performed not by individual proteins, but by groups of functionally associated proteins, often referred to as modules. In a protein assocation network modules appear as groups of densely interconnected nodes, also called communities or clusters. These modules often overlap with each other and form a network of their own, in which nodes (links) represent the modules (overlaps). We introduce CFinder, a fast program locating and visualizing overlapping, densely interconnected groups of nodes in undirected graphs, and allowing the user to easily navigate between the original graph and the web of these groups. We show that in gene (protein) association networks CFinder can be used to predict the function(s) of a single protein and to discover novel modules. CFinder is also very efficient for locating the cliques of large sparse graphs.
Availability: CFinder (for Windows, Linux and Macintosh) and its manual can be downloaded from http://angel.elte.hu/clustering.
Supplementary information: Supplementary data are available on Bioinformatics online.
Contact: cfinder{at}angel.elte.hu
| 1 INTRODUCTION |
|---|
|
|
|---|
High-throughput experimental techniques, e.g. proteinprotein interaction (PPI) and mRNA expression methods, have largely advanced our knowledge about the functioning of the cell. Gene (protein) association networks integrate the broadest possible set of evidenceincluding high-throughput dataon protein linkages: they provide an integrated list of binary interactions (von Mering et al., 2005; Salwinski et al., 2004) and allow the discovery of previously uncharacterized cellular systems (Date and Marcotte, 2003). One major goal of current research efforts is to elucidate how the observed behaviours of an entire cell can be understood in terms of the interactions of its protein modules. To identify such modules, a common approach is to search for groups of densely interconnected nodes in the cell's protein association network (Bader and Hogue, 2003; Rives and Galitski, 2003). Note, however, that modules strongly overlap. According to the CYGD database (Guldener et al., 2005), in Saccharomyces cerevisiae the number of proteins in known protein complexes (modules where the participating proteins physically interact at the same time) versus the sum of the sizes of these complexes is 2750/8932. Thus, most protein modules probably share many of their proteins with other modules.
We introduce CFinder, a platform-independent, stand-alone application locating overlapping groups of densely interconnected nodes in graphs and illustrate its use on the network of gene associations in the yeast genome. We decided to maintain CFinder as an independent program (as opposed to a package plugin), because it can be employed by potential users belonging to diverse fields including, in addition to bioinformatics, economics or sociology.
Generic graph visualization and analysis programs (Batagelj and Mrvar, 1998) are frequently used for the layout and structural analysis of networks. Recent bioinformatics software platforms (Shannon et al., 2003), on the other hand, enable the user to integrate many different types of data, e.g. PPI, expression levels and annotation information. CFinder reads a list of binary interactions, performs a search for dense subgraphs (groups), andunlike several currently used algorithms (Newman, 2004)it allows for any node to belong to more than one group. Owing to its algorithm and implementation, CFinder is efficient for networks with millions of nodes and, as a byproduct of its search, the full clique overlap matrix of the network is determined. Below we will show that in gene association networks CFinder's results can be used to predict novel modules and novel individual protein functions.
| 2 OVERVIEW OF CFINDER |
|---|
|
|
|---|
The input of CFinder is a file containing strings and numbers ordered into three columns; in each row the first two strings correspond to the two end points of a link and the third item is the weight of this link.
The computational core of CFinder was implemented in C++, while the visualization and analysis components were written in Java. The search algorithm uses the Clique Percolation Method (CPM, see Derényi et al., 2005) to locate the k-clique percolation clusters of the network that we interpret as modules. A k-clique is a complete subgraph on k nodes (k = 3, 4, ...), and two k-cliques are said to be adjacent, if they share exactly k 1 nodes. A k-clique percolation cluster consists of (1) all nodes that can be reached via chains of adjacent k-cliques from each other and (2) the links in these cliques. Note that larger values of k correspond to a higher stringency during the identification of dense groups and provide smaller groups with a higher density of links inside them. For both local and global analyses in a network, we suggest using such a value of k (typically between 4 and 6) that provides the user with the richest group structure (see Palla et al., 2005). In the presence of link weights CFinder can apply lower and upper cutoff values to keep only the set of connections judged to be significant by the user.
The user interface of CFinder offers several views of the analysed network and its module structure. As an example, Figure 1 shows the modules of the protein Pwp2 in the DIP yeast core network (Salwinski et al., 2004) at clique size k = 4. Alternative views currently available in CFinder are Communities (displaying the identified modules), Cliques, Stats (statistics of, e.g. module and overlap sizes) and Graph of communities. The special buttons forward, back, zoom and walk allow a quick navigation between the views. A wide variety of visualization settings can be adjusted in the Tools menu.
|
Figure 2 displays the network of modules produced by CFinder (k = 4) in the DIP yeast full dataset. In the complete map (a) each node represents a module, the area of a node is proportional to the number of proteins in the corresponding module, and the width of a link is proportional to the number of proteins shared by the two modules. Panel (b) shows a previously known complex identified by CFinder. Panels (c) and (d) both display a known complex grouped together with one additional protein (Msh2 and Vps8, respectively), leading to an improved functional annotation of that protein. In panel (e) Eeb1 (function currently unknown) is grouped together with proteins participating in vesicle-mediated transport, thus, we predict this to be a key function of Eeb1. Proteins in the marked dark blue and brown groups of panel (e) cooperate on the establishment of cell polarity, a function performed by a total of 103 proteins in the cell. (Please see colour figure online.) We anticipate that these two groups are biologically meaningful, novel modules within that larger set of 103 proteins. [Gene names and annotations were handled with Perl tools, e.g. GO::TermFinder (Boyle et al., 2004).]
|
| Acknowledgments |
|---|
The authors acknowledge funding from the Hungarian Scientific Research Fund, OTKA (Grants No. D048422, F047203 and T049674).
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Alfonso Valencia
Received on December 19, 2005; revised on February 1, 2006; accepted on February 2, 2006
| REFERENCES |
|---|
|
|
|---|
Bader, G.D. and Hogue, C.W.V. (2003) An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics, 4, 2[CrossRef][Medline].
Batagelj, V. and Mrvar, A. (1998) Pajekprogram for large network analysis. Connections, 21, 4757.
Boyle, E.I., et al. (2004) GO::TermFinderopen source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics, 20, 37103715
Date, S.V. and Marcotte, E.M. (2003) Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages. Nat. Biotechnol, . 21, 10551062[CrossRef][ISI][Medline].
Derényi, I., et al. (2005) Clique percolation in random networks. Phys. Rev. Lett, . 94, 160202.
Guldener, U., et al. (2005) CYGD: the Comprehensive Yeast Genome Database. Nucleic Acids Res, . 33, D364D368
von Mering, C., et al. (2005) STRING: known and predicted proteinprotein associations, integrated and transferred across organisms. Nucleic Acids Res, . 33, D433D437
Newman, M.E.J. (2004) Detecting community structure in networks. Eur. Phys. J. B, 38, 321330[CrossRef].
Palla, G., et al. (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435, 814818[CrossRef][Medline].
Rives, A.W. and Galitski, T. (2003) Modular organization of cellular networks. Proc. Natl Acad. Sci. USA, 100, 11281133
Salwinski, L., et al. (2004) The Database of Interacting Proteins: 2004 update. Nucleic Acids Res, . 32, D449D451
Shannon, P., et al. (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res, . 13, 24982504
This article has been cited by other articles:
![]() |
Y. Qi, F. Balem, C. Faloutsos, J. Klein-Seetharaman, and Z. Bar-Joseph Protein complex identification by supervised graph local clustering Bioinformatics, July 1, 2008; 24(13): i250 - i268. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.-Y. Lin, C.-H. Chin, H.-H. Wu, S.-H. Chen, C.-W. Ho, and M.-T. Ko Hubba: hub objects analyzer--a framework of interactome hubs identification for network biology Nucleic Acids Res., July 1, 2008; 36(suppl_2): W438 - W443. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. D. Wren, Y. Wu, and S.-W. Guo A system-wide analysis of differentially expressed genes in ectopic and eutopic endometrium Hum. Reprod., August 1, 2007; 22(8): 2093 - 2102. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Zhu, M. Gerstein, and M. Snyder Getting connected: analysis and principles of biological networks Genes & Dev., May 1, 2007; 21(9): 1010 - 1024. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. F. Jonsson and P. A. Bates Global topological features of cancer proteins in the human interactome Bioinformatics, September 15, 2006; 22(18): 2291 - 2297. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Aittokallio and B. Schwikowski Graph-based methods for analysing networks in cell biology Brief Bioinform, September 1, 2006; 7(3): 243 - 255. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






