Skip Navigation


Bioinformatics Advance Access originally published online on September 27, 2007
Bioinformatics 2007 23(24):3388-3390; doi:10.1093/bioinformatics/btm454
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
23/24/3388    most recent
btm454v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Fusco, D.
Right arrow Articles by Cosentino Lagomarsino, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Fusco, D.
Right arrow Articles by Cosentino Lagomarsino, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

DIA-MCIS: an importance sampling network randomizer for network motif discovery and other topological observables in transcription networks

D. Fusco 1,*, B. Bassetti 1,2, P. Jona 3 and M. Cosentino Lagomarsino 1,2,4

1Università degli Studi di Milano, Dip. Fisica, Via Celoria 16, 20133 Milano, 2I.N.F.N., Milano, 3Politecnico di Milano, Dip. Fisica, Pza Leonardo Da Vinci 32, 20133 Milano, Italy and 4UMR 168/Institut Curie, 26 rue d'Ulm 75005 Paris, France

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHM
 3 IMPLEMENTATION AND RESULTS
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 

Motivation: Transcription networks, and other directed networks can be characterized by some topological observables (e.g. network motifs), that require a suitable randomized network ensemble, typically with the same degree sequences of the original ones. The commonly used algorithms sometimes have long convergence times, and sampling problems. We present here an alternative, based on a variant of the importance sampling Monte Carlo developed by (Chen et al.).

Availability: The algorithm is available at http://wwwteor.mi.infn.it/~bassetti/downloads.html

Contact: diana.fusco{at}studenti.unimi.it and marco.cosentino{at}unimi.it

Supplementary information: Supplementary data are available at Bioinformatics online.


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHM
 3 IMPLEMENTATION AND RESULTS
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
Gene regulatory networks represent interactions between genes or proteins. For example, in a transcription network the nodes are genes, and the edges represent TF-promoter interactions (Babu et al., 2004). Considered their topology, one can study the deviation of the empirical topology from a ‘typical case’ statistics (Milo et al., 2004). To this end, one generates so called ‘randomized counterparts’ of the original dataset as a null model. That is, an ensemble of random networks with some invariant properties, such as the degree sequences, i.e. the number of outgoing and incoming links for each node. This approach has a wider application for networks of different kinds (Kashtan et al., 2004; Milo et al., 2004). Some algorithms to generate this uniformly distributed ensemble are commonly used (Chen et al., 2005; Milo et al., 2003). In particular, one Markov Chain Monte Carlo (MCMC) algorithm is based on swapping edges at random (Molloy et al., 1995). This generates an ergodic dynamics, with, however, large relaxation times. Another type of algorithm is the so called ‘stub-pairing’ algorithm (Milo et al., 2003) that consists in randomly linking ‘stubs’ made of nodes with prescribed degrees (Maslov et al., 2005; Rao et al., 1996). This technique may fall in metastable states, where no stubs can be connected (King, 2004). The algorithm developed in Chen et al. (2005) is free of these two problems. Based on importance sampling Monte Carlo, it generates matrices with an almost uniform probability, and subsequently adjusts the sample, assigning to every element a certain weight. Moreover, it is able to estimate the size of the sampled ensemble. Here, we present an implementation of this algorithm that works specifically on transcription networks, but may be applied in general, with two variants. The first variant is designed to improve speed and make the algorithm competitive to the existing ones, while sampling more efficiently. The second variant deals with ensembles of structured matrices, in particular with structured diagonal, as it is often done in transcription networks when dealing with self-regulations (Kashtan et al., 2004).


    2 ALGORITHM
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHM
 3 IMPLEMENTATION AND RESULTS
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
A directed network can be conveniently represented as a zero-one adjacency matrix where element a(i,j) is 1 if node j has a directed link to node i (Fig. 1A). The null ensemble of degree-conserving graphs translates into a set of matrices having the same row and column sums of the empirical matrix. As the goal is the uniform distribution of the sample, the importance sampling weight for every element is 1/P(T), where P(T) is the matrix probability. The algorithm generates the matrix column by column as illustrated in Figure 1A. One has to consider the row sums having subtracted the first column. When all the columns are filled, the total probability of having a certain matrix is the product of all the column probabilities, which can be computed knowing the constraints of each column (Chen et al., 1997). This number allows to weigh correctly the matrix sample. We introduced the following two variants.


Figure 1
View larger version (27K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. (A) Description of the algorithm. Left: the graph is translated into an adjacent matrix, which is filled column by column. Right/top: the procedure for generating a matrix is showed. The second column (in pink) must be generated. In order to perform this operation, the updated row sums and the constraint are calculated. The free positions are extracted and the algorithm considers the next (and last) column. Right/bottom: the column is then reordered by the residual row sums. One constraint is found and the last column filled. The matrix is now complete. (B) Table summarizing the results for three triangular motifs and the feedback (measured by Mcore, see Cosentino Lagomarsino et al., 2006) of the E.coli and S.cerevisiae transcription networks. Note that the E. Coli dataset has no feedback. (C) Comparison of subgraph (FFL, TGC) distributions in E.coli randomizations for structured (light green) and unstructured (dark blue) diagonals. These distributions are systematically shifted with respect to each other.

 
2.1 Large matrices with compact indegree
Transcription networks typically have several hundreds of nodes. The computational cost for generating a column is of order Formula (M2c2) where M is the length of a column and c the number of 1s contained in that column (Chen et al., 2005). This is due to the fact that, the algorithm has to evaluate the probability of success for every position inside the column (Bekazova et al., 2006). The probability of success in a given position can be well approximated using the corresponding row-sum if the in-degree distribution is sufficiently limited in range (see Supplementary Material and Supplementary Fig. 1). This last feature is typical of transcription networks. The computational cost for generating a column is then reduced to order Formula (Mc).

2.2 Structured diagonal
Self-regulatory interactions are often considered to have a particular status (Kashtan et al., 2004). They are represented in the matrix by 1 on the diagonal. In order to constrain the diagonal, we accounted for the fact that some positions are not available for the extraction (see Supplementary Material).


    3 IMPLEMENTATION AND RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHM
 3 IMPLEMENTATION AND RESULTS
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
3.1 Triangular network motif
As an example of application, we have studied the occurrence of three triangular subgraphs (Fig. 1B and C). The FFL (Feed Forward Loop), SIM (triangular Single Input Module) and TGC (Three Gene Chain), for the transcription networks of Escherichia coli (Shen-Orr et al., 2002) and Saccharomyces cerevisiae (Guelzim et al., 2002) verifying the results that can be found in the literature (Kashtan et al., 2004), (Milo et al., 2004) and with the algorithm of (Chen et al., 2005) (Supplementary Fig. 1). In all cases, we find a quantitative difference between the subgraph distributions in the randomized ensembles with or without structured diagonal (Fig. 1B and C). In some instances, such as the FFL, this does not affect the status of motif. In other cases one can also find qualitative changes.

3.2 Feedback
We also evaluated (Fig. 1B) the feedback in the graph, using a simple decimation algorithm that removes the input- and output-treelike components (Cosentino Lagomarsino et al., 2006). With this algorithm, the feedback is measured by the size Mcore of the decimated graph. As expected, the sample with structured diagonal is shifted towards smaller amounts of feedback. This can be explained considering the lower amount of available links to rearrange if the selfregulators are fixed.


    4 CONCLUSIONS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHM
 3 IMPLEMENTATION AND RESULTS
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
In conclusion, we have implemented and tested a Monte Carlo importance sampling algorithm to randomize directed graphs conserving the degree sequence, and evaluate topological observables. The algorithm follows the design principles of Diaconis et al. but is more efficient without loss of uniformity on graphs with compact indegree such as the known transcription networks. Furthermore, we added a variant that works with constrained diagonal, as it is usually done in motif discovery (Kashtan et al., 2004). We implemented the algorithm as a C++ three-node motif and feedback finder (also available as linux and windows executable).


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHM
 3 IMPLEMENTATION AND RESULTS
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
The authors would like to thank F. Bassetti, S. Holmes and P. Diaconis for helpful discussions.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Anna Tramontano

Received on July 24, 2007; revised on August 18, 2007; accepted on August 27, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHM
 3 IMPLEMENTATION AND RESULTS
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Babu M, et al. Structure and evolution of gene regulatory networks. Curr. Opin. Struct. Biol (2004) 14:14–283.

    Bekazova I, et al. Negative examples for sequential importance sampling of binary contingency tables. Lecture Notes in Computer Science (2006) 4168, Springer, Berlin, pp.136–147.

    Chen SX, Liu JS. Statistical applications of the Poisson-binomial and conditional Bernoulli distributions. Statistica Sinica (1997) 7:875–892.[Web of Science]

    Chen Y, et al. Sequential Monte Carlo methods for statistical analysis of tables. J. Am. Stat. Assoc (2005) 100:109–120.[CrossRef][Web of Science]

    Cosentino Lagomarsino M, et al. Randomization and feedback properties of directed graphs inspired by gene networks (2006) q-bio.MN/0606039.

    Guelzim N, et al. Topological and causal structure of the yeast transcriptional regulatory network. Nat. Genet (2002) 31:60–63.[CrossRef][Web of Science][Medline]

    Kashtan N, et al. Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics (2004) 20:1746–1758.[Abstract/Free Full Text]

    King OD. Comments on "Subgraphs in random networks". Phys. Rev (2004) 058101–1,2,3:E70.

    Maslov S, Sneppen K. Computational architecture of the yeast regulatory network. Phys. Biol (2005) 2:94.[Medline]

    Milo R, et al. Network motifs: simple building blocks of complex networks. Science (2002) 298:824.[Abstract/Free Full Text]

    Milo R, et al. On the uniform generation of random graphs with prescribed degree sequences. cond-mat/0312028 (2003).

    Milo R, et al. Superfamilies of designed and evolved networks. Science (2004) 303:1538.[Abstract/Free Full Text]

    Molloy M, Reed B. A critical point for random graphs with a given degree sequence. Random Structures and Algorithms (1995) 6:161–179.

    Rao A, et al. A Markov chain Monte Carlo method for generating random zero-one matrices with given marginals. Indian J. Stat (1996) 58:225.

    Shen-Orr SS, et al. Network motifs in the transcriptional regulation network of Escherichia coli. Nat. Genet (2002) 31:64–68.[CrossRef][Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
23/24/3388    most recent
btm454v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Fusco, D.
Right arrow Articles by Cosentino Lagomarsino, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Fusco, D.
Right arrow Articles by Cosentino Lagomarsino, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?