Skip Navigation


Bioinformatics Advance Access originally published online on January 10, 2006
Bioinformatics 2006 22(5):581-588; doi:10.1093/bioinformatics/btk030
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
22/5/581    most recent
btk030v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (7)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Louzoun, Y.
Right arrow Articles by Solomon, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Louzoun, Y.
Right arrow Articles by Solomon, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Copying nodes versus editing links: the source of the difference between genetic regulatory networks and the WWW

Yoram Louzoun 1,*, Lev Muchnik 2 and Sorin Solomon 3

1Department of mathematics, Bar Ilan University Ramat Gan 52900, Israel
2Department of Physics, Bar Ilan University Ramat Gan 52900, Israel
3Department of Physics, Hebrew University Jerusalem, Israel

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MODELS AND RESULTS
 DISCUSSION
 REFERENCES
 

We study two kinds of networks: genetic regulatory networks and the World Wide Web. We systematically test microscopic mechanisms to find the set of such mechanisms that optimally explain each networks' specific properties. In the first case we formulate a model including mainly random unbiased gene duplications and mutations. In the second case, the basic moves are website generation and rapid surf-induced link creation (/destruction). The different types of mechanisms reproduce the appropriate observed network properties. We use those to show that different kinds of networks have strongly system-dependent macroscopic experimental features. The diverging properties result from dissimilar node and link basic dynamics. The main non-uniform properties include the clustering coefficient, small-scale motifs frequency, time correlations, centrality and the connectivity of outgoing links. Some other features are generic such as the large-scale connectivity distribution of incoming links (scale-free) and the network diameter (small-worlds). The common properties are just the general hallmark of autocatalysis (self-enhancing processes), while the specific properties hinge on the specific elementary mechanisms.

Contact: louzouy{at}math.biu.ac.il

Supplementary information: Supplementary data are available at Bioinformatics Online.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MODELS AND RESULTS
 DISCUSSION
 REFERENCES
 
Recent attempts to characterize the properties of networks (social, Internet, citations, WWW, etc.) (Albert and Barabasi, 2002; Amaral et al., 2000; Barabasi and Albert, 1999; Faloutsos et al., 1999; Jeong et al., 2000; Strogatz, 2001) have been directed to generic properties that are shared by most evolving systems such as a small-world character, scaling and autocatalytic growth (Barabasi and Albert, 1999; Dorogovtsev and Mendes, 2001; Jeong et al., 2000; Uetz et al., 2000). The stress was on the generality of the phenomena over many disciplines rather than the specific elementary interactions responsible for specific systems. In particular the focus was on the node connectivity scaling properties, which are universal and only depend on the autocatalytic nature of the processes (Simon, 1955). Some more elaborate studies have used generic mechanisms producing combinations of two required properties (e.g. Hallinan, 2004). We show here that the usage of a much larger scope of measurements allows the reproduction of the network-specific generating mechanisms with a reasonable level of details. We test a large variety of different combinations of the microscopic laws corresponding to the presumed evolution mechanisms governing individual gene and, respectively, the dynamics of WWW node formation. In both cases (the genetic regulatory network and the WWW) only very specific combinations reproduce the actual observed network clustering coefficient, its main subnetworks components, the incoming and outgoing node degree distribution and its time correlations. The systematic analysis of network properties at all scales allows us to suggest likely generative mechanisms for each network studied and even to propose an order of magnitude for the rate constants for some of the mechanisms parameters. We indeed find that in the two different network types studied, some mechanisms are similar (e.g. node creation), but occur with rate constants differing by a few orders. The measured properties range from the degree distribution, which is a function of each node by itself, through clustering and small-scale motifs, which are properties of neighbouring nodes to the betweenness centrality distribution that depends on the full network architecture.


    MODELS AND RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MODELS AND RESULTS
 DISCUSSION
 REFERENCES
 
Genetic regulatory networks
We present a simple gene evolution model and show that it reproduces in detail the observed properties of genetics networks. The nodes in a genetic regulatory network are specific genes of a given organism. A link pointing from gene A to gene B implies that gene A regulates (through the protein that it codes for) gene B. The biological mechanisms assumed to take place in the generation of genetic regulatory networks are gene duplication, gene removal through deleterious mutations and gene mutations (Dehal et al., 2001; Friedman and Hughes, 2002; Gu et al., 2002; Seioghe and Wolfe, 1999; Sidow, 1996; Stubbs, 2002; Venter, 2001; Wolfe and Shields, 1997). We translate these biological reactions into abstract network operations:

  • Node copying corresponds to gene duplication. Note that we model only the duplication of the exons for which there is experimental evidence (Dehal et al., 2001; Stubbs, 2002; Venter, 2001); we do not assume duplication of the intronic and control regions, as there is no clear experimental evidence for it. This is in contrast with the generic ‘preferential attachment’ algorithm that amounts to the random creation of new nodes (followed by selection).
  • Node removal represents random deleterious mutations that result in gene destruction. Once a gene is non-functional, its regulating elements become irrelevant.
  • Link mutation represents mutations through which an existing protein binds a promoter of an unrelated gene or a hitherto unrelated protein binds the promoter of an existing gene. It is currently unknown which of those is the dominant mutation types, and we thus allow 50% for each type of mutations. This mechanism is obviously simplistic as mutations affecting an active site can probably affect more than one target at a time. Changing a fraction of the outgoing links, instead of just one had no significant effect on the results (see Supplementary Figure 1 for the effect of multiple changes per mutation).

Beyond the basic generative mechanisms, a description of a genetic regulatory network should address the issues of directionality and global dynamics:

  • Directionality: The model distinguishes between incoming and outgoing degree distributions. This difference is obvious and has been widely observed experimentally. In most experimental systems, originating and receiving a link are tied to very different properties. The production of an outgoing link requires some specific action of the origin node, while receiving an incoming link implies a triggering or referral mechanism of the target node. In the specific case of genetic regulatory networks, the trigger of the links incoming to a node depends on its transcription factor binding site sequences, and the action of links outgoing from a node is a function of the three-dimensional structure of the protein transcribed by it. Moreover, the observed distributions in most genetic regulatory networks (e.g. the Escherichia coli network) are scale free for incoming links, but approximately Poisson for outgoing links.
  • Global dynamics: The size of genetic regulatory networks saturates in the time scale of the network generation (Betran and Long, 2002; Forterre and Philippe, 1999; Holland, 2003). In fact, our model produces a scale-free incoming degree distribution. The exponent of the power law is {alpha} ~ 2, even for non-monotonic network size variation. This is again in contrast with most currently used models that assume a continuous growth of the network.

We have tested many model variants containing the above-mentioned interactions, and found that in order for a model to reproduce in detail the observed measurements, it should contain a proper balance between node copying (Fig. 1A), removal (Fig. 1B) and editing (Fig. 1C). More precisely we start the model with four nodes connected one to the other, apply the following dynamics and define the network resulting from the model as the network obtained once the number of nodes is stabilized. Note that following the number of nodes, all other measures rapidly reach steady state.

  • At each time step the nodes are copied [we arbitrarily set the copying rate to 1/step and thus define the time scale (Fig. 1A)]. When nodes are copied, only outgoing links are copied; the added node has no incoming links.
  • Each existing node can be removed with a probability {delta} per time step (Fig. 1B). When the nodes are removed, all incoming and outgoing links to and from these nodes are removed. {delta} is simply defined as one over the network size and is thus not a free parameter in the system.
  • Links can be mutated (Fig. 1C) randomly at a fixed rate. At each time step a constant number {lambda} of links are mutated. A mutation is either a link target or link origin duplication (Watts and Strogatz, 1998) i.e. a random link is chosen and a new link is created with either an identical target or an identical origin and random origin or targets, respectively. We have tested the possibility of replacing either only the origin or the target, but such mutations failed to reproduce some of the observed properties. We must thus mix the two types of mutations. Either an existing binding site in a protein is replaced by a different binding site or the creation of a new binding site in an existing protein for an existing promoter. Beyond the choice of basic mechanisms and the way mutations occur, the only free parameter in the system is {lambda}. The wealth of measurements we have on the system allows us to determine its value to be of the order of 0.2.


Figure 1
View larger version (37K):
[in this window]
[in a new window]
 
Fig. 1 Mechanisms of individual node evolution: the three elementary processes defining our genetic model. Their effect is demonstrated on the configuration of the left upper corner (only links and nodes relevant for the explanation are explicitly shown). The effect of a Node copying elementary event is shown in (A). The blue node is ‘duplicated’ by introducing the new brown node that has the same targets for its outgoing links (and no incoming links at all). The Node removal is illustrated in (B). The green node and all its links are deleted. The drawing (C) illustrates the elementary operation of Link mutation: the pink link is ‘copied’, i.e. a new link with the same origin but different target is created. These three elementary operations turn out to be sufficient for the formation of a steady-state directional hierarchical scale-free network, with the experimentally observed submotif distribution.

 
The main results of our model are as follows (for a summary see Supplementary Table 1):
  1. A scale-free incoming node degree distribution with an exponent (Fig. 2) compatible with the observed 2–2.5 (i.e. a cumulative exponent of 1–1.5) (Barabasi and Albert, 1999).
  2. A fixed-scale (approximately Poisson) outgoing node distribution, again compatible with the observations (Fig. 2 and Supplementary Figure 1).
  3. A hierarchical structure characterized by a high clustering coefficient (CC) that scales with the experimentally observed exponent of approximately –1 of the outgoing degree (Supplementary Figure 2) (Ravasz et al., 2002)
  4. The node distance (minimal path length) distribution is approximately Gaussian and its average is proportional to the log of the network size (Fig. 2) as is indeed experimentally observed (Agrawal, 2002; Watts and Strogatz, 1998). This together with the CC at (3) implies a small-world geometry.
  5. A positive correlation between the nodes age and their incoming links degree (Fig. 4) (Eisenberg and Levanon, 2003).
  6. Our model also reproduces the observed subgraph distribution. Using the algorithm in the work by Itzkovitz et al. (2003) we measured the relative occurrence frequency of the n-nodes ‘motifs’ with n < 5. An n-motif is a directed subnetwork with n nodes. We compared the occurrence frequency of each motif in our networks to their frequency in randomly generated networks with identical node degree distribution, obtained from link replacement in the original network that maintains the degree distribution but change the small-scale structure (Milo et al., 2002).


Figure 2
View larger version (15K):
[in this window]
[in a new window]
 
Fig. 2 Genetic regulatory network properties. The left panel represents the incoming (open squares) and outgoing (closed circles) node degree distributions. The outgoing link distribution is normal (as experimentally observed). The incoming link distribution is scale free over more than three orders of magnitude (1–10 000). The straight line corresponds on this double logarithmic graph to a power law with exponent –2.4 (similar to the value actually observed in most genetic regulatory networks). Increasing the network size has no effect on the exponent of the power distribution; it only increases its validity range. The right panel is the distribution of the shortest path between arbitrary pairs of nodes in the network. A regular one-dimensional lattice would produce a distance proportional to the network size (10 000).

 
In the networks generated by our algorithm, the most statistically significant deviation from the random network was obtained for the ‘Bi-fan’ motif (Fig. 3A). For instance, in a network of 894 nodes and 5074 edges it occurred 36 546 times, compared with 15 597 ± 447 in its random counterpart (Z-score = 47). This is the only 4-motif with an exceptionally high Z-score in all experimental genetic regulatory networks [e.g. the bacterium E.coli and the yeast Saccharomyces cerevisiae (Milo et al., 2002)]. The large Z-score of the Bi-fan motif is a direct result of the copying procedure performed in our algorithm. If X originally had links to Z and W, once copied to a new node Y, Y will have similar outgoing links. Thus the appearance of this motif in the above-mentioned experimental network suggests the microscopic mechanisms assumed in our model. The next very frequent motifs in our analysis were the ‘Converging fan’ (Fig. 3B) and two variations of the Bi-fan (Fig. 3C and D). The excess occurrence of these motifs in our networks is again consistent with the existing genetic observations (motif C is for example the next frequent motif after the Bi-fan in the E.coli genetic regulatory network) and can be assigned to the specific elementary operations in our model. The only motif experimentally in excess in genetic regulatory networks and not in our analysis is the feed-forward loop 3-motif (X->Y – Z and X->Z). This may actually prove that this motif is selected through evolution and is not a direct result of the random generation process (as suggested by Mangan and Alon, 2003). The fact that this is the only significant difference between our purely mechanistic model and the results probably highlight the fact that the effects of selection on the network structure may be of a second order.


Figure 3
View larger version (25K):
[in this window]
[in a new window]
 
Fig. 3 The 4 motifs most frequently over-expressed in the networks produced by the genetic model. They turn out to be the 4 motifs most frequently over-expressed in experimental genetic regulatory networks. The first motif A is denoted as a Bi-Fan and is consistently the most over-expressed in ALL genetic regulatory networks and in our model. Motifs C and D represent various combinations of the Bi-Fan and added links. Motif B excess is a result of the combination of node copying and removal.

 
All the above-mentioned (1–6) emerging properties are independent of the network's seed or history.

An analytic treatment of the incoming degree distribution in the model follows from the autocatalytic character of the ‘copying’ process. The combination of link removal and addition induces a random change in the probability that a node should have k-links, with an average of 0 and a variance of 2{delta}. Mutations add (in average) a constant number of incoming links. The probability of a node should have an incoming degree of k changes following

Formula
with nodes disappearing and appearing. This precise situation was studied by (Blank and Solomon, 2000; Solomon and Levy, 1996) to lead to a scale-free distribution with a power of 1 + (1/<k>), even if the agents number fluctuates.

WWW network
The WWW is the network of pages connected one to the other via links in the Internet. While the Internet is the backbone of servers physically connected one to the other, the WWW network is composed of virtual links, independent of their physical location. While the Internet formation depends on technological (hardware characteristic), geographical, economical, business and even political factors, WWW links are mostly content dependent. The mechanisms governing the WWW evolution are very different from the ones generating genetic regulatory networks. Genetic regulatory networks evolve through a slow random copying process (accompanied by some mutations and selection), while the WWW evolves mainly through the very rapid non-stop editing of the existing websites. We propose that the most frequent network operation in the WWW is link editing rather than node addition or removal. Consequently, one of the dramatic differences expected between the WWW and the genetic regulatory networks is in the correlation between the nodes degree and their age. While our genetic regulatory network model predicts strong correlations between the node degree and its age (since long-lived nodes continuously receive new incoming links), our WWW model has practically no age–degree correlations (because of the extensive reshuffling of links between node additions or removals). These predictions are validated by the actual data both in genetic regulatory networks (Eisenberg and Levanon, 2003) and in WWW measurements (Adamic et al., 2000) (Fig. 4).


Figure 4
View larger version (27K):
[in this window]
[in a new window]
 
Fig. 4 Correlations between node degree and their age. The upper drawings represent the WWW and the lower drawings represent genetic regulatory networks. The upper left drawing represents the experimental scatter plot of the websites incoming degree as a function of the year they were created [extracted with permission from Adamic,L.A. and Huberman,B.A. (2000) Science, 287, 2115a. Copyright 2000 AAAS.], while the upper right drawing represents our simulation results for the same scatter plot. One can clearly see that in the WWW our simulation reproduces properly the lack of correlation between the node age and its degree obvious in the experimental data. In our genetic regulatory network model, the scatter plot (lower left drawing) shows a clear correlation between the age of a gene and its degree. This was actually measured. The measurement results are presented in the lower right drawing [big grey circles reprinted with permission from ‘Preferential attachment in the protein network evolution’, Eisenberg,E. and Levanon,E.Y. (2003) Phys. Rev. Lett., 91(13). Copyright 2003 by the American Physical Society.]. We have plotted simultaneously the experimental data (big grey circles) and the simulation results of the average (open circles) in lower right drawing. One can see that our simulation results fit the experimental results. Note that the physical time scale in our simulation was set to fit the data, since the time units in our model are of gene duplications and not physical years.

 
The absence of time correlation suggests that link editing is the main factor shaping the Internet, but it does not supply information about the specific editing mechanism. The creation of links is the result of a combination between multiple possible mechanisms. Presently, there exists no data about the various personal link generation patterns. We propose the mechanism set, which reproduces most faithfully all the existing WWW empirical data. We predict a set of individual surfing and linking habits that we submit for further experimental validation/falsification.

Our WWW evolution model consists of the elementary known operations taking place in the WWW: node creation/copying with a rate of µ per node, node removal with a rate of {nu} per node and node editing sessions with a rate of {alpha} per node, ({alpha} >> µ and {nu}). The main difference between genetic regulatory networks and the WWW is obviously in the nature of editing.

  • Node creation/copying is typically performed through a ‘copy and paste’ action involving multiple websites. In our case we create new nodes using half of the outgoing links of a randomly chosen node and half of the outgoing links of another randomly chosen node. Thus newly created sites have no incoming links. Obviously some of the nodes are created de novo. We have validated that the precise details of the creation mechanisms have no significant effect on our results.
  • Node removal is straightforward. When a node is removed, all the links to and from the node are removed. Node removal is simply the removal of a website.
  • Node editing sessions consist of independent removals and additions of links originating from a randomly chosen node. Link targets are not chosen randomly, but rather through surfing and the detection of interesting websites or common interest among websites. Link addition is thus performed either through web browsing excursions (with a probability of {varepsilon}) or through detection of common interest (with a probability 1 – {varepsilon}). An excursion consists of following an outgoing link and continuing on one of the target's outgoing links and so on. The excursion stops with probability 0.1 per node by adding a link from the origin to the stopping point. Common interest links are added towards the nodes in the close proximity of an existing node (we interpret the non-directional proximity on the network as a measure of common interest among nodes).

In the specific application discussed here we added 0.02 nodes per existing node per time step and removed 0.016 nodes per node per time step. At each time step we performed on average one editing session per node. During an editing session, the link removal and addition probabilities are 5% per link; We also have a 5% probability of adding a link independent of the number of existing links.

This algorithm reproduces the observed collective properties of the WWW, namely

  1. Negligible correlations between the node age and their link degree distributions (Fig. 4),
  2. The observed average number of links per node (denoted as L) (Lexperimental = 7–8 versus Lmodel = 7.5) (Kumar et al., 1999),
  3. An appropriate exponent for the power of the incoming ({alpha}in ~ 2.1 ~ 1 + L/(L – 1)) (Fig. 5) (Solomon, 1998) and outgoing ({alpha}out ~ 3) link degree distributions (Fig. 5) (Adamic et al., 2000, 2001),
  4. A small-world network topology (Supplementary Figure 2) (Albert et al., 1999),
  5. We reproduce the appropriate small-scale motifs. The motifs experimentally observed in excess in the WWW compared with its random counterpart are only 3-motif triangles containing the feed-forward loop and 3-motifs based on the feed-forward loops with bidirectional links, instead of the unidirectional ones. In our WWW model, the feed-forward loop is indeed the most frequent over-expressed motif, and following it are also its extensions to bidirectional links (Supplementary Figure 3). The excellent fit between our model and the observed properties of the WWW allows us to conclude that our model does contain the crucial microscopic dynamic elements underlying the WWW evolution.


Figure 5
View larger version (14K):
[in this window]
[in a new window]
 
Fig. 5 WWW network. The format of this figure is similar to Figure 2. The main differences are the double power law for both incoming (open squares; exponent 1.8 ± 0.1) and outgoing links (closed circles; exponent 3.0 ± 0.2), The results here are in perfect agreement with the experimental data of the WWW.

 
Constraints versus parameters
The genetic regulatory network and the WWW generation models contain three and six parameters accordingly. We have employed definite values for each parameter in order to reproduce the observed properties. While all the used mechanisms are essential in order to obtain the results, the values of the rate constant for each one of them may vary. We have systematically varied all rate constants to test their importance. In the genetic regulatory network the node creation and destruction rates only affect the network size, and not its properties. The mutation rate (per node) on the other hand has to be lower or equal to the creation/destruction rate. Note that in this context a mutation is one that significantly affects the binding probability of an active site to a transcription factor and not any random mutation, whose rate is obviously much higher than the node creation and destruction rate. Thus actually, assuming the mechanism is correct, all the observations in this network depend on the tuning of a single parameter—the mutation rate.

In the WWW there are more parameters affecting the system, but again the node creation and destruction rate have a very minor effect on the observed properties. Their main effect is on the network node number growth rate. The editing rate on the other hand must be set to be much higher than the node creation/destruction rate. In fact the precise details of the node creation algorithm are not important. The details of the editing process in the WWW case are the ones determining the observed properties. For example, the incoming degree distribution power is determined by the ratio between link destruction and creation rates, during an editing session. The fraction of each small-scale motifs is determined by the ratio of forward browsing to bidirectional browsing ({varepsilon}). To summarize, while the genetic regulatory network properties are determined by the selected mechanisms and a single parameter, the WWW is affected by the editing mechanism properties. Both networks are much more affected by the mechanisms acting on links than on nodes.

Predictions
The models defined in the previous sections allow us to make new and contrasting predictions for many of the (mostly unmeasured until now) properties of genetic regulatory networks and the WWW. The experimental verifications of these predictions can be used to validate or to disqualify our model. Using the available data, we were able to measure some of the predicted values while the other values are left as predictions for others to measure. Following is a list of measurements proposed as validation of our models.

  • Betweenness centrality. The betweenness centrality of a node is defined as the portion of the shortest paths between nodes in the network passing through a given node (Freeman, 1977). The centrality is obviously related to the degree (P = 4 x 10–38, and 1 x 10–8 for the correlation between degree and betweenness centrality in the simulated genetic regulatory network and WWW1). One can however measure the normalized centrality [i.e. the centrality of a node divided by its (non-directional) degree as a function of its (non-directional degree)]. The normalized centrality distribution is flat in the WWW and raising in genetic regulatory networks. This is probably an evidence of the importance of hubs in genetic regulatory networks Supplementary Figure 4).
  • Clustering coefficient. There is some discrepancy between the simulated and observed CC in the genetic regulatory network. Although both scale similarly (with a scale approximately equal –1) (Supplementary Figure 2), the absolute value of the CC is lower in the genetic regulatory network model than in reality. This is probably due to the previously discussed over representation of FFL (which is simply a directed triangle) in experimental genetic regulatory networks.
  • Markov property. We have shown above that the WWW has a very weak correlation between a node age and its degree, while the genetic regulatory network has very strong positive correlations (older nodes have a higher degree). One could assume that these strong correlations are due to a ‘preferential attachment’ mechanism, where older nodes accumulate links more rapidly. The process implemented in the genetic regulatory network simulation was however age independent. If the proposed process is indeed correct then one would expect no correlations between the rate at which links are added to a given node and the node age. One can see that in our simulation there are indeed no such correlations (Supplementary Figure 5). The probability of adding an outgoing link to a node is independent of its age. This is in clear contrast with the WWW where there are positive correlations between the node age and the probability to add either incoming or outgoing links to it (Fig. 6). This non-Markovian property of the WWW results from the browsing process. The probability of reaching a node is proportional to how dense is its regions. Thus the WWW model contains ‘non-local’ interactions which are affected by more than a single node. Note that both the genetic regulatory network and the WWW have a clear correlation between the incoming degree and the number of links added based on the duplication process.
  • Neighbor degree correlations. Again, the local process in the genetic regulatory network induces no correlation between the neighbor degrees. In the WWW case, on the other hand, we predict a clear negative correlation between incoming and outgoing links. (Supplementary Figure 5) as was indeed observed (Newman, 2002).


Figure 6
View larger version (27K):
[in this window]
[in a new window]
 
Fig. 6 Markov properties of link addition. We have measured the frequency of link addition to nodes as a function of their degree and age. If the process is fully local and Markovian, this frequency should be constant. In other words the number of links added to nodes of age t and degree k should be linear with the frequency of such nodes. We have tested this frequency for incoming link as a function of incoming degree (upper panel) and outgoing links as a function of outgoing degree (lower panel) for genetic regulatory networks (left column) and the WWW (right column). The addition is indeed age and degree independent in the case of the incoming genetic links, and age independent in the case of outgoing genetic links. On the other hand in the WWW, the process is non-Markovian (at the single node level) and degree dependent.

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MODELS AND RESULTS
 DISCUSSION
 REFERENCES
 
Network collective properties are currently extensively studied for a wide class of networks (Adamic et al., 2001; Agrawal, 2002; Albert and Barabasi, 2002; Amaral et al., 2000; Barabasi and Albert, 1999; Barrat et al., 2004; Bhan et al., 2002; Chung et al., 2003; Dorogovtsev and Mendes, 2001; Itzkovitz et al., 2003; Jeong et al., 2000; Mangan and Alon, 2003; Milo et al., 2002; Pastor-Satorras and Vespignani, 2001; Ravasz et al., 2002; Rzhetsky and Gomez, 2001; Sole and Pastor-Santorros, 2002; Strogatz, 2001; Wagner, 2001; Watts and Strogatz, 1998; Wu et al., 2004). This vast amount of information allows us to distinguish between network classes, but was never simultaneously correlated to the elementary mechanisms generating the networks. We have combined here a large set of network observations with known generation mechanisms to propose a detailed model for genetic regulatory networks and the WWW. In the case of genetic regulatory networks the main generating mechanism used was node copying, accompanied by a lower or equal rate of mutations. We reproduced all the known properties of genetic regulatory network, such as their link degree distributions, their topology, age correlations and the small-scale motifs appearing in the network using just three parameters. In the case of the WWW, the main mechanism is the extensive editing of nodes via browsing and the sharing of common interest between neighbouring nodes.

Although the WWW and genetic regulatory networks seem to have similar incoming link degree distribution scaling properties (and as such were classified under the generic rubric of ‘scale-free networks’) one can see that their structure is dramatically different. A large number of other networks, such as neural, social, citations, linguistic and ecological networks (see for example Adamic et al., 2001; Amaral et al., 2000; Newman, 2001; Watts and Strogatz, 1998; Wu et al., 2004), were also clustered under the same rubric of scale-free networks. These networks are obviously generated by very different mechanisms. The scale-free character of their incoming link distribution only reflects their autocatalytic nature and cannot reveal their specific properties. However the combination of multiple measures can hint to specific mechanism at a high enough precision to allow detailed predictions that can be either validated or proved wrong. Many of the elements introduced in our models were present in previous models. For example gene duplication models of monotonically growing genetic regulatory networks have been considered in combination with preferential attachment (Bhan et al., 2002; Chung et al., 2003; Rzhetsky and Gomez, 2001; Wagner, 2001). Bi-directional networks were studied too e.g. in the context of a small-world model (Bhan et al., 2002). However to the best of our knowledge, the present work incorporates the largest set of observations with the obvious elementary observed mechanisms to produce the experimentally observed statistical collective features of specific networks.


    Acknowledgments
 
The work of Y.L. and L.M. is covered by the Yeshaya Horowitz foundation and by the co3 pathfinder NEST grant of the EU 6th framework and by grant ISF.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Satoru Miyano

1The correlation is mainly observed in high degree nodes P = 1.555e–015; R = 0.9443 and P = 3.8726e–006; R = 0.74317 for nodes with an incoming degree higher than 100. Back

Received on October 23, 2005; revised on December 12, 2005; accepted on December 28, 2005

    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MODELS AND RESULTS
 DISCUSSION
 REFERENCES
 

    Adamic, L.A., et al. (2000) Power-law distribution of the World Wide Web. Science, 287, 2115 www.sciencemag.org.

    Adamic, L.A., et al. (2001) Search in power-law networks. Phys. Rev. E. Stat. Nonlin. Soft Matter Phys, . 64, 046135[Medline].

    Agrawal, H. (2002) Extreme self-organization in networks constructed from gene expression data. Phys. Rev. Lett, . 89, 268702[CrossRef][Medline].

    Albert, R. and Barabasi, A.L. (2002) Statistical mechanics of complex networks. Rev. Modern Phys, . 74, 47–97[CrossRef][Web of Science].

    Albert, R., et al. (1999) Diameter of the World Wide Web. Nature, 401, 130–131[CrossRef].

    Amaral, L.A., et al. (2000) Classes of small-world networks. Proc. Natl Acad. Sci. USA, 97, 11149–11152[Abstract/Free Full Text].

    Barabasi, A.L. and Albert, R. (1999) Emergence of scaling in random networks. Science, 286, 509–512[Abstract/Free Full Text].

    Barrat, A., et al. (2004) The architecture of complex weighted networks. Proc. Natl Acad. Sci. USA, 101, 3747–3752[Abstract/Free Full Text].

    Betran, E. and Long, M. (2002) Expansion of genome coding regions by acquisition of new genes. Genetica, 115, 65–80[CrossRef][Web of Science][Medline].

    Bhan, A., et al. (2002) A duplication growth model of gene expression networks. Bioinformatics, 18, 1486–1493[Abstract/Free Full Text].

    Blank, A. and Solomon, S. (2000) Power laws in cities population, financimarkets and Internet sites (scaling in systems with a variable number of components). Physica A, 287, 279–288[CrossRef].

    Chung, F., et al. (2003) Duplication models for biological networks. J. Comput. Biol, . 10, 677–687[Medline].

    Dehal, P., et al. (2001) Human chromosome 19 and related regions in mouse: conservative and lineage-specific evolution. Science, 293, 104–111[Abstract/Free Full Text].

    Dorogovtsev, S.N. and Mendes, J.F.F. (2001) Scaling properties of scale-free, evolving networks: continuous approach. Phys. Rev. E. Stat. Nonlin. Soft Matter Phys, . 63, 1–18.

    Eisenberg, E. and Levanon, E.Y. (2003) Preferential attachment in the protein network evolution. Phys. Rev. Lett, . 91, 138701[CrossRef][Medline].

    Faloutsos, M., et al. (1999) On power-law relationships of the Internet topology. Comp. Comm. R, . 29, 251[CrossRef].

    Forterre, P. and Philippe, H. (1999) Where is the root of the universal tree of life? Bioessays, 21, 871–879[CrossRef][Web of Science][Medline].

    Freeman, L.C. (1977) A set of measures of centrality based on betweenness. Sociometry, 40, 35–41[CrossRef][Web of Science].

    Friedman, R. and Hughes, A. (2002) Gene duplication and the structure of eukaryotic genomes. Genome Res, . 11, 373–381.

    Gu, Z., et al. (2002) Extent of gene duplication in the genomes of Drosophila, nematode, and yeast. Mol. Biol. Evol, . 19, 256–262[Abstract/Free Full Text].

    Hallinan, J. (2004) Gene duplication and hierarchical modularity in intracellular interaction networks. Biosystems, 74, 51–62[CrossRef][Web of Science][Medline].

    Holland, P.W. (2003) More genes in vertebrates? J. Struct. Funct. Genomics, 3, 75–84[CrossRef][Medline].

    Itzkovitz, S., et al. (2003) Subgraphs in random networks. Phys. Rev. E. Stat. Nonlin. Soft Matter Phys, . 68, 026127[Medline].

    Jeong, H., et al. (2000) The large-scale organization of metabolic networks. Nature, 407, 651–654[CrossRef][Medline].

    Kumar, S.R., Raghavan, P., Rajagopalan, S., Tomkins, A. (1999) Trawling the Web for emerging cyber communities. Proceedings of the 8th WWW ConferenceAmsterdam, The Netherlands , pp. 403–416.

    Mangan, S. and Alon, U. (2003) Structure and function of the feed-forward loop network motif. Proc. Natl Acad. Sci. USA, 100, 11980–11985[Abstract/Free Full Text].

    Milo, R., et al. (2002) Network motifs: simple building blocks of complex networks. Science, 298, 824–827[Abstract/Free Full Text].

    Newman, M.E. (2002) Assortative mixing in networks. Phys. Rev. Lett, . 89, 208701[CrossRef][Medline].

    Newman, M.E.J. (2001) The structure of scientific collaboration networks. Proc. Natl Acad. Sci. USA, 98, 404–409[Abstract/Free Full Text].

    Pastor-Satorras, R. and Vespignani, A. (2001) Epidemic spreading in scale-free networks. Phys. Rev. Lett, . 86, 3200–3203[CrossRef][Web of Science][Medline].

    Ravasz, E., et al. (2002) Hierarchical organization of modularity in metabolic networks. Science, 297, 1551–1555[Abstract/Free Full Text].

    Rzhetsky, A. and Gomez, S.M. (2001) Birth of scale-free molecular networks and the number of distinct DNA and protein domains per genome. Bioinformatics, 17, 988–996[Abstract/Free Full Text].

    Seioghe, C. and Wolfe, K.H. (1999) Updated map of duplicated regions in the yeast genome. Gene, 238, 253–261[CrossRef][Web of Science][Medline].

    Sidow, A. (1996) Gen(om)e duplications in the evolution of early vertebrates. Curr. Opin. Genet. Dev, . 6, 715–722[CrossRef][Web of Science][Medline].

    Simon, H.A. (1955) On a class of skew distribution functions. Biometrica, 42, 425–440.

    Sole, R. and Pastor-Santorros, R. (2002) Complex networks in genomics and proteomics. Santa Fe Institute Working Paper, .

    Solomon, S. (1998) Stochastic Lotka-Volterra systems of competing auto-catalytic agents lead generically to truncated pareto power wealth distribution, truncated levy distribution of market returns, clustered volatility, booms and crashes. In Refenes, A.-P., Burgess, A.N., Moody, J.E. (Eds.). Decision Technologies for Computational Finance, , The Netherlands Kluwer Academic Publishers.

    Solomon, S. and Levy, M. (1996) Spontaneous scaling emergence in generic stochastic systems. Int. J. Mod. Phys. C, . 7, 745[CrossRef].

    Strogatz, S. (2001) Exploring random networks. Nature, 410, 268–276[CrossRef][Medline].

    Stubbs, L. (2002) Genome comparison techniques. In Galas, D. and McCormack, S. (Eds.). Genomic Technologies:Present and Future, , U.K Caister Academic Press.

    Uetz, P., et al. (2000) A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature, 403, 623–627[CrossRef][Medline].

    Venter, J.C. (2001) The sequence of the human genome. Science, 291, 1304–1351[Abstract/Free Full Text].

    Wagner, A. (2001) The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. Mol. Biol. Evol, 18, 1283–1292[Abstract/Free Full Text].

    Watts, D.J. and Strogatz, S.H. (1998) Collective dynamics of ‘small-world’ networks. Nature, 393, 440–442[CrossRef][Medline].

    Wolfe, K.H. and Shields, D.C. (1997) Molecular evidence for an ancient duplication of the entire yeast genome. Nature, 387, 708–713[CrossRef][Medline].

    Wu, F., et al. (2004) Information flow in social groups. Phys. Stat. Mech. Appl, . 337, 327–335[CrossRef].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
22/5/581    most recent
btk030v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (7)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Louzoun, Y.
Right arrow Articles by Solomon, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Louzoun, Y.
Right arrow Articles by Solomon, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?