Bioinformatics Advance Access originally published online on December 14, 2004
Bioinformatics 2005 21(8):1610-1616; doi:10.1093/bioinformatics/bti223
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Dynamic simulation of protein complex formation on a genomic scale
Theoretical Systems Biology, Institute of Molecular Biotechnology Beutenbergstr. 11, 07745 Jena, Germany
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Motivation: One of the central questions in the post-genomic era is the understanding of proteinprotein interactions and of protein complex formation. It has been observed that protein complex size distributions of the yeast Saccharomyces cerevisiae decay exponentially. The shape of these size distributions reflects mechanisms of protein complex association and dissociation.
Results: We present the most simple dynamic model that is able to reproduce the observed protein complex size distribution for yeast. This protein associationdissociation model (PAD-model) simulates the dynamics of protein complex formation on a genomic scale for about 50 million protein molecules. By ruling out different model variants it is possible to elucidate fundamental features of the protein complex dynamics, e.g. complex association is independent of complex size. In addition, the PAD-model provides information about the complexity of the yeast proteome and it gives an idea of how many complexes could not be identified during the measurements.
Availability: All programs used for this publication are available on request from the authors.
Contact: beyer{at}imb-jena.de
Supplementary information: Supplementary information about the model and its interpretation can be downloaded from http://www.imb-jena.de/tsb/pad
| INTRODUCTION |
|---|
|
|
|---|
Understanding the dynamics and complexity of the proteome is one of the core challenges of the post-genomic era. The dynamics of protein interactions is central for whole cell functioning (Ideker et al., 2001; Hlavacek et al., 2003). Simulations of the entire cell, performed to elucidate emergent statistical and dynamic properties, need detailed knowledge of the proteome dynamics (Endy and Brent, 2001; Tomita, 2001). Most cellular functions are conducted or regulated by protein complexes of varying size (Alberts et al., 2002). It is mainly the proteome (i.e. the entirety of all expressed proteins in an organism and their interactions) that actually executes the genetic code. The genome size of different organisms is much more similar than the difference in complexity of the organisms implies [the apparent paradox of too few genes in complex organisms (Rubin, 2001)]. Hence, it has been suggested that protein complexes may contribute substantially to an organism's complexity (International Human Genome Sequence Consortium, 2001; Venter et al., 2001; Gavin et al., 2002; Hollunder et al., 2005). An organism with 6000 different proteins can hypothetically create about 18 million different pairs of interacting proteins. Assuming complexes of size 3 we already get about 1011 different variants. This combinatorial explosion (Endy and Brent, 2001; Hlavacek et al., 2003) shows how evolution could significantly increase the regulatory and metabolic complexity of organisms without substantially increasing the genome size (Koonin et al., 2002; Gavin and Superti-Furga, 2003). However, only a very small subset of these many possible protein complexes is actually realized (Sear, 2004).
Recent large-scale measurements of characteristics of the yeast proteome (Gavin et al., 2002; Ho et al., 2002; Ghaemmaghami et al., 2003; Hurowitz and Brown, 2004; Beyer et al., 2004) paved the way for performing integrated analyses of the processes underlying such data, thereby improving our understanding of the proteome as a whole. Studies of this kind yield substantial new insights into cell regulation and cell dynamics (Ideker et al., 2001; Beyer et al., 2004). Based on large-scale measurements of protein complexes in the yeast Saccharomyces cerevisiae it was shown that the size-frequency distributions of such complexes have a number of common characteristics (Wilhelm et al., 2003). When plotting the number of different complexes of a given size versus the complex size one obtains an exponentially decreasing distribution. The authors hypothesize that the shape of this distribution reflects the nature of the underlying cellular dynamics which is creating the protein complexes. A simulation model that correctly depicts the most important processes should therefore yield the same complex size distribution.
Simulating the interaction of a large number of components is a well established technique in bioinformatics and systems biology that has successfully been applied in the past (Barabási et al., 1999; Endy and Brent, 2001; Dittrich et al., 2001; Rzhetsky and Gomez, 2001; Koonin et al., 2002; Furusawa and Kaneko, 2003; Sear, 2004). So-called birth, death and innovation models (BDIMs) have been used to reconstruct protein domain frequency distributions and to draw non-trivial conclusions (Wolf et al., 1999; Qian et al., 2001; Koonin et al., 2002; Karev et al., 2004). Our strategy follows a similar line of thought: if the model is capable of reproducing the observed protein complex frequency distribution, it can be used to draw instructive conclusions.
Here we outline the most simple dynamic model that explicitly simulates association and dissociation of protein complexes and that is capable of reproducing the observed complex size distribution in S.cerevisiae. The model that we propose simulates only two processes: association and dissociation of protein complexes. During discrete time steps single proteins and protein complexes are randomly selected to undergo association and dissociation. The system is considered to be spatially homogeneous. The value of such a simplistic model is that it helps in identifying the dynamics of the most fundamental processes underlying protein complex formation. This model allows us to answer the question as to which processes and what kind of dynamics are needed at minimum in order to reproduce the observed distribution. By comparing different variants of such a model it is possible to rule out certain mechanisms of protein complex formation and the relative importance of model parameters can be estimated. In addition, the model opens up the possibility (to our knowledge for the first time) of assessing the complexity of the yeast proteome as a whole. This gives us some idea of how much has been missed during the protein complex measurements.
| METHODS |
|---|
|
|
|---|
Protein complex data
The system that we simulate corresponds to an entire hypothetical yeast cell. We have analyzed protein complex data by Gavin et al. (2002) which were obtained by tandem affinity purification (TAP) and subsequent mass-spectrometric protein identification. Here we use the manually curated TAP complexes (Gavin et al., 2002), which is a set of 232 biologically meaningful complexes in yeast with sizes ranging from 1 to 88 different proteins per complex. In this study we only account for complexes containing at least two different proteins, yielding a subset of 229 complexes. For this data set, we calculated the complex size-frequency distribution (Fig. 1), which shows that the number of different complexes of a given size decays nearly exponentially.
|
Protein abundance data
For simulating complex formation we need data on the abundance of the
6200 different proteins in yeast cells. Since we simulate hypothetical proteins we do not need actual concentrations for every yeast protein, but we only need a realistic protein abundance distribution for parameterizing the model. The protein abundance distribution that we use follows protein concentrations compiled by Beyer et al. (2004) in the upper range with data from Ghaemmaghami et al. (2003) and Greenbaum et al. (2003). The data by Beyer et al. (2004) may underestimate the number of low-abundance proteins, because smaller signals are more difficult to detect. We, therefore, assume that 2000 proteins not contained in the Beyer et al. dataset are mainly expressed at low concentrations. The resulting abundance distribution roughly follows a power-law. More details about the protein abundance distribution are given in the supplementary information.
Dynamic complex formation model
We discuss three variants of the protein complex associationdissociation model (PAD-model) with the following features:
- In all three versions the composition of the proteome (i.e. the abundance of all proteins) does not change with time. This means, degradation of proteins is always balanced by an equal production of the same kind of proteins. This assumption is justified by the fact that we simulate the averaged complex size distribution of a large number of cells under constant log-growth conditions (Gavin et al., 2002).
- The cell consists of either one (PAD-A and B) or several (PAD-C) compartments in which proteins and protein complexes can freely interact with each other. Thus, all proteins can potentially bind to all other proteins in their compartments.
- Association and dissociation rate constants are the same for all proteins (i.e. the values chosen represent cellular averages). In PAD-models A and C association and dissociation are independent of complex size and complex structure.
- At each time step, a set of complexes is randomly selected to undergo association and dissociation. Association is simulated as the creation of new complexes by the binding of two smaller complexes and dissociation is simulated as the reverse process, i.e. it is the decay of a complex into two smaller complexes. The number of associations and dissociations per time step is ka ·
and kd · NC, respectively, with NC being the total number of complexes in the cell and ka[1/(no. of complexes x time)] and kd(1/time) being the association and dissociation rate constants, respectively. The rate constants ka and kd are mathematically equivalent to biochemical rates of a reversible reaction (see Supplementary information).
PAD-A is the simplest model where all proteins can interact with each other (no partitioning) and it assumes that association and dissociation are independent of complex size. PAD-B is equivalent to PAD-A, except for the assumption that larger complexes are more likely to bind (preferential attachment). In this case we assume that the binding probability is proportional to i · j, where i and j are the sizes of two potentially interacting complexes.
Finally, model PAD-C extends PAD-A by assuming that proteins can interact only within groups of proteins (with partitioning). The sizes of these protein groups are based on the sizes of first level functional modules according to the yeast database CYGD (http://mips.gsf.de). PAD-C assumes 16 modules, each containing between 100 and 1000 different ORFs. Hence, the protein groups do not represent physical compartments, but rather resemble functional modules of interacting proteins. The total steady-state pool of protein complexes is obtained by independently simulating each protein group, which gives rise to one complex pool per group. In order to get whole-cell averages, results are averaged based on the sizes of the resulting complex pools. For instance, when averaging the resulting bait distributions, it is assumed that complexes are drawn with higher probability from larger pools.
Mathematical description
Since explicit simulation of an entire cell (we simulate
50 million protein molecules) is too time-consuming for many applications of the model, we also developed a mathematical description of the PAD model, which allows us to assess different scenarios and parameter combinations more quickly. The change of the number of complexes of size i,
xi, during one time step
t can be described as
![]() | (1) |
![]() | (2) |
![]() | (3) |
i is a correction term for even complex sizes i, NC =
xi is the total number of protein complexes and Nj = 2(j1) 1 is the number of possible dissociations of a complex of size j.
Figure S1 (Supplementary information) shows a comparison of a numerical solution of Equation (1) with a stochastic simulation of the associationdissociation process. After a transient period a steady-state is reached. We are mainly interested in this steady-state distribution of frequencies xi. Hence, we have to find a set of xi solving
xi/
t = 0. The solution of this non-linear equation system is obtained by numerically minimizing all
xi/
t. Dividing Equation (1) by kd, it can be seen that the steady-state distribution is independent of the absolute values of ka and kd, but it only depends on the ratio of the two parameters Rad = ka/kd. Hence, only two parameters affect the xi at steady-state: the total number of proteins NP (which indirectly determines NC) and the ratio of the two rate constants Rad.
The dissociation terms remain unchanged for PAD-B model, whereas the association terms have to be modified (see Supplementary information). In case of PAD-C we calculated weighted averages of results obtained with PAD-A.
Measurable size distribution and bait selection
Based on the distribution resulting from Equation (1) at steady-state, we derive two further distributions: (1) the measurable size distribution and (2) the bait distribution. The former is defined as the frequency distribution of the measurable complex sizes. The measurable complex size is the number of different proteins in a protein complex (as opposed to the total number of proteins). For the measurable size distribution we only count the number of complexes with distinct protein compositions (Fig. 2). In order to determine the distribution of measurable complex sizes corresponding to the steady-state distribution, we create a set of complexes according to the original steady-state size distribution by randomly filling the complexes with proteins from the protein abundance distribution. We then compute the resulting measurable size distribution. Results shown are the averages of several of such random sets.
|
The bait distribution is used to compare the simulation to the actual TAP measurements (Gavin et al., 2002). The bait distribution is obtained by randomly selecting and subsequently analyzing a subset from all simulated complexes. We call that distribution bait distribution, because the process of selecting a subset of all complexes corresponds to selecting bait proteins for pulling out the complexes during the measurements (Gavin et al., 2002). We always select 229 different complexes, which is the number of TAP complexes with which we compare the simulations.
Computation of a dissociation constant KD
Mathematically, our model describes a reversible (bio-)chemical reaction (see Supplementary information). When following this interpretation of the model one can calculate an equilibrium dissociation constant KD, which quantifies the fraction of free subcomplexes A and B compared with the bound complex AB. Interestingly, this equilibrium is complex size dependent, because a large complex AB is less likely to randomly dissociate exactly into the two specific subunits A and B than a small complex. (Note that A and B can be ensembles of several proteins.) Therefore, we get for any given complex of size i the following KD:
![]() | (4) |
, with NC being the total number of complexes and xi being the number of complexes of size i.
Quality of regression
The quality of the regressions is quantified by the sum of squared and weighted residuals
X:
![]() | (5) |
is the corresponding cumulative frequency of the simulated bait distribution. Frequencies are cumulated starting with the largest measured complex (size 88) and counting down to size i (Fig. 1). | RESULTS |
|---|
|
|
|---|
We dynamically simulated the association and dissociation of 6200 different protein types yielding a set of about 50 million protein molecules. Subsequently we analyzed the resulting steady-state size distribution of protein complexes. This steady-state is thought to reflect the log-growth conditions under which the yeast cells were held when TAP-measuring the protein complexes (Gavin et al., 2002). Based on measured protein complex data (Gavin et al., 2002) we calculated a protein complex size distribution which we can compare the simulation results (Fig. 1).
The TAP measurements do not provide concentrations of the measured complexes, but they only demonstrate the presence of a certain protein complex in yeast cells. In addition, the number of proteins of a certain type inside such a complex could not be measured. Hence, the complex size from Figure 1 does not represent real complex sizes (i.e. total number of proteins in the complex), but it refers to the number of different proteins in a complex. The measured data reflect the characteristics of only 229 different protein complexes of size 2 and larger, which is just a small subset of the complexosome. These peculiarities have to be taken into account when comparing simulation results to the observed complex size distribution. We refer to the measurable complex size as the number of distinct proteins in a protein complex (Fig. 2). When comparing our simulation results with the measurements, we always select a random subset of 229 different complexes from the simulated pool of complexes. This results in a complex size distribution comparable to the measured distribution from Figure 1 (bait distribution).
An exponential fit (a · ebi, i is the complex size) of the observed complex size distribution yields a slope b = 0.063, which is slightly shallower than the slopes obtained for other datasets (Wilhelm et al., 2003). In agreement with previous findings the TAP complexes follow an exponential-law much more than a power-law (Wilhelm et al., 2003). For instance, the residuals
X between the observed size distribution and the regression are 0.68 for the exponential fit and 5.0 for the respective power-law regression. Also the R2 deviate significantly (exponential fit: 0.98, power-law fit: 0.92). The best fit simulation using the simplest model PAD-A yields a
X of 0.75, which is slightly above the exponential fit. The complex size distributions resulting from simulations with PAD-A are close to an exponential distribution. Koonin et al. (2002) used a mathematically similar model to simulate the evolution of proteinprotein interactions. They also found that the number of distinct domains in proteins decreases exponentially, whereas the total number of domains per protein follows a power-law. In our model the former roughly corresponds to the measurable size distribution and the latter to the original size distribution. The original size distribution of PAD-A is slightly closer to a power-law than the measurable size distribution (Fig. 2), which is consistent with the findings by Koonin et al. (2002).
The dynamic simulation needs only very few input parameters: a set of proteins, which is defined by the protein abundance distribution, and the association and dissociation rate constants, while only the ratio of the two rate constants Rad actually matters (see Methods). Since the protein abundance distribution is known (Beyer et al., 2004), the ratio of Rad is the only free (i.e. adjustable) parameter of PAD-A. The best fit of the model to the observed complex size distribution is obtained for a Rad of 1.6 x 107. Using Equation (4) and assuming a cell volume of 5 x 1011 ml one gets KD-values ranging between 105 (size 2) and 1025 nM (size 100) for this Rad.
A comparison of the subset of 229 complexes (Fig. 1) and the complete simulated complexosome (Fig. 2) reveals a significant deviation of the two distributions. This indicates that only a small subset of all protein complexes in the cell has been observed during the TAP measurements. However, the measurable size distribution closely follows the distribution of the actual complex sizes in the simulated pool. Hence, only few complexes contain a protein more than once and hardly any complexes (except for very small ones) occur more than once in the simulated system. The simulated cell contains more than 3.5 million different complexes, which certainly exceeds the true number of different complexes in yeast cells. A crucial reason for this overestimation is that specificity of protein binding is not taken into account in PAD-A (see Discussion).
Since the model assumes that association and dissociation are independent of the complexes' composition, the number of the different protein types (i.e. number of expressed genes) has no effect on the distribution of actual complex sizes. Only the measurable complex size, i.e. the number of different proteins in a complex, is affected by the composition of the protein pool. Yet even this effect can be reduced to one characteristic value of the protein abundance-distribution. The distribution of measurable complex sizes is mainly determined by the probability Pp that two randomly chosen molecules from the proteome are identical (see Supplementary information). In other words, all protein abundance distributions having the same probability Pp yield very similar complex size distributions. This reduces the relevant model parameters to three input parameters: Rad, NP and Pp. Among the model parameters, the kinetic parameter Rad and the number of proteins NP are the most sensitive ones (Table 1). Already a 10% deviation of Rad or NP from the best fit scenario may double the weighted residual
X. However, none of the model parameters significantly changes the shape and slope of the distribution, i.e. the distributions are close-to-exponential independent of the selected parameters.
|
No preferential attachment
A core assumption of our complex formation model PAD-A is that association and dissociation are independent of the complex size. One possible extension of this model is to assume that association of larger complexes is more likely than association of smaller complexes. The rationale for this preferential attachment might be that larger complexes have more potential interaction surfaces than smaller complexes. However, sterical effects might oppose such preferential attachment. In order to test the hypothesis that association might depend on complex size we simulated a second model that includes a size dependence of association (PAD-B). Since association of complexes is an undirected process we presume that both binding partners equally contribute to the binding affinity. Hence, in the preferential attachment model two complexes associate with a likelihood proportional to the product of the complex sizes.
Figure 3 shows that the assumption of preferential attachment leads to a power-law complex size distribution. This result is in agreement with the previous finding that preferential attachment in the context of network growth leads to power-law connectivity distributions (Barabási et al., 1999). However, the experimentally observed distribution clearly follows an exponential-law [Fig. 1 and (Wilhelm et al., 2003)]. The fundamentally different character of these two distributions suggests that generally there is no preferential attachment during protein complex formation.
|
Specific binding
Another important simplification of PAD-A is that all proteins can potentially interact with each other. Obviously this is a vast simplification of biological reality. As a first step towards a better approximation of real proteinprotein interactions we group the proteins into 16 modules of different sizes (corresponding to functional modules, see Methods section) and we restrict proteinprotein interactions to proteins belonging to the same module (PAD-C). The best fit of PAD-C (Rad = 3.8 x 106) yields better results than the best fit of PAD-A. In this case the weighted residual
X drops to 0.63, which is even below the
X of the exponential fit (Table 1). The character of the resulting distribution is the same as for PAD-A (i.e. close to exponential). However, the parameter for the best fit deviates significantly: The optimal Rad for PAD-C is larger by more than an order of magnitude (Table 1). In order to match the observations, the simulated affinity must be much larger if we assume a smaller number of interacting partners. Also, the number of different complexes drops significantly (PAD-A: 3.5 million, PAD-C: 2 million). | DISCUSSION |
|---|
|
|
|---|
We developed a very simple, dynamic model that is capable of reproducing the observed complex size distribution. Given the small number of input parameters the very good fit of the observed data is astonishing. A number of conclusions with respect to the processes underlying protein complex formation can be drawn.
First, preferential attachment does not take place in yeast cells under the investigated conditions. The absence of preferential attachment is biologically plausible: specific and strong binding can be just as important for small protein complexes as for large complexes. This implies that also the dissociation should on average be independent of the complex size. The interpretation of the simulated association and dissociation in terms of KD-values suggests that larger complexes bind more strongly than smaller complexes. [See the Supplementary information for a detailed discussion of Equation (4).] However, the size dependence of KD is compensated by the higher number of possible dissociations in larger complexes. In all variants of the PAD-model we assume that all possible dissociations happen with the same probability. In reality it is more plausible that large complexes break into specific subcomplexes, which subsequently can be reused for a different purpose (Gavin and Superti-Furga, 2003; Hollunder et al., 2005). An improved version of the model should not only account for the specificity of association, but also for specific dissociation.
A second important conclusion is that the number of complexes that were missed during the TAP measurements is potentially large. Based on our simulations, for the first time we can give an upper limit of the number of different complexes in cells. At first glance, the number of different complexes in PAD-A (>3.5 million) and PAD-C (
2 million) may appear to be far too large. Even PAD-C may overestimate the true number of different complexes, because association within the groups is unrestricted. However, the PAD-models do not only simulate functional, mature complexes, but they also consider all intermediate steps. Each of these steps is counted as a different protein complex. The large difference between the number of measured complexes and the (potential) number of existing complexes may partly explain the very small overlap that has been observed between different large scale measurements of protein complexes (von Mering et al., 2002; Wilhelm et al., 2003).
A correct interpretation of the kinetic parameters is important. First of all, ka and kd cannot be compared to real numbers, because the model does not define a length of the time steps for interpreting ka and kd as actual rate constants. In addition, the association-to-dissociation ratio Rad is not identical to a physical KD-value obtained by in vitro measurements of protein binding in water solutions. Several reasons do not allow for this simple interpretation:
- In vivo diffusion rates are below those in water due to the high concentration of proteins and other large molecules in the cytosol (Endy and Brent, 2001).
- Most proteins are either synthesized where they are needed or they get transported directly to the site where the complex gets compiled (Alberts et al., 2002). Hence, transport to the site of action is on average faster than random diffusion.
- Protein concentrations are often above the cell average due to the compartmentalization of the cell. All these processes (protein production, transport and degradation) are not explicitly described in the PAD-model, but they are lumped in our assumptions. The Rad (and the KD derived from it) must, therefore, be interpreted as an operationally defined property. It characterizes the overall, cell averaged complex assembly process, which includes all steps necessary to synthesize a protein complex.
However, even the model-derived KD-values allow for some conclusions regarding complex formation. We calculated weighted averages
of the size-dependent KD-values by using the steady-state complex size distribution of the best fit (see Methods). This yields average
-values of 4.7 and 0.18 nM for the best fits of PAD-A and PAD-C, respectively. First, the fact that the
for PAD-C is below that of PAD-A underlines the notion that more specific binding is reflected by smaller KD values (Sear, 2004). Second, typical in vitro KD-values are above 1 nM (Amini et al., 2003; Sear, 2004), thus the average KD of PAD-C is comparably low. The model, therefore, confirms that protein complex formation in vivo gets accelerated due to directed protein transport and due to the compartmentalization of eukaryotes. It is a surprising finding though, that important aspects of these highly regulated protein synthesis and transport processes can on average be described by a simple compartment model assuming random association and dissociation.
Large scale proteinprotein interaction data sets are subject to substantial error (von Mering et al., 2002; Grünenfelder and Winzeler, 2002), resulting in a potentially large number of false positives and false negatives. In order to get a correct picture of the protein complex size distribution it is necessary to have an unbiased, random subset of all complexes in the cells. It is known that there are certain biases in the TAP data, e.g. there is a reduced number of membrane proteins among the bait proteins (Gavin et al., 2002). However, if compared to other datasets such as MIPS complexes (http://mips.gsf.de), the TAP complexes constitute a fairly random selection of all protein complexes in yeast (von Mering et al., 2002). Uncertainties in the TAP data do not affect our conclusions as long as they are not strongly biased with respect to the resulting complex size distribution. Since Gavin et al. (2002) have measured long-term interactions, our results apply to permanent complexes. Yet the model is applicable to future protein complex data that take account of transient binding.
In this study we use the protein complex data from Gavin et al. (2002) as an example to outline the methodology. The model is equally applicable to any other suitable complex data set, such as the data from Ho et al. (2002). In particular, the model can be applied to more reliable data in the future and it is useful for analyzing protein complex formation in other organisms. It then becomes possible to compare Rad-values (or KD-values) of different organisms and of different cell types. For example, one question that could be addressed is, whether larger cells exhibit different Rad-values. It is also interesting to see if the Rad is larger or smaller in quickly proliferating cells compared to quiescent cells. This could be helpful for better understanding and characterizing tumor cells.
The simulated complex size distribution is almost independent of the assumed protein abundance distribution. The detailed shape hardly matters and only Pp (i.e. the probability of randomly selecting the same protein twice) has some influence (Table 1). It turns out that Pp is a valuable summarizing property that can be used to characterize proteomes of different species. A decreasing Pp increases the number of different large complexes (the slope in Table 1 gets more shallow) because it is less likely that a large complex contains the same protein twice. Thus, Pp is a measure of complexity that not only relates to the diversity of the proteome but also to the composition of protein complexes.
Probably the most severe simplification in our model is the assumption that all proteins can potentially interact with each other. The PAD-model C is a first step towards more biological realism. By restricting the number of potential interaction partners it more closely maps functional modules and cell compartments, both of which restrict the interaction among proteins (Wilhelm et al., 2003). The partitioning in PAD-C connotes that proteins within one group exhibit very strong binding, whereas binding between protein groups is set to zero. This again is a simplification, since cross-talk between different modules or compartments is possible.
Future extensions of the model could incorporate more and more detailed information about the binding specificity of proteins. Assuming even more specific binding will further reduce the number of different complexes, the frequency of the complexes will increase. High-binding specificity potentially lowers the complex sizes, so Rad has to be increased in order to fit the experimentally observed protein complex size distribution. On the other hand, cross-talk gives rise to larger complexes. Taking both counteracting refinements into account, it is impossible to generally predict the best fit Rad since it depends on the quantitative details. A first additional refinement of PAD-C could account for the observed clustering of protein interaction networks (Wilhelm et al., 2003). In a second step one could simulate protein associations and dissociations according to predefined binary protein interactions. A very detailed model could additionally account for individual association/dissociation rates between individual proteins.
Such extensions will yield more realistic figures about the number of different protein complexes created in yeast cells. However, starting model development with the most simple assumptions reveals the most important characteristics of the system for reproducing the observations. The excellent match that we have already obtained with the most simple model PAD-A is striking.
| Acknowledgments |
|---|
This work has been funded by the Federal Ministry of Education and Research, Germany (grant 0312704E).
Received on October 10, 2004; revised on November 3, 2004; accepted on November 4, 2004
| REFERENCES |
|---|
|
|
|---|
Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., Walter, P. Molecular Biology of the Cell, (2002) , New York, NY Taylor & Francis.
Amini, F., Denison, C., Lin, H.-J., Kuo, L., Kodadek, T. (2003) Using oxidative crosslinking and proximity labeling to quantitatively characetrize proteinprotein and proteinpeptide complexes. Chem. Biol., 10, 11151127[Medline].
Barabási, A.-L., Albert, R., Jeong, H. (1999) Mean-field theory for scale-free random networks. Physica A, 272, 173187[CrossRef].
Beyer, A., Hollunder, J., Nasheuer, H.-P., Wilhelm, T. (2004) Post-transcriptional expression regulation in the yeast Saccharomyces cerevisiae on a genomic scale. Mol. Cell. Proteomics, 3, 10831092
Dittrich, P., Ziegler, J., Banzhaf, W. (2001) Artificial chemistriesa review. Artif. Life, 7, 225275[CrossRef][Medline].
Endy, D. and Brent, R. (2001) Modelling cellular behaviour. Nature, 409, 391395[CrossRef][Medline].
Furusawa, C. and Kaneko, K. (2003) Zip's law in gene expression. Phys. Rev. Lett., 90, 088102-1088102-4.
Gavin, A.C., Bösche, M., Krause, R. (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature, 415, 141147[CrossRef][Medline].
Gavin, A.C. and Superti-Furga, G. (2003) Protein complexes and proteome organization from yeast to man. Curr. Opin. Chem. Biol., 7, 2127[CrossRef][ISI][Medline].
Ghaemmaghami, S., Huh, W.K., Bower, K., Howson, R.W., Belle, A., Dephoure, N., O'Shea, E.K., Weissman, J.S. (2003) Global analysis of protein expression in yeast. Nature, 425, 737741[CrossRef][Medline].
Greenbaum, D., Colangelo, C., Williams, K., Gerstein, M. (2003) Comparing protein abundance and mRNA expression levels on a genomic scale. Genome Biol., 4, 117.1117.8.
Grünenfelder, B. and Winzeler, E.A. (2002) Treasures and traps in genome-wide data sets: case examples from yeast. Nature Reviews, 3, 653661.
Hlavacek, W.S., Faeder, J.R., Blinov, M.L., Perelson, A.S., Goldstein, B. (2003) The complexity of complexes in signal transduction. Biotech. Bioeng., 84, 783794.
Ho, Y., Gruhler, A., Heilbut, A., Bader, G.D. (2002) Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature, 415, 180183[CrossRef][Medline].
Hollunder, J., Beyer, A., Wilhelm, T. (2005) Identification and characterization of protein subcomplexes in yeast, submitted.
Hurowitz, E.H. and Brown, P.O. (2004) Genome-wide analysis of mRNA lengths in Saccharomyces cerevisiae. Genome Biol. Proteomics, 5, in press.
Ideker, T., Thorsson, V., Ranish, J.A., Christmas, R., Buhler, J., Eng, J.K., Bumgarner, R., Goodlet, D.R., Aebersold, R., Hood, L. (2001) Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science, 292, 929930
International Human Genome Sequence Consortium. (2001) Initial sequencing and analysis of the human genome. Nature, 409, 860921[CrossRef][Medline].
Karev, G.P., Wolf, Y.I., Berezovskaya, F., Koonin, E.V. (2004) Gene family evolution: an in-depth theoretical and simulation analysis of non-linear birth-death-innovation models. BMC Evol. Biol., 4, 32[CrossRef][Medline].
Koonin, E.V., Wolf, Y.I., Karev, G.P. (2002) The structure of the protein universe and genome evolution. Nature, 420, 218223[CrossRef][Medline].
Qian, J., Luscombe, N.M., Gerstein, M. (2001) Protein family and fold occurence in genomes: power-law behaviour and evolutionary model. J. Mol. Biol., 313, 673681[CrossRef][ISI][Medline].
Rubin, G.M. (2001) Comparing species. Nature, 409, 820821[CrossRef][Medline].
Rzhetsky, A. and Gomez, S. (2001) Birth of scale-free molecular networks and the number of distinct DNA and protein domains per genome. Bioinformatics, 17, 988996
Sear, R.P. (2004) Specific proteinprotein binding in many-component mixtures of proteins. Phys. Biol., 1, 5360[CrossRef][Medline].
Tomita, M. (2001) Whole-cell simulation: a grand challenge of the 21st century. Trends Biotech., 19, 205210[CrossRef][ISI][Medline].
Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O. (2001) The sequence of the human genome. Science, 291, 13041351
von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., Fields, S., Bork, P. (2002) Comparative assessment of large-scale data sets of proteinprotein interactions. Nature, 417, 399403[Medline].
Wilhelm, T., Nasheuer, H.-P., Huang, S. (2003) Physical and functional modularity of the protein network in yeast. Mol. Cell. Proteomics, 2, 281291
Wolf, Y.I., Brenner, S.E., Bash, P.A., Koonin, E.V. (1999) Distribution of protein folds in the three superkingdoms of life. Genome. Res., 9, 1726
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||







