Bioinformatics Advance Access originally published online on November 19, 2007
Bioinformatics 2008 24(2):218-224; doi:10.1093/bioinformatics/btm565
Estimating node degree in bait-prey graphs
1Department of Preventive Medicine, Northwestern University Medical School, 680 N. Lake Shore Drive Suite 1102, Chicago, IL 60611-4402, USA, 2EMBL Outstation – Hinxton, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK and 3Fred Hutchinson Cancer Research Center, Computational Biology Group, 1100 Fairview Avenue North – M2-B876, P.O. Box 19024, Seattle, Washington 98109-1024, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Proteins work together to drive biological processes in cellular machines. Summarizing global and local properties of the set of protein interactions, the interactome, is necessary for describing cellular systems. We consider a relatively simple per-protein feature of the interactome: the number of interaction partners for a protein, which in graph terminology is the degree of the protein.
Results: Using data subject to both stochastic and systematic sources of false positive and false negative observations, we develop an explicit probability model and resultant likelihood method to estimate node degree on portions of the interactome assayed by bait-prey technologies. This approach yields substantial improvement in degree estimation over the current practice that naïvely sums observed edges. Accurate modeling of observed data in relation to true but unknown parameters of interest gives a formal point of reference from which to draw conclusions about the system under study.
Availability: All analyses discussed in this text can be performed using the ppiStats and ppiData packages available through the Bioconductor project (http://www.bioconductor.org).
Contact: dscholtens{at}northwestern.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
| 1 INTRODUCTION |
|---|
|
|
|---|
Understanding the contribution of proteins to coordinated cellular systems requires knowledge of the various interactions they have with other proteins. Global and local statistics on the topology of interactome graphs aim to infer the nature and behavior of protein interactions, and can provide a basis for planning and interpretation of experiments. These measures capture simple but informative features of graphs, e.g. the number of interactors for each protein. There are caveats to the use of these measures, including the dynamic nature of the in vivo interactome as opposed to the static data yielded by currently available technologies. But these summary statistics capture characteristics of high-throughput observations in tractable form and accurate estimation is paramount to making correct conclusions about interactome behavior.
Affinity purification-mass spectrometry (AP-MS) technology provides data on complex co-membership interactions by using a series of bait proteins to detect the set of all prey proteins that share membership with the bait in at least one multi-protein complex. Yeast two-hybrid (Y2H) technology uses a fusion protein system to test for physical interactions between baits tagged with the DNA binding domain and prey with the activation domain. Hence AP-MS and Y2H technologies probe undirected, symmetric relationships in a directed manner from bait to prey. Inference on the features of AP-MS and Y2H graphs is complicated by this directedness, as well as sampling bias, incomplete coverage, and stochastic and systematic errors leading to both false positive (FP) and false negative (FN) observations. These issues have often been overlooked when analyzing protein interaction data. In this study, we apply a statistical modeling approach to ameliorate these difficulties for AP-MS, Y2H and other bait-prey technologies.
This report specifically demonstrates the use of statistical likelihood for estimating node degree to obtain a substantial improvement over the naïve approach in which degree is estimated by summing observed interactions. The implications are widespread since many other graph statistics, e.g. the clustering coefficient and node degree distribution, are functions of node degree and special biological interpretations are often assigned to nodes of particularly high degree. Furthermore, knowing the number of interactions for any protein is helpful for identifying its true interactors given a set of reported possibilities. Node degree estimation is one example of interactome-based statistical modeling. The paradigm we propose applies generally to other bait-prey graph statistics and is critical for accurately describing interactome behavior.
| 2 RESULTS |
|---|
|
|
|---|
2.1 Multinomial model for node degree
In concept, the set of edges in a bait-prey graph can be divided into distinct sets of doubly, singly and untested edges. If all experiments for each bait work properly and all proteins in the cell are available for detection as prey, then all edges between pairs of baits are tested twice, all edges extending from baits to non-bait proteins are tested once, and all edges between pairs of non-baits are untested. Under realistic experimental conditions, some proteins fail as baits and some proteins are not detectable as prey, and so the distinctions between these collections of edges blur. Chiang et al. (2007) discuss a viable-bait-prey (VBP) subgraph of a full set of bait-prey experimental data induced by the subset of bait proteins that detect at least one prey and are detected as prey by at least one other bait. By focusing on the set of proteins with direct evidence of viability as both bait and prey and by eliminating proteins prone to systematic bias (see Section 3.2), the VBP graph only includes edges for which two bait-prey assays running in opposite directions between a pair of proteins can reasonably be viewed as replicate observations on the same underlying true edge. Different experimental conditions, and other factors, dictate that VBP nodes may differ even for experiments using the same original bait set (see Section 2.4).
Despite replicate testing of all edges in the VBP graph, the observations in each direction are not necessarily consistent due to measurement error. For each VBP node there is an observed number of reciprocated edges, r, an observed number of unreciprocated edges, u (Fig. 1), and a true but unknown degree
. The joint probability of observing specific values of r and u for any given
can be written as a function of
, the TP probability (pTP) and the FP probability (pFP) using Multinomial models for both TP and FP observations; full model development is reported in Section 3.3. After adjusting the probability statement for only observing non-zero counts of interactions, statistical maximum-likelihood estimation (MLE) can be used to arrive at a degree estimator,
, that accounts for restrictions on observed data in the VBP graph as well as pTP and pFP. Current practice is to estimate degree naïve to pTP, pFP and subtleties in data collection, specifically
.
|
2.2 Estimating degree when pTP and pFP are known
2.2.1 Local performance under misspecification of pTP
Calculation of
For nodes with
ranging from 1 to 20, 500 observations were generated from graphs with 1000 nodes, doubly tested edges, pTP = 0.6, 0.7 and 0.8, and pFP = 0.001. This pFP parameter results in a mean of 2 FP observations per node with FP observations occurring in either direction. To simulate the VBP paradigm, observations of zero incident edges were excluded from analysis. Given the simulated observations,
was estimated under the true pTP and pFP parameters as well as incorrect assumed values for pTP. Figure 2 demonstrates the mean relative error for
using both correct and misspecified values of pTP. For these simulation parameters, relative bias in the estimator corresponds roughly with the amount of under- or overestimation of pTP; when pTP is under- or overestimated by 0.10 (0.20), degree is estimated on average within ten (twenty) percent of the true degree. Greater relative bias is apparent for nodes of lower degree.
|
2.2.2 Global performance under correct specification of pTP and pFP for varying graph topologies
A series of Erdös–Renyi (ER) random graphs containing 1000 nodes and 2000 edges were examined to explore global performance of
|
Much debate has centered on whether graphs exhibit node degree distributions with heavier tails or higher variability than ER random graphs (Li et al., 2006). To explore the performance of
2.3 Estimating pTP and pFP using a gold standard
For real data analysis, estimation of
requires estimates of pTP and pFP since their true values are generally unknown. A variety of techniques have been discussed for estimation of pTP and pFP (Collins et al., 2007; Deng et al., 2003; D'haeseleer and Church, 2004; Hart et al., 2006) but these do not directly account for bait-prey viability in an experiment. The method we develop here for AP-MS data first aligns protein complex viability in a gold standard set with the data under study. The observed interactions are then compared to the viable complexes in the gold standard to estimate values of pTP. Results are reported here for five AP-MS data sets on Saccharomyces cerevisiae: Gavin02 (Gavin et al., 2002), Ho02 (Ho et al., 2002), Krogan04 (Krogan et al., 2004), Gavin06 (Gavin et al., 2006) and Krogan06 (Krogan et al., 2006).
Our source of candidate gold standard complex co-memberships is a collection of 335 protein complexes culled from the Munich Information Center for Protein Sequences (MIPS) (Mewes et al., 2004) and Gene Ontology (GO) (The Gene Ontology Consortium, 2000), specifically excluding complex estimates based on the high-throughput AP-MS data sets under investigation in this text (see Section 3.4). All pairs of proteins jointly annotated in one of these 335 complexes could, in principle, be detected as interactors by AP-MS technology, as long as all members of the complex are viable proteins in the experiment under consideration. For each AP-MS experiment, we determine the subset of the 335 candidates whose constituent members are all reported in the data set as viable prey and, when applicable, viable baits. Given the resultant number of true complex co-memberships and the observed data, a slightly modified version of the probability statement in Equation (1) can be used to estimate pTP and its variance for each data set (see Section 3.4). Specific estimates of pTP for each experiment are reported in Table 2. Supplementary Figure 2 demonstrates the trend in estimated pTP as the gold standard set of complexes centers on the set of viable proteins in each experiment.
|
The lack of a robust set of high confidence protein complex non-comemberships prohibits estimation of pFP in the same manner as pTP; however, with an estimate of pTP in place, a corresponding value for pFP can be calculated using a method of moments approach (see Section 3.5). Table 2 records the method of moments pFP estimates for the five AP-MS data sets. The effects of the specific estimates on node degree are discussed in Section 2.4.
2.4 AP-MS data results
We estimated node degrees via the statistical likelihood for the five S.cerevisiae AP-MS data sets. As stated in Section 2.1, the Multinomial model assumes that baits prone to various types of systematic bias have been excluded from analysis and only stochastic errors, globally applicable to all VBP proteins, remain. We diagnosed systematically biased VBP proteins by examining the distribution of unreciprocated in- and out-edges as in Chiang et al. (2007), eliminating those with severe imbalance from further analysis. Supplementary Table 1 records the number of baits and prey originally reported for each data set as well as the number of VBP nodes both pre- and post-filtering for systematic bias. MLE estimates reported in Sections 2.4.1 and 2.4.2 were computed using the pTP and pFP estimates in Table 2.
2.4.1 Local analysis
Figure 3 plots estimated degree versus u for increasing values of r. The left panel reveals the exact linearity of
as r and u increase. The middle and right panels relevant to the Gavin06 and Krogan06 data, respectively, demonstrate the additional dependence of
on pTP and pFP. In practical terms, this means that the same numbers of observed reciprocated and unreciprocated interactions in different data sets can map to different values of
. The naïve approach equates observations across all experiments without regard to divergent error probabilities.
|
Figure 4 illustrates the differences in local node degree estimates for the MLE and naïve techniques. Actual numeric estimates are reported in Supplementary Table 3. In these figures, colors map to negative or positive values of
|
In addition to between-experiment variability, Figures 3 and 4 also illustrate within-experiment variability. Observations within a data set that would be equivalent under a naïve paradigm are not when estimated via MLE. For example, in the Krogan06 data r = 2 and u = 0 yield
2.4.2 Global degree analysis
Figure 5 contains density plots for estimates of AP-MS degree using
prior to filtering baits prone to systematic bias,
post-systematic-filtering, and
. The left panel compares degree density estimates for the most recent and larger Gavin06 and Krogan06 AP-MS data sets, and the right panel compares density estimates for the three smaller Gavin02, Ho02 and Krogan04 data sets.
|
Degree density estimates for the Gavin06 and Krogan06 data suggest that both experiments were prone to systematic error, the latter more so than the former. Degree densities for
In contrast to the genome-wide surveys, the earlier, smaller data sets are far less subject to systematic error and in general have much lower degree. Although the graphs contained a similar number of nodes, the degree density for Ho02 weights low degree nodes more heavily than the degree density for the Gavin02 data. Krogan04 baits were selected based on involvement in RNA-processing and prefractionization by high-speed centrifugation was used as an intentional means of investigating smaller protein complexes. Relatively low degrees are expected due to the small complexes under investigation, but some connectivity is observed because of the functional commonality of the baits under study.
The interplay between total graph size and local node degree must be considered when drawing biological conclusions about graph data, particularly in the face of measurement error. For example, the Krogan06 data had the highest expected number of FPs per individual node (Table 2), but the global degree densities for Ho02 and Krogan04 actually experienced the greatest shrinkage toward zero after estimation with
. Estimated degree for Krogan06 is in general much larger than Ho02 or Krogan04, hence the impact of modeling FPs on a local level plays out differently when the graphs are considered from a global point of view. In a second example, the number of nodes and global pTP and pFP estimates for Gavin02 are closest to those for Ho02. However, after modeling of stochastic error, the node degree distribution for Gavin02 in fact much more closely resembles that for the larger Gavin06 and Krogan06 data sets, indicating larger local degree estimates. For practical data analysis, both local degree estimates and global degree distributions provide complementary information about interactome graphs.
2.5 Y2H data results
Rather than select a particular pair of pTP and pFP parameters for estimating node degree via MLE, it is also of interest to explore a range of plausible error probability estimates and their effect on estimated node degree. Chiang et al. (2007) discuss a method of moments approach for estimating a family of solutions for pTP and pFP. After removing VBP proteins prone to systematic bias from the ItoCore01 (Ito et al., 2001) and Uetz00 (Uetz et al., 2000) data sets, we estimated
using the 25th, 50th and 75th percentiles for (pTP, pFP) pairs from the family of solutions for these two parameters and compared the resultant degree densities to the naïve estimate in Figure 6. Interestingly, for the ItoCore01 data the three parameter pairs make very little difference in node degree density (left panel of Fig. 6). For small numbers of observed reciprocated and unreciprocated edges, the MLE estimates are equal for a wide range of error probability combinations. Degree estimate densities for the Uetz00 data are plotted in the right hand panel of Figure 6, and for these lower estimates of pTP, we do see much more variation in resultant MLE density. For both Y2H data sets described here, the MLE estimates at all parameter combinations suggest that in general the naïve approach underestimates degree. We note that in the application of these methods to these two Y2H data sets, we assume the VBP paradigm as described in Section 3.2 is appropriate. These screens did use pooled tests and sampled a limited number of clones for analysis; were these parameters readily available, the bait-specific number of tests could easily be incorporated into analysis.
|
| 3 METHODS |
|---|
|
|
|---|
3.1 Graph theory and notation
A graph can be represented as G = (V, E) in which V is the set of nodes and E is the set of edges between the nodes. The nodes in V represent objects of interest, e.g. proteins, and |V| denotes the size of V, e.g. the number of proteins. The edges in E represent relationships between nodes, e.g. complex co-membership. A graph in which all edges are not directed is called an undirected graph while any graph with one or more directed edges is a directed graph. For an undirected graph, degree of a node is the number of incident edges. For a directed graph, the concept of degree is accommodated by considering the in-edges (in-degree) and out-edges (out-degree) separately. The true interactome graph is assumed to be an undirected graph. Data generated by AP-MS and Y2H technologies form a directed graph assay of the interactome with edges extending from baits to prey.
3.2 VBP graphs
Bait-prey technologies test for all relationships involving a set of pre-specified baits and all other prey expressed under the cellular conditions of the experiment. Experimental results are currently only reported for proteins that are observed in at least one interaction, leaving uncertainty as to which proteins were available at the time of interaction testing. Define V as the set of proteins reported as either bait or prey or both in the set of observed interactions. Let VB denote the set of viable baits defined as baits that detect at least one prey, and let VP denote the set of viable prey defined as the set of proteins detected by at least one bait. Note that VB
VP = V. The VBP graph is a subgraph of the original data induced by the set of nodes common to both VB and VP, or VBP = VB
VP. While it is possible that all nodes in V could be contained in VB and VP, that is not the case for any of the five AP-MS data sets discussed here. Furthermore, even though for all five data sets |VB| < |VP|, it is not the case that VB
VP.
The distributions of unreciprocated in- and out-edges for nodes in the VBP graph are used to diagnose nodes prone to systematic bias using the Binomial method of Chiang et al. (2007). Systematically biased nodes are excluded from further analysis in this report.
3.3 Multinomial model for node degree
Let RT and UT be random variables representing the number of reciprocated and unreciprocated observations for edges incident on the node of interest that exist in truth. Given
and the sensitivity of the technology, pTP, RT and UT are jointly distributed according to a Multinomial model, specifically
|
| (1) |
Both reciprocated and unreciprocated FP edges may also arise, and these are also jointly distributed according to a Multinomial model. Let RF and UF be random variables representing the number of reciprocated and unreciprocated observations for edges incident on the node of interest that do not exist in truth. Given
and the specificity of the technology, 1 – pFP, RF and UF are jointly distributed according to
|
| (2) |
Data arising from bait-prey technologies do not afford direct observation of RT, UT, RF and UF for a node. Rather, the observations are generated by R = RT + RF and U = UT + UF where R and U are random variables representing the total number (both TPs and FPs) of reciprocated and unreciprocated edges incident upon that node, respectively. The joint distribution of R and U is the convolution of the two distributions in (1) and (2), with a truncation factor that accounts for the fact that the VBP graph includes only nodes for which at least one in- and out-edge is observed. In particular, the convolution is divided by
.
The current practice naïve estimate of degree is
. The maximum likelihood estimate of
,
, is that which maximizes the joint probability of observing r and u given estimates of pTP and pFP.
3.4 Gold standard complex co-membership edges and estimation of pTP
Candidate complexes for the AP-MS gold standard data sets were obtained from both GO and MIPS. For the GO repository, the Cellular Component Ontology was searched to identify terms with the entire word complex or the suffixes ase or some. We performed manual curation to exclude the following terms that were not protein complexes but were rather subcellular locations with descriptions containing the words chromosome, endosome, chitosome, or kinetochore: GO:0000794, GO:0000780, GO:0000781, GO:0000784, GO:0000778, GO:0000942, GO:0031902, GO:0045009, GO:0000776. We also manually checked that all selected GO terms used the word complex as a noun. For the MIPS repository, the S. cerevisiae genome database was parsed under the complex catalog. We included all terms containing the word complex in the description but excluded any containing the word complexes so as to avoid miscellaneous collections of multiple complexes. After collection and curation, the set of protein complexes was then merged to form one aggregate set of 335 complexes after deleting redundant protein complexes. The complete set is available in the ScISIC data set in the R package ScISI (version 1.9.7).
For each data set, let
GS represent the total number of edges in the complex co-membership graph induced by the gold standard data set. Reciprocated and unreciprocated observations of this set of edges that we believe to exist in truth are then used to estimate pTP. Equation (1) is maximized for pTP after replacing
by
GS and letting RT and UT represent the number reciprocated and unreciprocated observations on the gold standard set, respectively. Truncation of observations is not a concern in this case since it is possible for none of the gold standard edges incident on any unique node to be observed. Specifically, the maximum-likelihood estimate for pTP is
. Variance for this estimator can be directly calculated to be
, and is estimated by plugging in
.
3.5 Method of moments estimator for pFP
Chiang et al. (2007) describe a method of moments approach for the problem of estimating pFP. We briefly describe their method here. Let |VBP| be the number of nodes in the VBP graph, then the largest number of possible distinct interacting protein pairs is
. Let
VBP be the true number of unique interacting pairs and
the number of non-interacting protein pairs. The expected number of reciprocally adjacent pairs RVBP and unreciprocally adjacent pairs UVBP in the VBP graph are:
|
| (3) |
|
| (4) |
These two independent equations in three parameters {(pTP, pFP,
VBP)} generate a family of solutions in which a value of any one of these parameters determines unique solutions for the other two (Chiang et al., 2007).
| 4 DISCUSSION |
|---|
|
|
|---|
Experimental data from bait-prey technologies are prone to sampling bias, FP and FN observations. While these issues are widely recognized, little has been done to statistically model these errors when estimating global and/or local graph statistics. Straightforward application of likelihood techniques can be used to estimate node degree with more accuracy than the current practice naïve estimate. Simulation studies demonstrate the accuracy of the MLE approach even under misspecification of stochastic error probabilities, and depictions of graphs with known node degree distributions show the extent to which MLE degree estimation can be used to more closely resemble the true distribution. Node degree itself is useful for understanding the range of influence of an individual protein in the cell, and is a helpful contributor for elucidating the identities of a protein's interactors given the possibilities reported in a data set. Accurate estimation of node degree, and many other graph parameters, cannot be overlooked.
Our MLE approach limits analysis to the VBP graph and in this report we compare it to the naïve estimator also used on the VBP graph. In practice, naïve summation is generally applied to the graph including all baits and prey-only proteins, regardless of the severe imbalance in the number of tested edges incident on each. Under perfect sensitivity and specificity, the number of detected interactions for a bait would equal its true degree since all incident edges are tested. On the other hand, for prey-only proteins, only the subset of edges connected to bait proteins is tested. The naïve sum approach generally treats the remaining untested edges as though they are tested but not observed. Hence, even under perfect sensitivity and specificity for tested edges, naïve summation actually only yields a lower bound for the true degree for prey-only proteins. Estimation of degree for nodes outside the VBP set could proceed quite naturally using the Hypergeometric distribution under the assumption that the VBP nodes are a random sample from the population of proteins. If the VBP nodes are not a random sample then it is not, in general, possible to make specific statements about the probability that a prey is observed. The AP-MS and Y2H data sets discussed here give no evidence that the set of VBP nodes is a representative random sample of proteins in the cell, hence extrapolation of the results to untested portions of the interactome using either the MLE or naïve degree estimation approach would be largely irrelevant.
Focusing on the VBP graph for data analysis does therefore demand careful interpretation of the term degree. Degree in this context represents the number of interactions between viable baits that were also viable prey, and is therefore representative of both bait selection and the set of constitutive proteins under the experimental conditions. For intended genome-wide experiments such as Gavin06 and Krogan06 or for smaller-scale experiments with baits targeting specific cellular processes such as Krogan04, interpretation is more straightforward. Experiments between the two extremes of genome-wide assay and those targeting specific cellular functions, e.g. Gavin02, Ho02, ItoCore01 and Uetz00 make the interpretation of VBP degree slightly more difficult. That said, degree density estimates for these experiments do point to different characteristics of the detected interactions. While AP-MS and Y2H data are used here to demonstrate MLE degree estimation, the technique is equally applicable and similar conclusions can be made for data generated by other bait-prey technologies for which the VBP paradigm is appropriate.
One limitation of the MLE estimation method in its current formulation is that the same estimate of degree is made for all nodes with the same numbers of observed unreciprocated and reciprocated edges within a data set. In reality, protein-specific covariates, e.g. PFAM domains, likely contribute to variability in the observed data. As further relationships between these types of protein characteristics and node degree are uncovered, they can easily be accommodated into the MLE paradigm by introducing protein-specific FP and FN error probabilities.
Classic statistical techniques are readily applicable to graph feature analysis, and in this specific example yield accurate estimates of node degree. This work represents a first step toward rigorous handling of experimental noise in bait-prey data when estimating graph statistics. Likelihood methods are also appropriate for estimation of other features of interactome behavior, and a move beyond naïve summary statistics of observed data is both needed and warranted. Provided data are reported with bait-prey distinctions, sampling, systematic errors and stochastic errors can be easily addressed. Such approaches promise greatly increased accuracy in estimation of global and local statistics, as well as more holistic model development.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
We acknowledge funding through the Human Frontiers Science Program Research Grant RGP0022/2005 to W.H. and R.G., also in support of T.C.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Chris Stoeckert
Received on September 7, 2007; revised on November 7, 2007; accepted on November 7, 2007
| REFERENCES |
|---|
|
|
|---|
Barabási A.-L, Albert R. Emergence of scaling in random networks. Science (1999) 286:509–512.
Chiang T, et al. Coverage and error models of protein-protein interaction data by directed graph analysis. Genome Biol. (2007) 8:R186.[CrossRef][Medline]
Collins S, et al. Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae. Mol. Cell. Proteomics (2007) 6:439–450.
Deng M, et al. Assessment of the reliability of protein-protein interactions and protein function prediction. Pac. Symp. Biocomput. (2003) 8:140–151.
D'haeseleer P, Church G. Estimating and improving protein interaction error rates. In: Proceedings of the IEEE Computational Systems Bioinformatics Conference August 16-19 2004 (2004) California: IEEE Computer Society. 216–223.
Gavin AC, et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature (2002) 415:141–147.[CrossRef][Medline]
Gavin AC, et al. Proteome survery reveals modularity of the yeast cell machinery. Nature (2006) 440:631–636.[CrossRef][Medline]
Hart GT, et al. How complete are current yeast and human protein-interaction networks? Genome Biol. (2006) 7:120.[CrossRef][Medline]
Ho Y, et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature (2002) 415:180–183.[CrossRef][Medline]
Ito T, et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl Acad. Sci. USA (2001) 98:4569–4574.
Krogan N, et al. High-definition macromolecular composition of yeast RNA-processing complexes. Mol. Cell (2004) 13:225–239.[CrossRef][Web of Science][Medline]
Krogan N, et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature (2006) 440:637–643.[CrossRef][Medline]
Li L, et al. Towards a theory of scale-free graphs: Definition, properties, and implications. Internet Mathe. (2006) 2:4.
Mewes H, et al. MIPS: analysis and annotations of proteins from whole genomes. Nucleic Acids Res. (2004) 32:D41–D44.
The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology. Nat. Genet. (2000) 25:25–29.[CrossRef][Web of Science][Medline]
Uetz P, et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature (2000) 403:623–627.[CrossRef][Medline]
This article has been cited by other articles:
![]() |
H. Huang and J. S. Bader Precision and recall estimates for two-hybrid screens Bioinformatics, February 1, 2009; 25(3): 372 - 378. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||








