Skip Navigation


Bioinformatics Advance Access originally published online on May 8, 2008
Bioinformatics 2008 24(13):1510-1515; doi:10.1093/bioinformatics/btn220
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/13/1510    most recent
btn220v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Zampieri, M.
Right arrow Articles by Altafini, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zampieri, M.
Right arrow Articles by Altafini, C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Discerning static and causal interactions in genome-wide reverse engineering problems

Mattia Zampieri , Nicola Soranzo and Claudio Altafini *

SISSA-ISAS, International School for Advanced Studies, via Beirut 2-4, 34014 Trieste, Italy

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 REFERENCES
 

Background: In the past years devising methods for discovering gene regulatory mechanisms at a genome-wide level has become a fundamental topic in the field of systems biology. The aim is to infer gene-gene interactions in an increasingly sophisticated and reliable way through the continuous improvement of reverse engineering algorithms exploiting microarray data.

Motivation: This work is inspired by the several studies suggesting that coexpression is mostly related to ‘static’ stable binding relationships, like belonging to the same protein complex, rather than other types of interactions more of a ‘causal’ and transient nature (e.g. transcription factor–binding site interactions). The aim of this work is to verify if direct or conditional network inference algorithms (e.g. Pearson correlation for the former, partial Pearson correlation for the latter) are indeed useful in discerning static from causal dependencies in artificial and real gene networks (derived from Escherichia coli and Saccharomyces cerevisiae).

Results: Even in the regime of weak inference power we have to work in, our analysis confirms the differences in the performances of the algorithms: direct methods are more robust in detecting stable interactions, conditional ones are better for causal interactions especially in presence of combinatorial transcriptional regulation.

Contact: altafini{at}sissa.it

Supplementary information: Supplementary data are available at Bioinformatics online.


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 REFERENCES
 
In the field of systems biology, the possibility of using the information provided by high-throughput measurements in order to infer interactions between genes represents a first step towards a comprehensive understanding of a biological system in terms of gene functions, ‘partner genes’, conditions for activation and dynamical behavior. The reconstruction of a gene network (Bansal et al., 2007; De Jong, 2002; Gardner and Faith, 2005) is a very challenging problem since biological systems are difficult to perturb and perturbations/experiments are typically much less than the number of dynamical variables composing the system (Faith et al., 2007). Several methods exist to reverse engineer collections of experimental data, relying on more or less sophisticated statistical analysis of gene expression profiles and modeling frameworks, like Bayesian networks (Husmeier, 2003; Jansen et al., 2003) and Boolean networks (Kauffman, 1969; Shmulevich et al., 2002), or linear and non-linear ordinary differential equations (ODEs) (Yeung et al., 2002).

We focus here on two other classes of algorithms, called relevance networks and graphical models. They are computationally more tractable than most of the methods mentioned above and can therefore be applied in a truly genome-wide context. They consist essentially in computing a two-point similarity measure between gene pairs which is then used to weight the edges of a graph. The higher the weight, the more likely the two genes interact in some way. The similarity measures we used are Pearson correlation (P) (Butte and Kohane, 1999), mutual information (MI) (Butte and Kohane, 2000), partial Pearson correlation (CP) (de la Fuente et al., 2004), conditional mutual information (CMI) and graphical Gaussian model (GGM) (Schäfer and Strimmer, 2005). The first two metrics are based on direct gene–gene coexpression, while the remaining three perform a conditioning operation on the two-point measure, conditioning which can depend on a single third gene (CP and CMI) or on the remaining n–2 genes (GGM), see Soranzo et al. (2007) for details. Hereafter we will refer to the first two as ‘direct metrics’ and to the last three as ‘conditional metrics’.

Relevance networks and GGM have been extensively used in recent years (Ma et al., 2007) and their results have been validated experimentally, for example in Basso et al. (2005), where the analysis is based on a similarity index related to CMI (Margolin et al., 2006), or in Faith et al. (2007) where coexpression is used to investigate combinatorial regulation.

Of the various types of interactions between DNA, genes and gene products, we shall focus in this work on two categories, one describing the coparticipation of genes in a protein complex, the other the transcriptional activation of a gene due to its transcription factors (TFs). Macroscopically the first category can be considered as a more ‘static’ association where gene products have to be expressed in a constant stoichiometric ratio (Nomura, 1999; Planta, 1997), while the second a cause–effect relationship between genes. Based on our current knowledge (Balaji et al., 2006; Spirin and Mirny, 2003; Yu et al., 2006, see also Fig. 1), the interaction networks representing protein complexes (PCs) and transcription factor–binding site (TF–BS) are roughly characterizable by means of different ‘recurrent’ regulatory motifs, that for simplicity we denote ‘dense modules’ and ‘causal modules’. The dense modules for PC represent undirected subgraphs in which all nodes are mutually connected. The modules for the TF–BS, instead, are directed subgraphs constructed with a ‘scale-free like’ connectivity, but overall sparse graphs. In addition, in order to represent the combinatorial effect of multiple TFs on a target gene, the input degrees are normally higher than the output degrees.


Figure 1
View larger version (35K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Regulatory modules. Schemes of the two regulatory motifs: in the top drawing a dense module, where all nodes are mutually connected. In the bottom one a causal module, i.e. directed graph accounting for only a few feedback loops and multiregulated genes. The former is representing a PC, the latter (multiple) TFs acting on their BSs. PCs can be characterized as sets of proteins that interact closely with each other. As a matter of fact, searching for highly connected subgraphs is a common predictor for PCs (Spirin and Mirny, 2003; Yu et al., 2006). Hence in our artificial network a dense subgraph represents a PC. On the other hand, a statistical description showing a non-uniform connectivity degree on an oriented and globally sparse graph emerges from the analysis of the E.coli and S.cerevisiae known TF–BS interactions, see Figure 2. It is taken here as a paradigm for the TF–BS modules in our artificial network.

 

Figure 2
View larger version (17K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Regulatory modules size distribution. Log-scale distribution of PC size (A), and number of TFs per gene (B) for different organisms. In yeast for example the largest complex is the cytoplasmic ribosome accounting for 81 genes, while in E.coli it is the flagellum complex composed of 24 genes. Both distributions hint at an increase in the size of the regulatory motifs as the complexity of the organisms increase.

 
It is worth noticing that these two types of regulatory motifs can characterize the complexity of an organism. Going from unicellular prokaryote (Escherichia coli) and eukaryote (Saccharomyces cerevisiae) to mammals (human, rat and mouse), the distribution of annotated protein complexes shows an heavier tail towards bigger complexes (Fig. 2A). The same happens looking at the combinatorial effect of multiple TFs, (Fig. 2B; Lee and Rieger, 2007; Lee et al., 2002).

In this article, a comparison between the two classes of similarity metrics cited above (direct and conditional) is performed with the aim of analyzing their ability to infer regulatory networks characterized by the above mentioned topological structures, in a completely unsupervised manner. The five different similarity metrics are tested on an artificial and two real networks. The artificial network is meant to enable the evaluation under controlled conditions, like a well-defined topology and known kinetics governing the system (see section 2). In the two cases of real data, the identification of true positive (TP) edges relies instead on the real physical networks of PC and TF–BS relationships collected from the literature. We choose two simple organisms, a prokaryote (E.coli) and an eukaryote (S.cerevisiae), in order to test the consistency of the two regulatory structures for the different algorithms. For these two organisms sufficiently many PC and TF–BS have been annotated and large collections of gene-expression profiles can be gathered from online repositories.

The comparison of the inference power of the five algorithms shows that the gene interactions associated to coparticipation in the same protein complex are better detected by the direct methods, while those associated to the combinatorial effect of multiple TFs are better retrieved by the conditional metrics (in particular by the GGM). Apart from comparing the performances of the algorithms on the different topologies, we also aim at evaluating them on modules of different sizes (i.e. for larger PCs or with increasing numbers of TFs acting on the same BS). For this purpose it is convenient to rank the weights of each similarity matrix and look at the percentage of TPs (with respect to the total number of true edges) in the highest 1% of weights. This procedure allows us to make an unbiased and unsupervised comparison between different metrics. For the dense modules case, it is also possible to specify how well the reconstructed dense modules truly correspond to known PCs. If we do so by means of a clustering algorithm on the inferred graphs, we see that indeed the direct metrics are those allowing the most faithful PC reconstruction.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 REFERENCES
 
2.1 Artificial datasets
The artificial gene-expression datasets are generated by the reaction kinetics-based system of coupled non-linear continuous time ODEs introduced in Mendes et al. (2003) and already used by us in Soranzo et al. (2007). A scale-free matrix of adjacencies, representing the causal module, is superimposed to a matrix of densely connected subsets of nodes representing the stable modules (see Fig. S1 in the Supplementary Material). The rate law for the mRNA synthesis of a gene is obtained by multiplying together the sigmoidal-like contributions of the genes identified as its inhibitors and activators. Consider the i-th row of A, i=1,...,n, and choose randomly a sign to its non-zero indexes. Denote by j1,...,ja the indexes with assigned positive values (activators of the gene xi) and with k1,...,kb the negative ones (inhibitors of xi). The ODE for xi is then:


Formula 1

(1)
where Vi represents the basal rate of transcription, {theta}i,j (respectively {theta}i,k) the activation (resp. inhibition) half-life, {nu}i,j (resp. {nu}i,k) the activation (resp. inhibition) Hill coefficients (in our simulations: {nu}i,j, {nu}i,k isin{1, 2, 3, 4}), and {lambda}i the degradation rate constants. In the multiple in silico experiments, the perturbations of the system are performed by means of random initial conditions, plus ‘gene knockouts’ (obtained setting to 0 the expression of the selected gene in the corresponding differential equation). A Gaussian measurement noise is added to corrupt the output. The number of experiments has been chosen to be comparable with the real cases, where we have a ratio experiments/genes of approximately one to six, while the number of complexes and their size have been sampled from a log-normal distribution with a maximum size of 50 genes composing a complex. This choice is consistent with what is observed in the real organisms and gives rise to a manageable number of nodes in the simulation of gene-expression profiles (2154).

2.2 Data collected
We downloaded the M3D ‘Many Microbe Microarrays Database’ (build E-coli-v3-Build-1) (Faith et al., 2008) for E.coli (445 experiments for 4345 genes). For S.cerevisiae we compiled a collection of microarrays containing experiments performed with cDNA chips (958 experiments for 6203 Open Reading Frame). Both datasets were preprocessed and normalized prior to network inference.

PC network for yeast was downloaded from the MPACT subsection of the CYGD database at MIPS (Güldener et al., 2006). Only the complexes annotated from the literature and not those obtained from high-throughput experiments (according to the MIPS classification scheme these last are labeled ‘550’) were considered to limit the high rate of false positives. PC sizes for human, rat and mouse were downloaded from CORUM database (Ruepp et al., 2008), while for E.coli from the EcoCyc website (Karp et al., 1999). We obtained TF–BS networks from the RegulonDB database, version 5.6, for E.coli (Salgado et al., 2006), and from a recent collection (Balaji et al., 2006) for S.cerevisiae. See Supplementary Material for more details.

2.3 Similarity measures
Let m be the number of experiments available and n the number of genes. Assume Xi and Xj, i,j=1,...,n, are random variables representing the genes, and xi({ell}), xj({ell}), {ell}=1,...,m, their sample measurements. The matrices of edges weights are computed using the following five algorithms, see Soranzo et al. (2007) and references therein for details:

  • Pearson correlation (direct):


    Formula 2

    (2)
    where Formula , vi and Formula , vj are means and variances of xi and xj over the m experiments and E[·] denotes expectation.

  • Partial Pearson correlation (conditional):


    Formula 3

    (3)

  • Graphical Gaussian model (conditional):


    Formula 4

    (4)
    where {Omega}=({omega}ij) is R–1 if R–1 exists, it is the small-sample estimate of (Schäfer and Strimmer 2005) when R is not full-rank. De facto, RCall is computed by means of the R package GeneNet version 1.0.1, available from CRAN (http://cran.r-project.org).

  • Mutual information (direct):


    Formula 5

    (5)
    where p({varphi}i) is the probability mass function p({varphi}i)=Pr(Xi={varphi}i), {varphi}i in the alphabet H, and likewise for the joint probability function p({varphi}i,{varphi}j).

  • Conditional mutual information (conditional):


    Formula 6

    (6)

2.4 Criteria in the comparison of the five algorithms
In order to compare the performances of the metrics of Section 2.3, we use two different types of criteria. The first is a standard precision/recall curve [see e.g. Soranzo et al. (2007) for a definition] performed on each of the five (full) matrices of similarity measures. The second criterion consists in testing the ability of the different algorithms in retrieving the two types of regulatory moduli (causal and dense) as a function of their dimension. For this task it is useful to look only at the first percentile of edge weights (i.e. top 1% of edges sorted by their weights). As inference is performed on the entire genome, this first percentile corresponds to 23 188, 94 373 and 192 355 edges in the artificial, E.coli and S.cerevisiae networks reconstructions, respectively (see Table S1 in the Supplementary Material for the corresponding numbers of true edges). As a matter of fact, choosing a threshold for the edges based on one of the commonly used parameters (edges weights, bootstrapping, area under the ROC curve, etc.) leads to very different pruning in the five inferred matrices, see Figure S1 in the Supplementary Materail, making a comparison more troublesome. For the two types of regulatory structures (causal and dense) the true edges are binned according to the size of the module they belong to. The recall (i.e. the percentages of TP over the total number of true edges) of the reconstructed network for each bin (of module size) is used to evaluate how the reconstructions vary with size (shown in Fig. 4 and 6). More standard curves such as precision/recall (Fig. 3) or ROC curves (Fig. S2 and S3) cannot show the dependence on the module size of the algorithms.


Figure 3
View larger version (18K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. Precision versus recall curves for PC and TF–BS networks. Top row: precision/recall curves of the five different similarity metrics in the PC reconstructions for the artificial (A), E.coli (B) and S.cerevisiae (C) datasets. In all three cases the two direct metrics (P and MI) seem to be performing better than the corresponding conditional metrics. The curves are very high in the artificial network case because the density of true PC edges is higher than in the two organisms, see Table S1. Bottom row: precision/recall curves of the five different similarity metrics in the TF–BS reconstructions for the artificial (D), E.coli (E) and S.cerevisiae (F) datasets. In absolute terms, the inference power is much lower than for PC. Notice, however, how the conditional metrics still give the best results (in E.coli, P and MI are performing slightly better than CP and CMI, but GGM is still outperforming all the others; compare also Fig. 6B).

 

Figure 4
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4. Dense modules and PCs. Recall (i.e. percentage fraction of TP in the first percentile of edges) for the five different similarity metrics for increasing size of the PCs, in: (A) artificial dataset (B), E.coli and (C) S.cerevisiae. In all three cases, considering the percentage of TPs for the whole PC network the two direct metrics can be ranked in the same order (P followed by MI) and are performing better than the corresponding conditional metrics.

 
2.5 Clustering
Only the edges in the most significant percentile (i.e. the top 1%) of weights are retained, and the resulting graph is decomposed using a simple hierarchical clustering algorithm, with weighted average linkage as cost of merging, and taking a fixed number of clusters (300 in Fig. 5). This procedure allows to identify the most connected components, which are then matched with the dense modules/PCs. This matching is fairly robust with respect to the choice of the number of clusters (compare the statistics for 300 and 500 clusters in Tables S2 and S3 of the Supplementary Material).


Figure 5
View larger version (15K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 5. Clustering dense modules/PC. For the artificial (A), E.coli (B) and S.cerevisiae (C) networks, the color scale represents the percentage of PCs belonging to a single cluster (darkest), two clusters, three clusters and more than three (lightest). P and MI are almost unanimously outperforming the three conditional metrics (the only exception being GGM for E.coli).

 

    3 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 REFERENCES
 
3.1 Artificial dataset
The procedure used to construct the artificial network is such that dense regulatory modules (of different sizes) are numerous enough to compare the inference power among the different algorithms in a statistically relevant manner. Our results (Fig. 4A) show that increasing the size of the dense modules, conditional metrics perform worse than direct metrics. Also the clustering shows the same qualitative difference and in fact the best results are obtained for the direct measures (P, MI). In Figure 5 the percentage of complexes completely contained in: one cluster, two clusters, three clusters and more than three are shown.

On the other hand for the causal modules (Fig. 6A), the performances of the conditional metrics are higher than the direct ones in correspondence of the largest modules. Notice how for all five algorithms the absolute performances drop dramatically when the number of TFs increases, as we expect due to the more complicated combinatorial effect.


Figure 6
View larger version (15K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 6. Combinatorial transcription regulation. Recall for TF–BS modules with increasing number of TF on the same BS, in: artificial (A), E.coli (B), and S.cerevisiae (C) datasets. In all three plots the downward trends in the ability to recover causal modules is visible and in all three the conditional measures seem to outperform the direct measures when combinatorial complexity increases.

 
As for the precision/recall curves of Figure 3, the qualitative difference between direct and conditional metrics in the two regulatory structures are substantially confirmed, although in the TF–BS curve (bottom row) the differences are minimal (precision is much lower than in PC).

3.2 E.coli dataset
Owing to the different genome organization and architecture, in prokaryotes regulatory mechanisms are much simpler than in eukaryotes. Genes are organized in transcriptional units, with one promoter for many consecutive genes, a feature absent in monocistronic eukaryotic DNA. E.coli has only a few large complexes and also the combinatorial regulation of transcription is lower, so we expect the different algorithms to have more similar performances. We calculate the precision/recall curves of the five different metrics (Fig. 3B,E) and plot the percentage of TPs in the most significant percentile of edges, for increasing sizes of the PCs (Fig. 4B) and combinatoriality of TFs (Fig. 6B). PCs are identified slightly better by the two direct metrics, although the number of relatively large complexes is too low to have statistical significance.

The different performances emerging from the clustering (Fig. 5B) indicate that the highest correspondence between PCs and clusters are provided by GGM followed by P and MI. Considering as an example the flagellum complex (accounting for 24 genes), if the clustering procedure is performed by means of P and MI, the complex belongs entirely to a single cluster, which contains also other genes functionally related to the flagellum, like chemotactic genes and other genes involved in flagellar biogenesis and motility. Instead for CMI, CP and GGM this complex is split respectively into 6, 8 and 2 clusters. Regarding TF–BS relationships, we expect the ability in recovering true interactions to be inversely proportional to the multiplicity of TFs. This is particularly true for the algorithms performing well on low multiplicity TF (P, MI and GGM) while CMI has a counterintuitive slightly positive trend for multiregulated targets.

3.3 S.cerevisiae dataset
In S.cerevisiae, Figure 4C shows clearly that for small complexes the performances of conditional metrics are comparable with those of P and MI, up to a critical size above which the inference power of CMI and GGM remains almost constant while the direct metrics increase their percentage of TPs. The results are consistent with the ones obtained for the artificial data. Qualitatively the results on the two organisms are the same, although the percentages of TP are higher in the simpler one (see also Fig. 3). In addition, the critical size of the dense modules for which conditional similarities start to fail is almost similar to the one obtained in the artificial network and E.coli, suggesting an intrinsic peculiarity of such similarity metrics. The clustering performances (Fig. 5C) for the five algorithms are coherent with those of the E.coli and artificial networks and once again better results are obtained for the P and MI metrics. If we move to the network of TF–BS (Fig. 6C), we immediately notice that all three conditional metrics perform better than the direct ones, although in absolute terms results are much worse than for E.coli. One reason for the low inference power regarding TF–BS could be that regulation is not just combinatorial but also combinatorially different in different environmental conditions. Another could be that TFs do not show the large variations in expression that can be seen for the corresponding regulated genes, but instead keep their expressions at low basal levels (Fig. S6).


    4 DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 REFERENCES
 
Comparing genome-wide networks inferred by means of different similarity metrics is not a simple task because of the very low ratio TP/total number of possible edges. Nevertheless, the results reported show that indeed different reverse engineering algorithms have performances which are tailored to different ‘characteristic’ regulatory modules. PCs are characterized by a very stable binding and give rise to a sort of post-transcriptional regulation, where gene products have to be expressed in a constant stoichiometric ratio and are mutually dependent one from the other, features absent in cause–effect relationships such as transcriptional activation. For the network generated with the model and the two real ones, we tested the ability to recover dense modules/PCs and causal transcriptional modules of different sizes. Several important observations emerge from the results. The critical size of a dense module for which direct similarity measures begin to perform better than the corresponding conditional ones is between 10 and 20 on both artificial and real datasets. The dense modules that characterize PCs are better captured by direct similarity measures, especially for large dense modules. This is almost the same in both organisms, in spite of the different complexity and the low experiments/genes ratio. On the contrary, the conditional similarity measures are more suited to deal with causal dependencies such as TF–BS interactions, especially when the combinatorial complexity of the regulation increases. It is evident and predictable that the ability to recover TF–BS interactions is roughly inversely proportional to the number of TFs regulating a gene. At the same time, it is worth pointing out that conditional metrics are more robust in taming this multiplicity effect of TFs. Needless to say the inference power of all the algorithms is higher in the simpler organism, for both PC and TF–BS networks. This reflects the more complex eukaryote regulation, as deducible also from Figure 2. Finally it is worth remarking that although direct metrics are better at detecting ‘static’ interactions and conditional metrics at detecting causal ones, in absolute terms all algorithms are far more powerful at discovering the static than the causal gene–gene dependencies (as can be deduced comparing the first and second rows of Fig. 3, or comparing the recall percentages of Fig. 4 and Fig. 6).


    5 CONCLUSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 REFERENCES
 
The predictive power of a reverse engineering algorithm is clearly a function of several aspects. First of all system complexity, data quality and numerosity. In addition, inference power depends on the type of interaction and the associate topology. Showing as we do in this article that indeed the algorithms yield different performances coherently with the features they are meant to extrapolate from the data (direct for static and stable interactions, conditional for causal interactions) is already a significant and encouraging observation.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Olga Troyanskaya

Received on January 16, 2008; revised on April 17, 2008; accepted on May 1, 2008

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 REFERENCES
 

    Balaji S, et al. Comprehensive analysis of combinatorial regulation using the transcriptional regulatory network of yeast. J. Mol. Biol. (2006) 360:213–227.[CrossRef][Web of Science][Medline]

    Bansal M, et al. How to infer gene networks from expression profiles. Mol. Syst. Biol. (2007) 78. Article 3.

    Basso K, et al. Reverse engineering of regulatory networks in human B cells. Nat. Genet. (2005) 37:382–390.[CrossRef][Web of Science][Medline]

    Butte AJ, Kohane IS. Unsupervised knowledge discovery in medical databases using relevance networks. Proc. AMIA Symp. (1999) 711–715.

    Butte AJ, Kohane IS. Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Pac. Symp. Biocomput. (2000) 418–429.

    De Jong H. Modeling and simulation of genetic regulatory systems: a literature review. J. Comput. Biol. (2002) 9:67–103.[CrossRef][Web of Science][Medline]

    de la Fuente A, et al. Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics (2004) 20:3565–3574.[Abstract/Free Full Text]

    Faith JJ, et al. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. (2007) 5:54–66.[CrossRef][Web of Science]

    Faith JJ, et al. Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata. Nucleic Acids Res. (2008) 36:D866–D870. (Database issue).[Abstract/Free Full Text]

    Gardner TS, Faith JJ. Reverse-engineering transcriptional control networks. Phys. Life Rev. (2005) 2:65–88.[CrossRef]

    Güldener U, et al. MPact: the MIPS protein interaction resource on yeast. Nucleic Acids Res. (2006) 34:D436–D441. (Database issue).[Abstract/Free Full Text]

    Husmeier D. Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks. Bioinformatics (2003) 19:2271–2282.[Abstract/Free Full Text]

    Jansen R, et al. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science (2003) 302:449–453.[Abstract/Free Full Text]

    Karp PD, et al. Eco Cyc: encyclopedia of Escherichia coli genes and metabolism. Nucleic Acids Res. (1999) 27:55–58.[Abstract/Free Full Text]

    Kauffman SA. Metabolic stability and epigenesis in randomly constructed genetic nets. J. Theor. Biol. (1969) 22:437–467.[CrossRef][Web of Science][Medline]

    Lee D-S, Rieger H. Comparative study of the transcriptional regulatory networks of E. coli and yeast: structural characteristics leading to marginal dynamic stability. J. Theor. Biol. (2007) 248:618–626.[CrossRef][Web of Science][Medline]

    Lee TI, et al. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science (2002) 298:799–804.[Abstract/Free Full Text]

    Ma S, et al. An Arabidopsis gene network based on the graphical Gaussian model. Genome Res. (2007) 17:1614–1625.[Abstract/Free Full Text]

    Margolin A, et al. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics (2006) 7((Suppl. 1)):S7.

    Mendes P, et al. Artificial gene networks for objective comparison of analysis algorithms. Bioinformatics (2003) 19((Suppl. 2)):ii122–ii129.[Abstract]

    Nomura M. Regulation of ribosome biosynthesis in Escherichia coli and Saccharomyces cerevisiae: diversityand common principles. J. Bacteriol. (1999) 181:6857–6864.[Free Full Text]

    Planta RJ. Regulation of ribosome synthesis in yeast. Yeast (1997) 13:1505–1518.[CrossRef][Web of Science][Medline]

    Ruepp A, et al. CORUM: the comprehensive resource of mammalian protein complexes. Nucleic Acids Res. (2008) 36:D646–D650. (Database issue).[Abstract/Free Full Text]

    Salgado H, et al. RegulonDB (version 5.0): Escherichia coli K-12 trans-criptional regulatory network, operon organization, and growth conditions. Nucleic Acids Res. (2006) 34:D394–D397. (Database issue).[Abstract/Free Full Text]

    Schäfer J, Strimmer K. An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics (2005) 21:754–764.[Abstract/Free Full Text]

    Shmulevich I, et al. Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks. Bioinformatics (2002) 18:261–274.[Abstract/Free Full Text]

    Soranzo N, et al. Comparing association networkalgorithms for reverse engineering of large-scale gene regulatory networks: synthetic versus real data. Bioinformatics (2007) 23:1640–1647.[Abstract/Free Full Text]

    Spirin V, Mirny LA. Protein complexes and functional modules in molecular networks. Proc. Natl Acad. Sci. USA (2003) 100:12123–12128.[Abstract/Free Full Text]

    Yeung MKS, et al. Reverse engineering gene networks using singular value decomposition and robust regression. Proc. Natl Acad. Sci. USA (2002) 99:6163–6168.[Abstract/Free Full Text]

    Yu H, et al. Predicting interactions in protein networks by completing defective cliques. Bioinformatics (2006) 22:823–829.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
A. Ruepp, B. Waegele, M. Lechner, B. Brauner, I. Dunger-Kaltenbach, G. Fobo, G. Frishman, C. Montrone, and H.-W. Mewes
CORUM: the comprehensive resource of mammalian protein complexes--2009
Nucleic Acids Res., November 1, 2009; (2009) gkp914v1.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Reverter and E. K. F. Chan
Combining partial correlation and an information theory approach to the reversed engineering of gene co-expression networks
Bioinformatics, November 1, 2008; 24(21): 2491 - 2497.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/13/1510    most recent
btn220v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Zampieri, M.
Right arrow Articles by Altafini, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zampieri, M.
Right arrow Articles by Altafini, C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?