Skip Navigation


Bioinformatics Advance Access originally published online on August 5, 2004
Bioinformatics 2005 21(1):135-136; doi:10.1093/bioinformatics/bth458
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/1/135    most recent
bth458v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (22)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Carey, V. J.
Right arrow Articles by Gentleman, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Carey, V. J.
Right arrow Articles by Gentleman, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics vol. 21 issue 1 © Oxford University Press 2005; all rights reserved.

Network structures and algorithms in Bioconductor

Vincent J. Carey 1,*, Jeff Gentry 2, Elizabeth Whalen 2 and Robert Gentleman 2

1 Channing Laboratory, Brigham and Women's Hospital 75 Francis Street, Boston, MA 02115, USA
2 Division of Biostatistics, Dana Farber Cancer Institute 44 Binney Street, Boston, MA 02115, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 DESCRIPTION
 EXAMPLES
 DISCUSSION
 REFERENCES
 

Summary: In this paper, we review the central concepts and implementations of tools for working with network structures in Bioconductor. Interfaces to open source resources for visualization (AT&T Graphviz) and network algorithms (Boost) have been developed to support analysis of graphical structures in genomics and computational biology.

Availability: Packages graph, Rgraphviz, RBGL of Bioconductor (www.bioconductor.org).

Contact: stvjc{at}channing.harvard.edu


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 DESCRIPTION
 EXAMPLES
 DISCUSSION
 REFERENCES
 
Network structures are ubiquitous in genomics and computational biology. The recent monograph of Collado-Vides and Hofestädt (2002) is devoted to genomic network modeling. A number of software initiatives in bioinformatics has targeted different aspects of network visualization and analysis. For recent examples and references, refer to Han et al., (2004) and the Reactome project (www.reactome.org). In this paper, we describe methods that are developed in the Bioconductor project (www.bioconductor.org) to provide network data structures and algorithms for use in R. Key principles of design and implementation are object-oriented programming, leveraging existing open source software and support for standardized representation and serialization. The integration of these components within R provides users with the software infrastructure needed to address bioinformatic problems related to graphs and networks.


    DESCRIPTION
 TOP
 ABSTRACT
 INTRODUCTION
 DESCRIPTION
 EXAMPLES
 DISCUSSION
 REFERENCES
 
graph package. The graph package is a collection of structures and methods for working with graphs (sets of nodes and edges) in R. The base virtual class graph, has a single slot, edgemode, which may take value ‘directed’ or ‘undirected’. Classes that extend graph include graphNEL (graph represented by Node and Edge-List), distGraph [graph represented by an R distance (dist) structure] and clusterGraph (a representation of clustered data).

Generic methods have been defined for the base class, including acc (return vector of names of accessible nodes relative to a given node), complement, connComp (return list of node names defining connected components), degree, dfs (depth first search), intersection and union. When appropriate, these methods return graph instances; otherwise, the generics define the ‘contract’ that must be followed to secure consistent behavior for instances of subclasses. As application requirements emerge additional generics for the base class will be introduced.

Rgraphviz package. Graph visualization depends upon layout, fixing locations of nodes, edges and labels on a plotting surface. Various classes of layout algorithms have been developed, and the AT&T Graphviz system is an open source implementation suited to rendering large graphical structures encountered in telecommunications and complex software engineering tasks (North et al., 1998, http://www.graphviz.org). The Graphviz system defines an abstract data type agraph and a large collection of C utilities for populating and visualizing graph structures and associated textual information. The agraph structure is reflected in an R object, and layout and annotation information are used by the plot method for instances of class graph. Default renderings are fairly basic; users with significant enhancement needs (e.g. font changes, edge and node coloring) can save the layout information and use standard R text and line methods to create a customized visualization of a graphical structure.

RBGL package. The Boost project is an open source implementation of C++ libraries focusing on STL methodologies with an emphasis on portability. The Boost Graph Library (Siek et al., 2001) is a C++ library of network data structures and algorithms. The RBGL package defines interfaces from R to BGL routines. At present, procedures have been interfaced for minimum spanning tree construction, shortest path discovery, depth-first search, topological sorting, edge connectivity measurement and connected component decomposition.


    EXAMPLES
 TOP
 ABSTRACT
 INTRODUCTION
 DESCRIPTION
 EXAMPLES
 DISCUSSION
 REFERENCES
 
Figure 1 illustrates the use of Graphviz layout and Rgraphviz markup capabilities to deliver an interactively navigable representation of the signal transduction pathway. The graph can be interrogated using R's standard identify function, for a variety of possible purposes. For example, a user may wish to subset a collection of microarray probes to retain only those annotated to some elements of this pathway.



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 1 A rendering of the signal transduction pathway using Rgraphviz. Red arcs indicate antagonistic relationships, green arcs indicate supportive relationships, blue text indicates membrane-associated components and purple text indicates biological processes.

 
Figure 2 illustrates the advantage of combining biological annotation resources, statistical graphics and network layout methods in R. Pie chart and legend computation is routine in R, the graphical structure of the Gene Ontology (Ashburner et al., 2000) is obtained from the Bioconductor GO package, and layout was obtained from Rgraphviz. The figure clearly indicates non-homogeneity in the distribution of phenotypes across cellular component terms, and the graph can be interactively interrogated using R's standard identify function to obtain the terms associated with nodes of interest.



View larger version (42K):
[in this window]
[in a new window]
 
Fig. 2 GO node composition display. Nodes of the graph are elements of the GO cellular component subontology, and arcs are directed from specific to more general terms. Nodes are positioned in the plane using Graphviz ‘dot’ layout, and are occupied by pie charts indicating the relative frequencies of ALL/AF4, BCR and NEG samples in the set of differentially expressed genes annotated to the GO terms.

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 DESCRIPTION
 EXAMPLES
 DISCUSSION
 REFERENCES
 
Our novel integration of graph data types, graph layout and graph algorithm resources with a full-featured data analysis and visualization environment has proven highly productive in genomic and proteomic applications. New Bioconductor packages such as GOstats, GraphAT and apComplex make essential use of this infrastructure. As standard representation methods and curated data resources for genomic networks development, we will enrich the methods and resources for working with these in Bioconductor.


    Acknowledgments
 
Support for the development of this software came from the High Tech Industry Multidisciplinary Research Fund at the Dana-Farber Cancer Institute and NIH grant no. 1R33 HG002708: ‘A Statistical Computing Framework for Genomic Data’.

Received on May 28, 2004; revised on July 12, 2004; accepted on July 27, 2004

    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 DESCRIPTION
 EXAMPLES
 DISCUSSION
 REFERENCES
 

    Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al. (2000) Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet., 25, 25–29[CrossRef][Web of Science][Medline].

    (Eds.). Gene Regulation and Metabolism: Post-Genomic Computational Approaches, (2002) , Cambridge, MA MIT Press.

    Han, K., Ju, B.-H., Jung, H. (2004) WebInterViewer: visualizing and analyzing molecular interaction networks. Nucleic Acids Res., 32, , pp. W89–W95[Abstract/Free Full Text].

    North, S., Gansner, E., Ellson, J. (1998) http://www.graphviz.org .

    Siek, J.G., Lee, L.-Q., Lumsdaine, A. The Boost Graph Library: User Guide and Reference Manual, (2001) , Reading, MA Addison-Wesley.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
J. D. Zhang and S. Wiemann
KEGGgraph: a graph approach to KEGG PATHWAY in R and bioconductor
Bioinformatics, June 1, 2009; 25(11): 1470 - 1471.
[Abstract] [Full Text] [PDF]


Home page
EndocrinologyHome page
D. Diez, C. Grijota-Martinez, P. Agretti, G. De Marco, M. Tonacchera, A. Pinchera, G. Morreale de Escobar, J. Bernal, and B. Morte
Thyroid Hormone Action in the Adult Brain: Gene Expression Profiling of the Effects of Single and Multiple Doses of Triiodo-L-Thyronine in the Rat Striatum
Endocrinology, August 1, 2008; 149(8): 3989 - 4000.
[Abstract] [Full Text] [PDF]


Home page
Cereb CortexHome page
J.E. Schmitt, R.K. Lenroot, G.L. Wallace, S. Ordaz, K.N. Taylor, N. Kabani, D. Greenstein, J.P. Lerch, K.S. Kendler, M.C. Neale, et al.
Identification of Genetically Mediated Cortical Networks: A Multivariate Study of Pediatric Twins and Siblings
Cereb Cortex, August 1, 2008; 18(8): 1737 - 1747.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. T. Dittrich, G. W. Klau, A. Rosenwald, T. Dandekar, and T. Muller
Identifying functional modules in protein-protein interaction networks: an integrated exact approach
Bioinformatics, July 1, 2008; 24(13): i223 - i231.
[Abstract] [Full Text] [PDF]


Home page
BloodHome page
M. Jongen-Lavrencic, S. M. Sun, M. K. Dijkstra, P. J. M. Valk, and B. Lowenberg
MicroRNA expression profiling in relation to the genetic heterogeneity of acute myeloid leukemia
Blood, May 15, 2008; 111(10): 5078 - 5085.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
M. Cao, A. Koulman, L. J. Johnson, G. A. Lane, and S. Rasmussen
Advanced Data-Mining Strategies for the Analysis of Direct-Infusion Ion Trap Mass Spectrometry Data from the Association of Perennial Ryegrass with Its Endophytic Fungus, Neotyphodium lolii
Plant Physiology, April 1, 2008; 146(4): 1501 - 1514.
[Abstract] [Full Text] [PDF]


Home page
Proc R Soc BHome page
D. P Bebber, J. Hynes, P. R Darrah, L. Boddy, and M. D Fricker
Biological solutions to transport network design
Proc R Soc B, September 22, 2007; 274(1623): 2307 - 2315.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
F. Al-Shahrour, P. Minguez, J. Tarraga, I. Medina, E. Alloza, D. Montaner, and J. Dopazo
FatiGO +: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W91 - W96.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. E. Futschik, G. Chaurasia, and H. Herzel
Comparison of human protein protein interaction maps
Bioinformatics, March 1, 2007; 23(5): 605 - 611.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
M. Misirlioglu, G. P. Page, H. Sagirkaya, A. Kaya, J. J. Parrish, N. L. First, and E. Memili
Dynamics of global transcriptome in bovine matured oocytes and preimplantation embryos
PNAS, December 12, 2006; 103(50): 18905 - 18910.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
T. Aittokallio and B. Schwikowski
Graph-based methods for analysing networks in cell biology
Brief Bioinform, September 1, 2006; 7(3): 243 - 255.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
G. Wrobel, F. Chalmel, and M. Primig
goCluster integrates statistical analysis and functional interpretation of microarray expression data
Bioinformatics, September 1, 2005; 21(17): 3575 - 3577.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/1/135    most recent
bth458v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (22)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Carey, V. J.
Right arrow Articles by Gentleman, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Carey, V. J.
Right arrow Articles by Gentleman, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?