Bioinformatics Advance Access originally published online on August 5, 2004
Bioinformatics 2005 21(1):135-136; doi:10.1093/bioinformatics/bth458
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics vol. 21 issue 1 © Oxford University Press 2005; all rights reserved.
Network structures and algorithms in Bioconductor
1 Channing Laboratory, Brigham and Women's Hospital 75 Francis Street, Boston, MA 02115, USA
2 Division of Biostatistics, Dana Farber Cancer Institute 44 Binney Street, Boston, MA 02115, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: In this paper, we review the central concepts and implementations of tools for working with network structures in Bioconductor. Interfaces to open source resources for visualization (AT&T Graphviz) and network algorithms (Boost) have been developed to support analysis of graphical structures in genomics and computational biology.
Availability: Packages graph, Rgraphviz, RBGL of Bioconductor (www.bioconductor.org).
Contact: stvjc{at}channing.harvard.edu
| INTRODUCTION |
|---|
|
|
|---|
Network structures are ubiquitous in genomics and computational biology. The recent monograph of Collado-Vides and Hofestädt (2002) is devoted to genomic network modeling. A number of software initiatives in bioinformatics has targeted different aspects of network visualization and analysis. For recent examples and references, refer to Han et al., (2004) and the Reactome project (www.reactome.org). In this paper, we describe methods that are developed in the Bioconductor project (www.bioconductor.org) to provide network data structures and algorithms for use in R. Key principles of design and implementation are object-oriented programming, leveraging existing open source software and support for standardized representation and serialization. The integration of these components within R provides users with the software infrastructure needed to address bioinformatic problems related to graphs and networks.
| DESCRIPTION |
|---|
|
|
|---|
graph package. The graph package is a collection of structures and methods for working with graphs (sets of nodes and edges) in R. The base virtual class graph, has a single slot, edgemode, which may take value directed or undirected. Classes that extend graph include graphNEL (graph represented by Node and Edge-List), distGraph [graph represented by an R distance (dist) structure] and clusterGraph (a representation of clustered data).
Generic methods have been defined for the base class, including acc (return vector of names of accessible nodes relative to a given node), complement, connComp (return list of node names defining connected components), degree, dfs (depth first search), intersection and union. When appropriate, these methods return graph instances; otherwise, the generics define the contract that must be followed to secure consistent behavior for instances of subclasses. As application requirements emerge additional generics for the base class will be introduced.
Rgraphviz package. Graph visualization depends upon layout, fixing locations of nodes, edges and labels on a plotting surface. Various classes of layout algorithms have been developed, and the AT&T Graphviz system is an open source implementation suited to rendering large graphical structures encountered in telecommunications and complex software engineering tasks (North et al., 1998, http://www.graphviz.org). The Graphviz system defines an abstract data type agraph and a large collection of C utilities for populating and visualizing graph structures and associated textual information. The agraph structure is reflected in an R object, and layout and annotation information are used by the plot method for instances of class graph. Default renderings are fairly basic; users with significant enhancement needs (e.g. font changes, edge and node coloring) can save the layout information and use standard R text and line methods to create a customized visualization of a graphical structure.
RBGL package. The Boost project is an open source implementation of C++ libraries focusing on STL methodologies with an emphasis on portability. The Boost Graph Library (Siek et al., 2001) is a C++ library of network data structures and algorithms. The RBGL package defines interfaces from R to BGL routines. At present, procedures have been interfaced for minimum spanning tree construction, shortest path discovery, depth-first search, topological sorting, edge connectivity measurement and connected component decomposition.
| EXAMPLES |
|---|
|
|
|---|
Figure 1 illustrates the use of Graphviz layout and Rgraphviz markup capabilities to deliver an interactively navigable representation of the signal transduction pathway. The graph can be interrogated using R's standard identify function, for a variety of possible purposes. For example, a user may wish to subset a collection of microarray probes to retain only those annotated to some elements of this pathway.
|
Figure 2 illustrates the advantage of combining biological annotation resources, statistical graphics and network layout methods in R. Pie chart and legend computation is routine in R, the graphical structure of the Gene Ontology (Ashburner et al., 2000) is obtained from the Bioconductor GO package, and layout was obtained from Rgraphviz. The figure clearly indicates non-homogeneity in the distribution of phenotypes across cellular component terms, and the graph can be interactively interrogated using R's standard identify function to obtain the terms associated with nodes of interest.
|
| DISCUSSION |
|---|
|
|
|---|
Our novel integration of graph data types, graph layout and graph algorithm resources with a full-featured data analysis and visualization environment has proven highly productive in genomic and proteomic applications. New Bioconductor packages such as GOstats, GraphAT and apComplex make essential use of this infrastructure. As standard representation methods and curated data resources for genomic networks development, we will enrich the methods and resources for working with these in Bioconductor.
| Acknowledgments |
|---|
Support for the development of this software came from the High Tech Industry Multidisciplinary Research Fund at the Dana-Farber Cancer Institute and NIH grant no. 1R33 HG002708: A Statistical Computing Framework for Genomic Data.
Received on May 28, 2004; revised on July 12, 2004; accepted on July 27, 2004
| REFERENCES |
|---|
|
|
|---|
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al. (2000) Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet., 25, 2529[CrossRef][Web of Science][Medline].
(Eds.). Gene Regulation and Metabolism: Post-Genomic Computational Approaches, (2002) , Cambridge, MA MIT Press.
Han, K., Ju, B.-H., Jung, H. (2004) WebInterViewer: visualizing and analyzing molecular interaction networks. Nucleic Acids Res., 32, , pp. W89W95
North, S., Gansner, E., Ellson, J. (1998) http://www.graphviz.org .
Siek, J.G., Lee, L.-Q., Lumsdaine, A. The Boost Graph Library: User Guide and Reference Manual, (2001) , Reading, MA Addison-Wesley.
This article has been cited by other articles:
![]() |
J. D. Zhang and S. Wiemann KEGGgraph: a graph approach to KEGG PATHWAY in R and bioconductor Bioinformatics, June 1, 2009; 25(11): 1470 - 1471. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Diez, C. Grijota-Martinez, P. Agretti, G. De Marco, M. Tonacchera, A. Pinchera, G. Morreale de Escobar, J. Bernal, and B. Morte Thyroid Hormone Action in the Adult Brain: Gene Expression Profiling of the Effects of Single and Multiple Doses of Triiodo-L-Thyronine in the Rat Striatum Endocrinology, August 1, 2008; 149(8): 3989 - 4000. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.E. Schmitt, R.K. Lenroot, G.L. Wallace, S. Ordaz, K.N. Taylor, N. Kabani, D. Greenstein, J.P. Lerch, K.S. Kendler, M.C. Neale, et al. Identification of Genetically Mediated Cortical Networks: A Multivariate Study of Pediatric Twins and Siblings Cereb Cortex, August 1, 2008; 18(8): 1737 - 1747. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. T. Dittrich, G. W. Klau, A. Rosenwald, T. Dandekar, and T. Muller Identifying functional modules in protein-protein interaction networks: an integrated exact approach Bioinformatics, July 1, 2008; 24(13): i223 - i231. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Jongen-Lavrencic, S. M. Sun, M. K. Dijkstra, P. J. M. Valk, and B. Lowenberg MicroRNA expression profiling in relation to the genetic heterogeneity of acute myeloid leukemia Blood, May 15, 2008; 111(10): 5078 - 5085. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Cao, A. Koulman, L. J. Johnson, G. A. Lane, and S. Rasmussen Advanced Data-Mining Strategies for the Analysis of Direct-Infusion Ion Trap Mass Spectrometry Data from the Association of Perennial Ryegrass with Its Endophytic Fungus, Neotyphodium lolii Plant Physiology, April 1, 2008; 146(4): 1501 - 1514. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. P Bebber, J. Hynes, P. R Darrah, L. Boddy, and M. D Fricker Biological solutions to transport network design Proc R Soc B, September 22, 2007; 274(1623): 2307 - 2315. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Al-Shahrour, P. Minguez, J. Tarraga, I. Medina, E. Alloza, D. Montaner, and J. Dopazo FatiGO +: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments Nucleic Acids Res., July 13, 2007; 35(suppl_2): W91 - W96. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. E. Futschik, G. Chaurasia, and H. Herzel Comparison of human protein protein interaction maps Bioinformatics, March 1, 2007; 23(5): 605 - 611. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Misirlioglu, G. P. Page, H. Sagirkaya, A. Kaya, J. J. Parrish, N. L. First, and E. Memili Dynamics of global transcriptome in bovine matured oocytes and preimplantation embryos PNAS, December 12, 2006; 103(50): 18905 - 18910. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Aittokallio and B. Schwikowski Graph-based methods for analysing networks in cell biology Brief Bioinform, September 1, 2006; 7(3): 243 - 255. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Wrobel, F. Chalmel, and M. Primig goCluster integrates statistical analysis and functional interpretation of microarray expression data Bioinformatics, September 1, 2005; 21(17): 3575 - 3577. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||










