Skip Navigation


Bioinformatics Advance Access originally published online on December 1, 2007
Bioinformatics 2008 24(2):287-289; doi:10.1093/bioinformatics/btm578
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/2/287    most recent
btm578v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Li, P.
Right arrow Articles by Kell, D. B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Li, P.
Right arrow Articles by Kell, D. B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Automated manipulation of systems biology models using libSBML within Taverna workflows

Peter Li 1,*, Tom Oinn 2, Stian Soiland 3 and Douglas B. Kell 1

1School of Chemistry and Manchester Centre for Integrative Systems Biology, Manchester Interdisciplinary Biocentre, University of Manchester, M1 7DN, 2EMBL European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD and 3School of Computing Science, University of Manchester, M13 9PL, UK

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 ACKNOWLEDGEMENTS
 REFERENCES
 

Summary: Many data manipulation processes involve the use of programming libraries. These processes may beneficially be automated due to their repeated use. A convenient type of automation is in the form of workflows that also allow such processes to be shared amongst the community. The Taverna workflow system has been extended to enable it to use and invoke Java classes and methods as tasks within Taverna workflows. These classes and methods are selected for use during workflow construction by a Java Doclet application called the API Consumer. This selection is stored as an XML file which enables Taverna to present the subset of the API for use in the composition of workflows. The ability of Taverna to invoke Java classes and methods is demonstrated by a workflow in which we use libSBML to map gene expression data onto a metabolic pathway represented as a SBML model.

Availability: Taverna and the API Consumer application can be freely downloaded from http://taverna.sourceforge.net

Contact: peter.li{at}manchester.ac.uk

Supplementary information: Supplementary data and documentation are available from http://www.mcisb.org/software/taverna/libsbml/index.html

There are often processes involving the manipulation and analysis of biological data that we would wish to automate due to their frequent and essentially repetitive invocation. This is particularly the case when the structure of such data adheres to standardized specifications that are supported by software tooling (Brazma et al., 2006; Strömbäck et al., 2007). This is true in the case of the Systems Biology Mark up Language (SBML), which may be used to represent a biological system as a network of reactions (Hucka et al., 2003). Software libraries such as libSBML have been developed to read, write, manipulate and validate SBML files and data streams. libSBML (http://sbml.org/software/libsbml/) has been implemented in C and C++ but is also provided with language bindings in, for example, Python, Matlab and Java.

Workflow software such as Taverna may be used for automating processes that are applied to data in the life sciences (Oinn et al., 2004, 2006, 2007), and systems biology represents a prime candidate for such automation via loosely coupled workflows (Kell, 2006a, b, 2007). Workflows in Taverna consist of a pre-defined series of tasks that are performed by processors. A number of processors are available for accessing data and applications with different invocation mechanisms including Web Services (http://www.w3.org/2002/ws/). Taverna consists of a number of modules such as the workflow enactor engine and workbench that together allow one to construct and execute scientific workflows (Hull et al., 2006). This application note reports on how Taverna can be utilized for writing and enacting workflows involving the manipulation of SBML data by making direct use of the classes and functions in the libSBML programming library.

Taverna has been extended with a processor that is capable of invoking methods within Java classes. The set of methods for use in workflows is configured using a Doclet (http://java.sun.com/j2se/1.4.2/docs/tooldocs/javadoc/overview.html) program called the API consumer. This API consumer Doclet presents a user interface for selecting the subset of methods of an API, such as libSBML, that is to be exposed to the Taverna workbench. This selection is stored as a definition in XML format which can be imported into Taverna to present the selected classes and methods of the API as available services for inclusion when constructing a workflow. This definition file can be further distributed together with the actual API implementation to third party workflow designers for enabling the usage of the API as tasks within their workflows. We illustrate this approach in what follows with a specific example.

A common and useful means of visualizing transcriptome data is to map them onto pathway diagrams (Chung et al., 2004; Dahlquist et al., 2002). This can be performed as an automated pipeline using a Taverna workflow with SBML-compliant tools so that, for example, diagrams of metabolic pathways can be rendered with microarray data such that the nodes corresponding to proteins are colored according to the expression levels of the genes that encode them. The microarray data may be stored in a database from which they may be retrieved as part of the Taverna workflow. Such a workflow is shown in the Supplementary Material as well as in Figure 1A involving the automated editing of a SBML model of the glycolysis pathway to incorporate gene expression data from the Maxd database (Hancock et al., 2005) onto layout information embedded within SBML, which can then be visualized with Cell Designer (Funahashi et al., 2003). The workflow has two sub-workflows which contain API consumer processors that use methods from libSBML for parsing the names of proteins in the SBML file as well as generating a new SBML file incorporating the mapped gene expression data (Fig. 1B). Beanshell processors (http://www.beanshell.org) can be used to provide application logic for further processing of the data in the SBML model. These beanshell processors were used to determine how entities in the microarray data matched with those in the SBML model; this was done using information on how genes in the microarray data identified by their Affymetrix probe set identifiers mapped onto enzyme modifier species that were labeled using yeast gene names in the SBML model.


Figure 1
View larger version (43K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. A screenshot of the SBML microarray data mapping workflow is shown in Figure 1A. This workflow contains two sub-workflows labelled extractGeneNames and writeSBML. Figure 1B shows a screenshot of the writeSBML sub-workflow which has been expanded to show API consumer processors calling methods from libSBML. These methods from libSBML, together with a beanshell processor called writeAnnotation are used to create a new SBML model containing mapped microarray gene expression data.

 
Through the use of the API consumer, Taverna can make direct use of the functionality residing within Java classes and methods as workflow tasks. This is accomplished without the need for deploying the services in the API as Web Services. This may be more suitable since the underlying services may perform trivial tasks, making the overhead of invocation through a Web Services interface impractical. Whilst we have shown the use of the API consumer in systems biology, it is a generic tool in that it can be used with other Java APIs such as the Chemistry Development Kit (Steinbeck et al., 2003), enabling Taverna to be tailored to different scientific domains. This generic nature of the API consumer means that Taverna can work with SBML using new releases of libSBML as and when they become available, with no extra coding being required to make use of new releases of libSBML in Taverna. This said, the use of APIs can make workflows more difficult to compose as the functions are more fine-grained than are operations in Web Services, requiring expert knowledge of using the API and Taverna. Whilst this results in more complex workflows, the complexity can be hidden from users within Taverna using nested workflows (Fig. 1A). Also, once such workflows have been written, there is the great benefit that they may be saved and shared for use within the systems biology community.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 ACKNOWLEDGEMENTS
 REFERENCES
 
We thank Prof. Hiroaki Kitano and Dr Akira Funahashi for very useful discussions. P.L. and D.B.K. thank the BBSRC for financial support, and D.B.K. acknowledges the financial support of the BBSRC and EPSRC in the Manchester Centre for Integrative Systems Biology (www.mcisb.org).

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Chris Stoeckert

Received on October 11, 2007; revised on November 13, 2007; accepted on November 18, 2007

    REFERENCES
 TOP
 ABSTRACT
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Brazma A, et al. Standards for systems biology. Nat. Rev. Genet. (2006) 7:593–605.[CrossRef][Web of Science][Medline]

    Chung HJ, et al. ArrayXPath: mapping and visualizing microarray gene-expression data with integrated biological pathway resources using Scalable Vector Graphics. Nucleic Acids Res. (2004) 32:W460–W464.[Abstract/Free Full Text]

    Dahlquist KD, et al. GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways. Nat. Genet. (2002) 31:19–20.[CrossRef][Web of Science][Medline]

    Funahashi A, et al. CellDesigner: a process diagram editor for gene-regulatory and biochemical networks. BIOSILICO (2003) 1:159–162.[CrossRef]

    Hancock D, et al. maxdLoad2 and maxdBrowse: standards-compliant tools for microarray experimental annotation, data management and dissemination. BMC Bioinformatics (2005) 6:264.[CrossRef][Medline]

    Hucka M, et al. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics (2003) 19:524–531.[Abstract/Free Full Text]

    Hull D, et al. Taverna: a tool for building and running workflows of services. Nucleic Acids Res. (2006) 34:W729–W732.[Abstract/Free Full Text]

    Kell DB. Metabolomics, modelling and machine learning in systems biology: towards an understanding of the languages of cells. The 2005 Theodor Bücher lecture. FEBS J. (2006a) 273:873–894.[CrossRef][Medline]

    Kell DB. Systems biology, metabolic modelling and metabolomics in drug discovery and development. Drug Discov. Today (2006b) 11:1085–1092.[CrossRef][Web of Science][Medline]

    Kell DB. The virtual human: towards a global systems biology of multiscale, distributed biochemical network models. IUBMB Life (2007) 59:689–695.[CrossRef][Web of Science][Medline]

    Oinn T, et al. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics (2004) 20:3045–3054.[Abstract/Free Full Text]

    Oinn T, et al. Taverna: lessons in creating a workflow environment for the life sciences. Concurrency Comput. Pract. Exper. (2006) 18:1067–1100.[CrossRef]

    Oinn T, et al. Taverna/myGrid: aligning a workflow system with the life sciences community. In: Workflows for e-Science: Scientific Workflows for Grids—Taylor IJ, et al, eds. (2007) Guildford: Springer. 300–319.

    Steinbeck C, et al. The Chemistry Development Kit (CDK): an open-source java library for chemo- and bioinformatics. J. Chem. Inf. Comput. Sci. (2003) 43:493–500.[CrossRef][Web of Science][Medline]

    Strömbäck L, et al. A review of standards for data exchange within systems biology. Proteomics (2007) 7:857–867.[CrossRef][Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/2/287    most recent
btm578v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Li, P.
Right arrow Articles by Kell, D. B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Li, P.
Right arrow Articles by Kell, D. B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?