Bioinformatics Advance Access originally published online on September 17, 2008
Bioinformatics 2008 24(22):2647-2649; doi:10.1093/bioinformatics/btn496
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A toolkit for capturing and sharing FuGE experiments
1School of Computer Science, University of Manchester, Oxford Road, Manchester and 2Department of Pre-clinical Veterinary Science, Faculty of Veterinary Science, University of Liverpool, Liverpool, L69 7ZJ, UK
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: The Functional Genomics Experiment Object Model (FuGE) supports modelling of experimental processes either directly or through extensions that specialize FuGE for use in specific contexts. FuGE applications commonly include components that capture, store and search experiment descriptions, where the requirements of different applications have much in common.
Results: We describe a toolkit that supports data capture, storage and web-based search of FuGE experiment models; the toolkit can be used directly on FuGE compliant models or configured for use with FuGE extensions. The toolkit is illustrated using a FuGE extension standardized by the proteomics standards initiative, namely GelML.
Availability: The toolkit and a demonstration are available at http://code.google.com/p/fugetoolkit
Contact: khalid.belhajjame{at}manchester.ac.uk
| 1 INTRODUCTION |
|---|
|
|
|---|
Experimental processes in the life sciences produce complex modelling requirements, which in the past have led to models containing different terminologies and levels of detail. The Functional Genomics Experiment Object Model (FuGE) has been developed with the objective to increase the consistency in modelling these processes (Jones et al., 2007a). FuGE provides generic constructs that can be used to represent concepts from specific experimental activities. More importantly, these modelling constructs acts as extension points that can be used by (community) users to create (standard) models that fit their experiments. For example, extensions of FuGE are being created to support modelling of proteomics experiments.
The models obtained by extending FuGE are often complex: they comprise large numbers of inter-related elements. Therefore, capturing and browsing experiments in these models can be both laborious and time consuming. The complexity in the models obtained is intrinsic to life science experiments, which often involve complex protocols and the combined use of multiple techniques. For example, the XML Schema (http://www.w3.org/XML/Schema) of GelML (http://www.psidev.info/index.php?q=node/254), a model that extends FuGE for specifying gel electrophoresis experiments, comprises 101 simple and complex types. In this article, we describe a toolkit for easing the specification and sharing of FuGE experimental processes. Using the toolkit, users are able to capture experiments in any extension of FuGE. The toolkit also provides a Web interface for exploring collected experiments using predefined canned queries.
| 2 TOOLKIT ARCHITECTURE |
|---|
|
|
|---|
Figure 1 illustrates the components that constitute the toolkit. Rather than implementing a brand new component for data capture, we make use of Pedro (Garwood et al., 2004). This is a Java standalone application that, given an XML Schema document (specifying a FuGE extension in this case), generates a data entry tool for capturing experiments in that model. The data entry tool provides users with forms and assists them in their use, e.g. sample values that can be accepted by a field in a form are displayed when they are available. Also, if a field represents a term from a controlled vocabulary, then the user is provided with a drop down list containing the legal terms that can be used as a value for the field in question. Once the experiment is specified, the user can export it as an XML document that validates against the XML Schema of the experiment model.
|
The experiments specified using the data entry tool can be stored within a database system to allow their dissemination. We use for this purpose eXist, a freely available open source native XML database management system (http://exist.sourceforge.net). Users can browse stored experiments using the Data Access Web Application. This is a collection of Java Server Pages (http://java.sun.com/products/jsp) that interact with the eXist database system to answer users requests. Specifically, users are provided with a set of canned queries. These are encoded in the form of predefined XQuery (http://www.w3.org/TR/xquery) queries that are issued by the Data Access Web Application to the eXist database system for processing. The results of queries are XML documents that are rendered in a user readable format using XML style sheets. Some stylesheets were newly developed by the authors and others were adapted from the ISA-TAB project (Sansone et al., 2008).
| 3 CAPTURING AND SHARING GELML EXPERIMENTS |
|---|
|
|
|---|
GelML is a FuGE extension developed by the Gel-based methods of analysis (PSI-Gel) working group of the proteomics standards initiative (PSI) to describe the process of gel electrophoresis, which is used to separate and quantify proteins within a proteomics workflow (Jones and Gibson, 2007b). The GelML model covers standard one- and two-dimensional gel electrophoresis. GelML also supports non-standard forms of gel electrophoresis using the generic modelling constructs provided by FuGE.
As a proof of a concept, we used the toolkit to capture and browse GelML experiments. To do this, we used the XML Schema specifying the GelML model as an input to Pedro. This Schema was automatically generated from the GelML object model using the FuGE software toolkit (http://fuge.sourceforge.net). To ease the navigation of large GelML experiments, the data entry tool generated by Pedro displays GelML experiments using a tree, the nodes of which represent records that can be edited in a form within another panel. Specified experiments can be exported into XML data documents and then stored within the eXist database. To allow users to browse specified GelML experiments, we specified a set of canned queries to retrieve: (i) general information regarding the conditions under which gel experiments were performed; (ii) information about gel locations (spots or bands) and (iii) information about gels, e.g. Figure 2 shows the results returned when requesting information about available gels.
|
| 4 CONCLUSIONS |
|---|
|
|
|---|
A toolkit has been developed for capturing and sharing experimental processes in the life sciences. The toolkit is generic in that it supports experimental processes in any FuGE extension: we showed how it can be used for capturing and browsing GelML experiments. There are two categories of users: information technology experts, e.g. bioinformaticians, and end users, e.g. wet-lab scientists. Information technology experts are responsible for configuring the tool mainly to specify the queries used for browsing the experiments. Once configured, the toolkit automatically generates interfaces for capturing and browsing experiments that can be used by end users. These users do not have to be information technology experts, e.g. they do not have to be familiar with the XML markup language or the XQuery query language. In this regard, a demonstration that shows the web interface generated by the toolkit for browsing GelML experiments is available from: http://code.google.com/p/fugetoolkit. The potential beneficiaries of the toolkit are FuGE users, including community users that are developing FuGE extensions. For example, extensions to FuGE are being developed to support transcriptomics by MGED, proteomics by the PSI, metabolics by the metabolics standards initiative and flow cytometry by the Flow Informatics and Computational Cytometry Society. These communities, plus individual laboratories developing repositories that use FuGE, can deploy the toolkit both to refine requirements and for development of production or experimental database infrastructures.
| Funding |
|---|
|
|
|---|
BBSRC Tools; Resources Development Fund.
Conflict of Interest: none declared.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
We would like to thank Neil Swainston from the University of Manchester for his help in configuring the Data Access Web Application.
| FOOTNOTES |
|---|
Associate Editor: Alfonso Valencia
Received on July 7, 2008; revised on August 26, 2008; accepted on September 16, 2008
| REFERENCES |
|---|
|
|
|---|
Jones AR, et al. The Functional Genomics Experiment model (FuGE): an extensible framework for standards in functional genomics. Nat. Biotechnol. (2007a) 25:1127–1133.[CrossRef][Web of Science][Medline]
Jones AR, Gibson F. GelML: Gel Markup Language (Version 1) specification. In: Proteomics Standards Initiative Recommendation: Final Document, PSI-GEL Working Group (2007b) Available at http://www.psidev.info/index.php?q=node/254 (last accessed date September 1, 2008).
Garwood KL, et al. Pedro: a configurable data entry tool for XML. Bioinformatics (2004) 20:2463–2465.
Sansone SA, et al. The First RSBI (ISA-TAB) workshop: "Can a Simple Format Work for Complex Studies?" OMICS (2008) 12:143–149.[CrossRef][Web of Science][Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

