Skip Navigation


Bioinformatics Advance Access originally published online on June 16, 2005
Bioinformatics 2005 21(16):3454-3455; doi:10.1093/bioinformatics/bti546
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/16/3454    most recent
bti546v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (16)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Krummenacker, M.
Right arrow Articles by Karp, P. D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Krummenacker, M.
Right arrow Articles by Karp, P. D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

Querying and computing with BioCyc databases

Markus Krummenacker 1, Suzanne Paley 1, Lukas Mueller 2, Thomas Yan 3 and Peter D. Karp 1,*

1SRI International, Bioinformatics Research Group EK207 Menlo Park, CA 94025, USA
2Cornell University, Department of Plant Breeding and Genetics Emerson Hall Ithaca, NY 14853, USA
3Carnegie Institution of Washington, Department of Plant Biology Stanford, CA 94305, USA

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 1 INTRODUCTION
 2 SCHEMA AND DATA...
 3 PROGRAMMATIC QUERYING
 REFERENCES
 

Summary: We describe multiple methods for accessing and querying the complex and integrated cellular data in the BioCyc family of databases: access through multiple file formats, access through Application Program Interfaces (APIs) for LISP, Perl and Java, and SQL access through the BioWarehouse relational database.

Availability: The Pathway Tools software and 20 BioCyc DBs in Tiers 1 and 2 are freely available to academic users; fees apply to some types of commercial use. For download instructions see http://BioCyc.org/download.shtml

Supplementary information: For more details on programmatic access to BioCyc DBs, see http://bioinformatics.ai.sri.com/ptools/ptools-resources.html

Contact: pkarp{at}ai.sri.com


    1 INTRODUCTION
 TOP
 Abstract
 1 INTRODUCTION
 2 SCHEMA AND DATA...
 3 PROGRAMMATIC QUERYING
 REFERENCES
 
BioCyc (see http://BioCyc.org/) is a collection of 161 Pathway/ Genome DataBases (PGDBs) that represent cellular networks and genome information in a structured manner, to allow powerful computational analysis and manipulation of data. The highly curated Tier 1 PGDBs at the core of BioCyc are the EcoCyc and MetaCyc DBs (Karp et al., 2002c,b). They contain many experimentally elucidated metabolic pathways from Escherichia coli and other organisms. BioCyc is viewed and edited through Pathway Tools (Karp et al., 2002a), a software environment we have developed to query, display and edit information about each pathway and its component reactions, compounds, enzymes, protein complexes, genes, operons and regulation at the substrate and transcriptional level. Additionally, the data objects support literature references, evidence codes and links to external databases. The BioCyc schema attempts to faithfully capture biological concepts and the cross-links among widely differing types of data. Tiers 2 and 3 were computationally predicted by Pathways Tools. Tier 2 has undergone moderate curation, whereas the 139 DBs in Tier 3 have undergone no curation (note also that Tier 3 PGDBs are not yet available for programmatic access, but we expect they will be soon).

This article describes multiple methods that are exposed for querying BioCyc DBs programmatically. The same access mechanisms are available for the many PGDBs now being created by Pathway Tools users outside SRI, such as by TAIR for Arabidopsis thaliana (Mueller et al., 2003), and by SGD for Saccharomyces cerevisiae. These query methods will simplify the investigation of global questions about cellular networks.


    2 SCHEMA AND DATA FILES
 TOP
 Abstract
 1 INTRODUCTION
 2 SCHEMA AND DATA...
 3 PROGRAMMATIC QUERYING
 REFERENCES
 
BioCyc uses an object-oriented database called a Frame Representation System (FRS), the schema for which has been described previously (Karp, 2000); see also Appendix A of (Paley et al., 2005). In short, every biological object (such as a compound or gene) is stored in a frame bearing a unique ID. A frame has slots, in which attributes and connections to other frames can be stored as values. Slots can store single or multiple values, and individual values can be annotated with comments or literature references. The frames are organized in a class hierarchy.

Pathway Tools can export BioCyc PGDBs in several formats: (1) A column-delimited format and attribute-value format are described in detail online. (http://brg.ai.sri.com/ptools/flatfile-format.html) These formats are attractive for import into spreadsheets or relational DBs, or for parsing by Perl scripts. (2) BioPAX (http://www.biopax.org/) format, which is an OWL RDF/XML-based format for exchange of pathway data. (3) SBML (http://www.sbml.org/) format, which is an XML-based format for capturing models of biochemical reaction networks.


    3 PROGRAMMATIC QUERYING
 TOP
 Abstract
 1 INTRODUCTION
 2 SCHEMA AND DATA...
 3 PROGRAMMATIC QUERYING
 REFERENCES
 
APIs in three languages provide direct, programmatic access to BioCyc DBs within Pathway Tools. The shared APIs are based upon the Generic Frame Protocol (GFP). The most commonly used GFP functions have been summarized (http://bioinformatics.ai.sri.com/ptools/gfp.html) and detailed documentation of GFP is available. (http://www.ai.sri.com/~gfp/spec/paper/paper.html) Additional useful functions (http://bioinformatics.ai.sri.com/ptools/ptools-fns.html) retrieve complex relationships in PGDBs. SQL querying is possible through the BioWarehouse.

Due to space limitations, only a simple example can be given below, which is transliterated to three languages: LISP, Perl and SQL. The example query finds all enzymes for which ATP is an inhibitor.

3.1 LISP
Common LISP is the native programming language of Pathway Tools and thus provides the richest environment for queries. The API consists of the commonly used GFP functions plus the additional useful relations, as referred to above. Many LISP query examples are available. (http://bioinformatics.ai.sri.com/ptools/examples.lisp)

(defun atp-inhibits ()

;; We check every instance of the class

(loop for x in (get-class-all-instances

|Enzymatic-Reactions|)

;; We test for whether the INHIBITORS-ALL

;; slot contains the compound frame ATP

when (member-slot-value-p

x INHIBITORS-ALL ATP)

;; Whenever the test is positive, we collect

;; the value of the slot ENZYME. The

;; collected values are returned as a list,

;; once the loop terminates.

collect (get-slot-value x 'ENZYME))

)

;;; invoking the query:

(select-organism :org-id 'ECOLI)

(atp-inhibits)

3.2 PerlCyc
PerlCyc (http://www.arabidopsis.org/tools/aracyc/perlcyc/) is a Perl API that allows Perl programmers to query and update data within a running Pathway Tools server. The communication between Pathway Tools and Perl occurs through a UNIX socket, and so both programs need to be executed on the same machine.

use perlcyc;

my $cyc = perlcyc –> new("ECOLI");

my @enzrxns = $cyc –> get_class_all_instances(

"|Enzymatic-Reactions|");

## We check every instance of the class

foreach my $er (@enzrxns){

## We test for whether the INHIBITORS-ALL

## slot contains the compound frame ATP

my $bool = $cyc –> member_slot_value_p ($er,

"Inhibitors-All", "Atp");

if ($bool){

## Whenever the test is positive, we collect

## the value of the slot ENZYME. The results

## are printed in the terminal.

my $enz = $cyc –> get_slot_value($er, "Enzyme");

print STDOUT "$enz\n";

}

}

3.3 JavaCyc
JavaCyc (http://www.arabidopsis.org/tools/aracyc/javacyc/) is a Java analog of PerlCyc. JavaCyc also communicates with Pathway Tools through a UNIX socket. The example query is available online (http://bioinformatics.ai.sri.com/ptools/example-javacyc.html).

3.4 SQL access via BioWarehouse
BioWarehouse is a DB integration project (http://bioinformatics.ai.sri.com/biowarehouse/) that allows multiple DBs including BioCyc, SWISS-PROT, Genbank, NCBI Taxonomy and KEGG to be loaded within a relational DBMS server. BioWarehouse supports SQL queries to BioCyc DBs, and allows cross-DB queries and validations to be performed. A detailed description of the BioWarehouse schema is beyond the scope of this Application Note.

select distinct DBID.xid

from DBID, Protein, EnzymaticReaction,

EnzReactionInhibitorActivator, Chemical, DataSet

where DataSet.name=EcoCyc

and DataSet.wid=EnzymaticReaction.datasetwid

and EnzymaticReaction.proteinwid = Protein.wid

and EnzymaticReaction.wid =

EnzReactionInhibitorActivator.enzymaticreactionwid

and EnzReactionInhibitorActivator.compoundwid=Chemical.wid

and EnzReactionInhibitorActivator.inhibitoractivate=I

and Chemical.name=ATP

and DBID.otherwid = Protein.wid


    Acknowledgments
 
We thank Jeremy Zucker for the SBML exporter and Thomas J. Lee for his SQL example. This work was supported by grants GM70065 and GM75742 from the NIH National Institute of General Medical Sciences.

Conflict of Interest: Krummenacker and Karp declare that they receive royalties from SRI licensing of BioCyc and Pathway Tools, and Paley declares that she receives royalties from SRI licensing of Pathway Tools.

Received on April 12, 2005; revised on June 14, 2005; accepted on June 14, 2005

    REFERENCES
 TOP
 Abstract
 1 INTRODUCTION
 2 SCHEMA AND DATA...
 3 PROGRAMMATIC QUERYING
 REFERENCES
 

    Karp, P.D. (2000) An ontology for biological function based on molecular interactions. Bioinformatics, 16, 269–285[Abstract/Free Full Text].

    Karp, P., Paley, S., Romero, P. (2002a) The Pathway Tools Software. Bioinformatics, 18, S225–S232[Abstract].

    Karp, P., Riley, M., Paley, S., Pellegrini-Toole, A. (2002b) The MetaCyc database. Nuc. Acids Res., 30, 1, 59–61[Abstract/Free Full Text].

    Karp, P., Riley, M., Saier, M., Paulsen, I., Paley, S., Pellegrini-Toole, A. (2002c) The EcoCyc database. Nuc. Acids Res., 30, 1, 56–8[Abstract/Free Full Text].

    Mueller, L., Zhang, P., Rhee, S. (2003) AraCyc, a biochemical pathway database for Arabidopsis. Plant Physiol., 132, 453–460[Abstract/Free Full Text].

    Paley, S., Krummenacker, M., Pick, J., Green, M., Karp, P. (2005) Pathway Tools User's Guide version 9.0. Available from SRI International.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
T. J. Lee, I. Paulsen, and P. Karp
Annotation-based inference of transporter function
Bioinformatics, July 1, 2008; 24(13): i259 - i267.
[Abstract] [Full Text] [PDF]


Home page
MicrobiologyHome page
P. N. Bertin, C. Medigue, and P. Normand
Advances in environmental genomics: towards an integrated view of micro-organisms and ecosystems
Microbiology, February 1, 2008; 154(2): 347 - 359.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. S. Wishart, D. Tzur, C. Knox, R. Eisner, A. C. Guo, N. Young, D. Cheng, K. Jewell, D. Arndt, S. Sawhney, et al.
HMDB: the Human Metabolome Database
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D521 - D526.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. Vallenet, L. Labarre, Z. Rouy, V. Barbe, S. Bocs, S. Cruveiller, A. Lajus, G. Pascal, C. Scarpelli, and C. Medigue
MaGe: a microbial genome annotation system supported by synteny results
Nucleic Acids Res., January 10, 2006; 34(1): 53 - 65.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. S. Hinrichs, D. Karolchik, R. Baertsch, G. P. Barber, G. Bejerano, H. Clawson, M. Diekhans, T. S. Furey, R. A. Harte, F. Hsu, et al.
The UCSC Genome Browser Database: update 2006
Nucleic Acids Res., January 1, 2006; 34(suppl_1): D590 - D598.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. Jaiswal, J. Ni, I. Yap, D. Ware, W. Spooner, K. Youens-Clark, L. Ren, C. Liang, W. Zhao, K. Ratnapu, et al.
Gramene: a bird's eye view of cereal genomes
Nucleic Acids Res., January 1, 2006; 34(suppl_1): D717 - D723.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. D. Karp, C. A. Ouzounis, C. Moore-Kochlacs, L. Goldovsky, P. Kaipa, D. Ahren, S. Tsoka, N. Darzentas, V. Kunin, and N. Lopez-Bigas
Expansion of the BioCyc collection of pathway/genome databases to 160 genomes
Nucleic Acids Res., October 24, 2005; 33(19): 6083 - 6089.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/16/3454    most recent
bti546v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (16)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Krummenacker, M.
Right arrow Articles by Karp, P. D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Krummenacker, M.
Right arrow Articles by Karp, P. D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?