Bioinformatics Advance Access originally published online on November 30, 2004
Bioinformatics 2005 21(7):1189-1193; doi:10.1093/bioinformatics/bti116
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Metabolic pathway analysis web service (Pathway Hunter Tool at CUBIC)
Cologne University BioInformatics Center (CUBIC) and Institute of Biochemistry Zülpicher Strasse 47, 50674 Köln, Germany
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Motivation: Pathway Hunter Tool (PHT), is a fast, robust and user-friendly tool to analyse the shortest paths in metabolic pathways. The user can perform shortest path analysis for one or more organisms or can build virtual organisms (networks) using enzymes. Using PHT, the user can also calculate the average shortest path (Jungnickel, 2002 Graphs, Network and Algorithm. Springer-Verlag, Berlin), average alternate path and the top 10 hubs in the metabolic network. The comparative study of metabolic connectivity and observing the cross talk between metabolic pathways among various sequenced genomes is possible.
Results: A new algorithm for finding the biochemically valid connectivity between metabolites in a metabolic network was developed and implemented. A predefined manual assignment of side metabolites (like ATP, ADP, water, CO2 etc.) and main metabolites is not necessary as the new concept uses chemical structure information (global and local similarity) between metabolites for identification of the shortest path.
Availability: PHT is accessible at http://www.pht.uni-koeln.de
Contact: d.schomburg{at}uni-koeln.de
| INTRODUCTION |
|---|
|
|
|---|
With the advent of the omics era, more and more system-based approaches to biological functions are being developed. Metabolome analysis and metabolomics are gaining significance, as they help us to understand the complexity of the underlying cellular networks in organisms. The completion of a large number of genomes has made the comparative study of genomes possible at different levels. One way to gain a better understanding of the sequenced genomes is to analyse the underlying metabolic network and its topology in different genomes. Several databases provide information about metabolic pathways. We have used KEGG (Kanehisa et al., 2004) as the basic database for our analysis apart from BRENDA (Schomburg et al., 2004) and PROSITE (Hulo et al., 2004). A global view of the connectivity in metabolic pathways, and the contribution and usage of certain metabolites in these pathways is highly instructive. Shortest path analysis (Arita, 2004) is one of the most comprehensive methods to analyse a graph (Metabolic Pathways) at different levels (Reactions) in terms of local and global connectivity between metabolites. With the Pathway Hunter Tool (PHT) it is also possible to calculate statistical information (Barabasi and Oltvai, 2004) from the topology arising from the interacting molecules in order to capture the nature of connectivity.
| METHODS |
|---|
|
|
|---|
Whereas a number of well-established methods exist for the analysis of shortest paths in graphs, the situation in metabolic networks is a little more complicated. In the example using reactions given in Figure 1, a shortest path algorithm for metabolic pathways is required to follow the path of the thick lines with the result that there exists a path between phosphate and ATP via glucose-6-phosphate (thick line). However, there is no way to produce fructose from phosphate to fructose (thin line). In the third reaction of the scheme, the algorithm has to decide in which direction to go, depending on the starting point being either glucose or phosphate.
|
Therefore it is important to connect two metabolites in a reaction with respect to their structural similarity. We have used the fingerprint algorithm from the Chemistry Development Kit (CDK) (Steinbeck et al., 2003) to convert the 2-dimensional chemical structure information to a 1-dimesional binary stream as a fingerprint for faster similarity search (Whittle et al., 2003). Using the fingerprints, the similarity between two molecules was calculated using a normalized scoring function obtained by combination of the atomic mass value of the metabolites and the Tanimoto algorithm (Xue et al., 2003). This allows avoidance of the false connectivity in the metabolic pathway thus made the path search algorithm more robust.
In order to calculate the shortest path between two metabolites, the breadth-first-search (BFS) algorithm (Jungnickel, 2002) is used in PHT. Higher-order horn logic (HOHL) (Nadathur and Miller, 1990) has been used to satisfy the constraints. Our new algorithm automatically discriminates between side metabolites (like ATP, ADP, water, CO2, etc.) and main metabolites while finding the shortest path without the need to predefine those. Predefined exclusion of small metabolites in the metabolic pathway may lead to broken links in the network or longer connectivity. This implies that at each reaction step the algorithm should be able to decide which metabolite to choose for further connectivity in the pathway and which to skip.
| ALGORITHM |
|---|
|
|
|---|
In this section the new algorithm used in PHT to find the shortest path in the biochemical network is described.
1 Definition of the metabolite mapping scoring function
Let A be an educt and B a product metabolite and a and b the number of bits (calculated by the fingerprint algorithm from the CDK, Steinbeck et al., 2003) on on A and B metabolites, respectively, c the number of bits on in both A and B, d number of bits off in both A and B. Then we can define the equation in the form of set theory (Jech and Jech, 1997):
![]() |
![]() |
Once we are able to formulate the chemical structure (Whittle et al., 2003) in terms of set theory the next step was to develop a scoring scheme for the similarity between two metabolites. We have used the Tanimoto Coefficient (Willet et al., 1998) for this purpose. The structural similarity between two metabolites A and B can be defined as
![]() |
![]() |
The final score for top competing metabolites can be defined as:
![]() |
2 Local mapping metabolites in reactions
The derived scoring function was used to find a suitable mapping between substrate molecules and product molecules. We use a slightly modified form of game theory (http://www.gametheory.net/) in order to map the substrate to the product metabolite. The method consists of construction of a matrix with substrates as rows and products as columns with the score defined above as matrix elements. The score between any substrate or product whose extension is smaller than three bonds is set to zero. A substrate is mapped to a product when the score dominates all other scores in the present row or column respectively. By this procedure we keep track of the maximum structural similarity between two interacting metabolites. Figure 2 illustrates the outcome of our mapping procedure when applied to a reaction.
|
3 Shortest path between two metabolites
For the calculation of the shortest paths, the two biochemical criteria local and global structural similarity are used, where local similarity is defined as the similarity between two intermediate molecules and global similarity is defined as the amount of conserved structure found between the source metabolite and the destination metabolites after a series of reaction steps (Fig. 3).
|
The only potential drawback of this method is that not all metabolites in the metabolite databases have structures (e.g. macromolecules like proteins or nucleic acids, or generic molecules like an alcohol). In these cases the user may miss some connectivity due to lack of structural information. It is possible to cross-check this result by switching off the Atom Mapper (local similarity) and Atom Tracer (global similarity) options thereby performing the search on the ligand-number-based mapping obtained from the KEGG reaction database. On the other hand, the power and biochemical relevance of having local similarity and global similarity is very high. In the future we plan to provide non-standard structural information for these metabolites in order to allow the inclusion of such reactions.
Complexity of the algorithm
The shortest path between source and destination metabolite is the minimum number of reaction steps between them (Fig. 4). We consider the metabolic pathway in our system as a directed graph with all edges (reactions) sharing the same cost (here 1). Hence this does not lead us to an NP-complete problem as one can calculate the k-shortest path between two metabolites using the BFS (breadth first search) algorithm. HOHL (Nadathur and Miller, 1990) has been used to satisfy the constraints (similarity) with the BFS algorithm in order to calculate k-shortest paths between two metabolites (source and destination). This means that the runtime of the tool depends on the metabolites and reactions present in an organism. We are able to generate all possible k-shortest paths between two metabolites under given criteria of global and local similarity.
|
Program options
Presently PHT has four options.
- Find k-shortest path to convert one metabolite into another in a given network (organism-specific or general metabolic network).
- Find k-shortest paths from a substrate metabolite to all feasible metabolites in a given network (organism-specific or general).
- Find k-shortest path to a product metabolite from all feasible substrate metabolites in a given network (organism-specific or general).
- Statistical analysis of the metabolic pathways like average path length, diameter of the network, average node connectivity, loose ends in the network, hubs in a given network (organism-specific or general).
User defined constraints
There are sets of user-defined constraints, which can be used for an in-depth network analysis without affecting the biochemical/biological relevance.
- While traversing through the metabolic pathway it is possible to set the similarity measure score (Atom Mapper) between interacting molecules and to define the amount of structure change with respect to this reference molecule at each reaction step (Atom Tracer).
- By setting the Minimum path length and Maximum path length, the path between two metabolites in the network can be altered. For example, if the minimum path length is set to 6, then the algorithm will drop paths below it and report the next possible shortest path above or equal to 6, which is the shortest possible path under the given constraint.
- It is possible to choose via Metabolite, not via Metabolites and not via Enzymes options for use of a particular set of pathways.
- Under Build Virtual Organism it is possible to add one's own set of enzymes and perform further analysis. This is very useful for identification of the missing links in the network.
| RESULTS |
|---|
|
|
|---|
We performed a shortest path analysis (Fig. 4) in Escherichia coli K-12 between beta-D-glucose and pyruvate, which was found to be nine steps long. We considered global similarity and local similarity while traversing the path. The algorithm automatically identifies the correct connectivity between the metabolites at each reaction step.
We also performed a comparative study between the KEGG reaction reference map, Corynebacterium glutamicum, E.coli K-12 and Mycobacterium tuberculosis (Fig. 5). We were interested in finding the shortest path between D-erythrose 4-phosphate and chorismate, which was found to be in seven reaction steps in all these cases. Looking closely at Figure 5, it is clear that different pathways are possible to convert D-erythrose 4-phosphate to chorismate in the reference map, or in C.glutamicum, E.coli K-12 and M.tuberculosis (score 4 on the edge). Some organisms may use enzyme 1.1.99.25 (blue colour) to perform the same conversion (score 1 on the edge).
|
| OUTPUT FORMAT |
|---|
|
|
|---|
PHT generates three kinds of output:
- A Text-based output can be viewed immediately in the browsers and is supplied with hyperlinks to other database, like BRENDA, KEGG and PROSITE.
- A Graphical view of the output is generated for Metabolic Pathways and Enzyme connectivity as graph modeling language (GML) (http://www.infosun.fmi.uni-passau.de/Graphlet/GML/) files. These portable files can be saved on the client's system and can be viewed later in any dynamic layout software that can read the GML format (e.g. the yEd (http://www.yworks.com/products/yed/ graphical editor).
- An enzymeenzyme connectivity matrix, which can be used for pathway alignment and other studies. The reaction-organism matrix highlights the presence of reactions in organisms by binary 1 and 0 for absence.
| Acknowledgments |
|---|
The authors are grateful for financial support by the German Federal Ministry for Education and Research (BMBF).
Received on August 12, 2004; revised on October 21, 2004; accepted on October 21, 2004
| REFERENCES |
|---|
|
|
|---|
Arita, M. (2004) The metabolic world of Escherichia coli is not small. Proc. Natl Acad. Sci. USA, 101, 15431547
Barabasi, A.L. and Oltvai, Z.N. (2004) Network biology: understanding the cell's functional organization. Nat. Rev. Genet., 5, 101113[CrossRef][Web of Science][Medline].
Hulo, N., Sigrist, C.J., Le Saux, V., Langendijk-Genevaux, P.S., Bordoli, L., Gattiker, A., De Castro, E., Bucher, P., Bairoch, A. (2004) Recent improvements to the PROSITE database. Nucleic Acids Res., 32, D134137
Jech, T.J. and Jech, T. Set Theory, (1997) , Berlin Springer-Verlag.
Jungnickel, D. Graphs, Network and Algorithm, (2002) , Berlin Springer Verlag.
Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., Hattori, M. (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res., 32, , pp. D277280
Nadathur, G. and Miller, D. (1990) Higher-order Horn clauses. JACM, 37, 777814[CrossRef].
Schomburg, I., Chang, A., Ebeling, C., Gremse, M., Heldt, C., Huhn, G., Schomburg, D. (2004) BRENDA, the enzyme database: updates and major new developments. Nucleic Acids Res., 32, D431433
Steinbeck, C., Han, Y., Kuhn, S., Horlacher, O., Luttmann, E., Willighagen, E. (2003) The Chemistry Development Kit (CDK): an open-source Java library for Chemo- and Bioinformatics. J. Chem. Inf. Comput. Sci., 43, 493500[CrossRef][Web of Science][Medline].
Whittle, M., Willett, P., Klaffke, W., van Noort, P. (2003) Evaluation of similarity measures for searching the dictionary of natural products database. J. Chem. Inf. Comput. Sci., 43, 449457[Medline].
Willet, P., Barnard, J.M., Downs, G.M. (1998) Chemical similarity searching. J. Chem. Inf. Comput. Sci., 38, 938996.
Xue, L., Godden, J.W., Stahura, F.L., Bajorath, J. (2003) Design and evaluation of a molecular fingerprint involving the transformation of property descriptor values into a binary classification scheme. J. Chem. Inf. Comput. Sci., 43, 11511157[Medline].
This article has been cited by other articles:
![]() |
F. J. Planes and J. E. Beasley An optimization model for metabolic pathways Bioinformatics, October 15, 2009; 25(20): 2723 - 2729. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Mithani, G. M. Preston, and J. Hein Rahnuma: hypergraph-based tool for metabolic pathway prediction and network comparison Bioinformatics, July 15, 2009; 25(14): 1831 - 1832. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. F. de Figueiredo, S. Schuster, C. Kaleta, and D. A. Fell Can sugars be produced from fatty acids? A test case for pathway analysis tools Bioinformatics, January 1, 2009; 25(1): 152 - 158. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. F. de Figueiredo, S. Schuster, C. Kaleta, and D. A. Fell Can sugars be produced from fatty acids? A test case for pathway analysis tools Bioinformatics, November 15, 2008; 24(22): 2615 - 2621. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Blum and O. Kohlbacher MetaRoute: fast search for relevant metabolic routes for interactive network navigation and visualization Bioinformatics, September 15, 2008; 24(18): 2108 - 2109. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. J. Planes and J. E. Beasley A critical examination of stoichiometric and path-finding approaches to metabolic pathways Brief Bioinform, September 1, 2008; 9(5): 422 - 436. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Brohee, K. Faust, G. Lima-Mendez, O. Sand, R. Janky, G. Vanderstocken, Y. Deville, and J. van Helden NeAT: a toolbox for the analysis of biological networks, clusters, classes and pathways Nucleic Acids Res., July 1, 2008; 36(suppl_2): W444 - W451. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. A. Rahman and D. Schomburg Observing local and global properties of metabolic pathways: 'load points' and 'choke points' in the metabolic networks Bioinformatics, July 15, 2006; 22(14): 1767 - 1774. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Vallenet, L. Labarre, Z. Rouy, V. Barbe, S. Bocs, S. Cruveiller, A. Lajus, G. Pascal, C. Scarpelli, and C. Medigue MaGe: a microbial genome annotation system supported by synteny results Nucleic Acids Res., January 10, 2006; 34(1): 53 - 65. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Arita Scale-Freeness and Biological Networks J. Biochem., July 1, 2005; 138(1): 1 - 4. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||












