Bioinformatics Advance Access published online on October 23, 2006
Bioinformatics, doi:10.1093/bioinformatics/btl535
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Theoretical Biology and Biophysics Group, Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
Motivation: Our knowledge of metabolism is far from complete, and the gaps in our knowledge are being revealed by metabolomic detection of small-molecules not previously known to exist in cells. An important challenge is to determine the reactions in which these compounds participate, which can lead to the identification of gene products responsible for novel metabolic pathways. To address this challenge, we investigate how machine learning can be used to predict potential substrates and products of oxidoreductase-catalyzed reactions. Results: We examined 1956 oxidation/reduction reactions in the KEGG database. The vast majority of these reactions (1626) can be divided into twelve subclasses, each of which is marked by a particular type of functional group transformation. For a given transformation, the local structures of reaction centers in substrates and products can be characterized by patterns. These patterns are not unique to reactants but are widely distributed among KEGG metabolites. To distinguish reactants from non-reactants, we trained classifiers (linear kernel Support Vector Machines) using negative and positive samples. The input to a classifier is a set of atomic features that can be determined from the 2D chemical structure of a compound. Depending on the subclass of reaction, the accuracy of prediction for positives (negatives) is 64 to 93% (44 to 92%) when asking if a compound is a substrate and 71 to 98% (50 to 92%) when asking if a compound is a product. Sensitivity analysis reveals that this performance is robust to variations of the training data. Our results suggest that metabolic connectivity can be predicted with reasonable accuracy from the presence or absence of local structural motifs in compounds and their readily calculated atomic features. Availability: Classifiers reported here can be used freely for noncommercial purposes via a Java program available upon request. Supplementary information: Supplementary Material is available at Bioinformatics online.
Received July 13, 2006
Revised September 28, 2006
Accepted October 12, 2006
Article
Prediction of oxidoreductase-catalyzed reactions based on atomic properties of metabolites
Fangping Mu 1, Pat J. Unkefer 2, Clifford J. Unkefer 2, and William S. Hlavacek 3 *
2 National Stable Isotope Resource, Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
3 Theoretical Biology and Biophysics Group, Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA; Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
![]()
Abstract
Associate Editor: Golan Yona
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
K. Fenner, J. Gao, S. Kramer, L. Ellis, and L. Wackett Data-driven extraction of relative reasoning rules to limit combinatorial explosion in biodegradation pathway prediction Bioinformatics, September 15, 2008; 24(18): 2079 - 2085. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Mu, R. F. Williams, C. J. Unkefer, P. J. Unkefer, J. R. Faeder, and W. S. Hlavacek Carbon-fate maps for metabolic reactions Bioinformatics, December 1, 2007; 23(23): 3193 - 3199. [Abstract] [Full Text] [PDF] |
||||
