Bioinformatics Vol. 19 no. 6 2003
Pages 764-771
© 2003 Oxford University Press
Feature selection and transduction for prediction of molecular bioactivity for drug design
1 Max Planck Institute, Spemannstr. 38,
Tuebingen, 72076, Germany
2 BIOwulf Technologies, 305 Broadway,
New York, NY 10007, USA
Received on April 16, 2002
; revised on October 31, 2002
; accepted on November 10, 2002
Motivation: In drug discovery a key task is to identify characteristics that separate active (binding) compounds from inactive (non-binding) ones. An automated prediction system can help reduce resources necessary to carry out this task.
Results: Two methods for prediction of molecular bioactivity for drug design are introduced and shown to perform well in a data set previously studied as part of the KDD (Knowledge Discovery and Data Mining) Cup 2001. The data is characterized by very few positive examples, a very large number of features (describing three-dimensional properties of the molecules) and rather different distributions between training and test data. Two techniques are introduced specifically to tackle these problems: a feature selection method for unbalanced data and a classifier which adapts to the distribution of the the unlabeled test data (a so-called transductive method). We show both techniques improve identification performance and in conjunction provide an improvement over using only one of the techniques. Our results suggest the importance of taking into account the characteristics in this data which may also be relevant in other problems of a similar type.
Availability: Matlab source code is available at http://www.kyb.tuebingen.mpg.de/bs/people/weston/kdd/kdd.html
Contact: jason.weston{at}tuebingen.mpg.de
Supplementary information: Supplementary material is available at http://www.kyb.tuebingen.mpg.de/bs/people/weston/kdd/kdd.html.
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
S. Idicula-Thomas, A. J. Kulkarni, B. D. Kulkarni, V. K. Jayaraman, and P. V. Balaji A support vector machine-based method for predicting the propensity of a protein to be soluble or to form inclusion body on overexpression in Escherichia coli Bioinformatics, February 1, 2006; 22(3): 278 - 284. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Rausch, T. Weber, O. Kohlbacher, W. Wohlleben, and D. H. Huson Specificity prediction of adenylation domains in nonribosomal peptide synthetases (NRPS) using transductive support vector machines (TSVMs) Nucleic Acids Res., October 12, 2005; 33(18): 5799 - 5808. [Abstract] [Full Text] [PDF] |
||||

