Bioinformatics Vol. 19 no. 17 2003
pages 2199-2209
© 2003 Oxford University Press
A comparative study of machine-learning methods to predict the effects of single nucleotide polymorphisms on protein function
School of Biochemistry and Molecular Biology, University of Leeds, Leeds LS2 9JT, UK
Received on February 26, 2003
; revised on May 9, 2003
; accepted on May 21, 2003
Motivation: The large volume of single nucleotide polymorphism data now available motivates the development of methods for distinguishing neutral changes from those which have real biological effects. Here, two different machine-learning methods, decision trees and support vector machines (SVMs), are applied for the first time to this problem. In common with most other methods, only non-synonymous changes in protein coding regions of the genome are considered.
Results: In detailed cross-validation analysis, both learning methods are shown to compete well with existing methods, and to out-perform them in some key tests. SVMs show better generalization performance, but decision trees have the advantage of generating interpretable rules with robust estimates of prediction confidence. It is shown that the inclusion of protein structure information produces more accurate methods, in agreement with other recent studies, and the effect of using predicted rather than actual structure is evaluated.
Availability: Software is available on request from the authors.
Contact: westhead{at}bmb.leeds.ac.uk
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. Masso and I. I. Vaisman Accurate prediction of enzyme mutant activity based on a multibody statistical potential Bioinformatics, December 1, 2007; 23(23): 3155 - 3161. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S. Kaminker, Y. Zhang, C. Watanabe, and Z. Zhang CanPredict: a computational tool for predicting cancer-associated missense mutations Nucleic Acids Res., July 13, 2007; 35(suppl_2): W595 - W598. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Bromberg and B. Rost SNAP: predict effect of non-synonymous polymorphisms on function Nucleic Acids Res., June 28, 2007; 35(11): 3823 - 3835. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z.-Q. Ye, S.-Q. Zhao, G. Gao, X.-Q. Liu, R. E. Langlois, H. Lu, and L. Wei Finding new structural and sequence attributes to predict possible disease association of single amino acid polymorphism (SAP) Bioinformatics, June 15, 2007; 23(12): 1444 - 1450. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Care, C. J. Needham, A. J. Bulpitt, and D. R. Westhead Deleterious SNP prediction: be mindful of your training data! Bioinformatics, March 15, 2007; 23(6): 664 - 672. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S. Kaminker, Y. Zhang, A. Waugh, P. M. Haverty, B. Peters, D. Sebisanovic, J. Stinson, W. F. Forrest, J. F. Bazan, S. Seshagiri, et al. Distinguishing Cancer-Associated Missense Mutations from Common Polymorphisms Cancer Res., January 15, 2007; 67(2): 465 - 473. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. A. Stone and A. Sidow Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity Genome Res., July 1, 2005; 15(7): 978 - 986. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Dantzer, C. Moad, R. Heiland, and S. Mooney MutDB services: interactive structural analysis of mutation data Nucleic Acids Res., July 1, 2005; 33(suppl_2): W311 - W314. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. R. Yang Mining SARS-CoV protease cleavage data using non-orthogonal decision trees: a novel method for decisive template selection Bioinformatics, June 1, 2005; 21(11): 2644 - 2650. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Bao and Y. Cui Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information Bioinformatics, May 15, 2005; 21(10): 2185 - 2190. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. R. Bradford and D. R. Westhead Improved prediction of protein-protein binding sites using a support vector machines approach Bioinformatics, April 15, 2005; 21(8): 1487 - 1494. [Abstract] [Full Text] [PDF] |
||||



