Bioinformatics Advance Access published online on March 3, 2005
Bioinformatics, doi:10.1093/bioinformatics/bti365
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Department of Molecular Sciences, Center of Genomics and Bioinformatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
* To whom correspondence should be addressed.
Motivation: There has been great expectation that knowledge of an individual's genotype will provide a basis for assessing susceptibility to diseases and designing individualized therapy. Nonsynonymous single-nucleotide polymorphisms (nsSNP) that lead to an amino acid change in the protein product are of particular interest because they account for nearly half of the known genetic variations related to human inherited disease (Stenson et al., 2003). To facilitate identifying disease-associated nsSNPs from a large number of neutral nsSNPs, it is important to develop computational tools to predict the nsSNP's phenotypic effect. Results: We prepared a training set based on the variant phenotypic annotation of the SwissProt database and focused our analysis on nsSNPs having homologous 3D structures. Structural environment parameters derived from the 3D homologous structure as well as evolutionary information derived from the multiple sequence alignment were used as predictors. Two machine learning methods, support vector machine and random forest, were trained and evaluated. We compared the performance of our method with that of the SIFT algorithm (Ng and Henikoff, 2003), which is one of the best predictive methods to date. An unbiased evaluation study shows that for nsSNPs with sufficient evolutionary information (e.g., with no fewer than 10 homologous sequences), the performance of our method is comparable to the SIFT algorithm, while for nsSNPs with insufficient evolutionary information (e.g., fewer than 10 homologous sequences), our method outperforms the SIFT algorithm significantly. These findings indicate that incorporating structural information is critical to achieving good prediction accuracy when sufficient evolutionary information is not available. Availability: The codes and curated dataset are available at http://compbio.utmem.edu/snp/.
Received October 20, 2004
Revised February 17, 2005
Accepted February 28, 2005
Article
Prediction of the phenotypic effects of nonsynonymous single nucleotide polymorphisms using structural and evolutionary information
Yan Cui, E-mail: ycui2{at}utmem.edu
![]()
Abstract ![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
B. Li, V. G. Krishnan, M. E. Mort, F. Xin, K. K. Kamati, D. N. Cooper, S. D. Mooney, and P. Radivojac Automated inference of molecular mechanisms of disease from amino acid substitutions Bioinformatics, November 1, 2009; 25(21): 2744 - 2750. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. J. Jorgensen, I. Ruczinski, B. Kessing, M. W. Smith, Y. Y. Shugart, and A. J. Alberg Hypothesis-Driven Candidate Gene Association Studies: Practical Design and Analytical Considerations Am. J. Epidemiol., October 15, 2009; 170(8): 986 - 993. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S. Kaminker, Y. Zhang, C. Watanabe, and Z. Zhang CanPredict: a computational tool for predicting cancer-associated missense mutations Nucleic Acids Res., July 13, 2007; 35(suppl_2): W595 - W598. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Bromberg and B. Rost SNAP: predict effect of non-synonymous polymorphisms on function Nucleic Acids Res., June 28, 2007; 35(11): 3823 - 3835. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z.-Q. Ye, S.-Q. Zhao, G. Gao, X.-Q. Liu, R. E. Langlois, H. Lu, and L. Wei Finding new structural and sequence attributes to predict possible disease association of single amino acid polymorphism (SAP) Bioinformatics, June 15, 2007; 23(12): 1444 - 1450. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Care, C. J. Needham, A. J. Bulpitt, and D. R. Westhead Deleterious SNP prediction: be mindful of your training data! Bioinformatics, March 15, 2007; 23(6): 664 - 672. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S. Kaminker, Y. Zhang, A. Waugh, P. M. Haverty, B. Peters, D. Sebisanovic, J. Stinson, W. F. Forrest, J. F. Bazan, S. Seshagiri, et al. Distinguishing Cancer-Associated Missense Mutations from Common Polymorphisms Cancer Res., January 15, 2007; 67(2): 465 - 473. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. S. Choi, E. J. Vallender, and B. T. Lahn Systematically Assessing the Influence of 3-Dimensional Structural Context on the Molecular Evolution of Mammalian Proteomes Mol. Biol. Evol., November 1, 2006; 23(11): 2131 - 2133. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Larranaga, B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza, J. A. Lozano, R. Armananzas, G. Santafe, A. Perez, et al. Machine learning in bioinformatics Brief Bioinform, March 1, 2006; 7(1): 86 - 112. [Abstract] [Full Text] [PDF] |
||||





