Bioinformatics Advance Access published online on June 18, 2008
Bioinformatics, doi:10.1093/bioinformatics/btn311
Predicting Functional Regulatory Polymorphisms
1Scripps Genomic Medicine and the Scripps Translational Sciences Institute, Scripps Health and Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, Ca 92037
*To whom correspondence should be addressed. Nicholas J. Schork, E-mail: nschork{at}scripps.edu
| Abstract |
|---|
Motivation: Limited availability of data has hindered the development of algorithms that can identify functionally meaningful regulatory single nucleotide polymorphisms (rSNPs). Given the large number of common polymorphisms known to reside in the human genome, the identification of functional rSNPs via laboratory assays will be costly and time-consuming. Therefore appropriate bioinformatics strategies for predicting functional rSNPs are necessary. Recent data from the ENCODE Project has significantly expanded the amount of available functional information relevant to noncoding regions of the genome, and, importantly, led to the conclusion that many functional elements in the human genome are not conserved.
Results: In this manuscript we describe how ENCODE data can be leveraged to probabilistically determine the functional and phenotypic significance of noncoding SNPs (ncSNPs). The method achieves excellent sensitivity (
80%) and specificity (
99%) based on a set of known phenotypically-relevant and non-functional SNPs. In addition, we show that our method is not overtrained through the use of cross-validation analyses.
Availability: The software platforms used in our analyses are freely available (http://www.cs.waikato.ac.nz/ml/weka/). In addition, we provide the training dataset (Supplemental Table 3), and our predictions (Supplemental Table 6), in the supplementary material
Associate Editor: Dr. Alex Bateman
Received on April 21, 2008; revised on June 6, 2008; accepted on June 12, 2008