Inferring missing genotypes in large SNP panels using fast nearest-neighbor searches over sliding windows
1Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599, 2Constella Group, Durham, NC 27713, 3Department of Environmental Sciences and Engineering, University of North Carolina, Chapel Hill, NC 27599 and 4Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: Typical high-throughput genotyping techniques produce numerous missing calls that confound subsequent analyses, such as disease association studies. Common remedies for this problem include removing affected markers and/or samples or, otherwise, imputing the missing data. On small marker sets imputation is frequently based on a vote of the K-nearest-neighbor (KNN) haplotypes, but this technique is neither practical nor justifiable for large datasets.
Results: We describe a data structure that supports efficient KNN queries over arbitrarily sized, sliding haplotype windows, and evaluate its use for genotype imputation. The performance of our method enables exhaustive exploration over all window sizes and known sites in large (150K, 8.3M) SNP panels. We also compare the accuracy and performance of our methods with competing imputation approaches.
Availability: A free open source software package, NPUTE, is available at http://compgen.unc.edu/software, for non-commercial uses.
Contact: mcmillan{at}cs.unc.edu
This article has been cited by other articles:
![]() |
E. L. Goode, B. L. Fridley, R. A. Vierkant, J. M. Cunningham, C. M. Phelan, S. Anderson, D. N. Rider, K. L. White, V. S. Pankratz, H. Song, et al. Candidate Gene Analysis Using Imputed Genotypes: Cell Cycle Single-Nucleotide Polymorphisms and Ovarian Cancer Risk Cancer Epidemiol. Biomarkers Prev., March 1, 2009; 18(3): 935 - 944. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-L. Jannink, H. Iwata, P. R. Bhat, S. Chao, P. Wenzl, and G. J. Muehlbauer Marker Imputation in Barley Association Studies The Plant Genome, March 1, 2009; 2(1): 11 - 22. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. M. Gatti, A. A. Shabalin, T.-C. Lam, F. A. Wright, I. Rusyn, and A. B. Nobel FastMap: Fast eQTL mapping in homozygous populations Bioinformatics, February 15, 2009; 25(4): 482 - 489. [Abstract] [Full Text] [PDF] |
||||


