Bioinformatics Advance Access originally published online on November 15, 2007
Bioinformatics 2007 23(23):3178-3184; doi:10.1093/bioinformatics/btm496
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Genome-wide selection of tag SNPs using multiple-marker correlation
Algorithm and Data Analysis, Affymetrix, Inc., 3420 Central Expressway, Santa Clara, California, USA
To whom correspondence should be addressed.
| Abstract |
|---|
Motivations: The tag SNP approach is a valuable tool in whole genome association studies, and a variety of algorithms have been proposed to identify the optimal tag SNP set. Currently, most tag SNP selection is based on two-marker (pairwise) linkage disequilibrium (LD). Recent literature has shown that multiple-marker LD also contains useful information that can further increase the genetic coverage of the tag SNP set. Thus, tag SNP selection methods that incorporate multiple-marker LD are expected to have advantages in terms of genetic coverage and statistical power.
Results: We propose a novel algorithm to select tag SNPs in an iterative procedure. In each iteration loop, the SNP that captures the most neighboring SNPs (through pair-wise and multiple-marker LD) is selected as a tag SNP. We optimize the algorithm and computer program to make our approach feasible on today's typical workstations. Benchmarked using HapMap release 21, our algorithm outperforms standard pair-wise LD approach in several aspects. (i) It improves genetic coverage (e.g. by 7.2% for 200 K tag SNPs in HapMap CEU) compared to its conventional pair-wise counterpart, when conditioning on a fixed tag SNP number. (ii) It saves genotyping costs substantially when conditioning on fixed genetic coverage (e.g. 34.1% saving in HapMap CEU at 90% coverage). (iii) Tag SNPs identified using multiple-marker LD have good portability across closely related ethnic groups and (iv) show higher statistical power in association tests than those selected using conventional methods.
Availability: A computer software suite, multiTag, has been developed based on this novel algorithm. The program is freely available by written request to the author at ke_hao{at}merck.com
Contact: ke_hao{at}163.com
Supplementary information: Supplementary data are available at Bioinformatics online.
Associate Editor: Martin Bishop
Present address: Rosetta Inpharmatics, a wholly owned subsidiary of Merck and Co. Inc., 401 Terry Ave. N., Seattle, WA, USA.
Received on May 21, 2007; revised on September 8, 2007; accepted on September 28, 2007