Bioinformatics Advance Access published online on November 4, 2008
Bioinformatics, doi:10.1093/bioinformatics/btn572
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
How Frugal is Mother Nature with Haplotypes?
1Department of Computer Science,2Department of Cardiology, 3Division of Biostatistics, 4Department of Biology, 5Department of Genetics, Washington University, St. Louis, MO, USA
*To whom correspondence should be addressed. Prof. Weixiong Zhang, E-mail: weixiong.zhang{at}wustl.edu
| Abstract |
|---|
Motivation: Inference of haplotypes from genotype data is crucial and challenging for many vitally important studies. The first, and most critical step, is the ascertainment of a biologically sound model to be optimized. Many models that have been proposed rely partially or entirely on reducing the number of unique haplotypes in the solution.
Results: This paper examines the parsimony of haplotypes using known haplotypes as well as genotypes from the HapMap project. Our study reveals that there are relatively few unique haplotypes, but not always the least possible, for the data sets with known solutions. Furthermore, we show that there are frequently very large numbers of parsimonious solutions, and the number increases exponentially with increasing cardinality. Moreover, these solutions are quite varied, most of which are not consistent with the true solutions. These results quantify the limitations of the Pure Parsimony model and demonstrate the imperative need to consider additional properties for haplotype inference models. At a higher level, and with broad applicability, this paper illustrates the power of combinatorial methods to tease out imperfections in a given biological model.
Associate Editor: Dr. Alex Bateman
Received on July 29, 2008; revised on October 26, 2008; accepted on October 30, 2008