Bioinformatics Advance Access originally published online on July 1, 2004
Bioinformatics 2005 21(1):90-103; doi:10.1093/bioinformatics/bth388
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics vol. 21 issue 1 © Oxford University Press 2005; all rights reserved.
HAPLORE: a program for haplotype reconstruction in general pedigrees without recombination
1 Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham AL 35294, USA
2 Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California 1042 W. 36th Place, Los Angeles, CA 90089, USA
3 Department of Epidemiology and Public Health, Yale University School of Medicine 60 College Street, New Haven, CT 06520, USA
*To whom correspondence should be addressed.
Motivation: Haplotype reconstruction is an essential step in genetic linkage and association studies. Although many methods have been developed to estimate haplotype frequencies and reconstruct haplotypes for a sample of unrelated individuals, haplotype reconstruction in large pedigrees with a large number of genetic markers remains a challenging problem.
Methods: We have developed an efficient computer program, HAPLORE (HAPLOtype REconstruction), to identify all haplotype sets that are compatible with the observed genotypes in a pedigree for tightly linked genetic markers. HAPLORE consists of three steps that can serve different needs in applications. In the first step, a set of logic rules is used to reduce the number of compatible haplotypes of each individual in the pedigree as much as possible. After this step, the haplotypes of all individuals in the pedigree can be completely or partially determined. These logic rules are applicable to completely linked markers and they can be used to impute missing data and check genotyping errors. In the second step, a haplotype-elimination algorithm similar to the genotype-elimination algorithms used in linkage analysis is applied to delete incompatible haplotypes derived from the first step. All superfluous haplotypes of the pedigree members will be excluded after this step. In the third step, the expectation-maximization (EM) algorithm combined with the partition and ligation technique is used to estimate haplotype frequencies based on the inferred haplotype configurations through the first two steps. Only compatible haplotype configurations with haplotypes having frequencies greater than a threshold are retained.
Results: We test the effectiveness and the efficiency of HAPLORE using both simulated and real datasets. Our results show that, the rule-based algorithm is very efficient for completely genotyped pedigree. In this case, almost all of the families have one unique haplotype configuration. In the presence of missing data, the number of compatible haplotypes can be substantially reduced by HAPLORE, and the program will provide all possible haplotype configurations of a pedigree under different circumstances, if such multiple configurations exist. These inferred haplotype configurations, as well as the haplotype frequencies estimated by the EM algorithm, can be used in genetic linkage and association studies.
Availability: The program can be downloaded from http://bioinformatics.med.yale.edu
Contact: hongyu.zhao{at}yale.edu
Received on July 28, 2003; revised on December 4, 2003; accepted on December 4, 2003
This article has been cited by other articles:
![]() |
M. Abney Identity-by-Descent Estimation and Mapping of Qualitative Traits in Large, Complex Pedigrees Genetics, July 1, 2008; 179(3): 1577 - 1590. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Lin, Z. Wang, L. Wang, Y.-L. Lau, and W. Yang Identification of linked regions using high-density SNP genotype data in linkage analysis Bioinformatics, January 1, 2008; 24(1): 86 - 93. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. L. Sherman, J. D. Nkrumah, B. M. Murdoch, C. Li, Z. Wang, A. Fu, and S. S. Moore Polymorphisms and haplotypes in the bovine neuropeptide Y, growth hormone receptor, ghrelin, insulin-like growth factor 2, and uncoupling proteins 2 and 3 genes and their associations with measures of growth, performance, feed efficiency, and carcass merit in beef cattle J Anim Sci, January 1, 2008; 86(1): 1 - 16. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. A. Albers, T. Heskes, and H. J. Kappen Haplotype Inference in General Pedigrees Using the Cluster Variation Method Genetics, October 1, 2007; 177(2): 1101 - 1116. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. J. Yoo, J. Tang, R. A. Kaslow, and K. Zhang Haplotype inference for present absent genotype data using previously identified haplotypes and haplotype patterns Bioinformatics, September 15, 2007; 23(18): 2399 - 2406. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Allen-Brady, L. A. Cannon-Albright, S. L. Neuhausen, and N. J. Camp A Role for XRCC4 in Age at Diagnosis and Breast Cancer Risk. Cancer Epidemiol. Biomarkers Prev., July 1, 2006; 15(7): 1306 - 1310. [Abstract] [Full Text] [PDF] |
||||



