Bioinformatics Advance Access originally published online on February 10, 2005
Bioinformatics 2005 21(9):2128-2129; doi:10.1093/bioinformatics/bti282
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
PowerMarker: an integrated analysis environment for genetic marker analysis

Bioinformatics Research Center Campus Box 7566 North Carolina State University Raleigh, NC 27695-7566, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Summary: PowerMarker delivers a data-driven, integrated analysis environment (IAE) for genetic data. The IAE integrates data management, analysis and visualization in a user-friendly graphical user interface. It accelerates the analysis lifecycle and enables users to maintain data integrity throughout the process. An ever-growing list of more than 50 different statistical analyses for genetic markers has been implemented in PowerMarker.
Availability: www.powermarker.net
Contact: powermarker{at}hotmail.com
| INTRODUCTION |
|---|
|
|
|---|
Fundamental and applied population genetics, quantitative genetics and human genetics depend heavily on the availability of genetic markers. Markers are used by population geneticists to investigate the origin, genetic diversity and population structure of alleles, by evolutionists to describe genetic relationships among species or populations and by geneticists to study linkage disequilibrium (LD) within or between genes. Markers are now widely used in the search for genes affecting human diseases by identifying statistical associations between the genetic markers and the traits of interest. Traditionally, these tasks have been performed by a variety of separate tools (Felsenstein, 1993; Lewis and Zaykin, 2001 Free program distributed over the internet from http://lewis.eeb.uconn.edu/lewishome/software.html), each of which having its own specific input and output formats. Such tools provide geneticists the necessary functionality to analyze their data. However, many of these tools, implemented as standalone programs, are not especially user-friendly and require the users to spend considerable time on data preparation and/or result parsing. As far as the authors can tell, there is no publicly available package for genetic marker analysis that integrates the data management, statistical analysis and visualization aspects of genetic data analysis through a single user-friendly graphical interface. This note describes such an application that allows scientists to perform genetic marker data analysis in an integrated analysis environment (IAE).
| OVERVIEW |
|---|
|
|
|---|
PowerMarker includes a powerful graphical interface (http://www.powermarker.net). The user interface allows the user to manage projects and a variety of data objects, to perform over 50 different analyses in more than 20 modules, and to manipulate and view all data objects. A project in PowerMarker consists of data objects (datasets, tables, etc.) and folders. Data objects in a project are organized by different object types. Users can create folders and give shortcuts to data objects in a folder by simple drag and drop operations. Upon input, all data are serialized as binary formats to reduce storage demands and improve computational efficiency. Perhaps more importantly, serialized data is not human readable and is not subject to casual editing, a feature useful for error reduction. Of course, PowerMarker can produce human readable datasets from the binary forms.
PowerMarker handles a variety of marker data, including both haplotypes and diplotypes. The gametic phase for the diplotype data can be known or unknown. Examples of marker data include microsatellite data, single nucleotide polymorphism (SNP) data, and Restriction Fragment Length Polymorphism (RFLP) data. PowerMarker does not require a specific input format such as NEXUS format. Instead, PowerMarker supports table-like format directly. When importing a dataset, the user can choose one of the two possible modes: the dataset wizard will guide the user step-by-step through the process of data importing or the batch importer can simultaneously import multiple datasets with the same format. PowerMarker can export datasets in a variety of formats.
| METHODS IMPLEMENTED IN POWERMARKER |
|---|
|
|
|---|
PowerMarker computes several summary statistics for each marker locus, including allele number, missing proportion, heterozygosity, gene diversity, polymorphism information content (PIC) and stepwise patterns for microsatellite data. Variances and confidence intervals of these statistics are estimated by non-parametric bootstrapping across different loci. Allele frequencies and genotype frequencies are estimated by simple counting, while haplotype frequencies are estimated by the widely used EM algorithm (Excoffier and Slatkin, 1995). The variances and confidence intervals of these frequencies are estimated by bootstrapping across individuals. Haplotype estimation in PowerMarker is highly optimized for SNP data by taking advantage of the binary feature of these markers. We also have efficient EM algorithms for trio families (Rohde and Fuerst, 2001).
PowerMarker implements all common methods for testing HardyWeinberg equilibrium and linkage equilibrium, including
2 tests, likelihood ratio tests and exact tests. Most common measures of LD are calculated by PowerMarker, including D' and r2. PowerMarker also creates several LD-distance plots directly in Microsoft Excel. The LD matrix constructed in PowerMarker can be viewed internally by its 2D viewer and exported as a graphics file for further editing.
Differentiation among populations is often summarized using F-statistics. PowerMarker performs four different types of F-statistics analysis. Several data selection modules are included in PowerMarker for experimental design purposes. The goal of line selection is to choose a core set of lines with maximal gene diversity from a larger germplasm collection. We implemented a flexible simulated annealing algorithm to do the combinatorial optimization (K.Liu, Y.Xiang and S.V.Muse, submitted for publication). A unique feature of the algorithm is that general constraints can be incorporated in the algorithm. The idea underlying marker selection is to choose relatively uncorrelated markers from a larger pool considering linkage disequilibrium between markers. Phylogenetic analysis in PowerMarker covers four modules: the frequency module computing allele frequencies, the distance module computing 19 different distances based on allele frequencies or repeat patterns of microsatellite markers, the tree module computing trees from distances and the bootstrap module that generates a list of trees by bootstrapping across markers.
PowerMarker offers three methods for testing an association between a single marker and the affected status of individuals (must be binary): the allele case-control test, genotypic case-control test and the multiallelic trend test. An F-test is provided for quantitative traits. Haplotype trend regression (Zaykin et al., 2002) can be applied to both quantitative traits and binary traits. The coalescence simulation with hotspot recombination model of Posada and Wiuf (2003) is implemented in PowerMarker. With no hotspot defined, the simulation becomes the classical coalescence model with homogeneous recombination (Hudson, 1983; Hudson and Kaplan, 1990). PowerMarker's SNP identification tool identifies SNPs from sequence data. A variety of user-settable options are available for this tool. PowerMarker also includes modules for Mantel tests and contingency table analyses.
For details on using the software and explanations of the underlying algorithms, we refer readers to the manual and the references listed there. Currently PowerMarker does not implement any linkage analyses, but this point will be addressed in the future version.
| SPECIAL FEATURES |
|---|
|
|
|---|
PowerMarker offers several special features for visualization and data analysis:
- Two-dimensional (2D) plots and triangle plots: 2D plots are used extensively for visualizing linkage disequilibria results. The 2D plot module in PowerMarker provides a powerful editor for visualizing two-way tables. The resulting plot can be saved as a Windows Meta File (WMF) for further editing. Triangle plots are useful for characterizing population structure.
- Excel integration: Datasets and tables in PowerMarker can be opened in Excel by double-clicking in the internal viewers. PowerMarker also directly draws triangle plots in Excel.
- Multithreading batch system: all of the analyses and data manipulations in PowerMarker can support multiple datasets in a graphical user interface. Also, each analysis runs in its own thread, so it is possible to pause/resume or cancel an analysis without affecting the graphical interface or other running analyses. For computers with multiple CPUs, the multithreading system makes it possible to allocate multiple CPUs to the program.
| IMPLEMENTATION |
|---|
|
|
|---|
The PowerMarker package was written in Visual C# and runs under the Microsoft .NET framework. The numerical library in PowerMarker was written in Visual C++. Excel integration was implemented through DCOM. The software does not put a limitation on the sample size or marker number of the dataset (some analyses, such as haplotype frequency estimation, will be subject to the limitation of the size of computer memory). The authors are frequently updating the software to support new analyses.
| Acknowledgments |
|---|
The authors gratefully acknowledge the John Doebley and Ed Buckler labs for their significant and continuing support. Input from Bruce Weir and Elizabeth Thompson is recognized and appreciated. The development of PowerMarker was partly supported by NSF grants DBI-0096033 and DEB-9996118.
| Footnotes |
|---|
Present address: GlaxoSmithKline, PO Box 13398, Five Moore Drive, Research Triangle Park, NC, USA
Received on May 12, 2004; revised on January 17, 2005; accepted on January 18, 2005
| REFERENCES |
|---|
|
|
|---|
Excoffier, L. and Slatkin, M. (1995) Maximum-Likelihood estimation of molecular haplotype frequencies in a diploid population. Mol. Biol. Evol., 12, 921927[Abstract].
Felsenstein, J. (1993) PHYLIP (phylogeny Inference Package), version 3.5c. , Seattle Department of Genetics, University of Washington.
Hudson, R.R. (1983) Properties of a neutral allele model with intragenic recombination. Theor. Pop. Biol., 23, 183201[CrossRef][Web of Science][Medline].
Hudson, R.R. and Kaplan, N. (1990) Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics, 111, 147164.
Lewis, P.O. and Zaykin, D. (2001) Genetic data analysis: computer program for the analysis of allelic data. Version 1.0.
Posada, D. and Wiuf, C. (2003) Simulating haplotype blocks in the human genome. Bioinformatics, 19, 289290
Rohde, K. and Fuerst, R. (2001) Haplotyping and estimation of haplotype frequencies for closely linked biallelic multilocus genetic phenotypes including nuclear family information. Hum. Mutat., 17, 289295[CrossRef][Web of Science][Medline].
Zaykin, D.V., Westfall, P.H., Young, S.S., Karnoub, M.C., Wagner, M.J., Ehm, M.G. (2002) Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals. Hum. Hered., 53, 7991[Web of Science][Medline].
This article has been cited by other articles:
![]() |
G. Wang, K. L. Spencer, B. L. Court, L. M. Olson, W. K. Scott, J. L. Haines, and M. A. Pericak-Vance Localization of Age-Related Macular Degeneration-Associated ARMS2 in Cytosol, Not Mitochondria Invest. Ophthalmol. Vis. Sci., July 1, 2009; 50(7): 3084 - 3090. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Elce, A. Boccia, G. Cardillo, S. Giordano, R. Tomaiuolo, G. Paolella, and G. Castaldo Three Novel CFTR Polymorphic Repeats Improve Segregation Analysis for Cystic Fibrosis Clin. Chem., July 1, 2009; 55(7): 1372 - 1379. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Dhliwayo, K. Pixley, A. Menkir, and M. Warburton Combining Ability, Genetic Distances, and Heterosis among Elite CIMMYT and IITA Tropical Maize Inbred Lines Crop Sci., June 26, 2009; 49(4): 1201 - 1210. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.S. Moon, J.S. Nicholson, A. Heineman, K. Lion, R. van der Hoeven, A.J. Hayes, and R.S. Lewis Changes in Genetic Diversity of U.S. Flue-Cured Tobacco Germplasm over Seven Decades of Cultivar Development Crop Sci., March 17, 2009; 49(2): 498 - 508. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Kwak, J. A. Kami, and P. Gepts The Putative Mesoamerican Domestication Center of Phaseolus vulgaris Is Located in the Lerma-Santiago Basin of Mexico Crop Sci., March 17, 2009; 49(2): 554 - 563. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C. Murray, W. L. Rooney, M. T. Hamblin, S. E. Mitchell, and S. Kresovich Sweet Sorghum Genetic Diversity and Association Mapping for Brix and Height The Plant Genome, March 1, 2009; 2(1): 48 - 62. [Abstract] [Full Text] [PDF] |
||||
![]() |
K.K. Ryckman, H.N. Simhan, M.A. Krohn, and S.M. Williams Predicting risk of bacterial vaginosis: the role of race, smoking and corticotropin-releasing hormone-related genes Mol. Hum. Reprod., February 1, 2009; 15(2): 131 - 137. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Lipkin, K. Straus, R. T. Stein, A. Bagnato, F. Schiavini, L. Fontanesi, V. Russo, I. Medugorac, M. Foerster, J. Solkner, et al. Extensive Long-Range and Nonsyntenic Linkage Disequilibrium in Livestock Populations: Deconstruction of a Conundrum Genetics, February 1, 2009; 181(2): 691 - 699. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Sztankoova, J. Kysel'ova, T. Kott, and E. Kottova Technical Note: Detection of the C Allele of {beta}-Casein (CSN2) in Czech Dairy Goat Breeds Using LightCycler Analysis J Dairy Sci, October 1, 2008; 91(10): 4053 - 4057. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. L. Weber, W. H. Briggs, J. Rucker, B. M. Baltazar, J. de Jesus Sanchez-Gonzalez, P. Feng, E. S. Buckler, and J. Doebley The Genetic Architecture of Complex Traits in Teosinte (Zea mays ssp. parviglumis): New Evidence From Association Mapping Genetics, October 1, 2008; 180(2): 1221 - 1232. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Jones, F. J. Leigh, I. Mackay, M. A. Bower, L. M.J. Smith, M. P. Charles, G. Jones, M. K. Jones, T. A. Brown, and W. Powell Population-Based Resequencing Reveals That the Flowering Time Adaptation of Cultivated Barley Originated East of the Fertile Crescent Mol. Biol. Evol., October 1, 2008; 25(10): 2211 - 2219. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. J. Wisser, S. C. Murray, J. M. Kolkman, H. Ceballos, and R. J. Nelson Selection Mapping of Loci for Quantitative Disease Resistance in a Diverse Maize Population Genetics, September 1, 2008; 180(1): 583 - 599. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. M. Ciarleglio, K. K. Ryckman, S. V. Servick, A. Hida, S. Robbins, N. Wells, J. Hicks, S. A. Larson, J. P. Wiedermann, K. Carver, et al. Genetic Differences in Human Circadian Clock Genes among Worldwide Populations J Biol Rhythms, August 1, 2008; 23(4): 330 - 340. [Abstract] [PDF] |
||||
![]() |
M Mamtani, B Rovin, R Brey, J F Camargo, H Kulkarni, M Herrera, P Correa, S Holliday, J-M Anaya, and S K Ahuja CCL3L1 gene-containing segmental duplications and polymorphisms in CCR5 affect risk of systemic lupus erythaematosus Ann Rheum Dis, August 1, 2008; 67(8): 1076 - 1083. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. L. Spencer, L. M. Olson, B. M. Anderson, N. Schnetz-Boutaud, W. K. Scott, P. Gallins, A. Agarwal, E. A. Postel, M. A. Pericak-Vance, and J. L. Haines C3 R102G polymorphism increases risk of age-related macular degeneration Hum. Mol. Genet., June 15, 2008; 17(12): 1821 - 1824. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. R. Velez, S. J. Fortunato, S. M. Williams, and R. Menon Interleukin-6 (IL-6) and receptor (IL6-R) gene haplotypes associate with amniotic fluid protein concentrations in preterm birth Hum. Mol. Genet., June 1, 2008; 17(11): 1619 - 1630. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. A. Brevis, N. V. Bassil, J. R. Ballington, and J. F. Hancock Impact of Wide Hybridization on Highbush Blueberry Breeding J. Amer. Soc. Hort. Sci., May 1, 2008; 133(3): 427 - 437. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Condon, C. Gustus, D. C. Rasmusson, and K. P. Smith Effect of Advanced Cycle Breeding on Genetic Diversity in Barley Breeding Germplasm Crop Sci., May 1, 2008; 48(3): 1027 - 1036. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Saeidi, M. R. Rahiminejad, and J. S. Heslop-Harrison Retroelement Insertional Polymorphisms, Diversity and Phylogeography within Diploid, D-genome Aegilops tauschii (Triticeae, Poaceae) Sub-taxa in Iran Ann. Bot., April 1, 2008; 101(6): 855 - 861. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-D. Lee, J.-K. Yu, Y.-H. Hwang, S. Blake, Y.-S. So, G.-J. Lee, H. T. Nguyen, and J. G. Shannon Genetic Diversity of Wild Soybean (Glycine soja Sieb. and Zucc.) Accessions from South Korea and Other Countries Crop Sci., March 19, 2008; 48(2): 606 - 616. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. I. Schwarz, M. D. Ritchie, Y. Bradford, C. Li, S. M. Dudek, A. Frye-Anderson, R. B. Kim, D. M. Roden, and C. M. Stein Genetic Determinants of Response to Warfarin during Initial Anticoagulation N. Engl. J. Med., March 6, 2008; 358(10): 999 - 1008. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. Casa, G. Pressoir, P. J. Brown, S. E. Mitchell, W. L. Rooney, M. R. Tuinstra, C. D. Franks, and S. Kresovich Community Resources and Strategies for Association Mapping in Sorghum Crop Sci., January 16, 2008; 48(1): 30 - 40. [Abstract] [Full Text] [PDF] |
||||
![]() |
A.-C. Thuillet, M. I. Tenaillon, L. K. Anderson, S. E. Mitchell, S. Kresovich, S. M. Stack, B. Gaut, and J. Doebley A Weak Effect of Background Selection on Trinucleotide Microsatellites in Maize J. Hered., January 1, 2008; 99(1): 45 - 55. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Weber, R. M. Clark, L. Vaughn, J. de Jesus Sanchez-Gonzalez, J. Yu, B. S. Yandell, P. Bradbury, and J. Doebley Major Regulatory Genes in Maize Contribute to Standing Variation in Teosinte (Zea mays ssp. parviglumis) Genetics, December 1, 2007; 177(4): 2349 - 2359. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Andreescu, S. Avendano, S. R. Brown, A. Hassen, S. J. Lamont, and J. C. M. Dekkers Linkage Disequilibrium in Related Breeding Lines of Chickens Genetics, December 1, 2007; 177(4): 2161 - 2169. [Abstract] [Full Text] [PDF] |
||||
![]() |
E Van Eyken, G Van Camp, E Fransen, V Topsakal, J J Hendrickx, K Demeester, P Van de Heyning, E Maki-Torkko, S Hannula, M Sorri, et al. Contribution of the N-acetyltransferase 2 polymorphism NAT2*6A to age-related hearing impairment J. Med. Genet., September 1, 2007; 44(9): 570 - 578. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. R. Kottapalli, M. D. Burow, G. Burow, J. Burke, and N. Puppala Molecular Characterization of the U.S. Peanut Mini Core Collection Using Microsatellite Markers Crop Sci., July 30, 2007; 47(4): 1718 - 1727. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Chao, W. Zhang, J. Dubcovsky, and M. Sorrells Evaluation of Genetic Diversity and Genome-wide Linkage Disequilibrium among U.S. Wheat (Triticum aestivum L.) Germplasm Representing Different Market Classes Crop Sci., May 31, 2007; 47(3): 1018 - 1030. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Cauci, M. Di Santolo, G. Casabellata, K. Ryckman, S. M. Williams, and S. Guaschino Association of interleukin-1{beta} and interleukin-1 receptor antagonist polymorphisms with bacterial vaginosis in non-pregnant Italian women Mol. Hum. Reprod., April 1, 2007; 13(4): 243 - 250. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Jayamani, S. Negrao, M. Martins, B. Macas, and M. M. Oliveira Genetic Relatedness of Portuguese Rice Accessions from Diverse Origins as Assessed by Microsatellite Markers Crop Sci., March 1, 2007; 47(2): 879 - 884. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Lazo-Langner, G. A. Knoll, P. S. Wells, N. Carson, and M. A. Rodger The risk of dialysis access thrombosis is related to the transforming growth factor-beta1 production haplotype and is modified by polymorphisms in the plasminogen activator inhibitor-type 1 gene Blood, December 15, 2006; 108(13): 4052 - 4058. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Zhang, S. Mischke, R. Goenaga, A. A. Hemeida, and J. A. Saunders Accuracy and Reliability of High-Throughput Microsatellite Genotyping for Cacao Clone Identification Crop Sci., September 8, 2006; 46(5): 2084 - 2092. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Pariset, I. Cappuccio, P. Ajmone-Marsan, M. Bruford, S. Dunner, O. Cortes, G. Erhardt, E.-M. Prinzenberg, K. Gutscher, S. Joost, et al. Characterization of 37 Breed-Specific Single-Nucleotide Polymorphisms in Sheep J. Hered., September 1, 2006; 97(5): 531 - 534. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Zhang, X. Luo, H. R. Kranzler, J. Lappalainen, B.-Z. Yang, E. Krupitsky, E. Zvartau, and J. Gelernter Association between two {micro}-opioid receptor gene (OPRM1) haplotype blocks and drug or alcohol dependence Hum. Mol. Genet., March 15, 2006; 15(6): 807 - 819. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
















