Skip Navigation


Bioinformatics Advance Access originally published online on June 9, 2005
Bioinformatics 2005 21(16):3445-3447; doi:10.1093/bioinformatics/bti529
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
21/16/3445    most recent
bti529v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (124)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Wigginton, J. E.
Right arrow Articles by Abecasis, G. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Wigginton, J. E.
Right arrow Articles by Abecasis, G. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

PEDSTATS: descriptive statistics, graphics and quality assessment for gene mapping data

Janis E. Wigginton * and Gonçalo R. Abecasis

Center for Statistical Genetics, Department of Biostatistics, School of Public Health, University of Michigan Ann Arbor, MI 48103, USA

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 REFERENCES
 

Summary: We describe a tool that produces summary statistics and basic quality assessments for gene-mapping data, accommodating either pedigree or case-control datasets. Our tool can also produce graphic output in the PDF format.

Availability: http://www.sph.umich.edu/csg/abecasis/Pedstats/download/

Contact: wiggie{at}umich.edu

Supplementary information: http://www.sph.umich.edu/csg/abecasis/Pedstats/

A crucial first step in the analysis of gene mapping data is the careful description of the available data, including, for example, genotyping completeness and heterozygosities for genetic markers, and distributions and familial correlations for quantitative traits. Although a number of programs now provide some facilities for data checking or summary (Mukhopadhyay et al., 2005; Lange et al., 1988; Elston et al., 2004; O'Connell et al., 1998) complete screening and summary of genetic data frequently involves the use of multiple programs and/or in-house tools. As the scale of the datasets available for analysis increases, this process can become particularly challenging. For example, with the advent of high-throughput single nucleotide polymorphism genotyping technologies, datasets will soon be available that includes genotypes for hundreds of thousands or millions of markers for each individual. In addition, with the focus on uncovering the genetic basis of complex disease, it is likely that collaborative projects will collect samples with hundreds or thousands of phenotypes each measured on thousands of individuals. We have developed PEDSTATS, a freely available utility, for summarizing salient features and performing basic quality checks on gene-mapping data. Our utility can conveniently handle these very large datasets and here we summarize its main features.

PEDSTATS runs on any platform where a modern C++ compiler is available, including those based on the Linux, UNIX, Windows and Mac OS X operating systems. It is a command-line utility that can produce both text output to the console and graphical output to a PDF file. Its major capabilities can be grouped into four areas: (1) checks of input formats and pedigree consistency, (2) checks and descriptions of genetic marker data, (3) checks and descriptions of quantitative traits and covariates and (4) descriptions of discrete traits. We describe each of these in turn below.

The first step in any analysis is the validation of input files. At this stage, common data-format errors such as missing or extraneous columns are reported. Next, the reported family structures are validated to ensure that all connecting individuals are present and that sex-codes are consistent for the various individuals. If desired, large pedigrees can be trimmed to remove uninformative individuals with no phenotype or genotype data, or separated into disconnected family units. A brief summary of the number of pedigrees, individuals and a distribution of individuals per family is produced. This information can be graphically summarized (Fig. 1A is an example summarizing the distribution of family sizes in one large dataset) and, optionally, includes counts for various types of relative pairs which can be further broken down by sex. Individuals with no phenotype or genotype information can be automatically removed and a new set of input files generated. PEDSTATS readily accepts files prepared for other packages we have developed, including those prepared for linkage analyses with Merlin (Abecasis et al., 2002), association analyses with QTDT (Abecasis et al., 2000) and relationship inference with GRR (Abecasis et al., 2001). Other popular formats, such as those used by the LINKAGE package (Lathrop et al., 1985) and by MENDEL and related tools (Lange et al., 1988) are also accommodated.



View larger version (37K):
[in this window]
[in a new window]
 
Fig. 1 Examples of available graphical output. (A) provides information on the distribution of family sizes; (B) summarizes the observed genotype distribution and the exact distribution of heterozygotes conditional on observed allele counts; (C) provides information on the distribution of a quantitative trait; and (D) summarizes relative pair correlations. More detailed descriptions and examples are available on our website.

 
When verifying genetic marker data, PEDSTATS reports basic statistics like heterozygosity and genotyping completeness and can produce graphical summaries of allele and genotype frequencies. After automatic grouping of rare alleles, conformance of observed genotypes with Hardy–Weinberg equilibrium can be checked with a {chi}2 test for multi-allelic markers or an exact test for bi-allelic markers (Wigginton et al., 2005). Results of Hardy–Weinberg tests, including an exact distribution for the number of heterozygotes in the sample, can be presented graphically (e.g. Fig. 1B). Mendelian inheritance checks for both autosomal and X-linked marker data are also carried out using a genotype elimination algorithm that finds all inconsistencies in pedigrees without loops (Lange and Goradia, 1987; O'Connell and Weeks, 1999). Verifying Mendelian consistency prior to analysis of genetic marker data can be a crucial step (Lange and Goradia, 1987; O'Connell and Weeks, 1998), since most genetic analysis programs do not model genotyping error explicitly (for an exception, see Sobel et al., 2002).

For quantitative traits and covariates, PEDSTATS reports the range, mean and variance of the trait distribution along with the correlation between siblings. Several graphics, including histograms of the overall trait distribution and comparisons of distributions between males and females can be generated (as illustrated in Fig. 1, Panel C which summarizes the distribution of ‘Height’ in one large dataset). These can be helpful in detecting outliers as well as detecting deviations from approximate normality, which is important for many quantitative trait analyses (Allison et al., 1999). Optionally, correlations for other relative pair types can be calculated and plotted (as illustrated in Fig. 1, Panel D, which summarizes the correlations between ‘Weight’ for different relative pairs) and stratified by sex, if desired. Correlations between relatives can provide information about the overall impact of genes on a particular trait. In the example, it is clear that correlation of the variable ‘Weight’ for first degree relatives (in this case, parent–offspring and sibling pairs) is higher than for more distant relatives (half-sibling, avuncular, grand-parent grand-child and cousin pairs). When an age variable is present, we have implemented checks to ensure that values recorded for each individual are compatible with those of their ancestors, subject to user-specified minimum and maximum generation times.

Finally, for discrete traits, PEDSTATS reports the proportion of phenotyped individuals and provides a breakdown of affected individuals. A summary of affected, unaffected and discordant pairs can also be produced, and may help guide decisions on whether a dataset contains sufficient information for an affected relative pair analysis to be carried out (Risch, 1990; Whittemore and Halpern, 1994). As with the other analysis options, discrete trait reports can be segregatedby sex.

In addition to the ability to report statistics separately for different relative pairs and segregate results by sex, PEDSTATS can produce reports for individual families and allows various filters to be applied to input data prior to analysis. For example, all analyses can be restricted to affected individuals (for a specific trait) or to individuals with a minimal amount of genotype data.

We hope our tool will prove valuable to scientists hoping to discern important features of their data, and ease the burdensome task of verifying the consistency and integrity of input formats. Executables, source code and a web-based tutorial that explains input file format, implementation details and output for various tests are available from our website.


    Acknowledgments
 
This work was supported by research grants from the National Human Genome Research Institute and the National Eye Institute.

Conflict of Interest: none declared.

Received on April 28, 2005; revised on June 5, 2005; accepted on June 6, 2005

    REFERENCES
 TOP
 Abstract
 REFERENCES
 

    Abecasis, G.R., et al. (2000) A general test of association for quantitative traits in nuclear families. Am. J. Hum. Genet.,, 66, 279–292[CrossRef][Web of Science][Medline].

    Abecasis, G.R., et al. (2001) GRR: graphical representation of relationship errors. Bioinformatics, 17, 742–743[Abstract/Free Full Text].

    Abecasis, G.R., et al. (2002) Merlin—rapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet., 30, 97–101[CrossRef][Web of Science][Medline].

    Allison, D.B., et al. (1999) Testing the robustness of the likelihood-ratio test in a variance-component quantitative-trait loci-mapping procedure. Am. J. Hum. Genet., 65, 531–544[CrossRef][Web of Science][Medline].

    Elston, R., Bailey-Wilson, J., Bonney, G., Tran, L., Keats, B., Wilson, A. (2004) SAGE Statistical Analysis for Genetic Epidemiology, Version 5.0.

    Lange, K. and Goradia, T.M. (1987) An algorithm for automatic genotype elimination. Am. J. Hum. Genet., 40, 250–256[Web of Science][Medline].

    Lange, K., et al. (1988) Programs for pedigree analysis: MENDEL, FISHER, and dGENE. Genet. Epidemiol., 5, 471–472[CrossRef][Web of Science][Medline].

    Lathrop, G.M., et al. (1985) Multilocus linkage analysis in humans: detection of linkage and estimation of recombination. Am. J. Hum. Genet., 37, 482–498[Web of Science][Medline].

    Mukhopadhyay, N., et al. (2005) Mega2: data-handling for facilitating genetic linkage and association analyses. Bioinformatics, 21, 2556–2557[Abstract/Free Full Text].

    O'Connell, J.R. and Weeks, D.E. (1998) PedCheck: a program for identification of genotype incompatibilities in linkage analysis. Am. J. Hum. Genet., 63, 259–266[CrossRef][Web of Science][Medline].

    O'Connell, J.R. and Weeks, D.E. (1999) An optimal algorithm for automatic genotype elimination. Am. J. Hum. Genet., 65, 1733–1740[CrossRef][Web of Science][Medline].

    Risch, N. (1990) Linkage strategies for genetically complex traits. II. The power of affected relative pairs. Am. J. Hum. Genet., 46, 229–241[Web of Science][Medline].

    Sobel, E., et al. (2002) Detection and integration of genotyping errors in statistical genetics. Am. J. Hum. Genet., 70, 496–508[CrossRef][Web of Science][Medline].

    Whittemore, A.S. and Halpern, J. (1994) A class of tests for linkage using affected pedigree members. Biometrics, 50, 118–127[CrossRef][Web of Science][Medline].

    Wigginton, J.E., et al. (2005) A note on exact tests of Hardy–Weinberg equilibrium. Am. J. Hum. Genet., 76, 887–893[CrossRef][Web of Science][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
DiabetesHome page
X. Li, Y.-H. Shu, A. H. Xiang, E. Trigo, J. Kuusisto, J. Hartiala, A. J. Swift, M. Kawakubo, H. M. Stringham, L. L. Bonnycastle, et al.
Additive Effects of Genetic Variation in GCK and G6PC2 on Insulin Secretion and Fasting Glucose
Diabetes, December 1, 2009; 58(12): 2946 - 2953.
[Abstract] [Full Text] [PDF]


Home page
J ANIM SCIHome page
V. Lampe, C. Dierks, K. Komm, and O. Distl
Identification of a new quantitative trait locus on equine chromosome 18 responsible for osteochondrosis in Hanoverian warmblood horses
J Anim Sci, November 1, 2009; 87(11): 3477 - 3481.
[Abstract] [Full Text] [PDF]


Home page
J. Clin. Endocrinol. Metab.Home page
Y.-H. Shu, J. Hartiala, A. H. Xiang, E. Trigo, J. M. Lawrence, H. Allayee, T. A. Buchanan, N. Bottini, and R. M. Watanabe
Evidence for Sex-Specific Associations between Variation in Acid Phosphatase Locus 1 (ACP1) and Insulin Sensitivity in Mexican-Americans
J. Clin. Endocrinol. Metab., October 1, 2009; 94(10): 4094 - 4102.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
O. Skare, N. Sheehan, and T. Egeland
Identification of distant family relationships
Bioinformatics, September 15, 2009; 25(18): 2376 - 2382.
[Abstract] [Full Text] [PDF]


Home page
J. Clin. Endocrinol. Metab.Home page
P.-H. Liu, Y.-C. Chang, Y.-D. Jiang, W. J. Chen, T.-J. Chang, S.-S. Kuo, K.-C. Lee, P.-C. Hsiao, K. C. Chiu, and L.-M. Chuang
Genetic Variants of TCF7L2 Are Associated with Insulin Resistance and Related Metabolic Phenotypes in Taiwanese Adolescents and Caucasian Young Adults
J. Clin. Endocrinol. Metab., September 1, 2009; 94(9): 3575 - 3582.
[Abstract] [Full Text] [PDF]


Home page
GutHome page
B Asling, J Jirholt, P Hammond, M Knutsson, A Walentinsson, G Davidson, L Agreus, A Lehmann, and M Lagerstrom-Fermer
Collagen type III alpha I is a gastro-oesophageal reflux disease susceptibility gene and a male risk factor for hiatus hernia
Gut, August 1, 2009; 58(8): 1063 - 1069.
[Abstract] [Full Text] [PDF]


Home page
Hum ReprodHome page
O. Valkenburg, A.G. Uitterlinden, D. Piersma, A. Hofman, A.P.N. Themmen, F.H. de Jong, B.C.J.M. Fauser, and J.S.E. Laven
Genetic polymorphisms of GnRH and gonadotrophic hormone receptors affect the phenotype of polycystic ovary syndrome
Hum. Reprod., August 1, 2009; 24(8): 2014 - 2022.
[Abstract] [Full Text] [PDF]


Home page
J HeredHome page
C. Wittwer, H. Hamann, and O. Distl
The Candidate Gene XIRP2 at a Quantitative Gene Locus on Equine Chromosome 18 Associated with Osteochondrosis in Fetlock and Hock Joints of South German Coldblood Horses
J. Hered., July 1, 2009; 100(4): 481 - 486.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
A. J. Birley, M. R. James, P. A. Dickson, G. W. Montgomery, A. C. Heath, N. G. Martin, and J. B. Whitfield
ADH single nucleotide polymorphism associations with alcohol metabolism in vivo
Hum. Mol. Genet., April 15, 2009; 18(8): 1533 - 1542.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
S. Macgregor, P. A. Lind, K. K. Bucholz, N. K. Hansell, P. A.F. Madden, M. M. Richter, G. W. Montgomery, N. G. Martin, A. C. Heath, and J. B. Whitfield
Associations of ADH and ALDH2 gene variation with self report alcohol reactions, consumption and dependence: an integrated analysis
Hum. Mol. Genet., February 1, 2009; 18(3): 580 - 593.
[Abstract] [Full Text] [PDF]


Home page
J Biomol ScreenHome page
K. K. Selmer, K. Brandal, O. K. Olstad, B. Birkenes, D. E. Undlien, and T. Egeland
Genome-wide Linkage Analysis with Clustered SNP Markers
J Biomol Screen, January 1, 2009; 14(1): 92 - 96.
[Abstract] [PDF]


Home page
HypertensionHome page
J. Palomino-Doza, T. J. Rahman, P. J. Avery, B. M. Mayosi, M. Farrall, H. Watkins, C. R.W. Edwards, and B. Keavney
Ambulatory Blood Pressure Is Associated With Polymorphic Variation in P2X Receptor Genes
Hypertension, November 1, 2008; 52(5): 980 - 985.
[Abstract] [Full Text] [PDF]


Home page
J. Lipid Res.Home page
K. Aberg, F. Dai, G. Sun, E. Keighley, S. R. Indugula, L. Bausserman, S. Viali, J. Tuitele, R. Deka, D. E. Weeks, et al.
A genome-wide linkage scan identifies multiple chromosomal regions influencing serum lipid levels in the population on the Samoan islands
J. Lipid Res., October 1, 2008; 49(10): 2169 - 2178.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
G. De Mars, A. Windelinckx, W. Huygens, M. W. Peeters, G. P. Beunen, J. Aerssens, R. Vlietinck, and M. A. I. Thomis
Genome-wide linkage scan for contraction velocity characteristics of knee musculature in the Leuven Genes for Muscular Strength Study
Physiol Genomics, September 17, 2008; 35(1): 36 - 44.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
L. A. Bremer, S. M. Blackman, L. L. Vanscoy, K. E. McDougal, A. Bowers, K. M. Naughton, D. J. Cutler, and G. R. Cutting
Interaction between a novel TGFB1 haplotype and CFTR genotype is associated with improved lung function in cystic fibrosis
Hum. Mol. Genet., July 15, 2008; 17(14): 2228 - 2237.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
J.-k. Wang, Y. Li, and B. Su
A common SNP of MCPH1 is associated with cranial volume variation in Chinese population
Hum. Mol. Genet., May 1, 2008; 17(9): 1329 - 1335.
[Abstract] [Full Text] [PDF]


Home page
DiabetesHome page
R. Do, S. D. Bailey, K. Desbiens, A. Belisle, A. Montpetit, C. Bouchard, L. Perusse, M.-C. Vohl, and J. C. Engert
Genetic Variants of FTO Influence Adiposity, Insulin Sensitivity, Leptin Levels, and Resting Metabolic Rate in the Quebec Family Study
Diabetes, April 1, 2008; 57(4): 1147 - 1150.
[Abstract] [Full Text] [PDF]


Home page
J HeredHome page
C. Wittwer, C. Dierks, H. Hamann, and O. Distl
Associations between Candidate Gene Markers at a Quantitative Trait Locus on Equine Chromosome 4 Responsible for Osteochondrosis Dissecans in Fetlock Joints of South German Coldblood Horses
J. Hered., March 1, 2008; 99(2): 125 - 129.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Physiol. Regul. Integr. Comp. Physiol.Home page
N. Gaukrodger, P. J. Avery, and B. Keavney
Plasma potassium level is associated with common genetic variation in the {beta}-subunit of the epithelial sodium channel
Am J Physiol Regulatory Integrative Comp Physiol, March 1, 2008; 294(3): R1068 - R1072.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
A. J. Birley, M. R. James, P. A. Dickson, G. W. Montgomery, A. C. Heath, J. B. Whitfield, and N. G. Martin
Association of the gastric alcohol dehydrogenase gene ADH7 with variation in alcohol metabolism
Hum. Mol. Genet., January 15, 2008; 17(2): 179 - 189.
[Abstract] [Full Text] [PDF]


Home page
NeurologyHome page
G. Dean, T. W. Yeo, A. Goris, C. J. Taylor, R. S. Goodman, M. Elian, A. Galea-Debono, A. Aquilina, A. Felice, M. Vella, et al.
HLA-DRB1 and multiple sclerosis in Malta
Neurology, January 8, 2008; 70(2): 101 - 105.
[Abstract] [Full Text] [PDF]


Home page
FocusHome page
G. Laje, S. Paddock, H. Manji, A. J. Rush, A. F. Wilson, D. Charney, and F. J. McMahon
Genetic Markers of Suicidal Ideation Emerging During Citalopram Treatment of Major Depression
Focus, January 1, 2008; 6(1): 69 - 79.
[Abstract] [Full Text] [PDF]


Home page
Int J EpidemiolHome page
M. Baker, T. Rahman, D. Hall, P. J Avery, B. M Mayosi, J. M C Connell, M. Farrall, H. Watkins, and B. Keavney
The C-532T polymorphism of the angiotensinogen gene is associated with pulse pressure: A possible explanation for heterogeneity in genetic association studies of AGT and hypertension
Int. J. Epidemiol., December 1, 2007; 36(6): 1356 - 1362.
[Abstract] [Full Text] [PDF]


Home page
Am. J. PsychiatryHome page
G. Laje, S. Paddock, H. Manji, A. J. Rush, A. F. Wilson, D. Charney, and F. J. McMahon
Genetic Markers of Suicidal Ideation Emerging During Citalopram Treatment of Major Depression
Am J Psychiatry, October 1, 2007; 164(10): 1530 - 1538.
[Abstract] [Full Text] [PDF]


Home page
Mol Hum ReprodHome page
S. A. Treloar, Z. Z. Zhao, L. Le, K. T. Zondervan, N. G. Martin, S. Kennedy, D. R. Nyholt, and G. W. Montgomery
Variants in EMX2 and PTEN do not contribute to risk of endometriosis
Mol. Hum. Reprod., August 1, 2007; 13(8): 587 - 594.
[Abstract] [Full Text] [PDF]


Home page
DiabetesHome page
R. M. Watanabe, H. Allayee, A. H. Xiang, E. Trigo, J. Hartiala, J. M. Lawrence, and T. A. Buchanan
Transcription Factor 7-Like 2 (TCF7L2) Is Associated With Gestational Diabetes Mellitus and Interacts With Adiposity to Alter Insulin Secretion in Mexican Americans
Diabetes, May 1, 2007; 56(5): 1481 - 1485.
[Abstract] [Full Text] [PDF]


Home page
J. Med. Genet.Home page
J M Andresen, J Gayan, S S Cherny, D Brocklebank, G Alkorta-Aranburu, E A Addis, The US-Venezuela Collaborative Research Group, L R Cardon, D E Housman, and N S Wexler
Replication of twelve association studies for Huntington's disease residual age of onset in large Venezuelan kindreds
J. Med. Genet., January 1, 2007; 44(1): 44 - 50.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
21/16/3445    most recent
bti529v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (124)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Wigginton, J. E.
Right arrow Articles by Abecasis, G. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Wigginton, J. E.
Right arrow Articles by Abecasis, G. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?