Bioinformatics Advance Access originally published online on June 22, 2007
Bioinformatics 2007 23(19):2633-2635; doi:10.1093/bioinformatics/btm308
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
TASSEL: software for association mapping of complex traits in diverse samples
,*

1United States Department of Agriculture-Agricultural Research Service, 2Institute for Genomic Diversity, Cornell University, Ithaca, New York and 3Cisco Systems, RTP, NC, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: Association analyses that exploit the natural diversity of a genome to map at very high resolutions are becoming increasingly important. In most studies, however, researchers must contend with the confounding effects of both population and family structure. TASSEL (Trait Analysis by aSSociation, Evolution and Linkage) implements general linear model and mixed linear model approaches for controlling population and family structure. For result interpretation, the program allows for linkage disequilibrium statistics to be calculated and visualized graphically. Database browsing and data importation is facilitated by integrated middleware. Other features include analyzing insertions/deletions, calculating diversity statistics, integration of phenotypic and genotypic data, imputing missing data and calculating principal components.
Availability: The TASSEL executable, user manual, example data sets and tutorial document are freely available at http://www.maizegenetics.net/tassel. The source code for TASSEL can be found at http://sourceforge.net/projects/tassel.
Contact: pjb39{at}cornell.edu
| 1 INTRODUCTION |
|---|
|
|
|---|
With advances in genotyping technology, including rapid increases in the number of genetic markers available for quantitative trait loci (QTL) studies (Churchill et al., 2004), association analysis has become a viable approach for the dissection of complex traits. A key issue in developing methods for analyzing association data is controlling false positives that arise from population and family structure. One widely used approach, structured association (Pritchard et al., 2000; Thornsberry et al., 2001), was first implemented in TASSEL (Trait Analysis by aSSociation, Evolution and Linkage) to reduce the risk of false positives arising from population structure. More recently, a unified mixed model method was developed which improves on the previous method by integrating population structure and family relatedness within populations (Yu et al., 2006). TASSEL reflects these improvements and offers a variety of data manipulation and results displays. Plant, animal or human geneticists and breeders interested in performing association analysis will find this software useful.
| 2 ASSOCIATION TOOLS |
|---|
|
|
|---|
Because complex traits are usually controlled by multiple QTL, the primary goal of QTL mapping is to find associated markers for each QTL. To achieve this goal, many association studies fit markers into a linear model as fixed effects. Any QTL not associated with a marker will contribute to the residual error, thereby inflating the error term and reducing statistical power. To compound matters further, QTL not physically linked to a marker may cause spurious associations between phenotypes and that marker due to factors, such as selection, population admixture or family structure. A structured association approach can partially correct for these false associations by using a Q-matrix of population membership estimates. Additional improvement can be made by incorporating multiple background QTL as a random effect in a mixed model, which takes into account covariances due to relatedness. The average relationship between individuals or lines can be estimated by kinship (K) calculated either from pedigrees or a suitable number of random markers across the entire genome. A composite approach, Q + K, that combines information from both Q and K, has been shown to be superior (Yu et al., 2006) to these former methods.
The Q method for structured association analysis was implemented in TASSEL as a general linear model (GLM) function. Population membership estimates serve as covariates in the model and can be derived using programs such as STRUCTURE (Pritchard et al., 2000) or principal components analysis (PCA) (Zhao et al., 2007). For each marker-trait combination, GLM finds the ordinary least squares solution as described in Searle (1987). The model can include main effects, interactions, nested effects and covariates. The user can specify F-tests to be calculated, permutations tests to be run and estimates of model effect means to be output.
The test of significance derived from an F-distribution assumes that the trait being analyzed has normally distributed residual error. When this is not the case, TASSEL provides two options. The first option provides some transformation functions which may produce roughly normal error terms, while the second utilizes a permutation test to generate P-values that are not distribution dependent. Algorithms for conducting such permutation tests are based on the formulation by Anderson and Ter Braak (Anderson and Ter Braak, 2003).
The Q + K method was implemented in TASSEL as a mixed linear model (MLM) function. The statistical model can be described in Henderson's notation (Henderson, 1975) as follows:
|
|
|
|
| 3 OTHER FEATURES |
|---|
|
|
|---|
In addition to providing association tools, TASSEL permits the analysis of diversity estimates such as average pairwise divergence (
) and segregating sites. Linkage disequilibrium is estimated by the standardized disequilibrium coefficient, D', as well as r2 and P-values. TASSEL also includes a variety of data extraction utilities and visualization tools, such as a sequence alignment viewer, extraction of SNPs and indels from alignments, neighbor-joining cladogram and a variety of data graphing functions. TASSEL contains several useful data management functions. In addition to importing data in flat file formats, a data browser from the GDPC (Genomic Diversity and Phenotype Connection) project (Casstevens and Buckler, 2004) has been integrated into TASSEL to provide an interface to relational databases. GDPC can utilize multiple data sources, retrieve filtered data and export tab-delimited text files. TASSEL can merge data from different sources into a single analysis data set, impute missing data using a k-nearest-neighbor algorithm (Cover and Hart, 1967) and conduct PCA to reduce a set of correlated phenotypes. Some of these features are shown in Figure 1.
|
| 4 IMPLEMENTATION |
|---|
|
|
|---|
This software package was developed in Java, making it com-patible with multiple platforms (e.g. Windows, Mac and Linux). The package uses the standard PAL library (http://iubio.bio.indiana.edu/soft/molbio/java/pal/doc/), the COLT library (http://dsd.lbl.gov/~hoschek/colt/) and jFreeChart (http://www.jfree.org/jfreechart/). Database access is achieved by GDPC middleware (http://www.maizegenetics.net/gdpc).
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
This project is supported by the USDA-ARS and the National Science Foundation. Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the US Department of Agriculture.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Martin Bishop
The authors wish it to be known that, in their opinion, the first three authors should be regarded as joint First Authors. ![]()
Received on March 26, 2007; revised on May 11, 2007; accepted on June 2, 2007
| REFERENCES |
|---|
|
|
|---|
Casstevens TM, Buckler ES. GDPC: connecting researchers with multiple integrated data sources. Bioinformatics (2004) 20:2839–2840.
Churchill G, et al. The collaborative cross, a community resource for the genetic analysis of complex traits. Nat. Genet. (2004) 36:1133–1137.[CrossRef][Web of Science][Medline]
Cover T, Hart P. Nearest neighbor pattern classification. Proc. IEEE Trans. Inform. Theory (1967) 13.
Hardy OJ, Vekemans X. SPAGEDi: a versatile computer program to analyze spatial genetic structure at the individual or population levels. Mol. Ecol. Notes (2002) 2:618–620.[CrossRef][Web of Science]
Henderson CR. Best linear unbiased estimation and prediction under a selection model. Biometrics (1975) 31:423–447.[CrossRef][Web of Science][Medline]
Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics (1982) 38:963–974.[CrossRef][Web of Science][Medline]
Pritchard JK, et al. Association mapping in structured populations. Am. J. Hum. Genet. (2000) 67:170–181.[CrossRef][Web of Science][Medline]
SAS Intstitute Inc. SAS/STAT software, version 9 (2002) Cary, NC, USA: SAS Institute, Inc.
Thornsberry JM, et al. Dwarf8 polymorphisms associate with variation in flowering time. Nat. Genet. (2001) 28:286–289.[CrossRef][Web of Science][Medline]
Yu JM, et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. (2006) 38:203–208.[CrossRef][Web of Science][Medline]
Zhao K, et al. An Arabidopsis example of association mapping in structured samples. PLoS Genet (2007) 3:e4. doi:10.1371/journal.pgen.0030004.[CrossRef][Medline]
This article has been cited by other articles:
![]() |
Z. Zhang, E. S. Buckler, T. M. Casstevens, and P. J. Bradbury Software engineering the mixed model for genome-wide association studies on large samples Brief Bioinform, November 1, 2009; 10(6): 664 - 675. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. R. Thumma, B. A. Matheson, D. Zhang, C. Meeske, R. Meder, G. M. Downes, and S. G. Southerton Identification of a Cis-Acting Regulatory Polymorphism in a Eucalypt COBRA-Like Gene Affecting Cellulose Content Genetics, November 1, 2009; 183(3): 1153 - 1164. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Gasbarra, M. Pirinen, M. J. Sillanpaa, and E. Arjas Bayesian Quantitative Trait Locus Mapping Based on Reconstruction of Recent Genetic Histories Genetics, October 1, 2009; 183(2): 709 - 721. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Sulpice, E.-T. Pyl, H. Ishihara, S. Trenkamp, M. Steinfath, H. Witucka-Wall, Y. Gibon, B. Usadel, F. Poree, M. C. Piques, et al. Starch as a major integrator in the regulation of plant growth PNAS, June 23, 2009; 106(25): 10348 - 10353. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Ni, A. Pujar, K. Youens-Clark, I. Yap, P. Jaiswal, I. Tecle, C.-W. Tung, L. Ren, W. Spooner, X. Wei, et al. Gramene QTL database: development, content and applications Database, May 12, 2009; 2009(0): bap005 - bap005. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Manicacci, L. Camus-Kulandaivelu, M. Fourmann, C. Arar, S. Barrault, A. Rousselet, N. Feminias, L. Consoli, L. Frances, V. Mechin, et al. Epistatic Interactions between Opaque2 Transcriptional Activator and Its Target Gene CyPPDK1 Control Kernel Trait Variation in Maize Plant Physiology, May 1, 2009; 150(1): 506 - 520. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C. Murray, W. L. Rooney, M. T. Hamblin, S. E. Mitchell, and S. Kresovich Sweet Sorghum Genetic Diversity and Association Mapping for Brix and Height The Plant Genome, March 1, 2009; 2(1): 48 - 62. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y.-B. Fu and D. J. Somers Genome-Wide Reduction of Genetic Diversity in Wheat Breeding Crop Sci., January 28, 2009; 49(1): 161 - 168. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Camus-Kulandaivelu, L.-M. Chevin, C. Tollon-Cordet, A. Charcosset, D. Manicacci, and M. I. Tenaillon Patterns of Molecular Evolution Associated With Two Selective Sweeps in the Tb1-Dwarf8 Region in Maize Genetics, October 1, 2008; 180(2): 1107 - 1121. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. J. Brown, W. L. Rooney, C. Franks, and S. Kresovich Efficient Mapping of Plant Height Quantitative Trait Loci in a Sorghum Association Population With Introgressed Dwarfing Genes Genetics, September 1, 2008; 180(1): 629 - 637. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Zhu, M. Gore, E. S. Buckler, and J. Yu Status and Prospects of Association Mapping in Plants The Plant Genome, July 1, 2008; 1(1): 5 - 20. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Ducrocq, D. Madur, J.-B. Veyrieras, L. Camus-Kulandaivelu, M. Kloiber-Maitz, T. Presterl, M. Ouzunova, D. Manicacci, and A. Charcosset Key Impact of Vgt1 on Flowering Time Adaptation in Maize: Evidence From Association Mapping and Ecogeographical Information Genetics, April 1, 2008; 178(4): 2433 - 2437. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Jing, R. Johnson, A. Seres, G. Kiss, M. J. Ambrose, M. R. Knox, T. H. N. Ellis, and A. J. Flavell Gene-Based Sequence Diversity Analysis of Field Pea (Pisum) Genetics, December 1, 2007; 177(4): 2263 - 2275. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

). (d) LD plots with positional information.





