Skip Navigation


Bioinformatics Advance Access originally published online on June 22, 2007
Bioinformatics 2007 23(19):2633-2635; doi:10.1093/bioinformatics/btm308
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
23/19/2633    most recent
btm308v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (24)
Google Scholar
Right arrow Articles by Bradbury, P. J.
Right arrow Articles by Buckler, E. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bradbury, P. J.
Right arrow Articles by Buckler, E. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

TASSEL: software for association mapping of complex traits in diverse samples

Peter J. Bradbury 1,{dagger},*, Zhiwu Zhang 2,{dagger}, Dallas E. Kroon 2,{dagger}, Terry M. Casstevens 2, Yogesh Ramdoss 3 and Edward S. Buckler 1,2

1United States Department of Agriculture-Agricultural Research Service, 2Institute for Genomic Diversity, Cornell University, Ithaca, New York and 3Cisco Systems, RTP, NC, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ASSOCIATION TOOLS
 3 OTHER FEATURES
 4 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Summary: Association analyses that exploit the natural diversity of a genome to map at very high resolutions are becoming increasingly important. In most studies, however, researchers must contend with the confounding effects of both population and family structure. TASSEL (Trait Analysis by aSSociation, Evolution and Linkage) implements general linear model and mixed linear model approaches for controlling population and family structure. For result interpretation, the program allows for linkage disequilibrium statistics to be calculated and visualized graphically. Database browsing and data importation is facilitated by integrated middleware. Other features include analyzing insertions/deletions, calculating diversity statistics, integration of phenotypic and genotypic data, imputing missing data and calculating principal components.

Availability: The TASSEL executable, user manual, example data sets and tutorial document are freely available at http://www.maizegenetics.net/tassel. The source code for TASSEL can be found at http://sourceforge.net/projects/tassel.

Contact: pjb39{at}cornell.edu


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ASSOCIATION TOOLS
 3 OTHER FEATURES
 4 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 
With advances in genotyping technology, including rapid increases in the number of genetic markers available for quantitative trait loci (QTL) studies (Churchill et al., 2004), association analysis has become a viable approach for the dissection of complex traits. A key issue in developing methods for analyzing association data is controlling false positives that arise from population and family structure. One widely used approach, structured association (Pritchard et al., 2000; Thornsberry et al., 2001), was first implemented in TASSEL (Trait Analysis by aSSociation, Evolution and Linkage) to reduce the risk of false positives arising from population structure. More recently, a unified mixed model method was developed which improves on the previous method by integrating population structure and family relatedness within populations (Yu et al., 2006). TASSEL reflects these improvements and offers a variety of data manipulation and results displays. Plant, animal or human geneticists and breeders interested in performing association analysis will find this software useful.


    2 ASSOCIATION TOOLS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ASSOCIATION TOOLS
 3 OTHER FEATURES
 4 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Because complex traits are usually controlled by multiple QTL, the primary goal of QTL mapping is to find associated markers for each QTL. To achieve this goal, many association studies fit markers into a linear model as fixed effects. Any QTL not associated with a marker will contribute to the residual error, thereby inflating the error term and reducing statistical power. To compound matters further, QTL not physically linked to a marker may cause spurious associations between phenotypes and that marker due to factors, such as selection, population admixture or family structure. A structured association approach can partially correct for these false associations by using a Q-matrix of population membership estimates. Additional improvement can be made by incorporating multiple background QTL as a random effect in a mixed model, which takes into account covariances due to relatedness. The average relationship between individuals or lines can be estimated by kinship (K) calculated either from pedigrees or a suitable number of random markers across the entire genome. A composite approach, Q + K, that combines information from both Q and K, has been shown to be superior (Yu et al., 2006) to these former methods.

The Q method for structured association analysis was implemented in TASSEL as a general linear model (GLM) function. Population membership estimates serve as covariates in the model and can be derived using programs such as STRUCTURE (Pritchard et al., 2000) or principal components analysis (PCA) (Zhao et al., 2007). For each marker-trait combination, GLM finds the ordinary least squares solution as described in Searle (1987). The model can include main effects, interactions, nested effects and covariates. The user can specify F-tests to be calculated, permutations tests to be run and estimates of model effect means to be output.

The test of significance derived from an F-distribution assumes that the trait being analyzed has normally distributed residual error. When this is not the case, TASSEL provides two options. The first option provides some transformation functions which may produce roughly normal error terms, while the second utilizes a permutation test to generate P-values that are not distribution dependent. Algorithms for conducting such permutation tests are based on the formulation by Anderson and Ter Braak (Anderson and Ter Braak, 2003).

The Q + K method was implemented in TASSEL as a mixed linear model (MLM) function. The statistical model can be described in Henderson's notation (Henderson, 1975) as follows:


Formula

where y is the vector of observations; ß is an unknown vector containing fixed effects including genetic marker and population structure (Q); u is an unknown vector of random additive genetic effects from multiple background QTL for individuals or lines; X and Z are the known design matrices; and e is the unobserved vector of random residuals. Each marker allele is fit as a separate class with heterozygotes fit as additional marker classes. The resulting marker effect is not decomposed into additive and dominance effects but simply tested for overall significance. The u and e vectors are assumed to be normally distributed with null mean and variance of


Formula

where Formula with Formula as the unknown additive genetic variance and K as the kinship matrix. TASSEL provides a function to estimate K from a set of random markers covering the whole genome. TASSEL also provides a function to import matrix K calculated externally from pedigrees by using SAS PROC INBREED (SAS, 2002) or from markers by using software packages such as SPAGedi (Hardy and Vekemans, 2002). Homogeneous variance is assumed for the residual effect, making Formula , where Formula is the unknown residual variance. The REML estimates of Formula and Formula are obtained through the expectation and maximization (EM) algorithm (Laird and Ware, 1982).


    3 OTHER FEATURES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ASSOCIATION TOOLS
 3 OTHER FEATURES
 4 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 
In addition to providing association tools, TASSEL permits the analysis of diversity estimates such as average pairwise divergence ({pi}) and segregating sites. Linkage disequilibrium is estimated by the standardized disequilibrium coefficient, D', as well as r2 and P-values. TASSEL also includes a variety of data extraction utilities and visualization tools, such as a sequence alignment viewer, extraction of SNPs and indels from alignments, neighbor-joining cladogram and a variety of data graphing functions.

TASSEL contains several useful data management functions. In addition to importing data in flat file formats, a data browser from the GDPC (Genomic Diversity and Phenotype Connection) project (Casstevens and Buckler, 2004) has been integrated into TASSEL to provide an interface to relational databases. GDPC can utilize multiple data sources, retrieve filtered data and export tab-delimited text files. TASSEL can merge data from different sources into a single analysis data set, impute missing data using a k-nearest-neighbor algorithm (Cover and Hart, 1967) and conduct PCA to reduce a set of correlated phenotypes. Some of these features are shown in Figure 1.


Figure 1
View larger version (99K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Illustration of features in TASSEL. (a) Main interface with data (default), analysis and result control panels. (b) Plotting trees to understand phylogenetic relationships. (c) Diversity estimates (silent, non-synonymous, indel and synonymous of gene, {pi} and {theta}). (d) LD plots with positional information.

 

    4 IMPLEMENTATION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ASSOCIATION TOOLS
 3 OTHER FEATURES
 4 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 
This software package was developed in Java, making it com-patible with multiple platforms (e.g. Windows, Mac and Linux). The package uses the standard PAL library (http://iubio.bio.indiana.edu/soft/molbio/java/pal/doc/), the COLT library (http://dsd.lbl.gov/~hoschek/colt/) and jFreeChart (http://www.jfree.org/jfreechart/). Database access is achieved by GDPC middleware (http://www.maizegenetics.net/gdpc).


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ASSOCIATION TOOLS
 3 OTHER FEATURES
 4 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 
This project is supported by the USDA-ARS and the National Science Foundation. Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the US Department of Agriculture.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Martin Bishop

{dagger}The authors wish it to be known that, in their opinion, the first three authors should be regarded as joint First Authors. Back

Received on March 26, 2007; revised on May 11, 2007; accepted on June 2, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ASSOCIATION TOOLS
 3 OTHER FEATURES
 4 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Casstevens TM, Buckler ES. GDPC: connecting researchers with multiple integrated data sources. Bioinformatics (2004) 20:2839–2840.[Abstract/Free Full Text]

    Churchill G, et al. The collaborative cross, a community resource for the genetic analysis of complex traits. Nat. Genet. (2004) 36:1133–1137.[CrossRef][Web of Science][Medline]

    Cover T, Hart P. Nearest neighbor pattern classification. Proc. IEEE Trans. Inform. Theory (1967) 13.

    Hardy OJ, Vekemans X. SPAGEDi: a versatile computer program to analyze spatial genetic structure at the individual or population levels. Mol. Ecol. Notes (2002) 2:618–620.[CrossRef][Web of Science]

    Henderson CR. Best linear unbiased estimation and prediction under a selection model. Biometrics (1975) 31:423–447.[CrossRef][Web of Science][Medline]

    Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics (1982) 38:963–974.[CrossRef][Web of Science][Medline]

    Pritchard JK, et al. Association mapping in structured populations. Am. J. Hum. Genet. (2000) 67:170–181.[CrossRef][Web of Science][Medline]

    SAS Intstitute Inc. SAS/STAT software, version 9 (2002) Cary, NC, USA: SAS Institute, Inc.

    Thornsberry JM, et al. Dwarf8 polymorphisms associate with variation in flowering time. Nat. Genet. (2001) 28:286–289.[CrossRef][Web of Science][Medline]

    Yu JM, et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. (2006) 38:203–208.[CrossRef][Web of Science][Medline]

    Zhao K, et al. An Arabidopsis example of association mapping in structured samples. PLoS Genet (2007) 3:e4. doi:10.1371/journal.pgen.0030004.[CrossRef][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Brief BioinformHome page
Z. Zhang, E. S. Buckler, T. M. Casstevens, and P. J. Bradbury
Software engineering the mixed model for genome-wide association studies on large samples
Brief Bioinform, November 1, 2009; 10(6): 664 - 675.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
B. R. Thumma, B. A. Matheson, D. Zhang, C. Meeske, R. Meder, G. M. Downes, and S. G. Southerton
Identification of a Cis-Acting Regulatory Polymorphism in a Eucalypt COBRA-Like Gene Affecting Cellulose Content
Genetics, November 1, 2009; 183(3): 1153 - 1164.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
D. Gasbarra, M. Pirinen, M. J. Sillanpaa, and E. Arjas
Bayesian Quantitative Trait Locus Mapping Based on Reconstruction of Recent Genetic Histories
Genetics, October 1, 2009; 183(2): 709 - 721.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
R. Sulpice, E.-T. Pyl, H. Ishihara, S. Trenkamp, M. Steinfath, H. Witucka-Wall, Y. Gibon, B. Usadel, F. Poree, M. C. Piques, et al.
Starch as a major integrator in the regulation of plant growth
PNAS, June 23, 2009; 106(25): 10348 - 10353.
[Abstract] [Full Text] [PDF]


Home page
DatabaseHome page
J. Ni, A. Pujar, K. Youens-Clark, I. Yap, P. Jaiswal, I. Tecle, C.-W. Tung, L. Ren, W. Spooner, X. Wei, et al.
Gramene QTL database: development, content and applications
Database, May 12, 2009; 2009(0): bap005 - bap005.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
D. Manicacci, L. Camus-Kulandaivelu, M. Fourmann, C. Arar, S. Barrault, A. Rousselet, N. Feminias, L. Consoli, L. Frances, V. Mechin, et al.
Epistatic Interactions between Opaque2 Transcriptional Activator and Its Target Gene CyPPDK1 Control Kernel Trait Variation in Maize
Plant Physiology, May 1, 2009; 150(1): 506 - 520.
[Abstract] [Full Text] [PDF]


Home page
The Plant GenomeHome page
S. C. Murray, W. L. Rooney, M. T. Hamblin, S. E. Mitchell, and S. Kresovich
Sweet Sorghum Genetic Diversity and Association Mapping for Brix and Height
The Plant Genome, March 1, 2009; 2(1): 48 - 62.
[Abstract] [Full Text] [PDF]


Home page
Crop Sci.Home page
Y.-B. Fu and D. J. Somers
Genome-Wide Reduction of Genetic Diversity in Wheat Breeding
Crop Sci., January 28, 2009; 49(1): 161 - 168.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
L. Camus-Kulandaivelu, L.-M. Chevin, C. Tollon-Cordet, A. Charcosset, D. Manicacci, and M. I. Tenaillon
Patterns of Molecular Evolution Associated With Two Selective Sweeps in the Tb1-Dwarf8 Region in Maize
Genetics, October 1, 2008; 180(2): 1107 - 1121.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
P. J. Brown, W. L. Rooney, C. Franks, and S. Kresovich
Efficient Mapping of Plant Height Quantitative Trait Loci in a Sorghum Association Population With Introgressed Dwarfing Genes
Genetics, September 1, 2008; 180(1): 629 - 637.
[Abstract] [Full Text] [PDF]


Home page
The Plant GenomeHome page
C. Zhu, M. Gore, E. S. Buckler, and J. Yu
Status and Prospects of Association Mapping in Plants
The Plant Genome, July 1, 2008; 1(1): 5 - 20.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
S. Ducrocq, D. Madur, J.-B. Veyrieras, L. Camus-Kulandaivelu, M. Kloiber-Maitz, T. Presterl, M. Ouzunova, D. Manicacci, and A. Charcosset
Key Impact of Vgt1 on Flowering Time Adaptation in Maize: Evidence From Association Mapping and Ecogeographical Information
Genetics, April 1, 2008; 178(4): 2433 - 2437.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
R. Jing, R. Johnson, A. Seres, G. Kiss, M. J. Ambrose, M. R. Knox, T. H. N. Ellis, and A. J. Flavell
Gene-Based Sequence Diversity Analysis of Field Pea (Pisum)
Genetics, December 1, 2007; 177(4): 2263 - 2275.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
23/19/2633    most recent
btm308v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (24)
Google Scholar
Right arrow Articles by Bradbury, P. J.
Right arrow Articles by Buckler, E. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bradbury, P. J.
Right arrow Articles by Buckler, E. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?