Bioinformatics Advance Access originally published online on January 17, 2006
Bioinformatics 2006 22(5):626-627; doi:10.1093/bioinformatics/btk025
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
TAMAL: an integrated approach to choosing SNPs for genetic studies of human complex traits
1School of Information and Library Science, University of North Carolina at Chapel Hill Chapel Hill NC, USA
2Department of Genetics, University of North Carolina at Chapel Hill Chapel Hill NC, USA
3Department of Medical Epidemiology and Biostatistics, Karolinska Institutet Stockholm, Sweden
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: Investigators conducting studies of the molecular genetics of complex traits in humans often need rationally to select a set of single nucleotide polymorphisms (SNPs) from the hundreds or thousands available for a candidate gene. Accomplishing this requires integration of genomic data from distributed databases and is both time-consuming and error-prone. We developed the TAMAL (Technology And Money Are Limiting) web site to help identify promising SNPs for further investigation. For a given list of genes, TAMAL identifies SNPs that meet user-specified criteria (e.g. haplotype tagging SNPs or SNP predicted to lead to amino acid changes) from current versions of online resources (i.e. HapMap, Perlegen, Affymetrix, dbSNP and the UCSC genome browser).
Availability: TAMAL is a platform independent web-based application available free of charge at http://neoref.ils.unc.edu/tamal
Contact: pfsulliv{at}med.unc.edu
Supplementary information: http://neoref.ils.unc.edu/tamal/
| INTRODUCTION |
|---|
|
|
|---|
Investigators conducting studies of the molecular genetics of complex traits in humans often need rationally to select a set of single nucleotide polymorphisms (SNPs) from the hundreds or thousands available for a candidate gene. For example, for a study of the genetics of type 2 diabetes mellitus, alcoholism or schizophrenia, an investigator may wish comprehensively to genotype SNP markers in dozens or even hundreds of candidate genes. With the completion of the initial sequencing of the human genome (Lander et al. 2001) and the considerable progress afforded by the International HapMap project (The International HapMap Consortium, 2003; Altshuler et al., 2005), many genes contain more SNPs than can be affordably genotyped. For example, the neuregulin-1 gene contains around 4000 SNPs, more than is practically feasible to genotype (even as genotyping costs continue to plummet). Our application provides a rational methodology for reducing the number of SNPs to evaluate while still capturing directly or indirectly a considerable portion of the genetic variation found in the genomic region.
Accomplishing this task for a set of dozens or hundreds of genes is currently time-consuming and error-prone as the integration of genomic data from disparate databases is required. We developed the TAMAL (Technology And Money Are Limiting) web-based application to help streamline the task of choosing SNPs for further investigation (Fig. 1 for a screenshot of the TAMAL application).
|
| METHODS |
|---|
|
|
|---|
TAMAL is designed to be interactive, so that in addition to displaying suggested SNPs, the researcher can dynamically filter the results based on any of the application's controls. On the left panel in Figure 1, the user inputs the standard gene name for a single gene or uploads a list of genes. The standard gene name is generally that approved by HUGO (http://www.gene.ucl.ac.uk/cgi-bin/nomenclature/searchgenes.pl), e.g. COMT for catechol-O-methyltransferase. All genomic locations are per the hg16 UCSC build.
The middle panel shows the result of querying the TAMAL database for the gene(s) input by the user. Optionally, the user can also limit the search to the most 5'- and 3'-extent of the gene or extend the search by a specified number of bases in either direction (20 000 bases by default). The SNP set is limited to those with evidence of variation in any of the major SNP databases (dbSNP, HapMap, Perlegen and Affymetrix).
The right panel lists sets of criteria that can be used to filter the set of SNPs according to flexible criteria. At the top, the user can select the Gabriel method (Gabriel et al., 2002) or TAGGER method of selecting haplotype tag SNPs from any or all of the four HapMap ancestry groups (The International HapMap Consortium, 2003) as determined by HaploView (Barrett et al., 2004). It is important to note that some genomic regions may not be amenable to this approach (Wall and Pritchard 2003a,b). At the middle of the right panel, the user can select SNPs that lead to non-synonymous or synonymous amino acid changes augmented with in silico prediction of functionality (Karchin et al., 2005) or alter an intronic splice site. At the bottom, the user can select SNPs that occur in certain types of genomic featuresSNPs that are in a predicted promoter (in silico prediction but with biological validation) (Trinklein et al., 2003), in a region of predicted regulatory potential (Blanchette et al., 2004) or a predicted transfactor binding site (TRANSFAC v6.0, http://www.gene-regulation.com), along with SNPs that are in regions with conservation scores
99th percentile genome-wide for humanchimpratmousechicken alignment via a hidden Markov model (Siepel and Haussler, 2003).
The user can inspect the choice of SNPs by clicking on the down arrow next to a gene in the middle panel. This opens the UCSC genome browser in a separate window (inset in Fig. 1) so the user can inspect the SNP coverage and ensure that the SNPs selected are a reasonable subset of all those potentially available. Finally, at the lower edge of the middle panel users can download the results into an EXCEL file (commonly used by researchers) or in XML format (for exchange with other applications).
TAMAL is provided as a good faith effort to assist the human genetics community. No such tool should be considered as a foolproof black box. There are some genes that will be difficult to study with typical SNP methods, and there are additional databases for some genes that should be consulted (e.g. for genotyping members of the large CYP gene family). Nonetheless, provided that users remain cognizant of its limitations, TAMAL can greatly assist with rational SNP selection.
We will endeavor to update TAMAL on a quarterly basis to incorporate updates to the primary databases as well as new features.
| Acknowledgments |
|---|
We thank the Carolina Center for Exploratory Genetic Analysis for computational support (P20RR20751), and the Informatics and Visualization Laboratory (http://www.ils.unc.edu/bmh/ivlab) at the School of Information and Library Science for hosting this service. Funding to pay the Open Access publication charges was provided by the University of North Carolina at Chapel Hill's Open Access Publishing Fund.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Frank Dudbridge
Received on November 16, 2005; revised on December 20, 2005; accepted on December 23, 2005
| REFERENCES |
|---|
|
|
|---|
Altshuler, D., et al. (2005) A haplotype map of the human genome. Nature, 437, 12991320[CrossRef][Medline].
Barrett, J.C., et al. (2004) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics, 21, 263265.
Blanchette, M., et al. (2004) Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res, . 14, 708715
Gabriel, S.B., et al. (2002) The structure of haplotype blocks in the human genome. Science, 296, 22252229
Karchin, R., et al. (2005) LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources. Bioinformatics, 21, 28142820
Lander, E.S., et al. (2001) Initial sequencing and analysis of the human genome. Nature, 409, 860921[CrossRef][Medline].
Siepel, A. and Haussler, D. (2003) Combining phylogenetic and hidden Markov models in biosequence analysis. In Proceedings of the Seventh Annual International Conference on Computional Molecular Biology (RECOMB 2003)Berlin, Germany , pp. 277286.
The International HapMap Consortium. (2003) The International HapMap Project. Nature, 426, 789796[CrossRef][Medline].
Trinklein, N.D., et al. (2003) Identification and functional analysis of human transcriptional promoters. Genome Res, . 13, 308312
Wall, J.D. and Pritchard, J.K. (2003a) Assessing the performance of the haplotype block model of linkage disequilibrium. Am. J. Hum. Genet, . 73, 502515[CrossRef][Web of Science][Medline].
Wall, J.D. and Pritchard, J.K. (2003b) Haplotype blocks and linkage disequilibrium in the human genome. Nat. Rev. Genet, . 4, 587597[CrossRef][Web of Science][Medline].
This article has been cited by other articles:
![]() |
B. P. McEvoy, G. W. Montgomery, A. F. McRae, S. Ripatti, M. Perola, T. D. Spector, L. Cherkas, K. R. Ahmadi, D. Boomsma, G. Willemsen, et al. Geographical structure and differential natural selection among North European populations Genome Res., May 1, 2009; 19(5): 804 - 814. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Love-Gregory, R. Sherva, L. Sun, J. Wasson, T. Schappe, A. Doria, D.C. Rao, S. C. Hunt, S. Klein, R. J. Neuman, et al. Variants in the CD36 gene associate with the metabolic syndrome and high-density lipoprotein cholesterol Hum. Mol. Genet., June 1, 2008; 17(11): 1695 - 1704. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. H. Perlis, S. Purcell, J. Fagerness, A. Kirby, T. L. Petryshen, J. Fan, and P. Sklar Family-Based Association Study of Lithium-Related and Other Candidate Genes in Bipolar Disorder Arch Gen Psychiatry, January 1, 2008; 65(1): 53 - 61. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Kasperaviciute, M. E. Weale, K. V. Shianna, G. T. Banks, C. L. Simpson, V. K. Hansen, M. R. Turner, C. E. Shaw, A. Al-Chalabi, H. S. Pall, et al. Large-scale pathways-based association study in amyotrophic lateral sclerosis Brain, September 1, 2007; 130(9): 2292 - 2301. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Grover, A. S. Woodfield, R. Verma, P. P. Zandi, D. F. Levinson, and J. B. Potash QuickSNP: an automated web server for selection of tagSNPs Nucleic Acids Res., July 13, 2007; 35(suppl_2): W115 - W120. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Bhatti, D. M. Church, J. L. Rutter, J. P. Struewing, and A. J. Sigurdson Candidate Single Nucleotide Polymorphism Selection using Publicly Available Tools: A Guide for Epidemiologists Am. J. Epidemiol., October 15, 2006; 164(8): 794 - 804. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






