Bioinformatics Advance Access originally published online on August 12, 2004
Bioinformatics 2005 21(1):124-127; doi:10.1093/bioinformatics/bth470
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics vol. 21 issue 1 © Oxford University Press 2005; all rights reserved.
SNP Chart: an integrated platform for visualization and interpretation of microarray genotyping data
James Hogg iCAPTURE Centre for Cardiovascular and Pulmonary Research, St Paul's Hospital, University of British Columbia Vancouver, Canada V6Z 1Y6
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: SNP Chart is a Java application for the visualization and interpretation of microarray genotyping data primarily derived from arrayed primer extension-based chemistries. Spot intensity output files from microarray analysis tools are imported into SNP Chart, together with a multi-channel TIFF image of the original array experiment and a list of the actual single nucleotide polymorphisms (SNPs) being tested. Data from different and/or replicate probes that interrogate the same SNP, but that are scattered across the array grid, can be reassembled into a single chart format, specific for the SNP. This allows a quick and very effective visualization/quality control of the data from multiple probes for the same SNP that can be easily interpreted and manually scored as a genotype.
Availability: http://www.snpchart.ca
Contact: stebbutt{at}mrl.ubc.ca
Supplementary information: A comprehensive manual describing SNP Chart is available at the above website, together with sample data files.
| INTRODUCTION |
|---|
|
|
|---|
An important aspect of the Human Genome Project is the massive governmental and industry-sponsored effort to develop a dense set of biallelic markers (single nucleotide polymorphisms, SNPs) throughout the human genome Wang et al., 1998. This effort has been spurred by the realization that a dense set of SNP markers throughout the genome could yield critical information to determine specific functional SNPs and combinations of SNPs that form the genetic basis of complex diseases Risch and Merikangas, 1996. For research discovery purposes, there are a number of high-throughput genotyping technologies available [e.g. MALDI-TOF Sequenom Buetow et al., 2001; TaqMan Livak et al., 1995; and Pyrosequencing Ahmadian et al., 2000] that have been engineered to optimize the genotyping of large numbers of individuals for one SNP at a time.
Genotyping microarrays are devices displaying hundreds, or even thousands of specific oligonucleotide probes, precisely located on a small-format solid support. These array-based technologies offer both research and potentially clinical (patient-specific) application due to the ability of the multiple probe sets to simultaneously interrogate multiple genetic markers (SNPs) from an individual. There are a number of microarray genotyping protocols, including Affymetrix GeneChips Kennedy et al., 2003, Tagged/ZipCode Arrays [e.g. SBE-TAGS Hirschhorn et al., 2000 and Illumina's bead-array system Oliphant et al., 2002] and arrayed primer extension [APEX Kurg et al., 2000, Shumaker et al., 1996].
APEX is a re-sequencing method, combining the advantages of a highly parallel microarray with the discriminatory power of the Sanger dideoxy terminator sequencing chemistry Sanger et al., 1977. Research groups have developed APEX-based microarrays for a variety of genotyping and mutation detection assays, including thalassemia gene mutations Chan et al., 2004, Gemignani et al., 2002, human chromosome 22 SNP markers Dawson et al., 2002, xenobiotic metabolism- and DNA repair-related gene SNPs Landi et al., 2003, and genome-wide SNPs (Tebbutt et al., in press).
Microarray image analysis tools are reasonably good at identifying array probe features (spots) and extracting appropriate intensity values from multiple dye channels, but are ultimately designed for gene expression studies, not for genotyping. To our knowledge there is only one software package specifically designed for microarray-based genotyping using arrayed primer extension. Genorama is a proprietary image analysis software package designed by Asper Biotech Ltd (www.asperbio.com) that is capable of detecting all four colours of fluorescence emitted from the dyes used in an APEX experiment, and then automatically call the base(s) incorporated at a particular probe spot. However, the scoring algorithm treats all probes equally, and can sometimes give an erroneous score that has three bases. This is an obvious problem as the genotype consists of two bases and not three, and hence considerable inspection of the original array data may be required to make a final genotype call. Thus, Genorama is a base calling algorithm and not a true genotyping algorithm. Furthermore, Genorama requires any duplicate spots of the same probe to be positioned adjacently in the microarray grid, limiting the robustness of the experimental design in overcoming issues, such as random pin blockage during chip printing, localized hybridization failure and high local background problems.
| PROGRAM OVERVIEW |
|---|
|
|
|---|
SNP Chart is a visualization tool written in the Java language. SNP Chart is platform independent: it can be run on any operation systemWindows, Linux, Unix, Novell, Mac OS or any other that can implement the Java run-time environment. The architecture of the application follows the Model-View-Controller paradigm, based on universal reusable components, and supports open standards. Functionality can be easily extended with plug-ins that can implement integration with new data sources or perform new analyses, including statistical algorithms.
Data can be stored in any database supporting the ANSI 92 SQL standard. User authentication can be provided either against a LDAP Server or using a built-in component. To change a back-end database, or authentication mechanism, appropriate alterations can be made to the configuration file. The default configuration for the enterprise version is set to use the enterprise level IBM DB2 Universal Database and LDAP authentication.
For illustrative purposes, SNP Chart has three major functionalitiesdata import, data visualization with user genotype calling and data export. Functionalities can be differentially assigned to multiple users. Multichannel spot intensity data from genotyping microarrays are imported, along with colour TIFF images of the actual arrays themselves, SNP-specific oligonucleotides probe information, http links to public database resources, such as NCBI (http://www.ncbi.nih.gov/SNP/) and the SNP Consortium (http://snp.cshl.org/), and any other type of information deemed appropriate.
Experimental data from a single sample array or multiple sample arrays can be viewed by selecting an individual SNP rs number (dbSNP). A SNP-specific chart is generated for each array (Fig. 36, 1), displaying all spot features and channel intensity measurements from multiple types of probes [including APEX probes and allele-specific (AS) APEX probes Gemignani et al., 2002] that were originally scattered across the microarray grid, but that provide information on a single SNP. The array colour TIFF image can also be accessed, with three different views displayed: the actual spot feature for a selected intensity data point; the sub-grid of the array; and the entire array grid (Fig. 36, 2). Prototype charts for the selected SNP, specific to validated genotypes (e.g. CC, CT, TT and NEGative control) can also be displayed (Fig. 36, 3), allowing easy discrimination of the chart under review. A genotype call can be made by way of the Scoring and Genotyping panel (Fig. 36, 4). Automation of the calling of genotypes, based on the information displayed in the chart, is under development. This will allow faster analysis and reduce user-subjectivity issues. Nevertheless, it is unlikely that any automatic scoring algorithm will be perfect for all SNPs, and the data visualization of SNP Chart provides a useful manual-override in cases of null calls.
The data export function allows single/multiple SNP data from selected experiments and samples to be downloaded to an Excel file. Three export formats are available: Intensities Export retrieves all data, including channel intensities for each spot; Scorer Genotypes Export and Final Genotypes Export delivers the genotype calls without any associated spot/probe data, and are in a format acceptable for genetic epidemiological analysis programs.
To evaluate the accuracy of SNP Chart, we genotyped 12 Coriell DNA samples (http://coriell.umdnj.edu/) across 123 SNPs (Tebbutt, et al., in press). We were able to compare our microarray-based data (scored using SNP Chart) against 1141 genotypes that had been determined by other research groups. Of these 1141, we found 1124 to be identical to our data, with a single null call (0.1%) and 16 miss calls (1.4%) for a combined error rate of 1.5%.
In summary, SNP Chart allows users to collect, store, and request data from multiple array genotyping experiments and analyses. The software generates visual patterns of spot intensity values from multiple channels, from a multiple probe set specific for a given SNP, easily interpretable as a specific genotype. The advantage over existing array data display methods is that one can easily look at an entire multiple probe set for a specific SNP, which can be more informative than looking at individual probes separately. The authors recognize that automated calling of the genotypes is required to further enhance SNP Chart. Nevertheless, the current software is a valuable tool for manual quality control of microarray genotyping data. SNP Chart could also be applied to expression array data, where multiple probes interrogate the same gene, or similar genes and/or gene pathways.
|
| Acknowledgments |
|---|
We would like to thank Jian Ruan for laboratory technical assistance, Kelly Burkett, Jian Qing He and Denise Daley for helpful comments and Peter Paré for continued support. This research was supported by the Canadian Institutes of Health Research, CANARIE, and the Michael Smith Foundation for Health Research.
Received on May 27, 2004; revised on July 13, 2004; accepted on August 8, 2004
| REFERENCES |
|---|
|
|
|---|
Ahmadian, A., Gharizadeh, B., Gustafsson, A.C., Sterky, F., Nyren, P., Uhlen, M., Lundeberg, J. (2000) Single-nucleotide polymorphism analysis by pyrosequencing. Anal. Biochem., 280, 103110[CrossRef][Web of Science][Medline].
Buetow, K.H., Edmonson, M., MacDonald, R., Clifford, R., Yip, P., Kelley, J., Little, D.P., Strausberg, R., Koester, H., Cantor, C.R., Braun, A. (2001) High-throughput development and characterization of a genomewide collection of gene-based single nucleotide polymorphism markers by chip-based matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Proc. Natl Acad. Sci. USA, 98, 581584
Chan, K., Wong, M.S., Chan, T.K., Chan, V. (2004) A thalassaemia array for Southeast Asia. Br. J. Haematol., 124, 232239[CrossRef][Web of Science][Medline].
Dawson, E., Abecasis, G.R., Bumpstead, S., Chen, Y., Hunt, S., Beare, D.M., Pabial, J., Dibling, T., Tinsley, E., Kirby, S., et al. (2002) A first-generation linkage disequilibrium map of human chromosome 22. Nature, 418, 544548[CrossRef][Medline].
Gemignani, F., Perra, C., Landi, S., Canzian, F., Kurg, A., Tonisson, N., Galanello, R., Cao, A., Metspalu, A., Romeo, G. (2002) Reliable detection of beta-thalassemia and G6PD mutations by a DNA microarray. Clin. Chem., 48, 20512054
Hirschhorn, J.N., Sklar, P., Lindblad-Toh, K., Lim, Y.M., Ruiz-Gutierrez, M., Bolk, S., Langhorst, B., Schaffner, S., Winchester, E., Lander, E.S. (2000) SBE-TAGS: an array-based method for efficient single-nucleotide polymorphism genotyping. Proc. Natl Acad. Sci. USA, 97, 1216412169
Kennedy, G.C., Matsuzaki, H., Dong, S., Liu, W.M., Huang, J., Liu, G., Su, X., Cao, M., Chen, W., Zhang, J., et al. (2003) Large-scale genotyping of complex DNA. Nat. Biotechnol., 21, 12331237[CrossRef][Web of Science][Medline].
Kurg, A., Tonisson, N., Georgiou, I., Shumaker, J., Tollett, J., Metspalu, A. (2000) Arrayed primer extension: solid-phase four-color DNA resequencing and mutation detection technology. Genet. Test, 4, 17[CrossRef][Web of Science][Medline].
Landi, S., Gemignani, F., Gioia-Patricola, L., Chabrier, A., Canzian, F. (2003) Evaluation of a microarray for genotyping polymorphisms related to xenobiotic metabolism and DNA repair. BioTechniques, 35, 816820 822, 824817[Web of Science][Medline].
Livak, K.J., Flood, S.J., Marmaro, J., Giusti, W., Deetz, K. (1995) Oligonucleotides with fluorescent dyes at opposite ends provide a quenched probe system useful for detecting PCR product and nucleic acid hybridization. PCR Methods Appl., 4, 357362[Web of Science][Medline].
Oliphant, A., Barker, D.L., Stuelpnagel, J.R., Chee, M.S. (2002) BeadArray technology: enabling an accurate, cost-effective approach to high-throughput genotyping. BioTechniques, Suppl., 5658 6051.
Risch, N. and Merikangas, K. (1996) The future of genetic studies of complex human diseases. Science, 273, 15161517
Sanger, F., Nicklen, S., Coulson, A.R. (1977) DNA sequencing with chain-terminating inhibitors. Proc. Natl Acad. Sci. USA, 74, 54635467
Shumaker, J.M., Metspalu, A., Caskey, C.T. (1996) Mutation detection by solid phase primer extension. Hum. Mutat., 7, 346354[CrossRef][Web of Science][Medline].
Tebbutt, S.J., Burkett, K.M., He, J-Q, Ruan, J., Opushnyev, I.V., Tripp, B.W., Zeznik, J.A., Abara, C.O., Nelson, C.C., Walley, K.R. (2004) A microarray genotyping resource to determine population stratification in genetic association studies of complex disease. Bio Techniques, in press.
Wang, D.G., Fan, J.B., Siao, C.J., Berno, A., Young, P., Sapolsky, R., Ghandour, G., Perkins, N., Winchester, E., Spencer, J., et al. (1998) Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science, 280, 10771082
This article has been cited by other articles:
![]() |
D. C. Walley, B. W. Tripp, Y. C. Song, K. R. Walley, and S. J. Tebbutt MACGT: multi-dimensional automated clustering genotyping tool for analysis of microarray-based mini-sequencing data Bioinformatics, May 1, 2006; 22(9): 1147 - 1149. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

