Bioinformatics Advance Access originally published online on September 17, 2004
Bioinformatics 2005 21(2):266-268; doi:10.1093/bioinformatics/bth486
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics vol. 21 issue 2 © Oxford University Press 2005; all rights reserved.
SNPP: automating large-scale SNP genotype data management


1 Laboratory of Molecular and Statistical Genetics, College of Life Sciences, Hunan Normal University Changsha, Hunan 410081, China
2 The Key Laboratory of Biomedical Information Engineering of Ministry of Education and Institute of Molecular Genetics, School of Life Science and Technology, Xi'an Jiaotong University Xi'an 710049, China
3 Department of Biomedical Sciences, Creighton University Omaha, NE 68131, USA
4 Osteoporosis Research Center, Creighton University Medical Center Omaha, NE 68131, USA
5 Department of Psychiatry, Yale University School of Medicine New Haven, CT 06516, USA
*To whom correspondence should be addressed at Osteoporosis Research Center, Creighton University Medical Center, 601 N. 30th St. Suite 6787, Omaha, NE 68131, USA.
| Abstract |
|---|
|
|
|---|
Summary: To manage high-throughput single nucleotide polymorphism (SNP) genotyping data efficiently, we developed a dynamic general database management systemSNPP (SNP Processor). It provides several functions, including data importing with comparison, Mendelian inheritance check within pedigrees, data compiling and exporting. Furthermore, SNPP may generate files for repeat genotyping and transform them into files that can be executed by a liquid handling system.
Availability: http://orclinux.creighton.edu/snpp/
Contact: lanjuanzhao{at}creighton.edu
| INTRODUCTION |
|---|
|
|
|---|
High-throughput single nucleotide polymorphism (SNP) genotyping is increasingly being applied in large-scale genetic studies, in which thousands of data are routinely generated. An efficient tool is needed to manipulate such huge data output for further analyses.
Automation of data management may be achieved in genetic studies by using different softwares, such as GenoDB (Li et al., 2001) or GenoTool (Hampe et al., 2001). GenoDB is used for microsatellite markers. The different characteristics of microsatellite and SNP markers do not make GenoDB ideal for efficient management of SNP genotype data. GenoTool is a component of an integrated system for TaqManTM-based SNP genotyping, which is efficient for processing the SNP data genotyped by TaqManTM method (Hampe et al., 2001). However, it is difficult to adapt the genotype data generated from other SNP genotyping systems to the format desired by GenoTool. In addition, GenoTool runs only on an MS SQL server and Windows-based platforms (Hampe et al., 2001). Some laboratories may have access to and thus prefer other operating systems (OS). Furthermore, in GenoTool, the data were exported in the format that can only be used for the LINKAGE program, making it cumbersome to adapt the data for various other genetic analysis software programs.
An efficient genotype database should assume that genotype data are not 100% accurate and some data may need to be confirmed or re-obtained in repeat experiments (Li et al., 2001). These DNA samples are irregularly distributed in different sample plates. Identifying and selecting these samples manually for repeat experiments is time consuming and fallible. An automated liquid handling system, such as Tecan Genesis series (Tecan Inc., Research Triangel Park, NC), can be used for large-scale SNP genotyping projects. However, currently, there is no general software available for liquid handling systems to select specific samples from DNA plates for performing repeat experiments for the selected samples.
We thus developed SNP Processor (SNPP), a general cross-platform dynamic software that can efficiently aid a liquid handling system in repeat experiments and manipulate large-scale SNP genotype data.
| FEATURES AND ALGORITHMS |
|---|
|
|
|---|
Data importing, comparing, searching, reviewing and editing
SNP Processor renders large-scale data import quick and easy. Raw SNP genotype data are loaded with duplicate data checking and comparison. This is handy given that usually some DNA samples are duplicated in sample plates for experi-ments as a genotyping quality control measure. If the same data are found, the system will alert users to choose the data for importing. In the user-friendly SNPP that has nice interfaces for its various capacities, users can review selected genotype data not only in a table format but also in a two-dimensional graphic format (right panel in Fig. 1). In the two-dimensional graph, the selected SNP genotype data are plotted according to their signal intensities of the two alleles and classed into three clusters corresponding to their genotype groups (i.e. AA, AB, BB). By visual inspection of the variability of signal intensity and the tightness of the data clustering, users can easily identify low quality SNP calls and improve genotype data quality by editing the data results in SNPP, and repeating experiments or other measures. Users can also compare two or three files automatically. The search function in SNPP facilitates users to locate the data sources.
|
Sample selection and data transformation for liquid handling systems
It is always the case that some DNA samples need to be re-genotyped. SNPP can identify DNA samples for SNPs that need to be repeatedly genotyped and compile them as a redo file. Redo file can then be transformed into files that a liquid handling system can recognize and execute. For users who only need to use data transformation function, SNPP allows one to apply this function only on an imported format-matched redo file.
The transformed files have been successfully tested on the liquid handling system Tecan Genesis RSP 150 in our lab. Theoretically, the files for the liquid handling system from SNPP will work on all liquid handling robotic systems, which are controlled by the Gemini software (Tecan Inc.).
Calculator
For convenience, SNPP provides a calculator to compute the total reagent volume needed for repeat genotyping for a SNP marker depending upon the number of repeat genotyping necessary. Users can edit the reagent volume for each polymerase chain reaction (PCR), and then the total reagent volume will be changed accordingly.
Mendelian inheritance check
For SNP genotype data obtained from families, consistency with Mendelian inheritance within families is necessary and can provide a preliminary mechanism for genotyping error check. SNPP automatically processes Mendelian inheritance check for the selected markers. According to the Mendelian inheritance checking results, users can edit (e.g. mark those inconsistent data to files for repeat experiments) and update the raw SNP data in the database easily. Then, users can process another step of Mendelian inheritance check by incorporating new results from repeat experiments until no inconsistent data are found under a specified missing data rate.
Data compiling and export
SNPP allows users to easily compile and export the genotype data, along with the available phenotype data in the format desired by different software, such as SOLAR, LINKAGE, QTDT, Genehunter or MEGA2. The MEGA2 software is able to convert the export files from SNPP into various other file formats for other linkage and genetic analyses (Mukhopadhyay et al., 1999).
| IMPLEMENTATION |
|---|
|
|
|---|
Two formats of input files can be used in SNPP. One is raw SNP genotype data directly obtained from Invader Analyzer (Third Wave Technologies, Madison, WI). The other is a general format, which can be easily generated from other principal SNP genotyping data files, such as SNPstream (Bell et al., 2002). In order to make it easy for the software to be extensible and changeable in the future, object-oriented design in the Unified Modeling Language is applied. The client application is written in Java. The system and source code can be downloaded for free from the website (http://orclinux.creighton.edu/snpp/). SNPP is supported by the following commonly used databases: MySQL and MS Access. SNPP can be used for various OS, such as Windows, Linux, Solaris and Mac OS.
| Acknowledgments |
|---|
The investigators were partially supported by Health Future Foundation, NIH, the State of Nebraska, US Department of Energy, CNSF, Huo YingDong Education Foundation, the Ministry of Education of China, Hunan Normal University and Xi'An Jiao Tong University.
| Footnotes |
|---|
The authors wish to be known that, in their opinion, the first two authors should be regarded as joint First Authors.
Received on February 7, 2004; revised on July 27, 2004; accepted on August 13, 2004
| REFERENCES |
|---|
|
|
|---|
Bell, P.A., Chaturvedi, S., Gelfand, C.A., Huang, C.Y., Kochersperger, M., Kopla, R., Modica, F., Pohl, M., Varde, S., Zhao, R., et al. (2002) SNPstream UHT: ultra-high throughput SNP genotyping for pharmacogenomics and drug discovery. Biotechniques, 32, Suppl., S70S77.
Hampe, J., Wollstein, A., Lu, T., Frevel, H.J., Will, M., Manaster, C., Schreiber, S. (2001) An integrated system for high throughput TaqMan based SNP genotyping. Bioinformatics, 17, 654655
Li, J.L., Deng, H., Lai, D.B., Xu, F., Chen, J., Gao, G., Recker, R.R., Deng, H.W. (2001) Toward high-throughput genotyping: dynamic and automatic software for manipulating large-scale genotype data using fluorescently labeled dinucleotide markers. Genome Res., 11, 13041314
Mukhopadhyay, N., Almasy, L., Schroeder, M., Mulvihill, W.P., Weeks, D.E. (1999) Mega2, a data-handling program for facilitating genetic linkage and association analyses. Am. J. Hum. Genet., 65, A436.
This article has been cited by other articles:
![]() |
Y. Xu and J. H. Crouch Marker-Assisted Selection in Plant Breeding: From Publications to Practice Crop Sci., March 19, 2008; 48(2): 391 - 407. [Abstract] [Full Text] [PDF] |
||||
![]() |
M.-X. Li, L. Jiang, S.-L. Ho, Y.-Q. Song, and P.-C. Sham IGG: A tool to integrate GeneChips for genetic studies Bioinformatics, November 15, 2007; 23(22): 3105 - 3107. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


