Skip Navigation


Bioinformatics Advance Access originally published online on September 17, 2004
Bioinformatics 2005 21(2):266-268; doi:10.1093/bioinformatics/bth486
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/2/266    most recent
bth486v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (6)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Zhao, L.-J.
Right arrow Articles by Deng, H.-W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zhao, L.-J.
Right arrow Articles by Deng, H.-W.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics vol. 21 issue 2 © Oxford University Press 2005; all rights reserved.

SNPP: automating large-scale SNP genotype data management

Lan-Juan Zhao 1,3,4,{dagger}, Miao-Xin Li 1,{dagger}, Yan-Fang Guo 1, Fu-Hua Xu 3,4, Jin-Long Li 5 and Hong-Wen Deng 1,2,3,4,*

1 Laboratory of Molecular and Statistical Genetics, College of Life Sciences, Hunan Normal University Changsha, Hunan 410081, China
2 The Key Laboratory of Biomedical Information Engineering of Ministry of Education and Institute of Molecular Genetics, School of Life Science and Technology, Xi'an Jiaotong University Xi'an 710049, China
3 Department of Biomedical Sciences, Creighton University Omaha, NE 68131, USA
4 Osteoporosis Research Center, Creighton University Medical Center Omaha, NE 68131, USA
5 Department of Psychiatry, Yale University School of Medicine New Haven, CT 06516, USA

*To whom correspondence should be addressed at Osteoporosis Research Center, Creighton University Medical Center, 601 N. 30th St. Suite 6787, Omaha, NE 68131, USA.


    Abstract
 TOP
 Abstract
 INTRODUCTION
 FEATURES AND ALGORITHMS
 IMPLEMENTATION
 REFERENCES
 

Summary: To manage high-throughput single nucleotide polymorphism (SNP) genotyping data efficiently, we developed a dynamic general database management system—SNPP (SNP Processor). It provides several functions, including data importing with comparison, Mendelian inheritance check within pedigrees, data compiling and exporting. Furthermore, SNPP may generate files for repeat genotyping and transform them into files that can be executed by a liquid handling system.

Availability: http://orclinux.creighton.edu/snpp/

Contact: lanjuanzhao{at}creighton.edu


    INTRODUCTION
 TOP
 Abstract
 INTRODUCTION
 FEATURES AND ALGORITHMS
 IMPLEMENTATION
 REFERENCES
 
High-throughput single nucleotide polymorphism (SNP) genotyping is increasingly being applied in large-scale genetic studies, in which thousands of data are routinely generated. An efficient tool is needed to manipulate such huge data output for further analyses.

Automation of data management may be achieved in genetic studies by using different softwares, such as GenoDB (Li et al., 2001) or GenoTool (Hampe et al., 2001). GenoDB is used for microsatellite markers. The different characteristics of microsatellite and SNP markers do not make GenoDB ideal for efficient management of SNP genotype data. GenoTool is a component of an integrated system for TaqManTM-based SNP genotyping, which is efficient for processing the SNP data genotyped by TaqManTM method (Hampe et al., 2001). However, it is difficult to adapt the genotype data generated from other SNP genotyping systems to the format desired by GenoTool. In addition, GenoTool runs only on an MS SQL server and Windows-based platforms (Hampe et al., 2001). Some laboratories may have access to and thus prefer other operating systems (OS). Furthermore, in GenoTool, the data were exported in the format that can only be used for the LINKAGE program, making it cumbersome to adapt the data for various other genetic analysis software programs.

An efficient genotype database should assume that genotype data are not 100% accurate and some data may need to be confirmed or re-obtained in repeat experiments (Li et al., 2001). These DNA samples are irregularly distributed in different sample plates. Identifying and selecting these samples manually for repeat experiments is time consuming and fallible. An automated liquid handling system, such as Tecan Genesis series (Tecan Inc., Research Triangel Park, NC), can be used for large-scale SNP genotyping projects. However, currently, there is no general software available for liquid handling systems to select specific samples from DNA plates for performing repeat experiments for the selected samples.

We thus developed SNP Processor (SNPP), a general cross-platform dynamic software that can efficiently aid a liquid handling system in repeat experiments and manipulate large-scale SNP genotype data.


    FEATURES AND ALGORITHMS
 TOP
 Abstract
 INTRODUCTION
 FEATURES AND ALGORITHMS
 IMPLEMENTATION
 REFERENCES
 
Data importing, comparing, searching, reviewing and editing
SNP Processor renders large-scale data import quick and easy. Raw SNP genotype data are loaded with duplicate data checking and comparison. This is handy given that usually some DNA samples are duplicated in sample plates for experi-ments as a genotyping quality control measure. If the same data are found, the system will alert users to choose the data for importing. In the user-friendly SNPP that has nice interfaces for its various capacities, users can review selected genotype data not only in a table format but also in a two-dimensional graphic format (right panel in Fig. 1). In the two-dimensional graph, the selected SNP genotype data are plotted according to their signal intensities of the two alleles and classed into three clusters corresponding to their genotype groups (i.e. AA, AB, BB). By visual inspection of the variability of signal intensity and the tightness of the data clustering, users can easily identify low quality SNP calls and improve genotype data quality by editing the data results in SNPP, and repeating experiments or other measures. Users can also compare two or three files automatically. The search function in SNPP facilitates users to locate the data sources.



View larger version (48K):
[in this window]
[in a new window]
 
Fig. 1 The interface of SNPP (left panel) and the graphic review of the genotype data (right panel).

 
Sample selection and data transformation for liquid handling systems
It is always the case that some DNA samples need to be re-genotyped. SNPP can identify DNA samples for SNPs that need to be repeatedly genotyped and compile them as a redo file. Redo file can then be transformed into files that a liquid handling system can recognize and execute. For users who only need to use data transformation function, SNPP allows one to apply this function only on an imported format-matched redo file.

The transformed files have been successfully tested on the liquid handling system Tecan Genesis RSP 150 in our lab. Theoretically, the files for the liquid handling system from SNPP will work on all liquid handling robotic systems, which are controlled by the Gemini software (Tecan Inc.).

Calculator
For convenience, SNPP provides a calculator to compute the total reagent volume needed for repeat genotyping for a SNP marker depending upon the number of repeat genotyping necessary. Users can edit the reagent volume for each polymerase chain reaction (PCR), and then the total reagent volume will be changed accordingly.

Mendelian inheritance check
For SNP genotype data obtained from families, consistency with Mendelian inheritance within families is necessary and can provide a preliminary mechanism for genotyping error check. SNPP automatically processes Mendelian inheritance check for the selected markers. According to the Mendelian inheritance checking results, users can edit (e.g. mark those inconsistent data to files for repeat experiments) and update the raw SNP data in the database easily. Then, users can process another step of Mendelian inheritance check by incorporating new results from repeat experiments until no inconsistent data are found under a specified missing data rate.

Data compiling and export
SNPP allows users to easily compile and export the genotype data, along with the available phenotype data in the format desired by different software, such as SOLAR, LINKAGE, QTDT, Genehunter or MEGA2. The MEGA2 software is able to convert the export files from SNPP into various other file formats for other linkage and genetic analyses (Mukhopadhyay et al., 1999).


    IMPLEMENTATION
 TOP
 Abstract
 INTRODUCTION
 FEATURES AND ALGORITHMS
 IMPLEMENTATION
 REFERENCES
 
Two formats of input files can be used in SNPP. One is raw SNP genotype data directly obtained from Invader Analyzer (Third Wave Technologies, Madison, WI). The other is a general format, which can be easily generated from other principal SNP genotyping data files, such as SNPstream (Bell et al., 2002). In order to make it easy for the software to be extensible and changeable in the future, object-oriented design in the Unified Modeling Language is applied. The client application is written in Java. The system and source code can be downloaded for free from the website (http://orclinux.creighton.edu/snpp/). SNPP is supported by the following commonly used databases: MySQL and MS Access. SNPP can be used for various OS, such as Windows, Linux, Solaris and Mac OS.


    Acknowledgments
 
The investigators were partially supported by Health Future Foundation, NIH, the State of Nebraska, US Department of Energy, CNSF, Huo YingDong Education Foundation, the Ministry of Education of China, Hunan Normal University and Xi'An Jiao Tong University.


    Footnotes
 
{dagger}The authors wish to be known that, in their opinion, the first two authors should be regarded as joint First Authors. Back

Received on February 7, 2004; revised on July 27, 2004; accepted on August 13, 2004

    REFERENCES
 TOP
 Abstract
 INTRODUCTION
 FEATURES AND ALGORITHMS
 IMPLEMENTATION
 REFERENCES
 

    Bell, P.A., Chaturvedi, S., Gelfand, C.A., Huang, C.Y., Kochersperger, M., Kopla, R., Modica, F., Pohl, M., Varde, S., Zhao, R., et al. (2002) SNPstream UHT: ultra-high throughput SNP genotyping for pharmacogenomics and drug discovery. Biotechniques, 32, Suppl., S70–S77.

    Hampe, J., Wollstein, A., Lu, T., Frevel, H.J., Will, M., Manaster, C., Schreiber, S. (2001) An integrated system for high throughput TaqMan based SNP genotyping. Bioinformatics, 17, 654–655[Abstract/Free Full Text].

    Li, J.L., Deng, H., Lai, D.B., Xu, F., Chen, J., Gao, G., Recker, R.R., Deng, H.W. (2001) Toward high-throughput genotyping: dynamic and automatic software for manipulating large-scale genotype data using fluorescently labeled dinucleotide markers. Genome Res., 11, 1304–1314[Abstract/Free Full Text].

    Mukhopadhyay, N., Almasy, L., Schroeder, M., Mulvihill, W.P., Weeks, D.E. (1999) Mega2, a data-handling program for facilitating genetic linkage and association analyses. Am. J. Hum. Genet., 65, A436.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Crop Sci.Home page
Y. Xu and J. H. Crouch
Marker-Assisted Selection in Plant Breeding: From Publications to Practice
Crop Sci., March 19, 2008; 48(2): 391 - 407.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M.-X. Li, L. Jiang, S.-L. Ho, Y.-Q. Song, and P.-C. Sham
IGG: A tool to integrate GeneChips for genetic studies
Bioinformatics, November 15, 2007; 23(22): 3105 - 3107.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/2/266    most recent
bth486v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (6)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Zhao, L.-J.
Right arrow Articles by Deng, H.-W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zhao, L.-J.
Right arrow Articles by Deng, H.-W.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?