Skip Navigation



Bioinformatics Advance Access published online on September 29, 2008

Bioinformatics, doi:10.1093/bioinformatics/btn509
This Article
Right arrow Advance Access manuscript (PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/23/2665    most recent
btn509v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Lin, Y.
Right arrow Articles by Feingold, E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lin, Y.
Right arrow Articles by Feingold, E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author (2008). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Smarter Clustering Methods for SNP Genotype Calling

Yan Lin 1,3,*, George Tseng 1,2, Soo Yeon Cheong 1, Lora J.H. Bean 4, Stephanie L. Sherman 4 and Eleanor Feingold 1,2

1Department of Biostatistics, University of Pittsburgh.
2Department of Human Genetics, University of Pittsburgh.
3Department of Medicine, University of Pittsburgh.
4Department of Human Genetics, Emory University.

*To whom correspondence should be addressed. Dr. Yan Lin, E-mail: yal14{at}pitt.edu


   Abstract

Motivation: Most genotyping technologies for single nucleotide polymorphism (SNP) markers use standard clustering methods to "call" the SNP genotypes. These methods are not always optimal in distinguishing the genotype clusters of a SNP because they do not take advantage of specific features of the genotype calling problem. In particular, when family data are available, pedigree information is ignored. Furthermore, prior information about the distribution of the measurements for each cluster can be used to choose an appropriate model-based clustering method and can significantly improve the genotype calls. One special genotyping problem that has never been discussed in the literature is that of genotyping of trisomic individuals, such as individuals with Down syndrome. Calling trisomic genotypes is a more complicated problem, and the addition of external information becomes very important.

Results: In this article, we discuss the impact of incorporating external information into clustering algorithms to call the genotypes for both disomic and trisomic data. We also propose two new methods to call genotypes using family data. One is a modification of the K-means method and uses the pedigree information by updating all members of a family together. The other is a likelihood-based method that combines the Gaussian or beta mixture model with pedigree information. We compare the performance of these two methods and some other existing methods using simulation studies. We also compare the performance of these methods on a real dataset generated by the Illumina platform (www.illumina.com).

Availability: The R code for the family-based genotype calling methods (SNPCaller) is available to be downloaded from the following website: http://watson.hgen.pitt.edu/register.

Contact: yal14{at}pitt.edu

Associate Editor: Prof. Dmitrij Frishman


Received on May 8, 2008; revised on September 23, 2008; accepted on September 23, 2008

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.