Bioinformatics Advance Access originally published online on June 3, 2009
Bioinformatics 2009 25(16):2074-2075; doi:10.1093/bioinformatics/btp344
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Detecting SNPs and estimating allele frequencies in clonal bacterial populations by sequencing pooled DNA
1 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, 2 Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN and 3 Laboratory of Gastrointestinal Pathogens, Centre for Infections, Health Protection Agency, 61 Colindale Avenue, London NW9 5HT, UK
*To whom correspondence should be addressed.
| Abstract |
|---|
Summary: Here, we present a method for estimating the frequencies of SNP alleles present within pooled samples of DNA using high-throughput short-read sequencing. The method was tested on real data from six strains of the highly monomorphic pathogen Salmonella Paratyphi A, sequenced individually and in a pool. A variety of read mapping and quality-weighting procedures were tested to determine the optimal parameters, which afforded
80% sensitivity of SNP detection and strong correlation with true SNP frequency at poolwide read depth of 40x, declining only slightly at read depths 20–40x.
Availability: The method was implemented in Perl and relies on the opensource software Maq for read mapping and SNP calling. The Perl script is freely available from ftp://ftp.sanger.ac.uk/pub/pathogens/pools/.
Contact: kh2{at}sanger.ac.uk
Supplementary information: Supplementary data are available at Bioinformatics online.
Associate Editor: Joaquin Dopazo
Received on March 19, 2009; revised on May 25, 2009; accepted on May 29, 2009