Bioinformatics Vol. 18 no. 1 2002
Pages 36-38
© 2002 Oxford University Press
Calculating the SNP-effective sample size from an alignment
1 Max-Planck-Institut für Chemische Ökologie, Carl-Zeiss-Promenade 10, D-07745 Jena, Germany
Received on May 11, 2001
; revised on June 27, 2001
; accepted on August 7, 2001
Motivation: The number of Single Nucleotide Polymorphisms (SNPs) detectable in an alignment is a function of the length and the number of the aligned sequences. The latter is called sample size. However, a typical alignment, for instance obtained as a BLAST-search result of a query sequence against an EST database, does not evenly cover the query sequence. Therefore, it is usually not clear what the actual sample size is.
Results: We present a method to calculate the effective sample size, called neff, for a given BLAST alignment. This method takes into account that multiple coverage contributes only logarithmically to the SNP yield of a given sequence stretch. We show that the effective sample size neff is usually much smaller than would be expected for a given amount of coverage and illustrate this with two typical examples.
Availability: The algorithm is implemented in NEFF, a program written in FORTRAN90 that is accessible at http://soft.ice.mpg.de/neff. From this site also the source, except for two subroutines protected by copyright, and a LINUX compiled executable can be downloaded.
Contact: twiehe{at}ice.mpg.de
* To whom correspondence should be addressed.