Skip Navigation


Bioinformatics Advance Access originally published online on April 14, 2008
Bioinformatics 2008 24(11):1406-1407; doi:10.1093/bioinformatics/btn136
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/11/1406    most recent
btn136v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (9)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Guillot, G.
Right arrow Articles by Estoup, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Guillot, G.
Right arrow Articles by Estoup, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Analysing georeferenced population genetics data with Geneland: a new algorithm to deal with null alleles and a friendly graphical user interface

Gilles Guillot 1,2,*, Filipe Santos 3 and Arnaud Estoup 3

1Centre for Ecological and Evolutionary Synthesis, Department of Biology, University of Oslo, P.O Box 1066 Blindern, 0316 Oslo, Norway, 2Applied Mathematics Department, INRA, Paris and 3Centre de Biologie et de Gestion des Populations, INRA / IRD / CIRAD / Montpellier SupAgro, Campus international de Baillarguet, CS 30016, F-34988 Montferrier-sur-Lez cedex, France

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Summary: We introduce a new algorithm to account for the presence of null alleles in inferences of populations clusters from individual multilocus genetic data. We show by simulations that the presence of null alleles can affect the accuracy of inferences if not properly accounted for and that our algorithm improve signficantly their accuracy.

Availability: This new algorithm is implemented in the program Geneland. It is freely available under GNU public license as an R package on the Comprehensive R Archive Network. It now includes a fully clickable graphical interface. Informations on how to get the software are available on folk.uio.no/gillesg/Geneland.html

Contact: gilles.guillot{at}bio.uio.no

Supplementary information: Details on the simulation study are available from folk.uio.no/gillesg/BioInformatics_Geneland


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Bayesian clustering algorithms have become extremely useful tools to investigate the structure of population genetics data (Excoffier and Henkel, 2006) but the conclusion drawn from the use of such algorithms can be markedly influenced by the presence of genotyping errors (Pompanon et al., 2005). A well known source of such potential problems is the presence of null alleles arising from variation in the nucleotide sequences of flanking regions that prevent the primer annealing to template DNA during PCR amplification of the microsatellite locus (Dakin and Avise, 2004). The presence of null alleles results in an excess of homozygous genotypes within a population as compared to the expected proportion under Hardy Weinberg Equilibrium (HWE) and Linkage Equilibrium (LE) (Callen et al., 1993; Paetkau et al., 1995). While all population genetics clustering softwares are based on HWE and LE within the sought clusters, there is no study to date on the effect of null alleles on the accuracy of inferences with such softwares. In this note, we introduce a new statistical model and an MCMC step to explictly take into account the putative presence of null allele(s) in the analysed dataset. We briefly illustrate how the presence of null alleles affects the accuracy of inferences with and without using our null allele filtering scheme.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
In case the presence of null alleles is suspected, we introduce a difference between the observed genotypes denoted by z = (zi,l) (where the subscript i and l refer to the individual and the locus, respectively) and the true non-observed genotypes denoted by y = (yi,l). For each locus, we introduce an extra fictional allele denoted by {nu}l coding for the putative presence of one or several null alleles for which cumulated frequency has to be estimated. The presence of null alleles is taken into account by estimating jointly y and the other parameters of the model in an MCMC simulation. A generic step updating y visits sequentially the genotype of all the individuals at all loci. If zi,l is an heterozygous genotype, there is no ambiguity and yi,l = zi,l. If zi,l consists of a double missing data, there is no ambiguity as the true unobserved genotypes consists necessarily of two null alleles and yi,l = ({nu}l,{nu}l). If zi,l = ({alpha},{alpha}) there is an ambiguity. The true genotype could be either genuinely homozygous, yi,l = ({alpha},{alpha}), or could be yi,l = ({alpha},{nu}l). We denote by {theta} the vector of all unknown quatities to be inferred (including y). The conditional probability of a genuine homozygous is


Formula

where {theta}y denotes the vector of all parameters except y and fklj denotes the allele frequency of allele j at locus l in population k. The full conditional probability of a presence of a null alleles is {pi}(yi,l= ({alpha},{nu}l)|zi,l = ({alpha},{alpha}),{theta} y) = 1–{pi}(yi,l = ({alpha},{alpha})|zi,l = ({alpha},{alpha}),{theta}y). yi,l is hence sampled randomly according to these two probabilities. The other steps of the Markov chain simulation are similar to those described in (Guillot et al., 2005) except that the likelihood is built on y instead of z.

To assess the benefit of using this extra step, we produced data according to the model implemented for simulations in Geneland. Loosely speaking, it produces spatially organised panmictic populations. For each simulated dataset, we tampered with the genotypes of the initial datasets (i.e. the datasets without null alleles) in a way that mimics the presence of null alleles with various frequencies. For each simulated dataset, we carried out inference of the number of populations K and individuals population memberships. Details on the simulation study are given as Supplementary Material.


    3 RESULTS AND DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Results are shown in Table 1. We found that inferences with Geneland are robust to the presence of a relatively small proportion of null alleles (i.e. <10%; Table 1). However, the presence of null alleles at higher proportions (e.g. 20%) substantially alters the accuracy of inferences on the number of populations with a systematic overestimation of K (Table 1, line 7, Formula tends to be larger than K). We note here that, even in the latter case, the accuracy in individual assignments (ERCAi) remains good as most of the spurious populations inferred contain very few individuals.


View this table:
[in this window]
[in a new window]

 
Table 1. Accuracy in terms of inference of K (percentage of runs where Table 1 != K and where Table 1 > K), and in terms of individual assignment (error rate in coassignment)

 
With regards to the use of the new statistical model and MCMC algorithm accounting for null alleles, we found that they efficiently restored the accuracy of inferences (Table 1, line 8). Incidentally, we observed that in case one or several null alleles were simulated, their cumulative frequency at each locus were very accurately estimated (results not shown). Interestingly enough, we observed that the use of our new statistical model and MCMC algorithm did not alter the accuracy of inferences if the dataset does not contain null alleles (Table 1, lines 1 and 2). Finally, we found that the use of the extra algorithmic step accounting for null alleles had a negligible effects on computing times (an increase of only a few percents depending on the thinning of the chain).

The resilience of Geneland to the presence of null alleles with frequencies up to 10% is fortunate regarding previous studies, as the presence of null alleles at microsatellite loci has been reported frequently in PCR primer characterization and in population genetics studies (Dakin and Avise, 2004). This resilience can be explained by the fact that in our simulations (and in real datasets as well), null alleles occur spatially at random so that the spatial locations of individuals carrying null alleles do not display any spatial pattern. Therefore, although the presence of null allele creates an excess of homozygous genotypes, this excess cannot be repaired by creating spurious populations while maintaining the geometric constraints in the spatial model on which Geneland is based. In agreement with this, we found that, when using Geneland under the non-spatial model option (making the prior model on population membership similar to that of Structure or BAPS in the non-spatial mode), we found that the inferences became largely unreliable with a systematic overestimation of the number of populations, even for low null allele frequencies; for instance, we obtained Formula = 24.3%, Formula = %22.4 and ERCAi = 8.03% when analysing the datasets with only 2% of null alleles. For mean frequencies of null alleles larger or equal to 20%, the presence of null alleles becomes an issue even when the spatial model option is used. In this case, the accuracy of inferences is efficiently restored when using the new statistical model and MCMC algorithm we specifically proposed for dealing with null alleles. In practice (i.e. when working with a real dataset), the presence of null alleles in the analysed dataset may often be suspected but their proportions are unknown. Since we found that the use of an extra algorithmic step accounting for their putative presence restores accuracy of inferences when null alleles are present and does not alter the accuracy of inferences if the dataset does not contain null alleles, we recommend to carry out inferences with Geneland with this option which does not increase the computing time significantly.

The present algorithm as well as previously existing functionnalities of Geneland are now available through a graphical user interface (GUI). This GUI is written in Tcl/Tk through the R library tcltk. For the growing community of R users in population genetics (see, e.g. the related CRAN task view cran.r-project.org/web/views/Genetics.html), this new GUI should prove to be very useful as its allows to use Geneland without any knowledge of the R language.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
We thank J.M. Cornuet, J.F. Cosson, M. Fontaine, R. Leblois, J.M. Marin, F. Mortier, C.P. Robert and G. Roderick for comment at various stages of this work.

Funding: This work was financially supported by the French Agence Nationale de la Recherche grant No NT05-4-42230.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Martin Bishop

Received on March 27, 2008; revised on April 9, 2008; accepted on April 9, 2008

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Callen DF, et al. Incidence and origin of ‘null’ alleles in the (ac)n microsatellite markers. Am. J. Hum. Genet (1993) 52:922–927.[Web of Science][Medline]

    Dakin EE, Avise JC. Microsatellite null alleles in parentage analysis. Heredity (2004) 93:504–509.[CrossRef][Web of Science][Medline]

    Excoffier L, Henkel G. Computer programs for population genetics data analysis: a survival guide. Nat. Rev. Genet (2006) 7:745–758.[CrossRef][Web of Science][Medline]

    Guillot G, et al. A spatial statistical model for landscape genetics. Genetics (2005) 170:1261–1280.[Abstract/Free Full Text]

    Paetkau D, et al. Microsatellite analysis of population structure in canadian polar bears. Mol. Ecol (1995) 4:347–354.[Medline]

    Pompanon F, et al. Genotyping errors: causes, consequences and solutions. Nat. Rev. Genet (2005) 6:847–859.[CrossRef][Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
G. Guillot
Inference of structure in subdivided populations at low levels of genetic differentiation--the correlated allele frequencies model revisited
Bioinformatics, October 1, 2008; 24(19): 2222 - 2228.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/11/1406    most recent
btn136v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (9)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Guillot, G.
Right arrow Articles by Estoup, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Guillot, G.
Right arrow Articles by Estoup, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?