Bioinformatics Advance Access published online on October 29, 2008
Bioinformatics, doi:10.1093/bioinformatics/btn552
Identifying molecular markers associated with classification of genotypes by External Logistic Biplots
1Centro de Biotecnología, Instituto de Estudios Avanzados (IDEA). Caracas, Venezuela. Email: jdemey{at}reacciun.ve.
2Departamento de Estadística, Universidad de Salamanca. Salamanca, España.
3Instituto Nacional de Investigaciones Agrícolas (INIA-CENIAP). Maracay, Venezuela.
*To whom correspondence should be addressed. Prof. Jhonny Demey, E-mail: jdemey{at}reacciun.vejhonny.demey{at}gmail.com
| Abstract |
|---|
For characterization of genetic diversity in genotypes several molecular techniques, usually resulting in a binary data matrix, have been used. Despite the fact that in Cluster Analysis and Principal Coordinates Analysis the interpretation of the variables responsible for grouping is not straightforward, these methods are commonly used to classify genotypes using DNA molecular markers. In this paper, we present a novel algorithm that uses a combination of Principal Coordinates Analysis (PCoA), Cluster Analysis (CA) and Logistic Regression (LR), as a better way to interpret the variables (alleles or bands) associated to the classification of genotypes. The combination of three standard techniques with some new ideas about the geometry of the procedures, allows constructing an External Logistic Biplot (ELB) that helps in the interpretation of the variables responsible for the classification or ordination. An application of the method to study the genetic diversity of four populations from Africa, Asia and Europe, using the HapMap data is included.
Availability: The Matlab code for implementing the methods may be obtained from the web site: http://biplot.usal.es.
Supplementary Information: Supplementary Data are available at Bioinformatics online.
Associate Editor: Prof. Dmitrij Frishman
Received on April 26, 2008; revised on September 30, 2008; accepted on October 22, 2008