Bioinformatics Vol. 17 no. 9 2001
Pages 775-790
© 2001 Oxford University Press
A non-parametric approach to translating gene region heterogeneity associated with phenotype into location heterogeneity
Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA
Received on November 20, 2000
; revised on February 5, 2001
; accepted on March 13, 2001
Motivation: The analysis of genetic data poses statistical problems in the form of high dimensionality with small sample sizes. The construction of a composite gene region (sequence pair) heterogeneity measure is one technique for reducing the dimensionality of the problem. This approach however is not without cost, since the contribution of locations to observed gene region differences between groups becomes entangled in this summary measure. This is problematic since it is of scientific interest to identify locations that together depict phenotype.
Results: A method is proposed for relating observed gene region heterogeneity back to the location level. In the spirit of a factor analysis-type setting, the approach focuses on identifying a latent variable structure among locations to explain within and between group genetic differences associated with phenotype. The method is flexible for identifying either the additive contribution from individual locations or the additive contribution from a group of locations, to observed gene region heterogeneity, depending upon the weighting scheme used in constructing a gene region heterogeneity measure. The approach is illustrated with clinical trial data, where the problem of altered HIV drug susceptibility is examined through characterizing location contributions to HIV protease gene region differences associated with a phenotypic treatment response.
Availability: The Splus (MathSoft, Inc. S-Plus 2000, Seattle, WA, 1999) developed menu-driven functions for obtaining results, GENE_ S (J.Kowalski, Harvard School of Public Health, Boston, MA 2001), is available from the author upon request.