Machine Learning in Computational Biology
Identifying HLA supertypes by learning distance functions


1 School of Computer Science and Engineering Israel
2 Interdisciplinary Center for Neural Computation, The Hebrew University of Jerusalem Israel
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: The development of epitope-based vaccines crucially relies on the ability to classify Human Leukocyte Antigen (HLA) molecules into sets that have similar peptide binding specificities, termed supertypes. In their seminal work, Sette and Sidney defined nine HLA class I supertypes and claimed that these provide an almost perfect coverage of the entire repertoire of HLA class I molecules.
HLA alleles are highly polymorphic and polygenic and therefore experimentally classifying each of these molecules to supertypes is at present an impossible task. Recently, a number of computational methods have been proposed for this task. These methods are based on defining protein similarity measures, derived from analysis of binding peptides or from analysis of the proteins themselves.
Results: In this paper we define both peptide derived and protein derived similarity measures, which are based on learning distance functions. The peptide derived measure is defined using a peptidepeptide distance function, which is learned using information about known binding and non-binding peptides. The protein derived similarity measure is defined using a proteinprotein distance function, which is learned using information about alleles previously classified to supertypes by Sette and Sidney (1999). We compare the classification obtained by these two complimentary methods to previously suggested classification methods. In general, our results are in excellent agreement with the classifications suggested by Sette and Sidney (1999) and with those reported by Buus et al. (2004).
The main important advantage of our proposed distance-based approach is that it makes use of two different and important immunological sources of informationHLA alleles and peptides that are known to bind or not bind to these alleles. Since each of our distance measures is trained using a different source of information, their combination can provide a more confident classification of alleles to supertypes.
Contact: tomboy{at}cs.huji.ac.il; cheny{at}cs.huji.ac.il
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.
This article has been cited by other articles:
![]() |
A. A. Chentoufi, X. Zhang, K. Lamberth, G. Dasgupta, I. Bettahi, A. Nguyen, M. Wu, X. Zhu, A. Mohebbi, S. Buus, et al. HLA-A*0201-Restricted CD8+ Cytotoxic T Lymphocyte Epitopes Identified from Herpes Simplex Virus Glycoprotein D J. Immunol., January 1, 2008; 180(1): 426 - 437. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Lundegaard, O. Lund, C. Kesmir, S. Brunak, and M. Nielsen Modeling the adaptive immune system: predictions and simulations Bioinformatics, December 15, 2007; 23(24): 3265 - 3275. [Abstract] [Full Text] [PDF] |
||||

