Bioinformatics Vol. 18 no. 9 2002
Pages 1257-1263
© 2002 Oxford University Press
A dissimilarity matrix between protein atom classes based on Gaussian mixtures
1 Department of Mathematics, University of Turku,
FIN-20014 Turku, Finland
2 Department of Biochemistry and Pharmacy,
Åbo Akademi University, PO Box 66, FIN-20521 Turku, Finland
3 Department of Mathematics, Linköping University,
S-581 83 Linköping, Sweden
Received on December 20, 2001
; revised on March 4, 2002
; accepted on March 11, 2002
Motivation: Previously, Rantanen et al. (2001; J. Mol. Biol., 313, 197214) constructed a protein atomligand fragment interaction library embodying experimentally solved, high-resolution three-dimensional (3D) structural data from the Protein Data Bank (PDB). The spatial locations of protein atoms that surround ligand fragments were modeled with Gaussian mixture models, the parameters of which were estimated with the expectation-maximization (EM) algorithm. In the validation analysis of this library, there was strong indication that the protein atom classification, 24 classes, was too large and that a reduction in the classes would lead to improved predictions.
Results: Here, a dissimilarity (distance) matrix that is suitable for comparison and fusion of 24 pre-defined protein atom classes has been derived. Jeffreys' distances between Gaussian mixture models are used as a basis to estimate dissimilarities between protein atom classes. The dissimilarity data are analyzed both with a hierarchical clustering method and independently by using multidimensional scaling analysis. The results provide additional insight into the relationships between different protein atom classes, giving us guidance on, for example, how to readjust protein atom classification and, thus, they will help us to improve protein--ligand interaction predictions.
Contact: vira{at}utu.fi