Bioinformatics Advance Access originally published online on January 29, 2004
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics 20(7) © Oxford University Press 2004; all rights reserved.
Measuring the similarity of protein structures by means of the universal similarity metric
1 Automated Scheduling, Optimisation and Planning Group, University of Nottingham, Nottingham, NG8 1BB, UK and 2 Department of Computer Science and Artificial Intelligence, E.T.S.I. Informatica, Universidad de Granada, 18071 Granada, Spain
Received on June 26, 2003; revised on October 13, 2003; accepted on October 22, 2003
Advance Access Publication January 29, 2004
Motivation: As an increasing number of protein structures become available, the need for algorithms that can quantify the similarity between protein structures increases as well. Thus, the comparison of proteins' structures, and their clustering accordingly to a given similarity measure, is at the core of today's biomedical research. In this paper, we show how an algorithmic information theory inspired Universal Similarity Metric (USM) can be used to calculate similarities between protein pairs. The method, besides being theoretically supported, is surprisingly simple to implement and computationally efficient.
Results: Structural similarity between proteins in four different datasets was measured using the USM. The sample employed represented alpha, beta, alphabeta, timbarrel, globins and serpine protein types. The use of the proposed metric allows for a correct measurement of similarity and classification of the proteins in the four datasets.
Availability: All the scripts and programs used for the preparation of this paper are available at http://www.cs.nott.ac.uk/~nxk/USM/protocol.html. In that web-page the reader will find a brief description on how to use the various scripts and programs.
Supplementary information: The protein datasets used are collected in http://www.cs.nott.ac.uk/~nxk/USM/datasets.html. The calculated similarity values for the proteins used in this paper can be found in http://www.cs.nott.ac.uk/~nxk/USM/similar.html. The clustering of the dataset based on these similarity values can be found in http://www.cs.nott.ac.uk/~nxk/USM/clustering.html
Contact: Natalio.Krasnogor{at}nottingham.ac.uk; dpelta{at}ugr.es
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
R. Giancarlo, D. Scaturro, and F. Utro Textual data compression in computational biology: a synopsis Bioinformatics, July 1, 2009; 25(13): 1575 - 1586. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Kocsor, A. Kertesz-Farkas, L. Kajan, and S. Pongor Application of compression-based distance measures to protein sequence classification: a methodological study Bioinformatics, February 15, 2006; 22(4): 407 - 412. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Handl, J. Knowles, and D. B. Kell Computational cluster validation in post-genomic data analysis Bioinformatics, August 1, 2005; 21(15): 3201 - 3212. [Abstract] [Full Text] [PDF] |
||||
