Bioinformatics 20(Suppl. 1) © Oxford University Press 2004; all rights reserved.
Mining frequent patterns in protein structures: a study of protease families
1 Center for Computational Biology and Bioinformatics, Department of Molecular Genetics and Biochemistry, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15261, USA and 2 Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA
Received on January 15, 2004; accepted on March 1, 2004
Motivation: Analysis of protein sequence and structure databases usually reveal frequent patterns (FP) associated with biological function. Data mining techniques generally consider the physicochemical and structural properties of amino acids and their microenvironment in the folded structures. Dynamics is not usually considered, although proteins are not static, and their function relates to conformational mobility in many cases.
Results: This work describes a novel unsupervised learning approach to discover FPs in the protein families, based on biochemical, geometric and dynamic features. Without any prior knowledge of functional motifs, the method discovers the FPs for each type of amino acid and identifies the conserved residues in three protease subfamilies; chymotrypsin and subtilisin subfamilies of serine proteases and papain subfamily of cysteine proteases. The catalytic triad residues are distinguished by their strong spatial coupling (high interconnectivity) to other conserved residues. Although the spatial arrangements of the catalytic residues in the two subfamilies of serine proteases are similar, their FPs are found to be quite different. The present approach appears to be a promising tool for detecting functional patterns in rapidly growing structure databases and providing insights in to the relationship among protein structure, dynamics and function.
Availability: Available upon request from the authors.
Contact: bahar{at}pitt.edu
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
L.-W. Yang, A. J. Rader, X. Liu, C. J. Jursa, S. C. Chen, H. A. Karimi, and I. Bahar oGNM: online computation of structural dynamics using the Gaussian Network Model. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W24 - W31. [Abstract] [Full Text] [PDF] |
||||
![]() |
L.-W. Yang, X. Liu, C. J. Jursa, M. Holliman, A.J. Rader, H. A. Karimi, and I. Bahar iGNM: a database of protein functional motions based on Gaussian Network Model Bioinformatics, July 1, 2005; 21(13): 2978 - 2987. [Abstract] [Full Text] [PDF] |
||||

