Bioinformatics Advance Access originally published online on January 5, 2008
Bioinformatics 2008 24(5):645-651; doi:10.1093/bioinformatics/btm641
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Structural search and retrieval using a tableau representation of protein folding patterns
1Department of Biochemistry and Molecular Biology and The Huck Institute for Genomics, Proteomics and Bioinformatics, The Pennsylvania State University, University Park, PA 16802, USA, 2Department of Computer Science and Software Engineering and 3NICTA Victoria Laboratories, The University of Melbourne, Victoria 3010, Australia
*To whom correspondence should be addressed.
| Abstract |
|---|
Comparison and classification of folding patterns from a database of protein structures is crucial to understand the principles of protein architecture, evolution and function. Current search methods for proteins with similar folding patterns are slow and computationally intensive. The sharp growth in the number of known protein structures poses severe challenges for methods of structural comparison. There is a need for methods that can search the database of structures accurately and rapidly.
We provide several methods to search for similar folding patterns using a concise tableau representation of proteins that encodes the relative geometry of secondary structural elements. Our first approach allows the extraction of identical and very closely-related protein folding patterns in constant-time (per hit). Next, we address the hard computational problem of extraction of maximally-similar subtableaux, when comparing two tableaux. We solve the problem using Quadratic and Linear integer programming formulations and demonstrate their power to identify subtle structural similarities, especially when protein structures significantly diverge. Finally, we describe a rapid and accurate method for comparing a query structure against a database of protein domains, TableauSearch. TableauSearch is rapid enough to search the entire structural database in seconds on a standard desktop computer. Our analysis of TableauSearch on many queries shows that the method is very accurate in identifying similarities of folding patterns, even between distantly related proteins.
Availability: A web server implementing the TableauSearch is available from http://hollywood.bx.psu.edu/TabSearch
Contact: arun{at}bx.psu.edu, aml25{at}psu.edu
Supplementary information: Supplementary Data are available at Bioinformatics online.
Associate Editor: Keith Crandall
Received on November 14, 2007; revised on December 14, 2007; accepted on December 29, 2007