Bioinformatics Advance Access published online on March 16, 2009
Bioinformatics, doi:10.1093/bioinformatics/btp149
Using multi-data hidden Markov models trained on local neighborhoods of protein structure to predict residue-residue contacts
1 The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden.
2 Department of Biophysics, Faculty of Physics, University of Warsaw, Warsaw, Poland.
3 UC Davis Genome Centre, UC Davis, USA.
4 Umeå Plant Science Centre, Department of Plant Physiology, Umeå University, Umeå, Sweden.
5 Stockholm Bioinformatics Center, Albanova, Stockholm University, 10691 Stockholm, Sweden
*To whom correspondence should be addressed. Mr. Patrik Björkholm, E-mail: pbh{at}sbc.su.se
| Abstract |
|---|
Motivation: Correct prediction of residue-residue contacts in pro-teins that lack good templates with known structure would take ab initio protein structure prediction a large step forward. The lack of correct contacts, and in particular long-range contacts, is considered the main reason why these methods often fail.
Results: We propose a novel hidden Markov model based method for predicting residue-residue contacts from protein sequences using as training data homologous sequences, predicted secondary struc-ture and a library of local neighborhoods (local descriptors of protein structure). The library consists of recurring structural entities in-corporating short-, medium- and long-range interactions and is gen-eral enough to reassemble the cores of nearly all proteins in the PDB. The method is tested on an external test set of 606 domains with no significant sequence similarity to the training set as well as 151 domains with SCOP folds not present in the training set. Con-sidering the top 0.2 L predictions (L = sequence length), our hidden Markov models obtained an accuracy of 22.8% for long-range inter-actions in new fold targets, and an average accuracy of 28.6% for long-, medium- and short-range contacts. This is a significant per-formance increase over currently available methods when compar-ing against results published in the literature.
Availability: http://predictioncenter.org/Services/FragHMMent/
Contact: torgeir.hvidsten{at}plantphys.umu.se
Supplementary information: Supplementary data available at Bioin-formatics online.
Associate Editor: Prof. Anna Tramontano
Received on October 9, 2008; revised on February 24, 2009; accepted on March 14, 2009