Bioinformatics Advance Access published online on April 23, 2009
Bioinformatics, doi:10.1093/bioinformatics/btp265
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Augmented Training of Hidden Markov Models to Recognize Remote Homologs via Simulated Evolution
1Department of Computer Science, Tufts University, Medford, MA, USA.
*To whom correspondence should be addressed. Anoop Kumar and Lenore Cowen, E-mail: anoop.kumar{at}tufts.edu, lenore.cowen{at}tufts.edu
| Abstract |
|---|
Motivation: While profile hidden Markov models (HMMs) are successful and powerful methods to recognize homologous proteins, they can break down when homology becomes too distant due to lack of sufficient training data. We show that we can improve the performance of HMMs in this domain by using a simple simulated model of evolution to create an augmented training set.
Results: We show, in two different remote protein homolog tasks, that HMMs whose training is augmented with simulated evolution outperform HMMs trained only on real data. We find that a mutation rate between 15 and 20 percent performs best for recognizing G-protein coupled receptor proteins in different classes, and for recognizing SCOP superfamily proteins from different families.
Associate Editor: Prof. John Quackenbush
Received on November 19, 2008; revised on March 31, 2009; accepted on April 14, 2009