Bioinformatics Advance Access published online on February 5, 2004
Bioinformatics, doi:10.1093/bioinformatics/btg492
Bioinformatics © Oxford University Press 2004; all rights reserved
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Argonne National Laboratory, 9700 S. Cass Avenue, Argonne, IL 60439
* To whom correspondence should be addressed. E-mail: rvilim{at}anl.gov.
Motivation: Methods that focus on secondary structures, such as Position Specific Scoring Matrices and Hidden Markov Models, have proved useful for assigning proteins to families. However, for assigning proteins to an attribute class within a family these methods may introduce more free parameters than are needed. There are fewer members and there is less variability among sequences within a family. We describe a method for organizing proteins in a family that exhibits up to an order of magnitude reduction in the number of parameters. The basis is the log odds ratio commonly used to measure similarity. We adapt this to characterize the sequence dissimilarities that give rise to attribute differentiation. This leads to the definition of Class Attribute Substitution Matrices (CLASSUM), a dual of the BLOSUM matrices. Results: The method was applied to hierarchically classify sequences in the lambda and kappa subgroups of the immunoglobulin superfamily. Positions conferring class were identified based on the degree of amino acid variability at a position. The CLASSUM matrices computed for these positions classified better than 90% of test data correctly compared with 35-50% for BLOSUM-62. The expected value for a random matrix is 14%. The results suggest that family-specific data-derived substitution matrices can improve the resolution of automated methods that use generic substitution matrices for searching for and classifying proteins.
Revised October 15, 2003
Accepted October 16, 2003
Article
Fold-specific substitution matrices for protein classification
2 Computer Engineering Department, University of Cincinnati, Cincinnati, OH 45221
3 Computer Science Department, University of Illinois, Urbana, IL 61801
![]()
Abstract ![]()
CiteULike
Connotea
Del.icio.us What's this?