Bioinformatics Advance Access originally published online on April 8, 2004
Bioinformatics 2004 20(15):2345-2354; doi:10.1093/bioinformatics/bth245
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics 20(15) © Oxford University Press 2004; all rights reserved.
Proteins associated with diseases show enhanced sequence correlation between charged residues
Institute for Physical Science and Technology, University of Maryland, College Park, MD 20742, USA
Received on October 21, 2003; revised on February 23, 2004; accepted on March 11, 2004
Advance Access Publication April 8, 2004
Motivation: Function of proteins or a network of interacting proteins often involves communication between residues that are well separated in sequence. The classic example is the participation of distant residues in allosteric regulation. Bioinformatic and structural analysis methods have been introduced to infer residues that are correlated. Recently, increasing attention has been paid to obtain the sequence properties that determine the tendency of disease-related proteins (Aß peptides, prion proteins, transthyretin, etc.) to aggregate and form fibrils. Motivated in part by the need to identify sequence characteristics that indicate a tendency to aggregate, we introduce a general method that probes covariations in charged residues along the sequence in a given protein family. The method, which involves computing the sequence correlation entropy (SCE) using the quenched probability Psk(i,j) of finding a residue pair at a given sequence separation, sk, allows us to classify protein families in terms of their SCE. Our general approach may be a useful way in obtaining evolutionary covariations of amino acid residues on a genome wide level.
Results: We use a combination of SCE and clustering based on the principle component analysis to classify the protein families. From an analysis of 839 families, covering
500 000 sequences, we find that proteins with relatively low values of SCE are predominantly associated with various diseases. In several families, residues that give rise to peaks in Psk(i,j) are clustered in the three-dimensional structure. For the class of proteins with low SCE values, there are significant numbers of mixed charged-hydrophobic (CH) and charged-polar (CP) runs. Our findings suggest that the low values of SCE and the presence of (CH) and/or (CP) may be indicative of disease association or tendency to aggregate. Our results led to the hypothesis that functions of proteins with similar SCE values may be linked. The hypothesis is validated with a few anecdotal examples. The present results also lead to the prediction that the overall charge correlations in proteins affect the kinetics of amyloid formationa feature that is common to all proteins implicated in neurodegenerative diseases.
Contact: dimar{at}Glue.umd.edu; thirum{at}Glue.umd.edu
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
P. Wong, A. Fritz, and D. Frishman Designability, aggregation propensity and duplication of disease-associated proteins Protein Eng. Des. Sel., October 1, 2005; 18(10): 503 - 508. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Noivirt, M. Eisenstein, and A. Horovitz Detection and reduction of evolutionary noise in correlated mutation analysis Protein Eng. Des. Sel., May 1, 2005; 18(5): 247 - 253. [Abstract] [Full Text] [PDF] |
||||
