Bioinformatics Advance Access originally published online on September 27, 2006
Bioinformatics 2006 22(22):2821-2822; doi:10.1093/bioinformatics/btl493
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
CAPS: coevolution analysis using protein sequences
Evolutionary Genetics and Bioinformatics Laboratory, Department of Genetics Smurfit Institute of Genetics, University of Dublin, Trinity College, Dublin 2, Dublin, Ireland
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: Coevolution Analysis using Protein Sequences (CAPS) is a PERL based software that identifies co-evolution between amino acid sites. Blosum-corrected amino acid distances are used to identify amino acid co-variation. The phylogenetic sequence relationships are used to remove the phylogenetic and stochastic dependencies between sites. The 3D protein structure is used to identify the nature of the dependencies between co-evolving amino acid sites. Friendly interpretable output files are generated.
Availability: CAPS version 1 is available at http://bioinf.gen.tcd.ie/~faresm/software/caps/. Distribution versions for Linux/Unix, Mac OS X and Windows operating systems are available, including manual and example files.
Contact: faresm{at}tcd.ie
| INTRODUCTION |
|---|
|
|
|---|
Proteins are linearly synthesised in the cytosol of the cell and they normally go through complex folding processes to acquire their final productive conformation. These folding processes make possible the 3D proximity between sites that are distinct in the sequence. Amino acid sites functionally/structurally linked to other regions in the protein will be subjected to stronger selective constraints because of the dramatic effects that changes at these sites on the nearby regions of the protein. The evolution of amino acid sites is hence multi-factorial depending on their intrinsic mutation rates and the constraints imposed by their complex co-evolutionary networks (Fares, 2006). This brings into question the use of one codon site as the unit of selection as has been previously shown (Hughes and Nei, 1989; Marin. et al., 2001).
Co-evolution between amino acid sites can be detected using non-parametric (e.g. Korber et al., 1993; Tillier et al., 2006) as well as parametric methods (e.g. Fares and Travers, 2006; Pollock et al., 1999). When a phylogenetic tree and a 3D protein structure are provided, distinguishing functional co-evolution from phylogenetic and stochastic co-variation becomes more approachable (Fares and Travers, 2006). CAPS provides a mathematically simple and computationally feasible way to uncover the co-evolutionary networks between amino acid sites within a protein (Fig. 1). Briefly, CAPS identifies co-evolving amino acid site pairs (e and k) by measuring the correlated evolutionary variation at these sites. Evolutionary variation is measured using time-corrected Blosum values for the transition between two amino acids at a particular site when comparing sequence i to sequence j at sites e and k (
ek)ij. The transition between two amino acids at each site (sites e and k) is corrected by the divergence time of the sequences (taxa) i and j. The time is estimated as the mean number of substitutions per synonymous site between the two sequences being compared (Fares and Travers, 2006). Correlation of the mean variability is measured using the Pearson coefficient. Finally, the significance of the correlation coefficients is estimated by comparing the real correlation coefficients to the distribution of re-sampled correlation coefficients. Only co-evolving sites parsimony informative (presenting significant variability) are considered. Further, a step-down permutational procedure is applied to correct for multiple testing and non-independence of data (Westfall and Young, 1993).
|
A sub-program named CladesCAPS.pl that runs CAPS after eliminating the user-specified phylogenetic clades removes the phylogenetic co-evolution. Finally, output files including the final set of functionally/structurally important sites are generated. When the crystal protein structure is available, CAPS also tests the significance of the distance between the amino acid sites identified as co-evolving, providing useful information about the type of co-evolution (e.g. functional or structural co-evolution).
In addition to the implementation of the method previously published (Fares and Travers, 2006), CAPS also performs a preliminary analysis of compensatory mutations by testing the correlation in the hydrophobicity as well as in the molecular weight variations between co-evolving amino acids. Inter-protein co-evolution, in addition to the intra-molecular co-evolution analysis developed previously (Fares and Travers, 2006), is also an option in CAPS.
The emphasis in CAPS has been centred on four main points: sensitivity of the co-evolutionary analyses, automatic performance, accessibility and ability to compute highly populated multiple sequence alignments. A protein-coding or amino acid multiple sequence alignment is required in one of the standard formats used in other programs (PHYLIP, MEGA or FASTA). The program generates an output file that summarises the results of co-evolution, including a table with all the parameters estimated. Several Excel readable files are also generated for an easier interpretation of the results. For each co-evolving pair of sites, the site location in the reference sequence for which the 3D structure is available is provided together with the site location in the alignment. Correlation of hydrophobicities and molecular weights for the pairs of co-evolving sites are also provided.
The performance of the algorithm together with the sensitivity of the method has been examined in several proteins (Fares and Travers, 2006). Although no limit in the length of the sequences is required, long and populated multiple sequence alignments (e.g. multiple sequence alignments containing 20 sequences or more) provide very accurate results. A limitation of CAPS is that the method does not account for recombination. We will upgrade CAPS in further versions to include other analyses such as the more exhaustive identification of compensatory mutations (conditional advantageous mutations) and the prediction of proteinprotein interfaces.
| Acknowledgments |
|---|
We would like to thank beta testers of CAPS for identifying bugs. We are especially thankful to Dr David Posada for helpful comments and algorithm suggestions for CAPS. This work was supported by Science Foundation Ireland.
Conflict of Interest: none declared
| FOOTNOTES |
|---|
Associate Editor: Alfonso Valencia
Received on August 11, 2006; revised on September 12, 2006; accepted on September 19, 2006
| REFERENCES |
|---|
|
|
|---|
Fares, M.A. (2006) Computational and statistical methods to detect the various dimensions of protein evolution. Curr. Bioinform, . 1, 207217.
Fares, M.A. and Travers, S.A. (2006) A novel method for detecting intramolecular coevolution: adding a further dimension to selective constraints analyses. Genetics, 173, 923
Hughes, A.L. and Nei, M. (1989) Nucleotide substitution at major histocompatibility complex class II loci: evidence for overdominant selection. Proc. Natl Acad. Sci. USA, 86, 958962
Korber, B.T., et al. (1993) Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. Proc. Natl Acad. Sci. USA, 90, 71767180
Marin, I., et al. (2001) Detecting changes in the functional constraints of paralogous genes. J. Mol. Evol, . 52, 1728[Web of Science][Medline].
Pollock, D.D., et al. (1999) Coevolving protein residues: maximum likelihood identification and relationship to structure. J. Mol. Biol, . 287, 187198[CrossRef][Web of Science][Medline].
Tillier, E.R., et al. (2006) Codep: maximizing co-evolutionary interdependencies to discover interacting proteins. Proteins, 63, 822831[CrossRef][Web of Science][Medline].
Westfall, P.H. and Young, S.S. Resampling-Based Multiple Testing, (1993) , New York John Wiley & Sons.
This article has been cited by other articles:
![]() |
R. Gouveia-Oliveira, F. S. Roque, R. Wernersson, T. Sicheritz-Ponten, P. W. Sackett, A. Molgaard, and A. G. Pedersen InterMap3D: predicting and visualizing co-evolving protein residues Bioinformatics, August 1, 2009; 25(15): 1963 - 1965. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. A. A. Travers, D. C. Tully, G. P. McCormack, and M. A. Fares A Study of the Coevolutionary Patterns Operating within the env Gene of the HIV-1 Group M Subtypes Mol. Biol. Evol., December 1, 2007; 24(12): 2787 - 2801. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Ruano-Rubio and M. A. Fares Testing the Neutral Fixation of Hetero-Oligomerism in the Archaeal Chaperonin CCT Mol. Biol. Evol., June 1, 2007; 24(6): 1384 - 1396. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. A. A. Travers and M. A. Fares Functional Coevolutionary Networks of the Hsp70-Hop-Hsp90 System Revealed through Computational Analyses Mol. Biol. Evol., April 1, 2007; 24(4): 1032 - 1044. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


