Skip Navigation


Bioinformatics Advance Access originally published online on November 14, 2006
Bioinformatics 2007 23(2):245-246; doi:10.1093/bioinformatics/btl566
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/2/245    most recent
btl566v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Kim, S.
Right arrow Articles by Vannucci, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kim, S.
Right arrow Articles by Vannucci, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Detecting protein dissimilarities in multiple alignments using Bayesian variable selection

Sinae Kim , Jerry Tsai 1, Ioannis Kagiampakis 1, Patricia LiWang 1 and Marina Vannucci 2,*

Department of Biostatistics, University of Michigan Ann Arbor, MI, USA
1 Department of Biochemistry and Biophysics, Texas A&M University College Station, TX, USA
2 Department of Statistics, Texas A&M University College Station, TX, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 REFERENCES
 

Motivation: We present an application of Bayesian variable selection to the novel detection of sequence elements that confer negative design to protein structure and function. As an illustration, we analyze the different dimer interfaces between the CXCL8 chemokine family with the CCL4 and CCL2 chemokine families to discover the changes that disfavor CXCL8 of quaternary structure.

Results: In comparison with known experimental results, our method identifies evolutionarily conserved sequence changes in the CC families that inhibit CXCL8 quaternary structure. Therefore, we find positive selection of negative design elements. Furthermore, our approach predicts that a two-residue deletion conserved in the CCL4 chemokine family disfavors CXCL8 dimerization.

Availability: The Matlab code for the Bayesian variable selection is freely available at http://stat.tamu.edu/~mvannucci/webpages/codes.html

Contact: mvannucci{at}stat.tamu.edu


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 REFERENCES
 
The purpose of this short note is to show how Bayesian variable selection can detect conserved dissimilarities in a comparison of two multiple sequence alignments. This application is one of the first computational approaches to address an important and subtle biological concept of evolutionarily conserved negative design elements: sequences that disfavor a particular function and/or interaction. For illustration purposes, we investigate the different dimerization forms exhibited by chemokines with the same protein fold. We aim at detecting the conserved negative design elements in two CC chemokine families that inhibit them from forming the CXCL8 dimer. To identify these negative design elements, we frame this problem in terms of a classification model with variable selection and apply a Bayesian method, developed by Vannucci and collaborators (Sha et al., 2004). We provide a brief description on how to use the suite of Matlab functions available on the web. In our analysis, we compare our method's results with experimental data and find that the positions identified are indicative of negative design. In addition, we predict a conserved two-residue deletion that disfavors CXCL8 dimer formation.

1.1 Chemokine dimerization interface
Chemokines are extracellular signaling proteins that play a general role in the innate and adaptive immune response, angiogenesis, cancer and wound healing. One family of chemokines possesses the same structural fold, a single {alpha}-helix over a three stranded ß-sheet, but exhibits different functions (Lodi et al., 1994). In mammals, these chemokines are characterized by four conserved cysteines and are categorized based on whether the first two cysteines are adjacent (CC) or separated by an amino acid (CXC) or 3 amino acid (CX3C). These chemokines' quaternary structure is distinct for each group. For example, CXC chemokines dimerize using the first ß-strand and {alpha}-helix, whereas CC chemokines interact across their uncoiled, extended N-terminus (Lodi et al., 1994).

In this study, our aim is to identify the negative design sequences in the CC chemokine family that disfavors the formation of the CXCL8 dimer interaction. In order to do this we consider a multiple sequence alignment (MSA) of the CXCL8 protein separately with two CC chemokine families. While MSA is commonly used to discover evolutionary conservation, here we want to computationally detect residues that are conserved within the CC chemokine family but are not conserved within the sequences contributing to the CXCL8. We therefore limit the sequence comparison to only an evolutionarily related subset (clade) of the CC chemokine family and consider only those residues involved in the CXCL8 dimer interface. Based on the phylogenetic tree of the chemokine family (Huising et al., 2003), we selected the CCL4 and CCL2 clades consisting of 15 and 23 sequences, respectively, to compare with the CXCL8 sequence (39 total). A structural analysis (Fischer et al., 2006) of the molecular, quaternary contacts in the CXCL8 interface identified 58 positions (input variables) involved in the dimer interaction. The results we report were obtained by using as input data the profiles obtained by quantifying the amino acid information with their corresponding hydrophobicity scores. Similar results were obtained with log of the volume.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 REFERENCES
 
2.1 Bayesian variable selection
In order to identify structural dissimilarities in protein alignments we have applied the well-known variable selection method for classification with probit models proposed by Sha et al. (2004). Latent variables are introduced to transform the model into a normal linear model. The number of explanatory variables, p, can greatly exceed the sample size. Bayesian variable selection is done via a binary vector with p entries that identifies the different sets of variables. The marginal posterior distribution of this binary vector is derived and Markov chain Monte Carlo algorithms are used to sample from its posterior distribution. The method allows the identification of sets of discriminating variables and the classification of future samples (via least squares or Bayesian model averaging).

2.2 Matlab suite
For our example we have used a Matlab suite, written by M. Vannucci with contributions by N. Sha (UT El Paso), recently made available on the web. The software was originally written for an application of Bayesian variable selection to classification problems involving microarray data. The implementation of the probit model is quite general and allows binary, multinomial and ordinal responses. The software also includes various parameter choices for the prior on the regression coefficients and on the binary selection vector. Inference is done efficiently by integrating out the model parameters. Posterior probabilities are calculated via fast QR updating. Users must pre-process the data by centering the training data and subtracting the training means from the future data (if any). Code and related documentation (bvsprob.tar) can be freely downloaded from the web page of the corresponding author.

The main function of bvsprob.tar is bvsme_prob.m and requires as input: the data, the hyperparameter specifications (the variance parameters of the prior on the intercept, h, and on the regression coefficients, Hg, and the parameter, w, of the Beta-Bernoulli prior on the binary indicator) and the Metropolis parameters (initial number of included variables, total number of MCMC iterations and burn-in). Guidelines for the choices of these parameters, together with a discussion of sensitivity issues, are given in Sha et al., 2004. The main function bvsme_prob.m returns as output: the list of all visited variable subsets and their relative posterior probabilities, the normalized ordered relative probabilities of distinct visited models and the marginal probabilities of inclusion of the single variables. Distinct visited models and their posterior probabilities can serve as input to miscler.m for least squares or Bayesian model averaging predictions, or to crosspred.m for cross-validation predictions.


    3 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 REFERENCES
 
We first looked at two-group comparisons of CXCL8 versus CCL2, CXCL8 versus CCL4 and also CCL4 versus CCL2. We set the hyperparameters to Hg = I, h = 106 and w = 5, and ran MCMC chains with 50 000 iterations and 10 000 burn-in. Each chain took about 7 min on a desktop computer. The method identified four sites with high probability of being discriminatory of CXCL8 versus CCL2 and seven sites for CXCL8 versus CCL4. These are reported in Table 1. Cross-validated prediction errors were 0 and 1 for CXCL8 versus CCL2 and CXCL8 versus CCL4, respectively. When comparing CCL4 versus CCL2, positions corresponding to residue numbers 27 and 66 were identified with a 0 misclassification error. Finally, sites 26, 27, 38, 62 and 66 were identified in the 3 group comparison of CXCL8, CCL2 and CCL4.


View this table:
[in this window]
[in a new window]

 
Table 1 Mutations inducing monomeric CXCL8

 
To assess the predictive ability of the Bayesian variable selection method in identifying negative design changes in the chemokine family, we have compared the identified mutational sites against experimental results. We concentrated on the CXCL8 sequence against the two sequence clades CCL4 and CCL2. Table 1 compares our results with those from the four mutational studies that have tried to inhibit the CXCL8 dimer, while retaining the chemokine fold. The data shows that there are more ways to push CXCL8 to a monomeric state and some positions predicted by the Bayesian method nicely overlap with the experimental mutations creating a monomer, especially in residues from 25 and 27 that participate in the ß-sheet interaction across the CXCL8 dimer. Also, positions predicted not to disrupt the CXCL8 dimer exhibit weak dimerization or no disruption of the dimer. We are currently in the process of experimentally testing the importance of the two-residue deletion at sites 26 and 27. Our approach also points to the second, internal ß-strand and the beginning of the {alpha}-helix as important to the CXCL8 dimer interface, which will be tested. Overall, our comparison with experimental results indicates that the Bayesian variable selection method is able to identify sites of evolutionarily conserved negative design.


    Acknowledgments
 
M.V. is supported by NIH-R01HG003319 and NSF/DMS-0600416. J.T. has support from NIH/NCI-R25CA090801. P.L. and I.K. are supported by NIH-RO1AI47832, NIH-RO1AI070993 and a Welch Foundation Grant A1472.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Martin Bishop

Received on August 22, 2006; revised on November 6, 2006; accepted on November 7, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 REFERENCES
 

    Fischer, T.B., et al. (2006) Assessing methods for identifying pair-wise atomic contacts across binding interfaces. J. Struct. Biol, . 153, 103–112[CrossRef][Web of Science][Medline].

    Huising, M.O., et al. (2003) Molecular evolution of CXC chemokines: extant CXC chemokines originate from the CNS. Trends Immunol, . 24, 307–313[Web of Science][Medline].

    Jin, H., et al. (2005) Investigation of CC and CXC chemokine quaternary state mutants. Biochem. Biophys. Res. Commun, . 338, 987–999[CrossRef][Web of Science][Medline].

    Kabsch, W. and Sander, C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22, 2577–2637[CrossRef][Web of Science][Medline].

    Lodi, P.J., et al. (1994) High-resolution solution structure of the beta chemokine hMIP-1 beta by multidimensional NMR. Science, 263, 1762–1767[Abstract/Free Full Text].

    Lowman, H.B., et al. (1997) Monomeric variants of IL-8: effects of side chain substitutions and solution conditions upon dimer formation. Protein Sci, . 6, 598–608[Web of Science][Medline].

    Lusti-Narasimhan, M., et al. (1996) A molecular switch of chemokine receptor selectivity. Chemical modification of the interleukin-8 Leu25 -> Cys mutant. J. Biol. Chem, . 271, 3148–3153[Abstract/Free Full Text].

    Sha, N., et al. (2004) Bayesian variable selection in multinomial probit models to identify molecular signatures of disease stage. Biometrics, 60, 812–819[CrossRef][Web of Science][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Proc. Natl. Acad. Sci. USAHome page
E. Akiva, Z. Itzhaki, and H. Margalit
Built-in loops allow versatility in domain-domain interactions: Lessons from self-interacting domains
PNAS, September 9, 2008; 105(36): 13292 - 13297.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/2/245    most recent
btl566v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Kim, S.
Right arrow Articles by Vannucci, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kim, S.
Right arrow Articles by Vannucci, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?