Skip Navigation


Bioinformatics Advance Access originally published online on March 6, 2007
Bioinformatics 2007 23(9):1099-1105; doi:10.1093/bioinformatics/btm073
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/9/1099    most recent
btm073v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (6)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Smith, R. E.
Right arrow Articles by Blundell, T. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Smith, R. E.
Right arrow Articles by Blundell, T. L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Andante: reducing side-chain rotamer search space during comparative modeling using environment-specific substitution probabilities

Richard E. Smith 1,*, Simon C. Lovell 1,{dagger}, David F. Burke 1, Rinaldo W. Montalvao 1 and Tom L. Blundell 1

1Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 

Motivation: The accurate placement of side chains in computational protein modeling and design involves the searching of vast numbers of rotamer combinations.

Results: We have applied the information contained within structurally aligned homologous families, in the form of conserved {chi} angle conservation rules, to the problem of the comparative modeling. This allows the accurate borrowing of entire side-chain conformations and/or the restriction to high probability rotamer bins. The application of these rules consistently reduces the number of rotamer combinations that need to be searched to trivial values and also reduces the overall side-chain root mean square deviation (rmsd) of the final model. The approach is complementary to current side-chain placement algorithms that use the decomposition of interacting clusters to increase the speed of the placement process.

Contact: res50{at}mole.bio.cam.ac.uk

Supplementary information: Supplementary data are available at Bioinformatics online.


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
The accurate selection of amino acid side-chain conformations is an ever-present challenge in the computer modeling of protein structures. The major limiting factor is that the number of rotamer combinations increases rapidly, resulting in an NP hard optimization problem that quickly makes exhaustive searches computationally intractable (Pierce and Winfree, 2002). A variety of dedicated search algorithms has been devised to explore side-chain conformational space. These algorithms assume a correct, fixed backbone conformation and then try to solve the combinational optimization problem to find an accurate ensemble of side-chain conformations for the defined backbone by means of some test function. Assuming the backbone topology is accurate, the placement of side chains then becomes a search to identify the global minimum energy conformation for a set of side-chain conformations defined by the target protein's sequence. Approximating the global minimum energy conformation is made possible by the use of knowledge-based, representative libraries of statistically significant, low energy conformations side-chain conformations (De Maeyer et al., 1997; Dunbrack and Cohen, 1997; Dunbrack and Karplus, 1993; Lovell et al., 2000; Ponder and Richards, 1987) commonly referred to as rotamers. There are numerous different search strategies that make use of these rotamer libraries, the most common of which are Monte Carlo methods (Holm and Sander, 1991), genetic algorithms (Tuffery et al., 1991), simulated annealing (Lee and Subbiah, 1991), mean field optimization (Delarue and Koehl, 1997; Vasquez, 1995), dead-end elimination algorithms (Goldstein, 1994; Lasters and Desmet, 1993; Lasters et al., 1995; Looger and Hellinga, 2001; Pierce et al., 2000), integer programming approaches (Kingsford et al., 2005), tree-searching algorithms (Bower et al., 1997; Canutescu et al., 2003; Gordon and Mayo, 1999) and graph theory approaches (Canutescu et al., 2003; Xie and Sahinidis, 2006; Xu, 2005). These approaches have various completion times, but a reduction in run time usually has a corresponding tradeoff in accuracy (Voigt et al., 2000).

In the context of comparative modeling, structural information from homologous templates can also be exploited in the side-chain placement problem. Sutcliffe et al. (Sutcliffe et al., 1987) derived 20 x 20 rules for building side chains from template conformations based on an analysis of positions of atoms in substituted amino acids in families of homologous proteins. Summers et al. (Summers and Karplus, 1989; Summers et al., 1987) analyzed the conservation of {chi} angles at topologically equivalent positions in a small set of proteins and used the information to model side chains in rhizopuspepsin (Summers and Karplus, 1989). Ogata and Umeyama (Ogata and Umeyama, 1998) applied a similar approach to tightly clustered positions of superimposed homologues and investigated the influence of local space homology on the effects of {chi} angle conservation. Sali and Blundell (Sali and Blundell, 1993) modeled protein side chains by satisfaction of spatial restraints making use of probability density functions of rotamer conformations derived from the structures of a large number of homologous families (Sali and Blundell, 1993). In the work presented here, we utilize the structural environment information for amino acid residues contained in the HOMSTRAD (Mizuguchi et al., 1998; Stebbings and Mizuguchi, 2004) database to extend the {chi} angle conservation approach to substitution of residues classified in six structural environments and four percentage identity (PID) ranges for {chi}1, {chi}1 + 2 and {chi}1 + 2 + 3 angles. The usefulness of environment-specific propensities(Rice and Eisenberg, 1997; Shi et al., 2001) and substitution tables (Overington et al., 1990; Overington et al., 1992) has been successfully demonstrated in the areas of sequence-structure homology fold recognition (Rice and Eisenberg, 1997; Shi et al., 2001). Substitution probabilities have also been used for fragment ranking in homology modeling (Topham et al., 1993). The goal of this work is to make extensive use of the {chi} angle conservation probabilities in order to contain the explosion of rotamer combinations in two separate ways. The program Andante systematically applies these probabilities as preliminary filters to ‘borrow’ (use exact parent {chi} angles) entire, high probability, side-chain conformations and/or then restrict possible rotamer solutions for non-borrowed positions to high probability {chi} angle bins. This results in a significant reduction in the rotamer search space during the rest of the side-chain placement process, principally because the size of the interacting clusters of amino acid side chains that must be solved is much smaller. The application of homologous template information results in models with reduced side-chain root mean square deviation (rmsd) to the original target that can be seen by comparison of the borrowed conformations with the equivalent rotameric solutions provided by SCWRL3.0 in the absence of using information from homologues. The stand-alone SCWRL programs do have the facility to retain desired side-chain conformations (Bower et al., 1997; Canutescu et al., 2003). However, they do not have an automated, pre-processing step to evaluate multiple homologous template information. In order to allow comparison, we also present results obtained by SCWRL3.0 when it is provided with borrowed side-chain information extracted by the Andante program.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
2.1 Environment-specific {chi} angle conservation analysis
In order to derive borrowing/restricting rules for amino acid side-chain conformations, the structural alignments of 637 families contained in the HOMSTRAD database were analyzed for substitutions of residue A in a specific environment, E, to residue B in any environment. Six structural environments were defined by considering all combinations of three secondary structure states (helix, strand and coil) and two measures of solvent accessibility (buried or exposed). The structural environments were assigned using the Joy suite of programs (Mizuguchi et al., 1998). The analysis of {chi} angle conservation was carried out for the following residue types: all residues (excluding Ala and Gly) for {chi}1; DEFHIKLMNPQRWY for {chi}1 + 2; and EKMQR for {chi}1 + 2 + 3. Pro has four {chi} angles, but {chi}1 defines the other three. For Pro residues, there were no Pro–non-Pro substitutions observed with a high probability of {chi}1 + 2 or {chi}1 alone being conserved, therefore, building/restricting information is limited to Pro–Pro substitutions only. The analysis of {chi}1 + 2 + 3 for Glu, Gln and Met was performed to investigate whether entire conformations could be borrowed in certain circumstances. Arg and Lys were included in this group in order to establish if restricting out to the {chi}3 angles could be performed. Only crystal structures with a resolution of 2.5 Å better were examined. Substitutions to and from incomplete side chains and side chains with atoms having occupancy <1.0 were ignored. Only side chains with all atoms having a B-factor <40.0 were used. {chi} angles were considered conserved if found to be in the range +/–30° of each other. The results were partitioned into four percentage identity ranges, 0–19, 20–39, 40–59, >60%, respectively, based on the structural alignments contained in HOMSTRAD. The limitation of a maximum of 90% PID for HOMSTRAD family members means each family can contribute to the statistics in each of the four PID ranges. This means larger multimember families do not heavily bias the statistic in any single PID range. Corrections for two-fold, symmetric side chains and chemically equivalent positions were carried out in an analogous manner to that outlined in Summers et al. (Summers and Karplus, 1989) and Ogata and Umeyama (Ogata and Umeyama, 1998). No corrections were made for Leu or Val residues. The environment-specific {chi} angle conservation probabilities were calculated from the HOMSTRAD structural alignments as follows. For each alignment position, i, {chi} angle conservation ({chi}1, {chi}1 + 2, {chi}1 + 2 + 3 where applicable) substitutions were counted for residue type A in environment E, going to residue type B in any environment, Formula The reverse substitution was also counted, Formula , as substitution patterns are not necessarily symmetric. The counts were tabulated for four PID ranges (0–19, 20–39, 40–59, 60+% PID). The probability of {chi} angle conservation, Pc, (Equation 1) was given by total number of conserved {chi} angles counted for a given substitution, Formula , divided by the raw total of substitutions observed for that given substitution, Formula


Formula 1

(1)
A total of 2046 crystal structures was used giving rise to 831 616 observed substitutions. Substitutions with a low-total observed count (<50 data points) were discarded. This produced 1832 {chi} conservation rules where the probability that the {chi} angle conservation was over 50%. There are 56 rules for {chi}1 + 2 + 3 (41 for buried template environment, 15 for exposed), 424 for {chi}1 + 2 only (217 for buried, 210 for exposed) and 1352 for {chi}1 only (586 for buried, 766 for exposed positions). The energy function used for van der Waals overlap clash checks is that used by Bower and Dunbrack (1997, 2003). The total fixed energy, Eroti,j, for any single rotamer, j of residue i, is given by Equation 2,


Formula 2

(2)
Ebbi,j is the clash energy of the rotamer atoms with the backbone atoms of the target. This is calculated for all non i residue atoms. C-ß atoms are considered to be part of the backbone. The threshold for tolerating backbone clashes was set to 12.0 kcal/mol. Econsi,j is the clash energy with ‘borrowed’ (locked) rotamer conformations. The default threshold for tolerated clashes between borrowed conformations was set to 10.0 kcal/mol. ElogPi,j is a pseudo-energy term based on the log of the rotamer library probability of observing that rotamer conformation. These values are normalized against the probability of the highest library rotamer for a given residue.

2.2 Program flow
2.2.1 Disulphide bond placement
Andante uses the Grade A definition of Sowdhamini et al. (Sowdhamini et al., 1989) in order to build possible disulphide bonds. By definition, a Grade A disulphide bond meets the following criteria: for any pair of Cys residues, Cysi and Cysj, the C{alpha}–C{alpha} and Cß–Cß distances must be less than 7.0 and 4.0 Å, respectively; the dihedral angle across Cßi–Si–Sj–Cßj must be in the range |60–120°| with a Si to Sj distance between 1.8 and 2.2 Å; finally, both the Cys residues must also have a {chi}1 in the range of |30–90°| or |150–180°|. In order to allow for errors in the modeling of the backbones, some flexibility in the Cys rotamers is allowed. For each Cys rotamer, an additional rotamer with a {chi}1 +/–15° is created from each Penultimate Rotamer Library (Lovell et al., 2000) rotamer, giving a total of nine Cys rotamers. All rotamer combinations for all Cys–Cys pairs are searched. If the configuration is considered Grade A, a score (Equation 3) is assigned to that rotamer pair and that Cys–Cys pair.


Formula 3

(3)
where Rss is the Si – Sj distance, Xss is the dihedral angle across Cßi – Si Sj – Cßj, ElogP is the normalized pseudoenergy attributed to each Cys rotamer based on the probability of it being observed. Ebb is the backbone clash energy for rotamers j and j. All rotamer combinations are tried for each Cys–Cys pair within the allowed C{alpha} and Cß cutoffs. The lowest scoring (closest to ideal) disulphide bond found from all pairs is constructed if it fits the Grade A criteria. The search is then repeated until no more disulphide bonds are found.

2.2.2 Construction of high probability side-chain conformations (Borrowing phase)
The application of this type of homologous template information can be performed using any combination of five steps in the order described below. The first three steps involve the ‘borrowing’ of entire rotamer conformations from templates at positions having a high probability that enough {chi} angles to define the entire side-chain are conserved. The final two steps involve restricting the possible rotamer solutions of the non-borrowed positions to library rotamers that have {chi} angles that are within 30° of a suitably high-scoring template position as defined by the conservation probabilities. The numbers of {chi} angles that are used as restrictions depend on which rules are applied. The order that the borrowing/restricting rules are applied is such that the larger, hydrophobic residues (set A: DFHILNYWP) are considered first, followed by set B (set B: EQM), which is much less populated and has considerably fewer borrowing rules (see Table S1). Finally, the smaller residues of set C (set C: STVC) are considered last. The objective of using set A first is to correctly fill out a large volume of the core, as this has numerous downstream benefits (see Results section).

2.2.2.1 {chi}1 + 2 definable positions
The first borrowing step is to build side-chain conformations definable by two {chi} angles (set A). For each sequence-structure alignment position, i, the following occurs. Working in decreasing order of PID to the target, the observed substitution between template and target is determined. If both the template residue and the target residue are members of set A, and the environment-specific probability of that entire rotamer conformation is conserved and exceeds a cutoff threshold, a substitution weight (Equation 4) is generated for that pair of residues.


Formula 4

(4)
where, P12 is the probability that both {chi}1 and {chi}2 angles are conserved for the observed substitution of template residue, A, in environment, E, going to target residue, B. The substitution weight is variable depending on the substitution and the PID between the target and template sequences. Arbitrary weights of 0.6 and 0.4 are assigned to the PID, and probability portions of the substitution weight calculated by Equations 4–6GoGo. This gives preference to higher PID templates when multiple templates are used and identical substitutions exceed the cut-off probability. The entire rotamer conformation is then built on to the backbone template of the target using the necessary {chi} angles obtained from the template residue. If the conformation is compatible with the backbone and does not clash with another higher-scoring borrowed position, then it is retained. If the conformation does clash with another higher scoring borrowed position, then the position with the lower weight is discarded. For each residue, the single highest scoring conformation from all corresponding template positions is retained.

2.2.2.2 {chi}1 + 2 + 3 definable positions (Borrowing phase)
The above procedure is then repeated for residue positions definable by three {chi} angles (set B). These positions are assigned a weight in a similar fashion (Equation 5) to those positions defined by two {chi} angles.


Formula 5

(5)
where, P123 is the probability that all {chi}1 + 2 + 3 angles are conserved for the observed substitution of template residue, A, in environment, E, going to target residue, B.

2.2.2.3 {chi}1 definable positions (Borrowing phase)
The above procedure (2.2.2.1) is then repeated for residue positions definable by one {chi} angle (set C). Here the substitution weight is given by Equation 6.


Formula 6

(6)
where, P1 is the probability that the {chi}1 angle is conserved for the observed substitution. The highest scoring positions are built and checked for structural compatibility as described above. The scaling factor (10.0) ensures that residues with predicted {chi}1 + 2 and {chi}1 + 2 + 3 are retained over those where only {chi}1 is predicted. After the borrowing phase only the highest scoring side-chain conformation for any given residue is retained, though it can be selected from any template structure.

2.2.2.4 Restricting rotamer solutions to {chi} bins
A further use of the homologous template information is the application of the conservation probabilities to any non-borrowed positions. These are positions where the probability that all the {chi} angles necessary to define an entire side-chain conformation are conserved is below the cut-off threshold. However, there may be information that {chi}1 or {chi}1 + 2 may be conserved for an observed substitution. Instead of considering all possible rotamers solutions for these positions, the substitution probabilities are then reapplied and used to restrict possible library rotamer solutions to {chi} angle bins defined by the highest scoring template side-chain conformations. For non-borrowed or non-disulphide bonded positions, initial side-chain conformations and interacting clusters are defined in a manner analogous to that outlined by Bower et al. (Bower et al., 1997). These clusters are searched using a standard simulated annealing protocol to determine the lowest energy ensemble for each cluster. For the jack-knife testing two rotamer libraries were used: the secondary structure-dependent library of Lovell (Lovell et al., 2000) and the backbone-dependent library of Dunbrack (Bower et al., 1997).


    3 RESULTS AND DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
As would be expected the highest probabilities for conserved {chi}-angles are for substitutions involving identical or chemical similar residues at PID >60% in buried environments. Specifically, FY/FY, W/W and H/H make up the majority of rules showing conserved probabilities >90% in this PID range. The lowest probabilities are observed for substitutions from exposed positions. For example, the lowest recorded probabilities were observed at {chi}1 were seen for P (exposed, helix)/E and P (exposed, coil)/E at <11% both in the PID range 0–20%. At {chi}1 + 2, the lowest observed probabilities were for D (coil, exposed)/L (10.1%, PID range 0–19%) and N (coil, exposed)/R (10.2%, PID range 40–59%). An interesting point resulting from the way the rules were derived is that some rules could be considered family specific. For example, considering {chi}1 only, N (buried, helix)/H has a very high probability (~95%) of being conserved in the lowest PID range (0–20%). However, this is a rarely observed substitution only seen in three HOMSTRAD families. Extended listings (down to 10% probability) of all the calculated probabilities used for the jack-knife test are given in the Supplementary Material (see S_chi1.txt, S_chi12.txt, S_chi123.txt). A jack-knife test was performed on 14 HOMSTRAD families (Table 1). The families were selected on the basis that the target structure did not contain any co-factors, ligands or non-standard amino acids. All template information was taken from crystal structures with resolutions of 2.50 Å or better. Also, a minimum of three templates was required in order to make use of the information contained in multiple template structures. Borrowing and restricting thresholds (conserved {chi} probability) were set at 65 and 70%, and both the Dunbrack and Lovell rotamer libraries were used. Setting the cut-off probability threshold at 70% reduced the overall number of borrowing/restricting rules enforceable to 828 from the original 1832 (Table S1). Twelve different combinations of the borrowing/restricting procedures were used to validate the borrowing rules and to investigate the overall effect the method has on the number of combinations of rotamers that need to be searched during the final selection process. Summarized results from the 70% cutoff and the Lovell rotamer library are listed below (Table 2). Full results for both 65 and 70% runs with both the Lovell and Dunbrack rotamer libraries are listed in the Supplementary Material (see Tables S2a and b). Side chains were positioned on the backbone coordinates of the original target structure. The results presented are divided into two categories. The first category was selected to estimate the effect of borrowing entire side-chain conformations on the final side-chain rmsd. The second category was chosen to estimate the reduction in the number of rotamer combinations that need to be searched as a result of using information derived from the template structures. Table 2 shows the effects of addition of homologous template information to the overall side-chain rmsd. For Andante, when only borrowing is enforced (Table 2, row 1 versus row 2) the overall side-chain rmsds produced by Andante are in all 14 cases reduced. Similar trends can be observed when any combination of borrowing or restricting using template information is applied. SCWRL3 results showed a reduction in overall side-chain rmsd in 11/14 cases (rows 7 and 8) when supplied with Andante borrowed side-chain conformations. Row 9 of Table 2 lists the side-chain rmsds of the borrowed positions (TEMPLATE) relative to the observed conformation in the target structure. In all cases, these are considerably lower than the overall side-chain rmsd. These values are further analysed in Tables S3a and S3b. Here, the rmsds of the borrowed conformations are compared to the equivalent rotameric solutions supplied by SCWRL3. For some cases, there is a considerable decrease in the rmsd of the TEMPLATE conformations to the observed conformations in both buried and exposed environments. For aldo/keto reductase family target, 1a4h, the rmsd of the TEMPLATE side chains is reduced by 0.6 Å over 165 positions. A similar improvement is noted for the glutathione S-transferase family. Figure 1a shows the packing of a cluster of several large hydrophobic residues in 1ah4. This example shows that borrowing can lead to a more accurate result compared to Andante or SCWRL3 using different rotamer libraries. A similar example for the glutathione S-transferase family is shown in Figure 1b. Table S3a shows that for 11/14 test sets, the borrowed conformations found in buried positions showed a decreased rmsd compared to rotamer library solutions placed by SCWRL3 for the corresponding side-chain conformation observed in the actual target structure file. Also, 12/14 cases showed decreased rmsd for the equivalent exposed positions (Table S3b). This is encouraging as one would hope to see a reduction of the incorrect packing of pockets of buried, hydrophobic residues using template information. This improvement might come as a result of borrowing an entire set of correct conformations or the selection of one correct conformation that forces other neighbouring conformation into a more accurate orientation. However, an increase in the accuracy of prediction of exposed residues without adding hydrogen bonding or solvation terms to the test function is a bonus. In four cases, the epidermal growth factor-like domain (1hcgb), glutathione S-transferase (1gta), SH3 domain (1shg) and ubiquitin conjugating enzyme (1u9a) families, borrowing of template conformations was very low (<20% of total). This can be attributed to a combination of three factors. First, the rules for borrowing decrease with PID (Table S1). For these four cases, the majority of templates have PIDs in the range 20–39%. Second, as to be expected there are more borrowing rules for entire side-chain conformations from template residues in buried environments (Table S1). For epidermal growth factor-like domain, glutathione S-transferase, sh3 and ubiquitin conjugating enzyme the target structures have 88, 71, 82 and 75% residues that are exposed, respectively. This compares to the aspartic proteinase and aldo/keto reductase cases, which have 65 and 59% exposed residues, respectively, but had borrowed conformation percentages of 52 and 32%, respectively. The final factor is the amino acid composition of the target structure. A high percentage of Ala, Gly, Arg and Lys will reduce the amount of borrowing possible. There are no borrowing rules for complete conformations of Arg and Lys and no rules for Gln, Glu and Met ({chi}1 + 2 + 3) at lower PIDs. The targets for epidermal growth factor-like domain (1hcgb), sh3 (1shg) and ubiquitin conjugating enzyme (1u9a) have 24, 27 and 28% composition of Gly, Ala, Lys and Arg, respectively. This compares to the asp (3app) and aldo/keto reductase family (1ah4) cases that have GAKR compositions of 20 and 22%, respectively. While no major improvement in the overall side-chain rmsd is seen in these cases, the general trend for lower rmsd for the borrowed conformations is still observed (Table S3a and b). Table S5 explores the use of different templates within the family for modeling the target. The trend shown here is an expected one in that the majority of the borrowing comes from the highest PID template structure, although borrowing is still performed from more distant homologues underlining the importance of using multiple family members as templates. A final point regarding the results from the 12 runs using Andante in various borrowing/restricting modes is that in 8/14 cases (see Tables S2a and b) Andante was able to provide the lowest rmsd model if all models produced and results using the Dunbrack rotamer library are considered. The other issue of interest is the extent to which borrowing and restricting influences the size of interacting clusters. Typically, DEE calculations are used to eliminate rotamers and reduce the total number of rotamer combinations that need to be searched. Considering that no DEE is implemented in this version, the reduction in the total number of combinations searched can be quite dramatic while still improving rmsd values. Table 3 lists the values of the total combinations, total backbone compatible combinations and the total number of combinations searched for four targets when various borrowing options are enforced (results for all families are given in Table S5). Compared to cases where no borrowing/restricting rules are applied, simply enabling the borrowing mode reduces the number of backbone compatible rotamer combinations by an average of ~1035 combinations overall families. The values designated ‘S’ in Table 3 are the total number of combinations searched for all identified clusters. In 13 of the 14 cases, the cluster combination totals are reduced to a trivial size that could be searched exhaustively. For the aldo/keto reductase and aspartic proteinase cases, massive reductions (1096 and 1055) in the number of combinations are observed. For three families; epidermal growth factor-like domain, profilin and tumor necrosis factor, the number of combinations is reduced to zero for some runs. Additionally, application of the restricting rules in concert with the borrowing rules further reduces the number of combinations searched (Table S5). If the restricting mode, without borrowing, is applied, similar reductions in the numbers of combinations searched are observed.


View this table:
[in this window]
[in a new window]

 
Table 1. The 14 HOMSTRAD families used for jack-knife test. The full HOMSTRAD family name, plus HOMSTRAD abbreviation. Target = HOMSTRAD structure modeled. Length = number of standard amino acids found in the target. Parent-target PID is listed after the template name

 

View this table:
[in this window]
[in a new window]

 
Table 2. Side-chain rmsd for Andante runs in various operating modes using the Penultimate Rotamer Library

 

View this table:
[in this window]
[in a new window]

 
Table 3. Listing of average numbers of rotamer combinations searched for all benchmark families when Andante is run in various operating modes

 

Figure 1
View larger version (15K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Side-chain orientations for six positions for: the aldo/keto reductase family target, 1a4h (a) and for five positions for the glutathione S-transferase family target, 1gta. (b) Blue shows observed orientations for the target structure. Red shows borrowed conformations from Andante. Black shows SCWRL3 orientations. Grey shows Andante results using the Penultimate Rotamer Library with no borrowing.

 

    4 CONCLUSIONS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
In the work presented here, we have derived environment-specific {chi} angle conversion probabilities for use in comparative modeling. These probabilities were then applied in the form of borrowing/restricting rules during the side-chain selection portion of the process. There are several advantages to implementing the borrowing and restricting rules when using the approach of defining interacting clusters and then determining low energy conformations for those clusters. The most striking is the large reduction in the number of combinations that needs to be searched. The appropriate application of these rules is shown to select accurate conformations from homologous templates giving a decrease in rmsd at the borrowed positions relative equivalent rotameric solutions. Building the borrowed conformations also has a number of knock-on effects if applied to the graph theory-cluster solving approached used by SCWRL3 and other programs by means of reducing the number of combinations that need to be search to resolve the defined clusters. The borrowed/restricted positions prevent the growth of large clusters in three ways. First, borrowing entire side-chain conformations automatically reduces the total numbers of rotamer combinations possible. Second, borrowed positions disrupt the interaction graph; large numbers of rotamers from non-borrowed positions can be eliminated because they are incompatible with the extended backbone, thus restricting the growth of interacting clusters when the initial lowest energy, clashing residues are ‘spun-out’ to define interacting clusters. Third, restricting solutions to defined {chi} bins reduces the number of rotamers that are ‘spun out’, again reducing the number of new interactions that will be identified thus limiting the growth of such clusters. This will result in an increase in speed, as large numbers of smaller clusters can be resolved more quickly than a smaller number of larger clusters. The {chi} angle conservation probabilities may have some application in the area of protein redesign. Any sequence/rotamer combination evaluated on a fixed backbone will have a corresponding PID to the sequence of the protein that adopts the fold in question. Instead of randomly changing rotamers during the repacking phase, it may be possible to use information from the original template. One could attempt to apply the low range (0–39% PID) restricting rules at {chi}1 to limit potential rotamer solutions to similar {chi}1 bin as that observed in the original template. This may improve sampling by directing potential side-chain conformations into the same area of space occupied by the original template residue. Andante is part of the Orchestrar suite of comparative modeling programs (Tripos Inc., USA).


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
Many thanks to Matthias Keil of Tripos Inc., for technical assistance and to Tripos Inc., for funding.

Conflict of Interest: none declared.


    FOOTNOTES
 
{dagger}Present address: University of Manchester, Faculty of Life Sciences, Michael Smith Building, Oxford Road, Manchester, M14 9PT. Back

Associate Editor: Anna Tramontano

Received on July 10, 2006; revised on February 21, 2007; accepted on February 25, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Bower MJ, et al. Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: a new homology modeling tool. J. Mol. Biol. (1997) 267:1268–1282.[CrossRef][Web of Science][Medline]

    Canutescu AA, et al. A graph-theory algorithm for rapid protein side-chain prediction. Protein Sci. (2003) 12(9):2001–2014.[CrossRef][Web of Science][Medline]

    De Maeyer M, et al. All in one: a highly detailed rotamer library improves both accuracy and speed in the modelling of sidechains by dead-end elimination. Fold. Des. (1997) 2:53–66.[CrossRef][Web of Science][Medline]

    Delarue M, Koehl P. The inverse protein folding problem: self consistent mean field optimisation of a structure specific mutation matrix. Pac. Symp. Biocomput. (1997) 109–121.

    Dunbrack RL Jr, Cohen FE. Bayesian statistical analysis of protein side-chain rotamer preferences. Protein Sci. (1997) 6:1661–1681.[Web of Science][Medline]

    Dunbrack RL Jr, Karplus M. Backbone-dependent rotamer library for proteins. Application to side-chain prediction. J. Mol. Biol. (1993) 230:543–574.[CrossRef][Web of Science][Medline]

    Goldstein RF. Efficient rotamer elimination applied to protein side-chains and related spin-glasses. Biophys. J. (1994) 66:1335–1340.[Web of Science][Medline]

    Gordon DB, Mayo SL. Branch-and-terminate: a combinatorial optimization algorithm for protein design. Structure Fold Des. (1999) 7:1089–1098.[Medline]

    Holm L, Sander C. Database algorithm for generating protein backbone and side-chain co-ordinates from a C alpha trace application to model building and detection of co-ordinate errors. J. Mol. Biol. (1991) 218:183–194.[CrossRef][Web of Science][Medline]

    Kingsford CL, et al. Solving and analyzing side-chain positioning problems using linear and integer programming. Bioinformatics (2005) 21:1028–1036.[Abstract/Free Full Text]

    Lasters I, et al. Enhanced dead-end elimination in the search for the global minimum energy conformation of a collection of protein side chains. Protein Eng. (1995) 8:815–822.[Abstract/Free Full Text]

    Lasters I, Desmet J. The fuzzy-end elimination theorem: correctly implementing the side chain placement algorithm based on the dead-end elimination theorem. Protein Eng. (1993) 6:717–722.[Abstract/Free Full Text]

    Lee C, Subbiah S. Prediction of protein side-chain conformation by packing optimization. J. Mol. Biol. (1991) 217:373–388.[CrossRef][Web of Science][Medline]

    Looger LL, Hellinga HW. Generalized dead-end elimination algorithms make large-scale protein side-chain structure prediction tractable: implications for protein design and structural genomics. J. Mol. Biol. (2001) 307:429–445.[CrossRef][Web of Science][Medline]

    Lovell SC, et al. The penultimate rotamer library. Proteins (2000) 40:389–408.[CrossRef][Web of Science][Medline]

    Mizuguchi K, et al. JOY: protein sequence-structure representation and analysis. Bioinformatics (1998a) 14:617–623.[Abstract/Free Full Text]

    Mizuguchi K, et al. HOMSTRAD: a database of protein structure alignments for homologous families. Protein Sci. (1998b) 7:2469–2471.[Web of Science][Medline]

    Ogata K, Umeyama H. The role played by environmental residues on sidechain torsional angles within homologous families of proteins: a new method of sidechain modeling. Proteins-Structure Function And Bioinformatics (1998) 31:355–369.[CrossRef]

    Overington J, et al. Environment-specific amino acid substitution tables: tertiary templates and prediction of protein folds. Protein Sci. (1992) 1(2):216–226.[Web of Science][Medline]

    Overington J, et al. Tertiary structural constraints on protein evolutionary diversity: templates, key residues and structure prediction. Proc. Biol. Sci. (1990) 241:132–145.[Abstract/Free Full Text]

    Pierce NA, et al. Conformational splitting: a more powerful criterion for dead- end elimination. J. Comput. Chem. (2000) 21:999–1009.[CrossRef][Web of Science]

    Pierce NA, Winfree E. Protein design is NP-hard. Protein Eng. (2002) 15:779–782.[Abstract/Free Full Text]

    Ponder JW, Richards FM. Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. J. Mol. Biol. (1987) 193:775–791.[CrossRef][Web of Science][Medline]

    Rice DW, Eisenberg D. A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence. J. Mol. Biol. (1997) 267:1026–1038.[CrossRef][Web of Science][Medline]

    Sali A, Blundell TL. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. (1993) 234:779–815.[CrossRef][Web of Science][Medline]

    Shi J, et al. FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J. Mol. Biol. (2001) 310:243–257.[CrossRef][Web of Science][Medline]

    Sowdhamini R, et al. Stereochemical modeling of disulfide bridges. Criteria for introduction into proteins by site-directed mutagenesis. Protein Eng. (1989) 3:95–103.[Abstract/Free Full Text]

    Stebbings LA, Mizuguchi K. HOMSTRAD: recent developments of the homologous protein structure alignment database. Nucleic Acids Res. (2004) 32:D203–D207.[Abstract/Free Full Text]

    Summers NL, et al. Analysis of side-chain orientations in homologous proteins. J. Mol. Biol. (1987) 196:175–198.[CrossRef][Web of Science][Medline]

    Summers NL, Karplus M. Construction of side-chains in homology modelling. Application to the C-terminal lobe of rhizopuspepsin. J. Mol. Biol. (1989) 210:785–811.[CrossRef][Web of Science][Medline]

    Summers NL, M. Karplus. Modeling of side chains, loops, and insertions in proteins. Meth. Enzymol. (1991) 202:156–204.[Web of Science][Medline]

    Sutcliffe MJ, et al. Knowledge based modelling of homologous proteins, Part II: rules for the conformations of substituted sidechains. Protein Eng. (1987) 1:385–392.[Abstract/Free Full Text]

    Topham CM, et al. Fragment ranking in modelling of protein structure. Conformationally constrained environmental amino acid substitution tables. J. Mol. Biol. (1993) 229:194–220.[CrossRef][Web of Science][Medline]

    Tuffery P, et al. A new approach to the rapid determination of protein side chain conformations. J. Biomol. Struct. Dyn. (1991) 8:1267–1289.[Web of Science][Medline]

    Vasquez M. An evaluation of discrete and continuum search techniques for conformational-analysis of side-chains in proteins. Biopolymers (1995) 36:53–70.[CrossRef][Web of Science]

    Voigt CA, et al. Trading accuracy for speed: a quantitative comparison of search algorithms in protein sequence design. J. Mol. Biol. (2000) 299:789–803.[CrossRef][Web of Science][Medline]

    Xie W, Sahinidis NV. Residue-rotamer-reduction algorithm for the protein side-chain conformation problem. Bioinformatics (2006) 22:188–194.[Abstract/Free Full Text]

    Xu J. Rapid protein side-chain packing via tree decomposition. Lecture Notes in Computer Science (2005) 3500:423–439.[Web of Science]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/9/1099    most recent
btm073v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (6)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Smith, R. E.
Right arrow Articles by Blundell, T. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Smith, R. E.
Right arrow Articles by Blundell, T. L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?