Skip Navigation


Bioinformatics Advance Access originally published online on November 8, 2005
Bioinformatics 2006 22(2):188-194; doi:10.1093/bioinformatics/bti763
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/2/188    most recent
bti763v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Xie, W.
Right arrow Articles by Sahinidis, N. V.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Xie, W.
Right arrow Articles by Sahinidis, N. V.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oxfordjournals.org

Residue-rotamer-reduction algorithm for the protein side-chain conformation problem

Wei Xie and Nikolaos V. Sahinidis *

Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign 600 South Mathews Avenue, Urbana, IL 61801, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 PROBLEM AND BACKGROUND
 3 METHODS
 4 ALGORITHM AND IMPLEMENTATION
 5 RESULTS AND DISCUSSION
 6 CONCLUSIONS
 REFERENCES
 

Motivation: The protein side-chain conformation problem is a central problem in proteomics with wide applications in protein structure prediction and design. Computational complexity results show that the problem is hard to solve. Yet, instances from realistic applications are large and demand fast and reliable algorithms.

Results: We propose a new global optimization algorithm, which for the first time integrates residue reduction and rotamer reduction techniques previously developed for the protein side-chain conformation problem. We show that the proposed approach simplifies dramatically the topology of the underlining residue graph. Computations show that our algorithm solves problems using only 1–10% of the time required by the mixed-integer linear programming approach available in the literature. In addition, on a set of hard side-chain conformation problems, our algorithm runs 2–78 times faster than SCWRL 3.0, which is widely used for solving these problems.

Availability: The implementation is available as an online server at http://eudoxus.scs.uiuc.edu/r3.html

Contact: nikos{at}uiuc.edu


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 PROBLEM AND BACKGROUND
 3 METHODS
 4 ALGORITHM AND IMPLEMENTATION
 5 RESULTS AND DISCUSSION
 6 CONCLUSIONS
 REFERENCES
 
The protein side-chain conformation problem calls for predicting protein side-chain structures when their backbone structures are known. This problem plays a very important role in the study of protein structure and function and has become a key problem in the context of many methods that predict protein folding (Bower et al., 1997). Design of novel protein sequences that fold into desired structures, which is of particular interest to the biotechnology and pharmaceutical industries, also requires to solve the side-chain conformation problem (Dahiyat and Mayo, 1997). As more complex proteins are to be predicted and designed using realistic biological models, the task of solving the side-chain conformation prediction problem becomes increasingly more challenging.

The protein side-chain amino acids interact energetically with the protein backbone as well as themselves. It is commonly assumed that each amino acid residue can take three-dimensional positions from among a finite set of statistically significant conformations known as rotamers (Ponder and Richards, 1987; Dunbrack and Karplus, 1993). It is also assumed that proteins fold in a way that corresponds to a global minimal energy conformation (GMEC), i.e. a conformation that minimizes the total energy of interaction between residues and between residues and the protein backbone. Then, the side-chain problem becomes a combinatorial optimization problem and can be solved in a number of ways. All existing algorithms for the side-chain problem can be divided into two categories: global optimization algorithms (Desmet et al., 1992; Lasters and Desmet, 1993; Goldstein, 1994; Gordon and Mayo, 1998; Pierce et al., 2000; Looger and Hellinga, 2001; Gordon et al., 2003) and heuristics (Lee and Levitt, 1991; Hellinga and Richards, 1994; Dahiyat and Mayo, 1994).

Global optimization algorithms yield conformations that are proven to be energetically minimal. Computational complexity results show that the side-chain problem is Formula (Pierce and Winfree, 2002) and remains so even when an approximate solution is sought within Formula from the optimum (Chazelle et al., 2004). Here, c is a constant, n is the number of residues and R is the average number of rotamers per residue. These computational complexity results suggest that any global optimization approach for the side-chain problem may, in the worst case, run in exponential time. Heuristics are faster than global optimization algorithms but do not provide any theoretical guarantee on the accuracy of their predicted conformation. The quality of heuristic solutions in practice degrades as problem size increases (Voigt et al., 2000).

Dead-End-Elimination (DEE), initially proposed by Desmet et al. (1992), was the first global optimization algorithm developed for the side-chain problem and has been widely used. DEE prunes parts of the search space via conformation domination, i.e. certain rotamers or combinations of rotamers are ruled out by proving the existence of better alternatives. Despite its apparent simplicity, DEE is very efficient in practice. Its efficiency deteriorates when more complex structures (usually in terms of the number of rotamers) are encountered and sophisticated potential energy functions are used. A number of complex DEE schemes have been proposed to be used alone (Looger and Hellinga, 2001; Gordon et al., 2003) or within a branch-and-bound framework (Gordon and Mayo, 1999).

It is possible to formulate the side-chain problem as a mixed-integer linear program (MILP) and use standard MILP solvers for solution (Eriksson et al., 2001; Althaus et al., 2002; Kingsford et al., 2005). Another approach to the side-chain problem uses graph-theoretic properties to decompose the underlining residue graph. Canutescu et al. (2003) decompose the original residue graph to a number of biconnected clusters and find the GMEC for the residues in these clusters. These authors also observed that residues with a single rotamer or a single neighbor can be eliminated from the residue graph. Their approach yields a GMEC and is widely used via the SCWRL 3.0 package. The computational time for this method is exponential in the size of the maximal biconnected clusters. Xu (2005) used a tree decomposition on the residue graph to find a GMEC. Since both the running time and space requirement of tree decomposition are exponential in the tree width, a heuristic was used to find a decomposition with a small tree width. In addition, Xu showed that residues with two neighbors can be eliminated from the residue graph. A tree decomposition procedure was also embedded in the adaptive dynamic programming algorithm developed by Leaver-Fay et al. (2005). This algorithm uses the vertex stiffness to adaptively change the number of rotamers to be considered and may miss the global optimum.

In this paper, we propose a novel algorithm that integrates residue and rotamer reduction. The key observation is that residue reduction and rotamer reduction enhance each other. Once some residues have been eliminated, the residue graph becomes topologically simpler and leads to more efficient rotamer reduction. Conversely, reduction of rotamers from residues may allow residues to be reduced from the residue graph. The paper builds on this observation to develop a highly efficient combination of residue and rotamer reduction techniques for solving the side-chain problem.

In Section 2, we provide a formal mathematical statement of the side-chain problem, describe the idea of residue graph and review the MILP formulation for the problem. The concept of a reduced residue graph is introduced in Section 3 as a generalization of the residue graph. We also describe existing residue and rotamer reduction techniques in this section as well as point out one improvement for rotamer reduction. In Section 4. we detail our residue-rotamer-reduction (R3) algorithm and its implementation. In Section 5, we compare R3 with MILP and SCWRL 3.0 on two sets of test problems from the literature. We find that R3 is faster than MILP by a factor of 10–100 and runs 2–78 times faster than SCWRL for the hardest of these problems. For all test problems, with up to 103 residues and 104 rotamers, the R3 algorithm determines the GMEC in a few seconds on a standard computer workstation. In addition, its predicted structures are of comparable accuracy to those of existing approaches.


    2 PROBLEM AND BACKGROUND
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 PROBLEM AND BACKGROUND
 3 METHODS
 4 ALGORITHM AND IMPLEMENTATION
 5 RESULTS AND DISCUSSION
 6 CONCLUSIONS
 REFERENCES
 
Let N = {1, 2, ... , n} be the set of amino acid residues and Formula be the set of rotamers for residue i isin N. We use Formula or (i,r) to denote rotamer r for residue i. The parameter cir denotes the self energy for rotamer Formula (interaction with the backbone), whereas pirjs denotes the interaction energy between rotamers Formula and Formula. The protein side-chain conformation prediction problem, or simply the side-chain problem, is to select one rotamer for each residue so that the GMEC is achieved for the overall energy:

Formula
In practice, many residues are too distant to interact, i.e. pirjs = 0 for certain Formula and Formula. The residue graph has been introduced to take advantage of this property (Canutescu et al., 2003). The set of rotamers Formula for all i isin N comprise the node set of this graph with edges (r,s) if and only if pirjs != 0. The set of residues that interact with residue i is called the neighbor list of residue i and denoted as Formula.

Kingsford et al. (2005) proposed the following MILP formulation for the side-chain problem:

Formula

Formula 1(1)

Formula 2(2)

Formula 2
Here, yir = 1 if and only if Formula 2 is the rotamer for residue i in a GMEC, whereas xirjs = 1 if and only if Formula 2 is the rotamer for residue i and Formula 2 is the rotamer for residue j. Constraint (1) requires each residue to take up exactly one rotamer position. Constraint (2) ensures that the contribution of the interaction between residues i and j to the objective function is pirjs if and only if rotamers (i, r) and (j, s) appear in GMEC.

When all pirjs ≤ 0 for some rotamer (i, r) and residue j, Kingsford et al. (2005) suggested adding the constraint

Formula 3(3)
to the above formulation. Constraint (3) holds because only negative pirjs terms will contribute to the overall energy in this case.


    3 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 PROBLEM AND BACKGROUND
 3 METHODS
 4 ALGORITHM AND IMPLEMENTATION
 5 RESULTS AND DISCUSSION
 6 CONCLUSIONS
 REFERENCES
 
The proposed approach relies on four important residue and rotamer manipulation techniques: the reduced residue graph, rotamer reduction, residue reduction and residue unification.

3.1 Reduced residue graph
According to the definition of the residue graph, two residues i and j are non-interacting if and only if they have no pairwise interactions:

Formula 4(4)

The idea of the interacting residue pair can be generalized to the {varepsilon}-interacting residue pair. Residues i and j form an {varepsilon}-interacting pair if Equation (5) holds for some predetermined positive {varepsilon}:

Formula 5(5)

If, instead of (4), criterion (5) is employed with {varepsilon} = 0, we will refer to the resultant graph as the reduced residue graph.

PROPOSITION 1. Two residues i and j can be treated as non-interacting if Equation (5) holds for {varepsilon} = 0.

PROOF. It is straightforward to see that residues i and j can be considered as non-interacting (i.e. pirjs = 0) without changing the GMEC, provided that we add Formula 5 to the global minimal energy in advance.

The node-edge sets of the reduced residue graph are subsets of those of the residue graph. Therefore, we use Formula 5 to denote the neighbor list of residue i in the reduced residue graph to distinguish it from the original residue graph. In practice, the difference between the reduced residue graph and the original residue graph may be very small for biologically interesting instances of the side-chain problem because non-zero pirjs values are rarely possible when Formula 5.

When {varepsilon} > 0, the definition of {varepsilon}-interacting pair is beneficial both in the context of local search for a good conformation and the derivation of a lower bound on the global minimum energy. By declaring all {varepsilon}-interacting pairs as non-interacting, one can simplify the residue graph considerably and facilitate the identification of a good conformation via search on the reduced graph. This conformation can then be used to facilitate the reduction steps discussed in the remainder of this section. On the other hand, we can underestimate pirjs of {varepsilon}-interacting pairs by their lower bounds Formula 5 (and therefore treat these pairs as non-interacting) to derive a lower bound on the GMEC.

3.2 Rotamer reduction
Rotamer reduction is usually done via DEE, although some new reduction techniques have recently been introduced based on lower bounding on the minimum energy (Gordon et al., 2003). The specific DEEs we employ come from the two following results:

PROPOSITION 2. (Goldstein, 1994) Rotamer Formula 5 can be eliminated if there exists a rotamer Formula 5 such that

Formula 6(6)

PROPOSITION 3. (Pierce et al., 2000) Consider rotamers (i, r) and (k, t) and assume that there exists some (i, s) satisfying

Formula 7(7)
Then, rotamers (i, r) and (k, t) cannot co-exist in the GMEC. Such a residue k is called a split residue. Furthermore, if such an (i, s) exists for any Formula 7, then rotamer (i, r) can be eliminated.

It is well known that Propositions 2 and 3 can be strengthened by considering only allowable rotamer pairs (i, r) and (j, s), i.e. those pairs not yet eliminated by Proposition 3, inside the minimization operator in the inequalities (6) and (7).

Criteria (6) and (7) can be simplified by the reduced residue graph as follows:

PROPOSITION 4.

  1. Inequality (6) can be replaced by

    Formula 7

  2. Inequality (7) can be replaced by

    Formula 7

for Formula 7.

When the reduced residue graph is sparse enough, Proposition 4 could lead to significant computational savings compared with Propositions 2 and 3.

3.3 Residue reduction
Residue reduction attempts to reduce some residues from the reduced residue graph without changing the GMEC. Four types of residue reduction techniques are proposed:

PROPOSITION 5. An isolated residue i, i.e. any i with Formula 7, can be eliminated from the reduced residue graph.

PROOF. An isolated residue incurs no interaction. Therefore, the rotamer Formula 7 must be in the GMEC.

The next three propositions were originally presented in the context of the residue graph. We present them in the context of the reduced residue graph. Their extension to the reduced residue graph is straightforward.

PROPOSITION 6. (Canutescu et al., 2003) A residue i with only one rotamer r can be eliminated from the reduced residue graph without changing GMEC provided that we add cir to the overall energy and set

Formula 7
in the resultant reduced residue graph.

PROPOSITION 7. (Canutescu et al., 2003) If a residue i has only one neighboring residue j, we can eliminate it from the reduced residue graph without changing GMEC if we set

Formula 7
in the resultant reduced residue graph.

PROPOSITION 8. (Xu, 2005) If a residue i has only two residue neighbors j and k, we can reduce it from the reduced residue graph without changing GMEC if we set

Formula 8(8)
We may have to add j to Formula 8 and k to Formula 8 if they are not neighbors before the reduction.

3.4 Residue unification
Residue unification is a technique used in the context of DEE in order to combine residues and present more opportunities for DEE pruning. In particular, any two residues i and j may be combined into a single residue k, which contains all allowable rotamer pairs for residues i and j.

PROPOSITION 9. (Desmet et al., 1994; Goldstein, 1994) Unification of any two residues does not change the GMEC.

Unification of two residues simplifies the topology of the reduced residue graph and may create opportunities for residue and rotamer reduction. Gordon et al. (2003) unify the pair of residues with the largest fraction of dead-ending rotamer pairs while restricting the size of the resulting super-residue. In our approach, we unify the residue pair with the least number of allowable rotamer pairs so that the resultant unified residue has a minimal number of rotamers.


    4 ALGORITHM AND IMPLEMENTATION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 PROBLEM AND BACKGROUND
 3 METHODS
 4 ALGORITHM AND IMPLEMENTATION
 5 RESULTS AND DISCUSSION
 6 CONCLUSIONS
 REFERENCES
 
The residue-rotamer-reduction (R3) algorithm for the side-chain problem incorporates all the techniques discussed in the previous section. A flowchart of R3 is given in Figure 1. We repeatedly execute residue and rotamer reductions, followed by limited use of residue unification, until all residues are eliminated. The philosophy behind the algorithm design is as follows:

  1. We start a new iteration of residue rotamer reduction whenever some residues or rotamers have been reduced.
  2. Other algorithms that use residue reduction usually involve DEE before residue reduction. We deem it more important to do residue reduction prior to rotamer reduction for three reasons. First, residue reduction takes less time than rotamer reduction. Second, residue reduction makes rotamer reduction faster. Finally, residue reduction may eliminate a large number of residues or even solve the problem entirely on its own.
  3. We minimize the use of residue unification by keeping this operation in the innermost loop of the algorithm. Residue unification may create residues with a very large set of rotamers, while residue reduction and rotamer reduction do not. Excessive application of residue unification exhausts memory very quickly, particularly for large and complex proteins.


Figure 1
View larger version (9K):
[in this window]
[in a new window]
 
Fig. 1 R3 algorithm flowchart.

 
PROPOSITION 10. The R3 algorithm terminates in a finite number of major iterations.

PROOF. The termination claim holds because we reduce the number of residues by at least one in each iteration.

To illustrate the importance of applying residue reduction in the outermost loop of the algorithm, we consider the problem of Figure 2. This example is taken from Canuteseu et al. (2003), where it was used to illustrate the biconnected cluster decomposition algorithm implemented in SCWRL 3.0. We do not show rotamers in the figure for the sake of convenience. Branch-and-bound on residues in every biconnected component may take a long time for some large components. However, by iterative application of R3, we can directly solve for the GMEC as follows. Residues F35 and D39 can be eliminated via Proposition 7 because they have W11 as their only neighbor. Now, we have two circles V10-R15-K17-Y21 and T30-Q25-W11-H29-M32 that are connected by an edge K17-T30. The circle on the left is reduced to a single residue K17 after the following steps of the algorithm. First, we eliminate V10 via Proposition 8 because only R15 and Y21 are its neighbors. Of course, we need to connect R15 and Y21 after elimination of V10. Now, R15 has only two neighbors Y21 and K17 and can be eliminated via Proposition 8. Since Y21 and K17 are connected already, we do not need to update their neighbor lists. Then, Y21 has only one neighbor K17 and can be eliminated via Proposition 7. Now, the left circle reduces to a single residue K17. Similarly, we can show that the right circle reduces to a single residue T30. Now the reduced residue graph is two connected residues K17 and T30 that can be eliminated in two steps via Propositions 7 and 5.


Figure 2
View larger version (12K):
[in this window]
[in a new window]
 
Fig. 2 Example for residue reduction.

 
Our experience indicates that side-chain problem instances with relatively simple reduced residue graph can be significantly simplified after residue reduction as happened in the above example.

In the remainder of this section, we first describe the data structures used to implement the reduced residue graph and the operation stack. The latter is used to store the necessary information to determine rotamers for the reduced residues. Finally, we describe the preprocessing routine that generates self- and pairwise-interaction energies.

4.1 Reduced residue graph
We use an adjacency list to model the reduced residue graph. For each residue, we only store the number of neighboring residues and an array of neighboring residue indices. This type of data structure saves memory and results in fast execution.

4.2 Operation stack
In the case of Propositions 5 and 6, the rotamer for the reduced residue in the GMEC is determined at the moment of residue reduction. For the operations of Propositions 7, 8 and 9, we need to somehow determine the rotamer for the reduced residue after the minimum energy of GMEC is found. In order to achieve this task, we use a stack to store the interim information of the residue to be reduced. Namely, we save a record of the following:

  1. The reduction operation on the residue to be reduced.
  2. The indices of residues involved in the reduction operation. We need to store two indices for the case of Proposition 7, and three indices for Propositions 8 and 9.
  3. The rotamer on the residue to be reduced in relation to the rotamers on the other involved residue(s). For example, let i be the residue to be reduced with two neighbors j and k. Then, for each possible rotamer combination (j, s) and (k, t), we must store rotamer (i, r) that achieves the minimum in (8).
The current record is inserted in the stack, the residue to be reduced is removed from the reduced residue graph and the energy terms are updated.

PROPOSITION 11. The records stored in the stack describe the GMEC.

PROOF. After calculation of the minimum energy of the GMEC, we have a list of rotamers in the GMEC for all residues that have been reduced in the case of Propositions 5 and 6. We retrieve records from the stack in LIFO order. It is guaranteed that GMEC rotamers on all involved residues in a record are known except for those on the reduced residue. Hence, we can determine the rotamer in the GMEC on the reduced residue.

Since this stack saves operation information, we call it the Operation Stack.

PROPOSITION 12. The operation stack stores at most n records.

PROOF. The conclusion is straightforward because n residues exist at the beginning and the number of residues in the reduced residue graph reduces by one when a record is created.

PROPOSITION 13. The R3 algorithm yields a GMEC after a finite number of major iterations.

PROOF. The claim holds because

  1. Rotamer reduction, residue reduction and residue unification do not change the energy of GMEC (Propositions 2, 3, 5, 6, 7, 8 and 9).
  2. The operation stack recovers the GMEC (Proposition 11).
  3. R3 terminates in finitely many steps (Proposition 10).

4.3 Preprocessing
Our implementation includes a preprocessing module that reads backbone atom coordinates, loads a rotamer library and generates energy terms cir and pirjs. This preprocessor incorporates a version of the backbone-dependent library (Dunbrack and Karplus, 1993). For each type of residue (i.e. one of the 20 amino acid residue types) and each set of backbone dihedral angles ({varphi},{psi}), we first sort the rotamers in order of decreasing probability. We then use only the minimal subset of consecutive rotamers from the top of this list with cumulative probability of at least 90%.

We generate self- and pairwise-energy terms similarly to Canuteseu et al. (2003). Each self-energy term cir consists of a library energy term

Formula 8
and a van der Waals interaction term. To compute the library term for each rotamer, we normalize the probability of this rotamer to the highest probability among the set of rotamers for a residue type with the same backbone dihedral angles. After preliminary computations, we decided to use a constant K = 6 in the implementation. The library energy term is assumed to count for the interaction of side-chain atoms with backbone atoms on this residue and immediate neighboring residues. In the van der Waals interaction term, we compute the pairwise interaction between each atom in the side chain and each atom in the backbone according to

Formula 8
Here, Rij is the van der Waals radius and r is the Euclidean distance for each atom pair. The pairwise interaction energy pirjs is the sum of the van der Waals interaction term between side-chain atom pairs of residues i and j. Here, Cß atoms are treated as backbone atoms. We do not include the effect of disulfide bonds in the current implementation to avoid introducing more complex energy terms.

Generating these energy terms takes a very large fraction of the running time. To reduce this computational effort, we have implemented cut-off criteria for computing the van der Waals energy. We compute the maximal distance from a Cß atom to a side-chain atom in all possible rotamers for the given residue and backbone dihedral angles. This distance can be regarded as the radius of the side chain and is denoted as scR(Residue). We compute the side-chain radii for all residues and store them. While computing the self energy for rotamers in residue i, we restrict our attention to all possible residues j whose C{alpha} is within a distance of scR(Residue(i)) + 5 (Å) from the Cß of residue i, where Residue(i) is the residue type of residue i. All other backbone atoms will be considered too distant to interact. When computing the pairwise interaction between residues i and j, we compute the distance between their Cß atoms. If this distance is larger than scR(Residue(i)) + scR(Residue(j)), we consider these two residues too distant to interact. Application of these two criteria reduces the computational time significantly. Our implementation is coded in C++ and uses routines from the BALL library (Kohlbacher and Lenhof, 2000) in our preprocessing module to read PDB files, to generate rotamers, to compute energy terms and to write predicted structures in PDB format.


    5 RESULTS AND DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 PROBLEM AND BACKGROUND
 3 METHODS
 4 ALGORITHM AND IMPLEMENTATION
 5 RESULTS AND DISCUSSION
 6 CONCLUSIONS
 REFERENCES
 
R3 has been tested on two sets of problems from the literature. The first test set of 25 problems come from Xiang and Honig (2001). For proteins with several chains, we only chose residues in Chain A. This test set is relatively easy. The second set of five problems comes from Canutescu et al. (2003) and is considered hard. All computations were done on a workstation with a 3 GHz CPU and 1 GB RAM. GAMS/CPLEX 9.0 (Brooke et al., 1998) was used to solve the MILP. SCWRL 3.0 was installed on the same machine and run with flag -u to disable disulfide bond evaluation.

Table 1 presents comparative computational results between R3, MILP and SCWRL 3.0 for both sets of test problems. For each test protein, the first column of this table presents the PDB code, while N and R denote, respectively, the number of residues other than ALA and GLY, and the total number of rotamers on residues other than ALA and GLY. Tp in this table denotes the preprocessing time for R3 (in seconds). This time also equals the preprocessing time of MILP since MILP uses the same preprocessing techniques. The table also presents the solution time (Ts, in seconds) of R3 and MILP, and the total time (Tt, in seconds) for R3, MILP and SCWRL 3.0. The total time for R3 and MILP is the sum of the corresponding preprocessing and solution times. To compare R3 and MILP, we use the solution time and the total time as criteria. However, we have to use the total running time when comparing R3 and SCWRL 3.0 because SCWRL 3.0 uses its own preprocessing routine.


View this table:
[in this window]
[in a new window]
 
Table 1 CPU requirements and prediction accuracy for R3, MILP and SCWRL 3.0 on two sets of test problems

 
For the first set of test problems (upper part of Table 1), the R3 solution time is about 100 times less than the MILP solution time. The total R3 time is about half of the MILP total running time because this set of problems requires more preprocessing rather than solution time. The total CPU requirements of R3 are comparable with SCWRL for this set of test problems. The computational results for the second test set are listed in the lower part of Table 1. R3 is still faster than MILP by two orders of magnitude as far as solution time is concerned and is 2–4 times faster in terms of total time. The performance of SCWRL 3.0 is significantly worse than that for the first test set. In particular, SCWRL 3.0 uses 2–78 times more CPU time than R3 for this test set. The reason behind this behavior may be the presence of large biconnected clusters. However, such large biconnected clusters do not seem to affect the performance of R3.

Finally, it is important to test whether the rotamer library and potential function we use are biologically sensible. Over-simplified rotamer libraries and potential functions could yield unrealistic structures despite computational efficiency. For this reason, we report in Table 1 the accuracy of our predictions in terms of side-chain dihedral angles ({chi}1, {chi}2) and compare them with SCWRL 3.0. A predicted angle is considered correct if it is within 20° of the corresponding angle in the crystal structure. In order for {chi}1+2 to be correct, both {chi}1 and {chi}2 must be correct. Columns 7 and 8 in Table 1 show the percentage of correctly predicted {chi}1 and {chi}1+2 by R3, while columns 12 and 13 present the corresponding predictions by SCWRL 3.0. For the first test set, R3 gives values of ({chi}1, {chi}2) that average at 78 and 53%, while SCWRL 3.0 averages 80 and 58%. Since R3 and SCWRL use essentially the same type of preprocessing module, these small differences in the accuracy of the resulting predictions can be attributed to implementation differences that improve with fine tuning the software over time. Kingsford et al. (2005) used a different method to generate energy terms and solved the resultant problems as MILPs. Their predicted ({chi}1, {chi}2) for this set of problems averaged 80 and 51%, respectively. For the second test set, R3 gives ({chi}1, {chi}2) that average 72 and 46%, while SCWRL averages 76 and 51%. These results indicate that the prediction accuracy of R3 is comparable with that of Kingsford et al., 2005 and only slightly lower than SCWRL 3.0, suggesting that the preprocessing module we have developed in R3 is biologically sensible.


    6 CONCLUSIONS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 PROBLEM AND BACKGROUND
 3 METHODS
 4 ALGORITHM AND IMPLEMENTATION
 5 RESULTS AND DISCUSSION
 6 CONCLUSIONS
 REFERENCES
 
In this work, we have proposed a reduction-rotamer-reduction-based algorithm R3 that integrates both residue reduction and rotamer reduction, and efficiently solves the side-chain problem to global optimality. Our implementation of this algorithm is significantly faster than the existing MILP and SCWRL 3.0 approaches to the side-chain problem with comparable prediction accuracy.

Our computational experience indicates that residue reduction simplifies the topology of the reduced residue graph and often solves quickly some instances of the side-chain problem that were previously solved by branch-and-bound in exponential time. Residue reduction also expedites rotamer reduction. On the other hand, rotamer reduction restricts the interaction energy range between residue pairs and creates opportunities for residue reduction.

The computational experiments also suggest that our preprocessing algorithm takes considerably more time and yields slightly less accurate dihedral angle predictions than SCWRL. On the other hand, for the hardest problems that we solved, our overall computational time requirements are almost two orders of magnitude lower than those of SCWRL. This suggests that a combination of the techniques used in the SCWRL preprocessor with the solution algorithm of R3 would result in an approach that is much superior than both R3 and SCWRL. Future work should therefore concentrate on improving the R3 preprocessing algorithms. The most promising approach seems to be experimentation with different energy generation schemes. The BALL library facilitates this task, as it provides a programmer-friendly interface to molecular modeling. In particular, it is of significant interest to investigate how much the accuracy of R3 is affected when disulfide bonds are taken into full consideration.


    Acknowledgments
 
The authors are thankful to two anonymous referees for useful comments. This work was supported in part by the Joint NSF/NIGMS Initiative to Support Research in the Area of Mathematical Biology under NIH award GM072023.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Anna Tramontano

Received on August 11, 2005; revised on October 28, 2005; accepted on November 3, 2005

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 PROBLEM AND BACKGROUND
 3 METHODS
 4 ALGORITHM AND IMPLEMENTATION
 5 RESULTS AND DISCUSSION
 6 CONCLUSIONS
 REFERENCES
 

    Althaus, E., et al. (2002) A combinatorial approach to predict protein docking with flexible side chains. J. Comput. Biol, . 9, 597–612[CrossRef][ISI][Medline].

    Bower, M., et al. (1997) Prediction of protein side-chain rotamers from a backbone-dependent library: a new homologous modeling tool. J. Mol. Biol, . 267, 1268–1282[CrossRef][ISI][Medline].

    Brooke, A., Kendrick, D., Meeraus, A. GAMS—A User's Guide, . (1998) , Washington, DC GAMS Development Corporation.

    Canutescu, A.A., et al. (2003) A graph-theory algorithm for rapid protein side-chain prediction. Protein Eng, . 12, 2001–2014.

    Chazelle, B., et al. (2004) A semidefinite programming approach to side-chain positioning with new rounding strategies. INFORMS J. Comput, . 16, 308–392.

    Dahiyat, B.I. and Mayo, S.L. (1994) Protein design automation. Protein Sci, . 5, 895–903.

    Dahiyat, B.I. and Mayo, S.L. (1997) De novo protein design: fully automated sequence selection. Science, 278, 82–87[Abstract/Free Full Text].

    Desmet, J., et al. (1992) The dead-end elimization theorem and its use in protein side-chain positioning. Nature, 346, 539–542.

    Desmet, J., De Maeyer, M., Lasters, I. (1994) The ‘dead-end elimination’ theorem: a new approach to the side-chain packing problem. In Merz, K.M. and Le grand, S.M. (Eds.). The Protein Folding Problem and Tertiary Structure Prediction, , Boston, MA Brikhauser, pp. 307–337.

    Dunbrack, R. and Karplus, M. (1993) Backbone-dependent rotamer library for proteins—application to side-chain prediction. J. Mol. Biol, . 230, 543–574[CrossRef][ISI][Medline].

    Eriksson, O., et al. (2001) Side chain-positioning as an integer programming problem. LNCS, 2149, 128–141.

    Goldstein, R.F. (1994) Efficient rotamer elimination applied to protein side-chains and related spin-glasses. Biophys. J, . 66, 1335–1340[Abstract/Free Full Text].

    Gordon, D.B. and Mayo, S.L. (1998) Radical performance enhancements for combinatorial optimization algorithms based on the dead-end elimination theorem. J. Comp. Chem, . 19, 1505–1514[CrossRef].

    Gordon, D.B. and Mayo, S.L. (1999) Branch-and-terminate: a combinatorial optimization algorithm for protein design. Structure, 7, 1089–1098[Medline].

    Gordon, D.B., et al. (2003) Exact rotamer optimization for protein design. J. Comp. Chem, . 23, 232–243.

    Hellinga, H.W. and Richards, R.M. (1994) Optimal sequence selection in proteins of known structure by simulated evolution. Proc. Natl Acad. Sci. USA, 91, 5803–5807[Abstract/Free Full Text].

    Kingsford, C., et al. (2005) Solving and analyzing side-chain positioning problems using linear and integer programming. Bioinformatics, 21, 1028–1039[Abstract/Free Full Text].

    Kohlbacher, O. and Lenhof, H. (2000) BALL—rapid software prototyping in computational molecular biology. Bioinformatics, 16, 815–824[Abstract/Free Full Text].

    Lasters, J. and Desmet, J. (1993) The fuzzy-end elimination theorem—correctly implementing the side-chain placement algorithm based on the dead-end elimination theorem. Protein Eng, . 6, 717–722[Abstract/Free Full Text].

    Leaver-Fay, A., Kuhlman, B., Snoeyink, J. (2005) An adaptive dynamic programming algorithm for the side chain placement problem. In Altman, R.B., Dunker, A.K., Hunter, L., Jung, T.A., Klein, T.E. (Eds.). Proceedings of the Pacific Symposium on Biocomputing 2005, , Singapore World Scientific, pp. 16–27.

    Lee, C. and Levin, M. (1991) Accurate prediction of the stability and activity effects of site-directed mutagenesis on a protein core. Nature, 352, 448–451[CrossRef][Medline].

    Looger, L.L. and Hellinga, H.W. (2001) Generalized dead-end elimination algorithms make large-scale protein side-chain structure prodiction tractable: implications for protein design and structure genomics. J. Mol. Biol, . 307, 429–445[CrossRef][ISI][Medline].

    Pierce, N.A., et al. (2000) Conformational splitting: a more powerful criterion for dead-end elimination. J. Comp. Chem, . 21, 999–1009[CrossRef].

    Pierce, N.A. and Winfree, E. (2002) Protein Design in Formula 8. Protein Eng, . 15, 779–782[Abstract/Free Full Text].

    Ponder, J.W. and Richards, F.M. (1987) Tertiary templetes for proteins: use of packing criteria in the enumeration of allowed sequences for different structural classes. J. Mol. Biol, . 193, 775–792[CrossRef][ISI][Medline].

    Voigt, C.A., Gordon, D.B., Mayo, S.L. (2000) Trading accuracy for speed: a quantitative comparison of search algorithms in protein sequence desing. J. Mol. Biol, . 299, 789–803[CrossRef][ISI][Medline].

    Xiang, Z. and Honig, B. (2001) Extending the accuracy limits of prediction for side-chain conformations. J. Mol. Biol, . 311, 421–430[CrossRef][ISI][Medline].

    Xu, J. (2005) Rapid protein side-chain packing via tree-decomposition. Lecture Notes in Computer Science, 3500, 423–439[ISI].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Protein Sci.Home page
C. Hartmann, I. Antes, and T. Lengauer
IRECS: A new algorithm for the selection of most probable ensembles of side-chain conformations in protein models
Protein Sci., July 1, 2007; 16(7): 1294 - 1307.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. E. Smith, S. C. Lovell, D. F. Burke, R. W. Montalvao, and T. L. Blundell
Andante: reducing side-chain rotamer search space during comparative modeling using environment-specific substitution probabilities
Bioinformatics, May 1, 2007; 23(9): 1099 - 1105.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/2/188    most recent
bti763v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Xie, W.
Right arrow Articles by Sahinidis, N. V.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Xie, W.
Right arrow Articles by Sahinidis, N. V.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?