Skip Navigation

Bioinformatics 2007 23(13):i539-i548; doi:10.1093/bioinformatics/btm199
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Tapia, L.
Right arrow Articles by Amato, N. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Tapia, L.
Right arrow Articles by Amato, N. M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Kinetics analysis methods for approximate folding landscapes

Lydia Tapia , Xinyu Tang , Shawna Thomas and Nancy M. Amato *

Parasol Lab, Department of Computer Science, Texas A&M University, College Station, TX 77843, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Motivation: Protein motions play an essential role in many biochemical processes. Lab studies often quantify these motions in terms of their kinetics such as the speed at which a protein folds or the population of certain interesting states like the native state. Kinetic metrics give quantifiable measurements of the folding process that can be compared across a group of proteins such as a wild-type protein and its mutants.

Results: We present two new techniques, map-based master equation solution and map-based Monte Carlo simulation, to study protein kinetics through folding rates and population kinetics from approximate folding landscapes, models called maps. From these two new techniques, interesting metrics that describe the folding process, such as reaction coordinates, can also be studied. In this article we focus on two metrics, formation of helices and structure formation around tryptophan residues. These two metrics are often studied in the lab through circular dichroism (CD) spectra analysis and tryptophan fluorescence experiments, respectively. The approximated landscape models we use here are the maps of protein conformations and their associated transitions that we have presented and validated previously.

In contrast to other methods such as the traditional master equation and Monte Carlo simulation, our techniques are both fast and can easily be computed for full-length detailed protein models. We validate our map-based kinetics techniques by comparing folding rates to known experimental results. We also look in depth at the population kinetics, helix formation and structure near tryptophan residues for a variety of proteins.

Availability: We invite the community to help us enrich our publicly available database of motions and kinetics analysis by submitting to our server: http://parasol.tamu.edu/foldingserver/

Contact: amato{at}cs.tamu.edu


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
As proteins fold to their native, functional state, they undergo critical conformational changes that effect their functionality. Some conformational changes are detrimental. For example, diseases such as Mad Cow disease and Alzheimer's disease are caused by misfolded proteins (Chiti and Dobson, 2006). Insight into the kinetics and detailed mechanics of the folding process will help explain critical information about the protein such as its function and why it misfolds.

In lab experiments, kinetic measurements are used frequently to quantify the folding process. Numerous lab experimental methods such as circular dichroism (CD spectra), fluorescence studies, hydrogen–deuterium exchange and pulse labeling give time-scale based views of the folding process (Roder et al., 2006). These measurements can be used to compare the kinetics of a group of proteins. For example, often the effects of mutations can be studied in detail through the comparison of kinetic metrics.

Simulating protein folding kinetics has been a difficult task performed on small structures through computationally expensive methods such as molecular dynamics or Monte Carlo simulations. Studies on larger proteins have recently been accomplished, but these simulations have only been done on limited proteins represented with coarse models.

In our previous work (Amato et al., 2003; Thomas et al., 2007), we studied protein folding through the application of a method that builds an approximate map of a protein's potential energy landscape. This map contains thousands of feasible folding pathways to the known native state enabling the study of global landscape properties. We obtained promising results for several proteins (Thomas et al., 2007). The pathways were validated by comparing secondary structure formation order with known experimental results. Our technique is not meant as a replacement to other simulation methods that may provide more detailed information about the folding process. Rather, it is a complementary technique to existing simulation methods.

This work introduces new methodologies for studying the kinetics of protein folding: map-based master equation (MME) and map-based Monte Carlo (MMC). These new techniques, MME and MMC, provide quantitative kinetic measurements such as relative folding rates and population kinetics that we could not obtain before from our maps. In contrast to other methods such as the traditional master equation and Monte Carlo simulation, our techniques are both fast and can easily be computed for full length and detailed protein models. We also show that these two new techniques facilitate the study of interesting metrics that describe the folding process, reaction coordinates. In this article we focus on two metrics: formation of helices and structure formation around tryptophan residues. These two metrics are often studied in the lab through CD spectra analysis and tryptophan fluorescence studies, respectively. We validate our techniques by comparing folding rates to known experimental results. We also look in depth at the population kinetics and the reaction coordinates for a variety of proteins in order to correlate our simulation results with the trends seen in experiment.

1.1 Related work
There are many different methods for studying protein folding kinetics. In this section we briefly introduce some of the methods, give insight into their strengths and weaknesses, and discuss the kinetics that each method provides.

1.1.1 Molecular dynamics
Molecular dynamics (MD) simulates the dynamics of the folding process using Newton's classical equations of motion. The forces applied are usually approximations computed using the first derivative of an empirical potential function. MD studies are highly realistic and help give insight into how proteins fold in nature. They also facilitate study of the underlying folding mechanism, provide folding pathways and identify intermediate folding states. While they give physically realistic simulations, these simulations come at a large computational cost. For example, it has taken months of supercomputer time to simulate a microsecond of a very small (36 residues) protein folding using molecular dynamics (Duan and Kollman, 1998)! Researchers are identifying ways to counteract the cost of MD simulations. For example, the Folding@Home distributed computing project computes MD simulations with a cluster of over 30 000 computers worldwide (Shirts and Oande, 2000).

1.1.2 Monte Carlo simulation
Monte Carlo simulation finds a single folding trajectory (Covell, 1992; Kolinski and Skolnick, 1994a). However, each run is computationally expensive because at each point in the conformation space search, complex kinetics and thermodynamics are simulated. Multiple runs are often done because the search is stochastic. Like molecular dynamics, Monte Carlo simulations provide highly realistic insight into the folding process.

1.1.3 Master equation kinetics
Folding kinetics have also been studied through a computation across the folding landscape. One way this has been done is through the use of lattice models that have enumerated the folding landscape, and then the master equation is computed for this landscape (Cieplak et al., 1998; Ozkan et al., 2002, 2003). One advantage of these approaches is that the transition state emerges from the dominate modes of the master equation solution. However, these models are very simplistic and do not represent real structures or sequences. Recent applications of the master equation have been able to study proteins with full structures (Weikl et al., 2004). However, the enumeration of the folding landscape is limited to the formation of contact clusters, which are groupings of nearby contacts as derived from the native state contact map.

1.1.4 Statistical mechanical methods
Statistical mechanical methods have also been successful in studying protein folding kinetics. These methods have provided estimates of the transition state ensemble, folding rates and {Phi}-values (Alm and Baker, 1999; Muñoz and Eaton, 1999). Only recently has this method been applied to larger protein structures of up to 349 residues (Das et al., 2005). However, these models use a very simplified energy function that depends only on the topology of the protein's native state and hence are not as accurate as the distance from the native state increases (as the protein unfolds).

1.1.5 SRS and Pfold
Stochastic roadmap simulation (SRS) samples motions and studies kinetics by modeling the folding energy landscape as a network of conformations where the connection between two conformations in the network reflects the transition probability between them. In early SRS work (Apaydin et al., 2001), the protein structure was modeled as a sequence of rigid secondary structure pieces, and the packing order of these elements was studied.

In recent work (Chiang et al., 2006), SRS was shown to identify the transition state ensemble, and it was used to compute folding rates and {Phi}-values. In order to identify the transition state ensemble, the conformation is modeled as a binary vector where each bit represents a sequence of five residues. The bit is set to 0 if the subsequence is non-native or 1 if it is native-like. All possible conformations and transitions (i.e. a single bit change) were enumerated in the model. To compute Pfold, the probability of folding, they perform random walks from every conformation until it reaches either the folded state or the unfolded state. Pfold for a given conformation is then the percentage of times a random walk from that conformation reaches the folded state before the unfolded state.Transitions are not allowed out of either the folded or the unfolded state.

In this model, Pfold helps identify the transition state ensemble. They use this ensemble to calculate relative folding rates and {Phi}-values. However, their model only contains a single unfolded state. Thus each conformation in their model does not represent the same volume of the energy landscape. In a more realistic model, it is unlikely that there will be a single, unique unstructured (‘unfolded’) state, thus making the Pfold calculation more difficult for use with more structurally accurate models.

1.1.6 Our contribution
The techniques introduced in this article, MME and MMC, provide quantitative kinetic measurements such as relative folding rates and population kinetics. Also, interesting reaction coordinates such as helix formation and structure around tryptophan residues can be monitored during the simulated folding process. In our previous work, we provided methods for building an approximate map of a protein's potential energy landscape (Amato et al., 2003; Thomas et al., 2007) and an RNA's folding landscape (Tang et al., 2007). We have published results from our approximate maps for proteins up to 148 residues easily built on a desktop PC (Thomas et al., 2007). These maps provide a framework for the MME and MMC techniques.

In contrast to other methods such as the traditional master equation and Monte Carlo simulation, MME calculation and MMC simulation are both fast and can easily be computed for full length and detailed protein models. The MME calculation gives insight into the folding rate, equilibrium distribution and transition states. The MMC simulation gives a stochastic view of the folding process and allows the computation of population kinetics.

2 Methods
Our roadmaps give an approximate view of the protein folding landscape. In the past, we have successfully extracted low-energy pathways, validated secondary structure formation order, and seen general and consistent trends in reaction coordinates such as native contacts present and RMSD. However, we had not been able to extract important kinetic measures such as folding rates and population kinetics. In this section, we begin with a summary of roadmap construction. After this, we introduce two new techniques that enable us to extract this information from our roadmaps: MME and MMC. Unlike traditional master equation calculation and Monte Carlo simulation, these techniques run very fast and can be applied to full-length detailed protein models.

The application of the MMC technique to the approximated landscape reduces the roadmap to a set of stochastic conformations and pathways. The benefit of this reduction is the ability to measure reaction coordinates, metrics that describe events during the time evolution of the folding process. In this section, we also present methods for using the MMC pathways to calculate two such reaction coordinates: helix formation and formation of structure around tryptophan residues.

2.1 Roadmaps for protein folding
In previous work (Amato et al., 2003), we introduced an approach to protein folding that is based on the probabilistic roadmap approach for motion planning (Kavraki et al., 1996). We applied our method to a large number of structures and were able to identify subtle differences in the known experimental secondary structure formation order for proteins with very similar structures (Song et al., 2003; Thomas et al., 2007).

Our method is simple and consists of two main steps: (1) sampling conformations in the landscape and (2) making transitions between sampled conformations. In the first step, conformations (nodes) are sampled on the folding landscape, with a bias to increase density near the known native state. In the second step, connections (edges) are made between sampled conformations with similar structure. Weights are assigned to directed edges to reflect the energetic feasibility of transitioning between the two endpoint conformations. This combination of nodes and weighted edges forms a roadmap that approximates the energy landscape. This roadmap encodes thousands of folding pathways. The most energetically feasible pathways in the roadmap can be extracted using these weights.

Connections between two nodes, q1 and q2, are labeled with edge weights that reflect the energetic feasibility of transitioning between them. This is done by first identifying all the intermediate nodes, Formula , that connect q1 to q2. For each pair of consecutive conformations ci and Formula , the probability Pi of transitioning from ci to Formula depends on the difference between their potential energies Formula :


Formula 1

(1)
This keeps the detailed balance between two adjacent states and enables the edge weight to be computed by summing the negative logarithms of the probabilities for all pairs of consecutive conformations in the sequence. With this edge weight definition, we can use simple graph search algorithms to extract the most energetically feasible pathways in the roadmap between two given states (e.g. from the unfolded state to the folded state).

Maps can be constructed incrementally through rounds of conformation generation and connection (Xie et al., 2006). We can stop the growth of the map when the features of the map become consistent (e.g. secondary structure formation order).

2.1.1 Protein model
We model the protein as an articulated linkage. Using a standard modeling assumption for proteins that bond angles and bond lengths are fixed (Sternberg, 1996), the only degrees of freedom in our model are the backbone's phi and psi torsional angles which are modeled as revolute joints with values in the range Formula .

2.1.2 Potential energy calculation.
Our method is flexible and allows any potential function to be used. In this article, we use a coarse potential function similar to Levitt (1983). We use a step function approximation of the van der Waals potential component and model side chains as spheres with zero dof. If any two spheres are too close (i.e. <2.4 Å during sampling and <1.0 Å during connection), a very high potential is returned. Otherwise, the potential is:


Formula 2

(2)
where Kd is 100 kJ/mol and Formula Å as in Levitt (1983). The first term represents constraints favoring known secondary structure through main-chain hydrogen bonds and disulfde bonds, and the second term is the hydrophobic effect. The hydrophobic effect is computed as follows: if two hydrophobic residues are within 6 Å of each other, then the potential is decreased by 20 kJ/mol.

2.2 Map-based master equation (MME)
The master equation calculation gives insight into the folding rate, the equilibrium distribution and transition states. However, it requires a detailed model of the possible conformations and their associated transitions. In the past, this has been done by enumerating landscapes—feasible only for small protein models or segments.

In this work, we develop a strategy for applying the master equation to the approximation of the folding landscape provided by our roadmaps. As we will show, our roadmaps provide a suitable framework to apply the master equation without requiring an enumeration of the conformation space. A major benefit of this is that the MME technique enables us to apply the master equation to much larger proteins than was possible before.

Master equation formalism has been developed for folding kinetics in a number of earlier studies (Kampen, 1992; Weikl et al., 2004). The stochastic process of folding is represented as a set of transitions among all n conformations (states). The time evolution of the population of each state, Pi(t), can be described by the following master equation:


Formula 3

(3)
where Formula denotes the transition rate from state i to state j. Thus, the change in population Pi(t) is the difference between transitions to state i and transitions from state i.

If we use an n-dimensional column vector p(t) = Formula Formula to denote the population of n conformational states, then we can construct an Formula matrix M to represent the transitions, where


Formula 4

(4)
The master equation can be represented in matrix form:


Formula 5

(5)
The solution to the master equation is:


Formula 6

(6)
where N is the matrix of eigenvectors Ni for the matrix M in Equation (4) and {Lambda} is the diagonal matrix of its eigenvalues {lambda}i. Pj(0) is the initial population of conformation j.

From Equation (6), we see that the eigenvalue spectrum is composed of n modes. If sorted by magnitude in ascending order, the eigenvalues include Formula and several small magnitude eigenvalues. Since all the eigenvalues are negative, the population kinetics will stabilize over time. The population distribution Formula will converge to the equilibrium Boltzmann distribution, and no mode other than the mode with the zero eigenvalue will contribute to the equilibrium. Thus the eigenmode with eigenvalue {lambda}0 = 0 corresponds to the stable distribution, and its eigenvector corresponds to the Boltzmann distribution of all conformations in equilibrium.

Similarly, we see that the large magnitude eigenvalues correspond to the fast folding modes, i.e. those modes which fold in a burst. Their contribution to the population will die away quickly. Similarly, the smaller the magnitude of the eigenvalue is, the more influence its corresponding eigenvector has on the global folding process. Thus, the global folding rates are determined by the slow modes.

For some folders (2-state folders), their folding rate is dominated by only one non-zero slowest mode. If we sort the eigen spectrum by ascending magnitude, there will be one other eigenvalue {lambda}1 in addition to eigenvalue {lambda}0 that is significantly smaller in magnitude than all other eigenvalues. This {lambda}1 corresponds to the folding mode that determines the global folding rate. We will refer to it as the master folding mode. Its corresponding eigenvector denotes its contribution to the population of each state. Hence, the large magnitude components of the eigenvector correspond to the states whose populations are most impacted by the master folding mode. These states are the transition states (Ozkan et al., 2002, 2003).

We apply the master equation formalism to our roadmaps by assigning each node in our roadmap to a row (and column) in the matrix M. The transition rates are computed directly from the edge weight: Formula . K0 is the constant coefficient adjusted according to experimental results. We will use MME to compute the relative folding rates for several proteins with known kinetics.

2.3 Map-based Monte Carlo (MMC)
Population kinetics provides information about the time evolution of different conformational populations. In our earlier work, we simply extracted the most energetically feasible paths in the roadmap to study the folding process. However, this does not mirror the stochastic folding process and cannot be used to determine the type of kinetic information that we are interested in here. In this article, we show how we can adapt Monte Carlo simulation and apply it directly to our roadmaps. Because the roadmap approximates the energy landscape, we can use the pathways computed by the MMC simulation to compute population kinetics.

Applying Monte Carlo simulation to our approximated landscape allows for the study of large protein structures with only a small computational cost. Previously, the size of the protein's conformational space limited the application of Monte Carlo techniques to small proteins [e.g. all-atom 56 residue protein (Shimada and Shakhnovich, 2002)]. However, our roadmap provides a pre-computed framework for this walk and greatly simplifies the computation required by Monte Carlo analysis.

In order to apply the Monte Carlo technique to our roadmap, we must ensure that the likelihood of transitioning from one neighbor to another is probabilistically biased by their Boltzmann transition probabilities. During roadmap construction, we compute edge weights that reflect the energetic feasibility to transition from one neighbor to another. We turn these edge weights into transition probabilities to perform the Monte Carlo simulation. One way to do this is to cluster the edge weights into disjoint buckets that reflect a grouping of edge weight qualities. After all edge weights are assigned a bucket, edge weights within a bucket are assigned a probability Qij reflecting their quality within the bucket. In doing so, the probability of each edge weight is assigned in a biased Gaussian fashion that favors clear discrimination of low edge weights, yet still can differentiate between edges of all weights. Then the probability to transition between two states, Pij can be calculated as:


Formula 7

(7)
where n is the number of outgoing edges from node i. This ensures the sum of all probabilities (including the self-transition probability) out of node i is one. Note that the transition probability is dependent on the number of outgoing edges from a node. Since during roadmap construction we only attempt connections between the k closest neighbors according to some distance metric, the out-degree for all nodes is roughly similar. Thus, this transition probability calculation is fair to all nodes in the roadmap and maintains the detailed balance.

2.3.1 Helix formation
The protein folding process can be monitored in the lab through the formation of local portions of the 3D structure of the protein. These local segments, commonly helices and strands, are the secondary structure of the protein. In the lab, the average formation of secondary structure can be measured through the technique of far-UV CD spectroscopy. At far-UV wavelengths (190–250 nm) the chromophore is the peptide bond and the resulting signal from CD spectroscopy appears when the peptide bond is located in a regular folded environment. It is common to monitor the formation of a specific type of secondary structure during the folding process by performing CD spectroscopy at a certain wavelength. One of the most common measurements is done at the wavelength of 220 nm where the formation of helices can be monitored.

There are many ways to measure helix formation in silico. In statistical mechanical simulations, the protein backbone is modeled by a sequence of dihedral angles, one angle between each pair of residues (Das et al., 2005). Helix formation has been measured from these simulations by summing the individual angle change between conformations. Unlike the single angle per residue model, our model consists of two angles that can be independently similar or dissimilar. Given this independence and a more complex protein model, we explored alternative ways of defining the formation of helices. Also unlike the statistical mechanical model, our pathways and conformations are extracted stochastically through the MMC technique.

In the results presented in this article, we used a measurement of helix formation that calculates the native contact formation in helices, H(t), as a function of time step, t, from the MMC simulation:


Formula 8

(8)
The contribution of a single contact, Formula , is equal to 1 if the residue pair (i, j) forms a native contact in the conformation at time step t. In order to compare results across proteins, the values of H(t) are normalized by the number of contacts at helices measured at the protein's native state, H(native). Thus, 1 represents the full formation of the helix structures in a conformation and 0 represents no helix structure formed.

2.3.2 Tryptophan structure formation
The protein folding process can also be studied in the lab by monitoring the fluorescence of certain amino acids. The fluorescence yield of these amino acids is determined by their local environment given the conformation of the protein. While all aromatic amino acids are known to fluoresce under certain conditions, the tryptophan residue is often favored for experiments because of its high fluorescence yield.

Even though tryptophan rarely occurs in proteins, it is common to mutate a protein to make fluorescence studies possible. Tryptophan can be introduced into the structure where fluorescence yield is optimized through site-directed mutagenesis. For example, they are often placed in the core of the protein and away from polar amino acids that detract from their yield.

In order to monitor the local environment of the tryptophan residues, we explore the effect of native contacts. As tryptophans are involved in native contacts, their local environment becomes more similar to the environment in the native state. At that native structure, we expect their fluorescence to be maximized. A similar approach was used in Das et al. (2005). However, unlike their method, our pathways and conformations are extracted stochastically through MMC.

In the results presented in this article, we use a measurement of tryptophan structure formation that calculates the native contact formation tryptophan residues, Trp(t), as a function of time step, t, from the MMC simulation:


Formula 9

(9)
The contribution of a single contact, Formula , is equal to 1 if the residue pair (i, j) forms a native contact in the conformation at time step t and either i or j is a tryptophan. This is a simple measure and could be modified for more complex local environments impacting fluorescence yield. In order to compare results across proteins, the values of Trp(t) are normalized by the number of contacts in the native state involving tryptophans, Trp(native). Thus, a value of 1 represents the full formation of the structure involving the tryptophan residues, and a value of 0 represents no tryptophan structure formed.


    3 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
In this section, we present results demonstrating how we can extract kinetics information from our roadmaps. We show that our MME can accurately compute the relative folding rates of protein G and two of its variants. Then we use our MMC simulation to investigate the folding population kinetics of the native state for several small proteins studied in our previous work (Thomas et al., 2007). When available, the helix formation and tryptophan contact formation calculated during the folding process of these proteins is also shown. It would be computationally prohibitive to apply the traditional Monte Carlo simulation or master equation calculation to these proteins and detailed protein model, hence we cannot compare to them.

3.1 Relative folding rates
One interesting protein to study is protein G [Fig. 1(c–inset)]. Protein G is a small two-state folder composed of a central {alpha}-helix and two ß-hairpins. Nauli et al. (2001) created two mutants of protein G to alter its folding behavior to switch the hairpin formation order while maintaining the same secondary and tertiary structure, NuG1 [Fig. 1(d–inset)] and NuG2 [Fig. 1(e–inset)]. They also show that these two mutants fold 100 times faster than protein G.


Figure 1
View larger version (32K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. (a) Eigenvalue comparison between protein G and mutants NuG1 and NuG2 computed by MME. NuG1 and NuG2 are experimentally known to fold 100 times faster than protein G (Nauli et al., 2001). (b) Running time of MME for protein G and mutants NuG1 and NuG2 as a function of roadmap size. MME scales linearly with the landscape model/map size. (c–e) Population kinetics from MMC for protein G and mutants NuG1 and NuG2. As with MME, the MMC results also indicate that the mutants fold faster than wild-type. Ribbon diagrams show mutated hairpin in wireframe.

 
We used our new MME to compute the relative folding rates of these two proteins on roadmaps that reached stable secondary structure formation order. In the results shown here, the potential values were normalized to fall between 0 and 1 for the fastest computation of the master equation solution. Fig. 1a shows the magnitudes of the five smallest eigenvalues. Recall that the smallest non-zero eigenvalues represent the rate-limiting barrier in the folding process. Therefore, they have the largest impact on the global folding rate. As seen in the magnitude of the second eigenvalue in Fig. 1a, protein G folds much slower than the two mutants, NuG1 and NuG2. Also, NuG1 and NuG2 fold at very similar rates. This matches what has been seen in lab experiments. While in previous work (Thomas et al., 2007) we were able to accurately identify the hairpin formation order of protein G and mutants NuG1 and NuG2, we were unable to study the change in folding rate.

We also studied the folding rate differences using population kinetics by MMC. Fig. 1c–e shows the population kinetics for the unfolded states and folded states for protein G, NuG1 and NuG2. As seen in Fig. 1d and e, the population of the native state of NuG1 and NuG2 rise very quickly. For example, the population of the native state is just under 60% by timestep 100. However, at the same timestep, the native state of protein G is only 20% populated (Fig. 1c). This contrast in the population of the native state between protein G and mutants NuG1 and NuG2 correlates with the faster folding rate of the mutants compared to the wild-type.

Fig. 1b shows the performance of MME for roadmaps ranging in size from 2000 to 15 000 nodes. The running time of MME scales linearly with roadmap size (i.e. the size of the landscape model). Thus, MME has an advantage over the traditional master equation solution. While the traditional master equation solution is usually applied to a fully enumerated landscape, MME is only computationally limited by the size of the approximated landscape model. Here we have shown that this approximated model can be a subset of the entire conformation space. This enables us to study larger proteins with more detailed models than can be handled by traditional techniques.

3.2 Structural folding kinetics
Here we study the folding process by computing the population kinetics of the native state with our new MMC simulation for several different proteins. Recall that a single roadmap encodes thousands of folding pathways. Previously, we extracted folding pathways by finding the most energetically feasible pathways in the roadmap. While this provided useful information about high level folding events such as the temporal ordering of secondary structure which we could validate against experiment, we could not use the deterministically extracted pathways to infer kinetic information. By instead extracting pathways stochastically using MMC, we can now compute population kinetics for different states. For example, we can compare the population kinetics of the unfolded state and the folded state.

We computed the population kinetics of several two-state folders studied in our previous work (Thomas et al., 2007) (see Table 1). In that work, we were able to produce roadmaps whose secondary structure formation order matched native state out-exchange experiments and pulse labeling experiments when available (Li and Woodward, 1999). We use the same roadmaps here, but are able to supplement our previous results by using MMC to compute the population kinetics of the folded state and the unfolded state. Table 1 also displays the MMC analysis time. In all cases, the analysis took <1 h on a 2.4 GHz desktop PC with 512 MB RAM.


View this table:
[in this window]
[in a new window]

 
Table 1. Proteins studied and MMC analysis time. (*tail, residues 1-8, of structure removed)

 
Figure 2 displays the results for several proteins studied. MMC was run for 500 iterations and 50 000 time steps. Our experience shows that this provided population kinetics with small variance. These proteins are similar in size (ranging from 53 to 86 residues) and varying secondary structure makeup. We study all {alpha} proteins, all ß proteins, and mixed {alpha} and ß proteins.


Figure 2
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Population kinetics from MMC simulations for proteins in Table 1 of varying structure: (a, b) {alpha}, (c) ß, (d, e) mixed.

 
Notice that the population kinetics of the native state for the all {alpha} proteins (Fig. 2a and b) shows a gradual growth at a constant rate. The all ß proteins (Fig. 2c) and mixed proteins (Fig. 2d and e), however, display a steep climb in their population kinetics and then plateau. We believe this is due to nucleation effects (e.g. that each native contact does not have the same probability of forming) present in structures containing ß-sheets. For example, a contact near the turn of a ß-hairpin (i.e. with lower effective contact order) has a greater probability to form early while more non-local native contacts such as those at the end of the hairpin have a lower probability to form early. Their formation probability increases as the protein folds/nucleates. This is commonly referred to as a ‘zipping’ process (Fiebig and Dill, 1993). Conversely, most contacts in an {alpha}-helix are local (i.e. have a low effective contact order) thus their formation probabilities are all similar and constant throughout the folding process.

In order to contrast the population kinetics of the folded state, we also studied the population kinetics of the unfolded ensemble (Fig. 2). For this study, we defined the unfolded ensemble as those states with few native contacts (relative to the number of contacts in the native state). There is a clear relationship between the kinetics of the unfolded state to that of the folded state. For example, in protein A (Fig. 2a), the population of the native state increases slowly as the population of the unfolded state ensemble decreases slowly. On the other hand, folding processes that reach folded equilibrium quickly also see a quick decrease in the population of the unfolded state ensemble.

A nice feature of the MMC technique is that it allows us to study stochastic events during the protein folding process. For the proteins studied above through population kinetics, we also examined the structural metrics of helix formation and formation of structure around tryptophan residues (see Fig. 3). From the combined information in these three plots, we can deduce characteristics of the folding process. In rest of this section, we compare the individual kinetic results produced by MMC to previous lab and simulation studies for each protein.


Figure 3
View larger version (14K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. Reaction coordinates calculated from MMC simulations for proteins in Table 1 of varying structure: (a–c) {alpha} and (d–i) mixed. Tryptophan contact formation is not displayed for protein A because it does not contain any tryptophan residues. Note that mEGF (all ß) is not displayed because it lacks {alpha}-helices and does not contain any tryptophan residues in the folding core.

 
3.2.1 Protein A
The B domain of protein A, containing three {alpha}-helices, has been the focus of many experimental studies. It does not contain a tryptophan naturally, but has been mutated so that tryptophan fluorescence can be studied (Dimitriadis et al., 2005). It has also been studied by a lattice-based Monte Carlo technique (Kolinski and Skolnick, 1994b). However, this lattice model only used a coarse representation of the backbone carbon-{alpha}s to model the structure. In lab and simulation studies, protein A has demonstrated formation of helix structure followed by the packing of the helices in the final folded structure (Li and Woodward, 1999). Our population kinetics (Fig. 2a) and helix formation (Fig. 3a) plots show similar trends. While the folding process begins early on (as indicated by continual growth in helix formation beginning at time step 1), it takes at least 100 time steps for any conformation to reach the native state. This suggests that helices are formed before any conformation reaches a shape close to the native state, as seen in experiment.

3.2.2 ACBP
A similar process is observed in the other all {alpha} protein, acyl-coenzyme A binding protein (ACBP). This protein has five helices and two tryptophans in the core of the protein. The folding of ACBP has been studied in the lab through tryptophan fluorescence, and it has been shown that it is a fast, two-state folder (Kragelund et al., 1996). From our MMC kinetics, we see that ACBP exhibits similar properties as the other all {alpha} protein, protein A: continual formation of helix contacts (Fig. 3(b)) and reaching the native state after the formation of many helix contacts (Fig. 2b). However, since ACBP has two tryptophans in the core of the protein, we see a quick increase in the formation of these contacts (Fig. 3c) around the same time we see the native state beginning to be populated, around time step 100. This could correspond to the packing of the structure and the formation of long-range interactions in the core of the protein.

3.2.3 mEGF
Since the protein murine epidermal growth factor (mEGF) has no helical structure, we do not plot its helix formation. While it does have two tryptophans, they are on the tail of the protein and do not make substantial contacts with the rest of the protein.

3.2.4 Protein G
The B1 domain of protein G has been the focus of many lab studies from CD spectra analysis and tryptophan fluorescence (Nauli et al., 2001) to hydrogen exchange and pulse labeling experiments (Li and Woodward, 1999). Much of the focus on the folding process of protein G has been on the folding order of its two sets of strands. However, it is known that the helix forms before the final stages of the folding process (Li and Woodward, 1999). It is never the last secondary structure element to form. In our MMC results, we see a similar ordering. Figure 3d shows that the helix forms quickly and is 80% formed by time step 100. By this time step, <20% of the protein has reached a native like conformation (Fig. 1c). The tryptophan contact formation (Figure 3g) continues through the folding process with continual packing around the protein core (where the tryptophan is located).

3.2.5 RdCp and RdDv
Cp Rubredoxin (RdCp) and Dv Rubredoxin (RdDv) are two Rubredoxins from mesophilic organisms. While their population kinetics are similar (Fig. 2d and e), some small details can be elucidated from the reaction coordinates studied. For RdDv, that has been studied by high-temperature MD simulations (Lazaridis et al., 1997), we see two jumps in the population kinetics (~50% then 90% native-like). This could be due to the early packing of protein around the hydrophobic core, as seen in the continually increasing tryptophan structure formation (Fig. 3i). The single tryptophan is in the core of the protein. After the core is formed, the helix finishes making a final set of contacts (Fig. 3f). This corresponds with the second jump in the population kinetics to 90% native-like (Fig. 2e). The behavior of opening the helix loop and then unfolding the core was also seen in MD simulation (Lazaridis et al., 1997). RdCp was shown through tryptophan fluorescence and far-UV CD experiments to have a simple two-state kinetic and no known intermediate (Cavagnero et al., 1998). We also see this in our simulations. The helix formation (Fig. 3e) and tryptophan contact formation (Fig. 3h) show cooperative and continual growth until the native state is fully populated.


    4 DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
We proposed and explored new analysis tools to study protein folding kinetics: map-based master equation solution (MME) and map-based Monte Carlo simulation (MMC). With these new methods, we can compute folding rates and extract population kinetics of various states. We validated our folding rates against known experimental data. The MME approach was able to produce relative folding rates for three proteins, matching what has been seen in lab experiments. Our population kinetics were also able to identify clear kinetic differences in proteins of different structure. Through the combination of population kinetics and helix and tryptophan structure formation information, we are able to elucidate important characteristics in the folding process. For example, our results on Protein A show that helix structure forms early, before packing of core of the protein. This is also what has been seen in lab experiment. In another case, DvRD, the hydrophobic core is formed before the helices. This behavior was also seen in MD simulations.

An important benefit of these approaches is that it enables us to study the kinetics of much larger proteins than can be handled by traditional master equation methods or Monte Carlo simulation. We believe these new techniques are valuable tools for discovering important features of protein folding kinetics.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
We would like to thank Annette Stowasser for her initial work and exploration of measuring tryptophan contact formation and its relation to fluorescence simulation. We would also like to thank Dr. Mauricio Lasagna of the Reinhart Lab at Texas A&M University for sharing his expertise of lab-based experimentation of tryptophan fluorescence.

This research supported in part by NSF Grants EIA-0103742, ACR-0081510, ACR-0113971, CCR-0113974, ACI-0326350, by the DOE, and by HP. L. T. supported in part by a NIH Molecular Biophysics Training Grant (T32GM065088) and previously supported by a Department of Education GAANN Fellowship. S. T. supported in part by a Department of Education GAANN Fellowship and previously supported by a NSF Graduate Research Fellowship and a P.E.O. Scholarship.

Conflict of Interest: none declared.


    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Alm E, Baker D. Prediction of protein-folding mechanisms from free-energy landscapes derived from native structures. Proc. Natl Acad. Sci. USA, ( (1999) ) 96, : 11305–11310.[Abstract/Free Full Text].

    Amato NM, et al. Using motion planning to map protein folding landscapes and analyze folding kinetics of known native structures. J. Comput. Biol, ( (2003) ) 10, : 239–256. Special issue of Int. Conf. Comput. Molecular Biology (RECOMB) 2002.[CrossRef][ISI][Medline].

    Apaydin M, et al. Capturing molecular energy landscapes with probabilistic conformational roadmaps. In: Proc. IEEE Int. Conf. Robot. Autom. (ICRA), ( (2001) ) 932–939..

    Cavagnero S, et al. Unfolding mechanism of rubredoxin from pyrococcus furiosus. Biochemistry, ( (1998) ) 37, : 3377–3385.[CrossRef][Medline].

    Chiang T-H, et al. Predicting experimental quantities in protein folding kinetics using stochastic roadmap simulation. In: Proc. Int. Conf. Comput. Molecular Biology (RECOMB), ( (2006) ) 410–424..

    Chiti F, Dobson C. Protein misfolding, functional amyloid, and human disease. Annu. Rev. Biochem, ( (2006) ) 75, : 333–366.[CrossRef][ISI][Medline].

    Cieplak M, et al. Master equation approach to protein folding and kinetic traps. Phys. Rev. Lett, ( (1998) ) 80, : 3654–3657.[CrossRef][ISI].

    Covell D. Folding protein {alpha}-carbon chains into compact forms by Monte Carlo methods. Proteins: Struct. Funct. Genet, ( (1992) ) 14, : 409–420.[CrossRef][ISI][Medline].

    Das P, et al. Characterization of the folding landscape of monomeric lactose repressor: Quantitative comparison of theory and experiment. Proc. Natl. Acad. Sci. USA, ( (2005) ) 102, : 14569–14574.[Abstract/Free Full Text].

    Dimitriadis G, et al. Microsecond folding dynamics of the f13w g29a mutant of the b domain of staphylococcal protein a by laser-induced temperature jump. Proc. Natl. Acad. Sci. USA, ( (2005) ) 101, : 3809–3814.[CrossRef][ISI].

    Duan Y, Kollman P. Pathways to a protein folding intermediate observed in a 1-microsecond simulation in aqueous solution. Science, ( (1998) ) 282, : 740–744.[Abstract/Free Full Text].

    Fiebig KM, Dill KA. Protein core assembly processes. J. Chem. Phys, ( (1993) ) 98, : 3475–3487.[CrossRef].

    Kampen NGV. Stochastic Processes in Physics and Chemistry, ( (1992) ) North-Holland, New York, USA..

    Kavraki LE, et al. Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Trans. Robot. Automat, ( (1996) ) 12, : 566–580.[CrossRef].

    Kolinski A, Skolnick J. Monte Carlo simulations of protein folding. Proteins Struct. Funct. Genet, ( (1994a) ) 18, : 338–352.[CrossRef][ISI][Medline].

    Kolinski A, Skolnick J. Monte Carlo simulations of protein folding II. application to protein a, rop, and crambin. Proteins Struct. Funct. Genet, ( (1994b) ) 18, : 353–366.[CrossRef][ISI][Medline].

    Kragelund BB, et al. Fast and one-step folding of closely and distantly related homologous proteins of a four-helix bundle family. J. Mol. Biol, ( (1996) ) 256, : 187–200.[CrossRef][ISI][Medline].

    Lazaridis T, et al. Dynamics and unfolding pathways of a hyperthermophilic and mesophilic rubredoxin. Protein Sci, ( (1997) ) 6, : 2589–2605.[Abstract].

    Levitt M. Protein folding by restrained energy minimization and molecular dynamics. J. Mol. Biol, ( (1983) ) 170, : 723–764.[ISI][Medline].

    Li R, Woodward C. The hydrogen exchange core and protein folding. Protein Sci, ( (1999) ) 8, : 1571–1591.[Abstract].

    Muñoz V, Eaton WA. A simple model for calculating the kinetics of protein folding from three dimensional structures. Proc. Natl Acad. Sci. USA, ( (1999) ) 96, : 11311–11316.[Abstract/Free Full Text].

    Nauli S, et al. Computer-based redesign of a protein folding pathway. Nat. Struct. Biol, ( (2001) ) 8, : 602–605.[CrossRef][ISI][Medline].

    Ozkan S, et al. Tranisition states and the meaning of {phi}-values in protein folding kinetics. Nat. Struct. Biol, ( (2001) ) 8, : 765–769.[CrossRef][ISI][Medline].

    Ozkan SB, et al. Fast-folding protein kinetics, hidden intermediates, and the sequential stabilization model. Protein Sci, ( (2002) ) 11, : 1958–1970.[Abstract/Free Full Text].

    Ozkan SB, Dill KA, Bahar I. Computing the transition state population in simple protein models. Biopolymers, ( (2003) ) 68, : 35–46.[CrossRef][ISI][Medline].

    Roder H, et al. Early events in protein folding explored by rapid mixing methods. Chem. Rev, ( (2006) ) 106, : 1836–1861.[CrossRef][ISI][Medline].

    Shimada J, Shakhnovich EI. The ensemble folding kinetics of protein g from an all-atom Monte Carlo simulation. Proc. Natl Acad. Sci. USA, ( (2002) ) 99, : 11175–11180.[Abstract/Free Full Text].

    Shirts M, Pande V. Screen savers of the world unite. Science, ( (2000) ) 290, : 1903–1904.[Abstract/Free Full Text].

    Song G, Thomas S, et al. A path planning-based study of protein folding with a case study of hairpin formation in protein G and L. In: Proc. Pacific Symposium of Biocomputing (PSB), ( (2003) ) 240–251..

    Sternberg MJ. Protein Structure Prediction, ( (1996) ) OIRL Press at Oxford University Press..

    Tang X, et al. Tools for simulating and analyzing RNA folding kinetics. In: Proc. Int. Conf. Comput. Molecular Biology (RECOMB), ( (2007) )..

    Thomas S, et al. Simulating protein motions with rigidity analysis. In: J. Comput. Biol, ( (2007) ) in press. Special issue of Int. Conf. Comput. Molecular Biology (RECOMB) 2006..

    Weikl T, et al. Coopertivity in two-state protein folding kinetics. Protein Sci, ( (2004) ) 13, : 822–829.[Abstract/Free Full Text].

    Xie D, et al. Incremental map generation (IMG). In: Proc. Int. Workshop on Algorithmic Foundations of Robotics (WAFR), ( (2006) )..


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Tapia, L.
Right arrow Articles by Amato, N. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Tapia, L.
Right arrow Articles by Amato, N. M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?