Skip Navigation


Bioinformatics Advance Access originally published online on May 23, 2006
Bioinformatics 2006 22(15):1809-1814; doi:10.1093/bioinformatics/btl198
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/15/1809    most recent
btl198v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Bodén, M.
Right arrow Articles by Bailey, T. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bodén, M.
Right arrow Articles by Bailey, T. L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Identifying sequence regions undergoing conformational change via predicted continuum secondary structure

Mikael Bodén 1,* and Timothy L. Bailey 2

1 School of Information Technology and Electrical Engineering, QLD 4072, The University of Queensland Australia
2 Institute for Molecular Bioscience, QLD 4072, The University of Queensland Australia

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHOD
 3 RESULTS
 4 CONCLUSIONS
 REFERENCES
 

Motivation: Conformational flexibility is essential to the function of many proteins, e.g. catalytic activity. To assist efforts in determining and exploring the functional properties of a protein, it is desirable to automatically identify regions that are prone to undergo conformational changes. It was recently shown that a probabilistic predictor of continuum secondary structure is more accurate than categorical predictors for structurally ambivalent sequence regions, suggesting that such models are suited to characterize protein flexibility.

Results: We develop a computational method for identifying regions that are prone to conformational change directly from the amino acid sequence. The method uses the entropy of the probabilistic output of an 8-class continuum secondary structure predictor. Results for 171 unique amino acid sequences with well-characterized variable structure (identified in the ‘Macromolecular movements database’) indicate that the method is highly sensitive at identifying flexible protein regions, but false positives remain a problem. The method can be used to explore conformational flexibility of proteins (including hypothetical or synthetic ones) whose structure is yet to be determined experimentally.

Availability: The predictor, sequence data and supplementary studies are available at http://pprowler.itee.uq.edu.au/sspred/ and are free for academic use.

Contact: mikael{at}itee.uq.edu.au


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHOD
 3 RESULTS
 4 CONCLUSIONS
 REFERENCES
 
Protein sequences adopt different conformations depending on environmental factors like temperature, pH and interactions. Some regions are more susceptible to changes than others and in many cases such flexibility is essential to the function of the peptide chain, e.g. enzyme catalysis, molecular recognition and binding (Krebs et al., 2003; Schlessinger and Rost, 2005; Berjanskii and Wishart, 2005). Knowledge of flexibility is essential to structure-based drug discovery (Carlson, 2002).

The present work aims to computationally identify regions that are prone to change conformation using only the amino acid sequence of the protein as input. A prediction method that does not require access to structural data has obvious advantages for protein engineering applications where one wants to explore and quickly characterize hypothetical peptide chains. Candidate sequences can be assessed in silico to verify the existence of conformational flexibility or, conversely, conformational stability, before designing and performing expensive and labour-intensive experiments.

Some recent data-driven studies have explored the use of so-called B-values produced by X-ray crystallography to elucidate flexibility (Yuan et al., 2005; Schlessinger and Rost, 2005). Several studies have taken the absence of structural coordinates in crystallographic protein data to indicate natively disordered sequence regions and constructed predictive models of such (Jones and Ward, 2003; Linding et al., 2003). Another approach is based on large-scale statistical analysis of variability of bond angles amongst short amino acid sequences (Kuznetsov and Rackovsky, 2003). Berjanskii and Wishart (2005) use chemical shift data to predict flexibility. Machine learning has previously been used to predict chemical shift from nuclear magnetic resonance (NMR) models (Meiler, 2003; Pons and Delsuc, 1999). See Krebs et al. (2002) and Alexandrov et al. (2005) for data-driven analyses of pairs of crystallographic structures based on motion vectors to characterize structural change in three dimensions. In relation to protein engineering, we note that the problem of identifying changes in conformational stability caused by specific point mutations has been addressed using machine learning techniques, providing a valuable but complementary insight into protein stability (Capriotti et al., 2004; Cheng et al., 2006).

It is well-known that two regular folding patterns dominate peptide chains, the helix and the sheet (also called strand). Coil is an additional label used to denote any other folding pattern. Kabsch and Sander (1983) proposed the Dictionary of Secondary Structures of Proteins (DSSP) based on eight classes instead of three: 310-helix (G), alpha-helix (H), pi-helix (I), helix-turn (T), extended beta sheet (E), beta bridge (B), bend (S) and other/loop (C). Both descriptions (3- and 8-class) refer to a protein's secondary structure (as opposed to the full three-dimensional structure).

Akin to the pioneering work of Chou and Fasman (1974), we use conformational variability observed at the level of secondary structure as an indicator of conformational flexibility (indiscriminative of transitional and poorly ordered states). Proteins that assume more than one type of secondary structure are, in one sense, flexible in the regions of varying secondary structure. The computational method we propose attempts to identify such regions of variable secondary structure. Proteins may contain other flexible regions not captured by our definition of flexibility (e.g. regions that can move without changing their secondary structure). However, methods that successfully predict regions of variable secondary structure are identifying one important type of protein conformational flexibility.

Secondary structure has long been a prime target for computational prediction (Rost, 2001). In particular, machine learning approaches have been successful at identifying the secondary structure state from the amino acid sequence. However, most such predictors are developed under the pretense that protein structures are rigid, i.e. a residue is associated with one secondary structure class alone. Indeed, most structures in databases are derived using X-ray crystallography, which provides limited insight into conformational variation [but cf Schlessinger and Rost (2005) and Yuan et al. (2005)]. We will refer to secondary structure predictors that predict a single secondary structure class for each residue as categorical predictors.

Recently, a scheme called DSSPCONT based on the concept of continuum secondary structure was proposed to more accurately describe a conformation and, indirectly, to illustrate the flexibility of protein structure (Andersen et al., 2002; Carter et al., 2003). In contrast to the categorical DSSP, the DSSPCONT scheme capitalizes on the variation amongst individual NMR models and associates each residue with a probability distribution over all secondary structure classes. The scheme was calibrated using high-quality NMR models of protein structures. The variability among several NMR models produced for a single protein may reflect the inherent flexibility of the protein (Andersen et al., 2002; Berjanskii and Wishart, 2005). (On a cautionary note, some of the variability among NMR models may be an artefact of under-determined structural parameters.)

Previously, we used the data from DSSPCONT to develop and evaluate probabilistic machine learning models designed to predict the continuum secondary structure from amino acid sequence data alone (Bodén et al., 2006). The best model, a cascaded probabilistic neural network, is able to predict the continuum secondary structure quite accurately without requiring access to the tertiary or quaternary structure. In the current work, we use these continuum secondary structure predictors to predict conformational variability. Similar to how Compiani et al. (1998) identified folding initiation sites from sequence-to-secondary structure mappings, we compute the entropy of the secondary structure prediction for each residue. Entropy is an information theoretic measure of the uncertainty of a prediction. It is, therefore, natural to interpret the uncertainty in the continuum secondary structure prediction for a residue as an indication of its potential variability.

Some categorical secondary structure predictors deliver reliability indices with predictions that indicate the confidence the predictor has in its output (Rost and Sander, 1993). The Ambivalent Structure Predictor, ASP (Young et al., 1999), used the reliability indices of the PHD secondary structure predictor (Rost and Sander, 1993) to identify regions prone to conformational rearrangement in a selection of proteins. Their approach was similar to the one proposed here in that it used the uncertainty of a secondary structure prediction to predict conformational variability. Our approach differs from PHD/ASP in two important ways. First, we use a secondary structure predictor specifically trained to predict the probabilities of each structure class. Second, we use a theoretically well-understood measure of uncertainty (entropy) applied to the output of the structure predictor to predict variability.


    2 METHOD
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHOD
 3 RESULTS
 4 CONCLUSIONS
 REFERENCES
 
2.1 Data
For this study, we created an annotated dataset of 171 protein sequences that exhibit conformational flexibility according to the Comprehensive Database of Macromolecular Movements (Echols et al., 2003) (MOLMOVDB) and that show variation at the secondary structure level according to the PDB (Protein Data Bank). (The MOLMOVDB consists of structures that are experimentally determined to exhibit conformational flexibility enabling a variety of protein motions.) We annotated each sequence in this dataset with a list of residue positions that have more than one secondary structure. This set is used exclusively as a test set for the previously developed continuum secondary structure predictor.

To construct this conformational variability test set, we retrieved all MOLMOVDB poly-peptide sequences for which the PDB identifier was known (almost 1000 sequences, June 2005). We then determined that 292 of the sequences have at least two secondary structures that differ at least one residue by using PDBFINDER2 (Hooft et al., 1996), a compilation consisting of all PDB entries annotated with the DSSP 8-class secondary structure description. We filtered out all sequences with less than 150 residues to avoid including spurious end effects (similar to Young et al., 1999). Finally, we removed sequences so that no pair of sequences had >20% similarity (within the test set, and between the test set and the training data used for training the continuum secondary structure predictor described in the next section). Our test set consists of the 171 sequences that remain after this filtering process.

For each of the sequences in our test set, we have at least two and, on average, about three 8-class secondary structures where at least one residue is assigned different classes in two or more of the structures. For each sequence, we use these 8-class secondary structure assignments to group each of its residues into two classes: non-variable or variable. A residue counts as variable if it differs in its 8-class secondary structure classification for at least one of the possible conformations the sequence can take. For most residues in our dataset, there is only a single class, but, for residues in regions with conformational variation there is usually a range of possible states. Of the 171 sequences, 5830 residues (9.5%) are considered variable and 55782 (90.5%) are non-variable.

As an example, Ras (4Q21, 6Q21) is a well-studied ‘molecular switch’ (Milburn et al., 1990) associated with cell growth/death and differentiation. It cycles between two main conformational states: an active GTP (guanosine triphosphate) and an inactive GDP (guanosine diphosphate) complexed form, but several intermediate states have also been proposed (Hall et al., 2002). The MOLMOVDB has several entries for Ras and the RasA59G mutant (the mutant was specifically derived for studying intermediate states). Searching the PDB for entries with identical amino acid sequences resulted in four structures for Ras (6Q21-A, 6Q21-B, 6Q21-C and 6Q21-D). The RasA59G mutant resulted in two structures (1LF5 and 1LF0). There are two main regions of conformational variation of which both are clearly seen at the level of secondary structure (Fig. 1).

2.2 Predicting conformational variability
Our approach is to first predict continuum secondary structure from amino acid sequence, and then to apply the entropy function to the predicted class distribution for each residue. Residues with entropy larger than a given threshold, T, are predicted to have variable conformation. Except as otherwise stated, we use the mean entropy of all residues in our conformation variability dataset as the threshold T.

Let the predicted continuum secondary structure for a given residue be

Formula
where Yj is the probability that the residue is in the j-th secondary structure class, and k is the number of secondary structure classes: either 3 or 8 in this work. The entropy of a residue is defined as

Formula
A high entropy (maximum is 1) indicates complete disorder—all secondary structure classes are equally probable. A low entropy (minimum is 0) indicates complete order—only one class is applicable. Note that, since Y is a probability vector, it has constraints

Formula
and

Formula

To estimate the continuum secondary structure, Y, of each residue in a protein from its sequence, we use cascaded probabilistic neural network (CPNN) models that we previously described (Bodén et al., 2006). We have one model each for the 3- and 8-class continuum secondary structure prediction problems. These models each estimates Y for the central residue in a window of 15 residues. In common with virtually all current secondary structure predictors, the input to the CPNN model is the PSI-BLAST profile (Jones, 1999) generated using the subject protein as a query versus Genbank's non-redundant protein set (involving three iterations to determine the profile, all parameters in accordance with the protocol used by Jones). The profile implicitly incorporates information about sequence variability and the location of indels within a family of proteins, contributing greatly to the prediction of structure of the single sequence (Jones, 1999).

The design of the 3- and 8-class CPNN models we use as our continuum secondary structure predictors was optimized using 174 unique and redundancy-reduced NMR structures for proteins. There is no overlap exceeding 20% sequence identity between the 174 training sequences and our 171 sequence conformational variability test set. In our previous work, we measured the predictive accuracy of our CPNN models. The Kullback–Leibler divergence between the predicted probability distribution and the experimental distribution is 0.47 for the 3-class problem and 0.84 for the 8-class problem (smaller value is better). We also showed previously that the CPNN models can compete as categorical predictors of secondary structure. By forcing them to choose one class only, their accuracy of Q3 = 77.3 and Q8 = 62.8 is on par with the most accurate categorical predictors.

For comparison, we also compute the so-called ASP scores from the prediction outputs of the PHD secondary structure predictor (Rost and Sander, 1993). The ASP score for a given residue is defined as

Formula 1(1)
where prH, prE and prC are the graded and normalized support (0–9) attached by PHD to helix, sheet and coil, respectively. ASP uses the average of the scores determined for a window of consecutive residues (Young et al., 1999).


    3 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHOD
 3 RESULTS
 4 CONCLUSIONS
 REFERENCES
 
We tested two conformational variability predictors: one using the 8-class continuum secondary structure model and one using the 3-class model. We presented each sequence in our test set to the continuum secondary structure model and computed and recorded the entropy of the output for each residue. We calculated the mean entropy for all residues in all sequences and used this as our ‘variability threshold’, T. Each variable residue in our dataset with entropy greater than T increases the number of ‘true positives’, tp. Non-variable residues in the dataset with entropy over the threshold add to fp, the count of ‘false positives’. Similar definitions apply for the numbers of true negatives, tn, and ‘false negatives’, fn. Using the 8-class model, the average residue-level sensitivity [defined as tp/(tp + fn)] over the 171 sequences in the dataset is 0.76. The residue-level specificity [defined as tn/(tn + fp)], however, is only 0.45. The precision [defined as tp/(tp + fp)] is 0.13. Using the 3-class model, the specificity improves to 0.54 but the sensitivity drops to 0.62 (precision is 0.05).

We performed a receiver operating characteristic (ROC) analysis by monitoring performance over the 171 sequence dataset for all possible values of T (Fig. 2 and Table 1). It is clear that the predictor using the 8-class model is superior—its ROC curve lies above that of the 3-class model for all choices of T. Using the 8-class model, the Matthews correlation coefficient (MCC) (Baldi et al., 2000) was maximized (0.130) using a threshold of ~0.35. The area under the ROC (AROC) for the whole 171-sequence set is 0.64. By comparison, the optimal MCC using the 3-class continuum secondary structure predictor is only 0.07. The AROC value for the 3-class model over the 171 sequences is 0.61.

To illustrate how the current method compares with PHD/ASP in spite of possible overlap with PHD training data, we randomly selected six sequences in the test set (1FBT, 1CBU, 1RX2, 1A5B-B, 1FQB and 1E24) and presented these to the PHD online predictor. The ASP score was then computed on basis of the outputs. To remove any bias in the threshold selection, we determined the AROC for each prescribed window size (Young et al., 1999). The AROC values for the six sequences are 0.66, 0.66 and 0.59 for window sizes 1, 5 and 21, respectively. The corresponding AROC using the entropy of the 8-class continuum secondary structure predictions on the selected six sequences is 0.69.

PHD is an early secondary structure model and may provide ASP with sub-optimal predictions (the accuracy of prediction usually improves as a trivial effect of more and better training data). Another explanation of the improvement shown by our method is that it is based on 8-class secondary structure predictions, whereas ASP is based only on helix, sheet and coil (i.e. 3-class, cf. Fig. 2).

We replaced the entropy function with the ASP score function in conjunction with our 3-class CPNN model [with the appropriate Yi values being substituted for the corresponding prx in Equation 1). The ROC curve for predicting conformational variability on our dataset using the ASP function was essentially identical to that using the entropy function.

We also explored whether training the CPNN models on continuum structure data rather than categorical data actually improves their ability to predict conformational variability. Neural networks with identical architectures to that of the continuum secondary structure predictor were therefore trained on categorical data. For the categorical predictor, the target output of the most likely class was set to 1 and the rest to 0. The categorical training data derived from the same 174 training sequences used for the continuum models. The categorical predictor was shown to exhibit an overall prediction accuracy in predicting secondary structure comparable with that of the predictor trained on probabilities (Kullback–Leibler divergence was 0.87 and 0.84, respectively, for the 8-class problem), but with worse accuracy in regions with high structural ambivalence (1.10 and 0.98, respectively, for residues with an entropy >0.5) (Bodén et al., 2006). Surprisingly, using the category-trained predictors in place of the continuum-trained predictors yields very similar prediction rates for conformational variability. The entropy on top of the category-trained 8-class predictor has a best MCC of 0.128, an AROC of 0.63—essentially equal to that of the entropy on top of the continuum secondary structure predictor (Table 1).

3.1 Ras: a detailed look at conformational switching
Schlessinger and Rost used Ras (Human) as a case study for evaluating their protein B-value predictor PROFBVAL (Schlessinger and Rost, 2005). Looking at switch II (which is responsible for binding GTP), they report that PROFBVAL singled out one of the critical residues and three neighbours involved in ‘switch II’ of Ras (residue 60 in the sequence we analysed, Fig. 1).

Using our method, we note that switch II is recognized well—all the residues that change secondary structure (61–74) are identified as positives. The sequence for Ras (6Q21) resulted in the entropy shown in Figure 3 (top). We note that the entropy of the predicted continuum secondary structure is not specific to regions that are known to be flexible. However, each known flexible region results in an entropy ‘hump’. Notably, the end points of such regions sometimes constitute ‘near misses’ close to transitions between high and low entropy.

For comparison we include the PHD/ASP scores in Figure 3. It is clear that PHD/ASP misses some of the regions in 6Q21 known to change conformation—even when various window sizes are tried. When using the entropy of the predicted continuum secondary structure the AROC is 0.76 for 6Q21. The corresponding values for the ASP-based predictions are inferior at 0.71, 0.68 and 0.66 for window sizes 1, 5 and 21, respectively.

3.2 ß-propeller folds: a detailed look at conformational stability
The previous tests are biased in that we try to identify known ‘positives’. All tested sequences involve regions that are known to exhibit a degree of flexibility, i.e. change conformation. Similar to Schlessinger and Rost we turn to find known ‘negatives’, a sequence that is known to exhibit rigid regions (Fülöp and Jones, 1999). ß propeller folds occur in several structures. The fold has a very symmetrical structure, utilizing four-stranded, anti-parallel beta sheets, packing face-to-face, to form a very rigid tunnel. In Figure 4 the entropy generated from predicting the 8-class continuum secondary structure of 1TBG is shown, with the stable beta sheets marked. 1TBG has seven blades, each consisting of four strands. Not all of the residues involved in the tunnel are predicted below the entropy threshold (again the mean entropy is used) but one can clearly discern that these residues are much less likely to change conformation compared with their neighbours.


    4 CONCLUSIONS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHOD
 3 RESULTS
 4 CONCLUSIONS
 REFERENCES
 
The present study demonstrates a natural application of the continuum secondary structure. In contrast to previous methods where the reliability of prediction is exploited, the prediction of continuum secondary structure incorporates the probability of conformation.

The method presented herein represents a step towards automatic recognition of regions with conformational flexibility from sequence data alone. The method is based on the entropy of the predicted class probabilities. With a high sensitivity, few regions are missed by the method. However, as the low specificity illustrates, several additional analyses will be required to disqualify a large number of false positives, e.g. by utilizing predicted B-factors (Schlessinger and Rost, 2005; Yuan et al., 2005), sequence disorder (Jones and Ward, 2003; Linding et al., 2003) and amino acid properties like charge and hydrophobicity (high charge and low hydrophobocity correlate with natively unfolded structures, Uversky et al., 2000). As judged from representative examples, the method compares favorably with the Ambivalent Structure Predictor (Young et al., 1999), which similarly uses a score based on graded outputs of a secondary structure predictor.

We note that the accuracy for identifying structural ambivalence is higher if based on the 8-class rather than the 3-class secondary structure prediction. As suggested by a reviewer, with eight classes, more subtle structural changes can be expressed and captured. In contrast, ambiguities in the assignment of 3-class secondary structure indicate only significant structural changes. Surprisingly, we also observe that there is no significant advantage of using a neural network model trained using the continuum secondary structure rather than one trained using categorical class data, even though the former method results in better class prediction for structurally ambivalent residues. Hence, using our method, a more accurate secondary structure prediction seen specifically for regions with structural ambivalence does not necessarily translate into a better discrimination between structurally flexible and stable residues.


Figure 1
View larger version (7K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 The amino acids for positions 30–89 in 6Q21 are presented using their one-letter codes (top row). The amino acid sequence is presented to the continuum secondary structure predictor. The DSSP secondary structure states for the four recorded conformations are listed, aligned with their position (middle four rows). Letter codes are the standard DSSP codes specified previously, except C that is here replaced by white space. The target classification of each position as non-variable, -, or variable, *, is also supplied (bottom row).

 


Figure 2
View larger version (10K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2 The ROC curve for the set of 171 sequences for the method based on an 8-class and on a 3-class continuum secondary structure prediction. The area under ROC is 0.64 and 0.61, respectively.

 


View this table:
[in this window]
[in a new window]

 
Table 1 Prediction performance on the 171-sequence test set when using entropy on top of specified secondary structure predictor

 


Figure 3
View larger version (38K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3 Graph 1: the entropy of the predicted continuum secondary structure of Ras (6Q21) is plotted as a solid line. The mean entropy predicted by our CPNN model is plotted as a dotted line (0.49). High entropy corresponds to high flexibility. The AROC (varying the threshold) is 0.76. Graph 2–4: the scores produced by PHD/ASP for each residue in 6Q21, for a window of five residues and for a window of 21 residues, respectively. The recommended cut-off scores are plotted as dotted lines (Young et al., 1999). Low score means high structural ambivalence (or low reliability of secondary structure prediction). The AROC for all possible threshold values is 0.71, 0.68 and 0.66, respectively. In all graphs, the regions that are subject to conformational change are marked with a shaded background.

 


Figure 4
View larger version (12K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4 The entropy of the predicted continuum secondary structure of 1TBG, a seven-bladed beta-propeller structure, is plotted as a solid line. The mean entropy is plotted as a dotted line (0.49). Low entropy corresponds to high rigidity. The beta sheet regions that are known to be rigid are marked with a shaded background. Note that the 28 regions correspond to the seven blades each with four strands.

 

    FOOTNOTES
 
Associate Editor: Martin Bishop

Received on April 11, 2006; revised on May 10, 2006; accepted on May 18, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHOD
 3 RESULTS
 4 CONCLUSIONS
 REFERENCES
 

    Alexandrov, V., et al. (2005) Normal modes for predicting protein motions: a comprehensive database assessment and associated web tool. Protein Sci, . 14, 633–643[CrossRef][Web of Science][Medline].

    Andersen, C.A.F., et al. (2002) Continuum secondary structure captures protein flexibility. Structure, 10, 175–184[Medline].

    Baldi, P., et al. (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics, 16, 412–424[Abstract/Free Full Text].

    Berjanskii, M. and Wishart, D. (2005) A simple method to predict protein flexibility using secondary chemical shifts. J. Am. Chem. Soc, . 127, 14970–14971[CrossRef][Web of Science][Medline].

    Bodén, M., et al. (2006) Prediction of protein continuum secondary structure with probabilistic models. BMC Bioinformatics, 7, 68[CrossRef][Medline].

    Capriotti, E., et al. (2004) A neural-network-based method for predicting protein stability changes upon single point mutations. Bioinformatics, 20, i63–68[Abstract].

    Carlson, H.A. (2002) Protein flexibility is an important component of structure-based drug discovery. Curr. Pharm. Des, . 8, 1571–1578[CrossRef][Web of Science][Medline].

    Carter, P., et al. (2003) DSSPcont: continuous secondary structure assignments for proteins. Nucleic Acids Res, . 31, 3293–3295[Abstract/Free Full Text].

    Cheng, J., et al. (2006) Prediction of protein stability changes for single-site mutations using support vector machines. Proteins, 62, 1125–1132[CrossRef][Web of Science][Medline].

    Chou, P.Y. and Fasman, G.D. (1974) Prediction of protein conformation. Biochemistry, 13, 222–245[CrossRef][Medline].

    Compiani, M., et al. (1998) An entropy criterion to detect minimally frustrated intermediates in native proteins. Proc. Natl Acad. Sci. USA, 95, 9290–9294[Abstract/Free Full Text].

    Echols, N., et al. (2003) MolMovDB: analysis and visualization of conformational change and structural flexibility. Nucleic. Acids Res, . 31, 478–482[Abstract/Free Full Text].

    Fülöp, V. and Jones, D.T. (1999) ß propellers: structural rigidity and functional diversity. Curr. Opin. Struct. Biol, . 9, 715–721[CrossRef][Web of Science][Medline].

    Hall, B.E., et al. (2002) The structural basis for the transition from Ras-GTP to Ras-GDP. Proc. Natl Acad. Sci. USA, 99, 12138–12142[Abstract/Free Full Text].

    Hooft, R.W.W., et al. (1996) The PDBFINDER database: a summary of PDB, DSSP and HSSP information with added value. Comput. appl. biosci, . 12, 525–529[Abstract/Free Full Text].

    Jones, D.T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol, . 292, 195–202[CrossRef][Web of Science][Medline].

    Jones, D.T. and Ward, J.J. (2003) Prediction of disordered regions in proteins from position specific score matrices. Proteins, 53, 573–578.

    Kabsch, W. and Sander, C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen bonded and geometrical features. Biopolymers, 22, 2577–2637[CrossRef][Web of Science][Medline].

    Krebs, W.G., et al. (2002) Normal mode analysis of macromolecular motions in a database framework: developing mode concentration as a useful classifying statistic. Proteins, 48, 682–695[CrossRef][Web of Science][Medline].

    Krebs, W.G., et al. (2003) Tools and databases to analyze protein flexibility: approaches to mapping implied features onto sequences. Methods Enzymol, . 374, 544[Web of Science][Medline].

    Kuznetsov, I.B. and Rackovsky, S. (2003) On the properties and sequence context of structurally ambivalent fragments in proteins. Protein Sci, . 12, 2420–2433[CrossRef][Web of Science][Medline].

    Linding, R., et al. (2003) Protein disorder prediction: implications for structural proteomics. Structure, 11, 1453–1459[Medline].

    Meiler, J. (2003) Proshift: protein chemical shift prediction using artificial neural networks. J. Biomol. NMR, 26, 25–37[CrossRef][Web of Science][Medline].

    Milburn, M.V., et al. (1990) Molecular switch for signal transduction: structural differences between active and inactive forms of protooncogenic ras proteins. Science, 247, 939–945[Abstract/Free Full Text].

    Pons, J. and Delsuc, M. (1999) Rescue: an artificial neural network tool for the nmr spectral assignment of proteins. J. Biomol. NMR, 15, 15–26[CrossRef][Web of Science][Medline].

    Rost, B. (2001) Protein secondary structure prediction continues to rise. J. Struct. Biol, . 134, 204–218[Web of Science][Medline].

    Rost, B. and Sander, C. (1993) Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol, . 232, 584–599[CrossRef][Web of Science][Medline].

    Schlessinger, A. and Rost, B. (2005) Protein flexibility and rigidity predicted from sequence. Proteins, 61, 115–126[CrossRef][Web of Science][Medline].

    Uversky, V.N., et al. (2000) Why are ‘natively unfolded’ proteins unstructured under physiologic conditions? Proteins, 41, 415–427[CrossRef][Web of Science][Medline].

    Young, M., et al. (1999) Predicting conformational switches in proteins. Protein Sci, . 8, 1752–1764[Web of Science][Medline].

    Yuan, Z., et al. (2005) Prediction of protein B-factor profiles. Proteins, 58, 905–912[CrossRef][Web of Science][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Proc. Natl. Acad. Sci. USAHome page
A. E. Kister and J. C. Phillips
A stringent test for hydrophobicity scales: Two proteins with 88% sequence identity but different structure and function
PNAS, July 8, 2008; 105(27): 9233 - 9237.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/15/1809    most recent
btl198v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Bodén, M.
Right arrow Articles by Bailey, T. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bodén, M.
Right arrow Articles by Bailey, T. L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?