Skip Navigation


Bioinformatics Advance Access originally published online on October 27, 2004
Bioinformatics 2005 21(7):981-987; doi:10.1093/bioinformatics/bti080
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/7/981    most recent
bti080v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Gront, D.
Right arrow Articles by Kolinski, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gront, D.
Right arrow Articles by Kolinski, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2004. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

A new approach to prediction of short-range conformational propensities in proteins

Dominik Gront * and Andrzej Kolinski

Faculty of Chemistry, Warsaw University Pasteura 1, 02-093 Warsaw, Poland

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 CONCLUSION
 REFERENCES
 

Motivation: Knowledge-based potentials are valuable tools for protein structure modeling and evaluation of the quality of the structure prediction obtained by a variety of methods. Potentials of such type could be significantly enhanced by a proper exploitation of the evolutionary information encoded in related protein sequences. The new potentials could be valuable components of threading algorithms, ab-initio protein structure prediction, comparative modeling and structure modeling based on fragmentary experimental data.

Results: A new potential for scoring local protein geometry is designed and evaluated. The approach is based on the similarity of short protein fragments measured by an alignment of their sequence profiles. Sequence specificity of the resulting energy function has been compared with the specificity of simpler potentials using gapless threading and the ability to predict specific geometry of protein fragments. Significant improvement in threading sensitivity and in the ability to generate sequence-specific protein-like conformations has been achieved.

Availability: see: http://www.biocomp.chem.uw.edu.pl

Contact: dgront{at}chem.uw.edu.pl


    INTRODUCTION
 TOP
 Abstract
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 CONCLUSION
 REFERENCES
 
A short-range potential means an energy function that evaluates the probability of a certain local conformation of a protein with a given sequence of amino acids. Potentials proposed here depend on the amino acid identities and their sequence context, on the distance between two Ca atoms and, in some cases, on the predicted secondary structure. The idea to use the local sequence similarity for scoring protein structures is not new and has been used in different applications. Details vary between particular applications. For instance, the local sequence similarity could be used as a criterion for selection of short fragments of structures as building blocks (Simons et al., 1997) in a fold assembly procedure. It could also be used in derivation of short-range distance restraints (Skolnick et al., 2003) to support subsequent threading refinements or restrained ab-initio folding (Kolinski et al., 2001).

In the present work we provide a systematic derivation of a set of short-range potentials for protein threading, fold evaluation and ab-initio algorithms of structure assembly. Results obtained from various databases (various levels of sequence similarity), with and without a support of a given or predicted secondary structure are compared and the ability of the designed potentials to predict a precise geometry of short protein fragments is evaluated. The method is relatively simple and is based on careful analysis of the sequence–structure relationship that employs profile-to-profile alignments (Gribskov et al., 1987). The potentials have a form of energy histograms and could be easily implemented in various applications. Of course such potentials are protein dependent. Thus, the detailed prescription for their derivation is provided and for a number of example cases the full datasets were made available via our homepage (http://www.biocomp.chem.uw.edu.pl).


    MATERIALS AND METHODS
 TOP
 Abstract
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 CONCLUSION
 REFERENCES
 
Input databases
In order to perform the computations described in this work, a custom-designed database has been prepared. The database contains protein structures combined with their sequence profiles. Each residue in a protein is described by its Ca coordinates and by a sequence profile column composed of 20 numbers, defining the probability of each amino acid type occurrence at the corresponding position in a multiple sequence alignment. The profiles were generated with PSIBLAST (Altschul et al., 1997) (number of iterations: 7, cutoff e-value for including the hit into the profile: 1e-7). Two non-redundant protein structure databases were used. The first one contained proteins with sequence similarity below 30%, the second one of proteins with sequence similarity below 90%. Both sets were extracted from PISCES database set (Wang and Dunbrack, 2003).

The backbone coordinates of some proteins included in the databases are gapped, i.e. some amino acids are missing. Therefore, the local structural properties, such as the ri,i+k distances between alpha carbons, were calculated only for the continuous fragments, i.e. when ri,i+1 = 3.8 ± 0.5 Å for all residues included in a fragment. In the structural sense this non-broken subchain has been treated as a separate protein chain. However, the gaps mentioned above were not taken into account in the PSIBLAST search—entire sequences (possibly with some amino acids missing) were used for generating the sequence profile. Because for short (20 amino acids and shorter) fragments the PSIBLAST results are not significant, we decided not to use each separate subchain as an input for PSIBLAST. Gaps existing in the query sequence may result in gaps in the multiple sequence alignment. The resulting profile has been cut into fragments to match structural subchains.

Comparison of profiles
Only short fragments of sequence profiles were compared. Their lengths were fixed and equal L. Depending on the distance range for particular potentials, the optimal values of L have been found to be equal 17, 18 and 19 for ri,i+2, ri,i+3 and ri,i+4 distances, respectively. In the simple case, the profile comparison score can be written as a sum of scores for aligning the related columns from profiles 1 and 2:

(1a)
where Cj,i is the column corresponding to i-th sequence position in j-th profile. The similarity score for two columns score(C1,i, C2,i) from two profiles is defined as follows:

(1b)
where k and l are amino acid types (k,l {ALA,GLY,etc.}), Cp,i,k is the probability of the k-th amino acid type occurrence on the i-th position in the p-th profile. M(k,l) denotes the similarity score for amino acids k and l. We used the BLOSUM62 similarity matrix.

It has to be noted that the raw score given by formulas (1a) and (1b) is highly dependent on the fragment length and its amino acid composition. Therefore, it was normalized in the form of z-score (Panchenko, 2003):

(2)
The mean value of the score <SP> and the standard deviation of the score {sigma}(SP) has to be estimated for all the permutations of the columns in both profiles. Thus, <SP> stands for an average alignment score for two profiles with a given length and amino acid composition, no matter what the amino acid order (column order in profiles) is. Consequently, <SP> was calculated as follows:

(3)
{sigma} (SP) was calculated in a similar manner. Due to the non-local scoring we could not use any of the standard alignment tools such as the local sequence alignment (Smith and Waterman, 1981). Indeed, the values <SP> and {sigma} (SP) depend not only on the entire aligned fragment, but also on the amino acids pair being aligned in a given step of the dynamic programming algorithm. The present approach required the assumption that L is a constant. As a result the computational cost was greatly reduced.

Comparison of pairs of sequences
The new short-range potentials proposed in this work heavily rely on the sequence profiles. However, in order to evaluate the effect of evolutionary information on the specificity of the designed potentials, the same calculations (for prediction of the local distances in proteins) have been conducted with single protein sequences. The scoring formulas were very similar to those for scoring profiles:

(4a)

(4b)
were sj,i denotes the i-th amino acid in the j-th sequence. We did not derive potentials based on the (single) sequence similarity.

Short-range statistical potentials
The general idea of the design of the short-range potential follows our previous work (Kolinski et al., 1999; Kolinski and Skolnick, 1998). These potentials were extensively tested in various applications of the reduced protein models, from comparative modeling to ab-initio folding (Kolinski et al., 1999; Kolinski and Skolnick, 1998; Boniecki et al., 2003; Kolinski, 2004). For the reader’s convenience it is briefly outlined below.

Potential functions R13, R14 and R15 have been derived for three types of short-range distances: between the i-th and (i + 2)-th alpha carbons (called r13), the i-th and the (i + 3)-th (called r14), and the i-th and the (i + 4)-th alpha carbons (called r15). (According to the convention assumed in this work rij denotes distance and Rij potential corresponding to the rij distance). Denoting ri as the coordinate vector of the i-th Ca and vi as the unit vector along the virtual Ca–Ca bond, the distances mentioned above are defined as follows:

(5)
Statistics for each potential has been generated from a non-redundant structural database, described above. For the r13 statistics, the histograms contained 8 bins from 0 to 8 Å, for r14—24 bins from –12 to 12 Å and for r15 16 bins from 0 to 16 Å. The negative values of the r14 distances denote the left-handed conformations, while the positive ones stand for the right-handed conformations of the three successive Ca backbone vectors. The geometry of the 1–3 (two consecutive virtual Ca bonds) fragments has no chirality and the definition of the chirality of the 1–5 fragments is somewhat ambiguous. Thus, only the chirality of the 1–4 fragments is treated in the explicit way and is denoted by the symbol ‘*’ in the abbreviation . Then, the potentials have been calculated from the histograms as follows:

(6)
where Ek denotes the value of the potential for the k-th bin of the histogram, ni is the number of observations for the k-th bin and n0 is the expected number of observation for the k-th bin. The expected values of the histograms are easy to calculate:

(7)
where N0 is the total number of observations for the histogram and k is the number of bins in the histogram (8, 24 or 16—depending on the potential). In order to make all the potentials complete, the maximum for all the short-range interactions has been set equal to an arbitrary value of 2.0 and ascribed to all empty bins of the distance histogram.

All the potentials depend on the identity of the two amino acids (see Fig. 1):

  1. The R13 potential for the i-th residue depends on the identity of the i-th and the (i + 2)-th amino acid.
  2. The R14 potential for the i-th residue depends on the identity of the (i + 1)-th and the (i + 2)-th amino acid.
  3. The R15 potential for the i-th residue depends on the identity of the (i + 2)-th and the (i + 4)-th amino acid.



View larger version (27K):
[in this window]
[in a new window]
 
Fig. 1 Definitions of the r13 (at the top), r14 and r15 distances (at the bottom of the picture).

 
The potentials could be also made specific to the local secondary structure:
  1. for R13—when amino acids i,i + 1 and i + 2 are helical, then the fragment is assigned as a helix. When all the three are in a beta sheet, the fragment is assigned as a beta-type. In all the other cases it is treated as a coil.
  2. for R14—when amino acids i, i+1, i + 2 and i + 3 are helical, then the fragment is assigned as a helix. When all the four are in a beta sheet, the fragment is assigned as a beta-type. In all other cases it is treated as a coil.
  3. for R15—when amino acids i, i + 1, i + 2, i + 3 and i + 4 are helical, then the fragment is assigned as a helix When all the five are in a beta sheet, the fragment is assigned as a beta-type. In all the other cases it is treated as a coil.
The secondary structure has been assigned by DSSP (Kabsch and Sander, 1983) program assuming the reduced three-letter code. For every type of secondary structure a separate set of potentials has been derived. Thus, for each distance type 1200 (20 x 20 x 3) different possibilities exist. Statistics for all the cases were collected separately and subsequently transformed into potentials. Consequently, the effects of known or predicted secondary structure can be incorporated into algorithms employing these potentials.

Short-range, protein-dependent (sequence similarity-based) potentials
The use of sequence profiles instead of sequences greatly improves the sensitivity of sequence comparisons. For instance, the assumption that local structural similarity follows local sequence similarity is employed in several secondary structure prediction methods, such as PSIPRED (Jones, 1999) or PHD (Rost and Sander, 1993).

In the present work the short-range potentials, which have to be derived separately for each protein sequence, are designed and evaluated. Statistics accounts only for profiles (of known protein structures) that are locally similar to the sequence profile of the query protein (for which the potential is calculated). Each observation is weighted by the local similarity score. The details of the procedure for calculation of the R13 potential are given below as an example.

Let us consider r13 distance between the i-th and the (i + 2)-th residues. A protein profile fragment of length L, containing the (i + 2)-th amino acid at its center is compared to all the profile's fragments in a database. In our case L = 17, therefore the i-th, (i + 1)-th and the (i + 2)-th residues were at positions 8, 9 and 10 in the fragment of interest.

To further improve the potentials, the secondary structure information can be used. A term scoring similarity between the predicted secondary structure for a query protein and the secondary structure of a protein in the structural database is added.

(8)
is the similarity score between the secondary structures of two residues:

(9)
For the central part of the fragment, i.e. i [6,10], {varepsilon} =0.16 and 0.08 for the remaining positions.

For each residue in the query protein (except the nine amino acids fragments at the N-terminus and the C-terminus of the sequence) separate histograms were generated. Only the observations of r13 distance with zSPT (or zSP) bigger than a threshold value SMIN were included. For each bin in the histograms, average score (zSPT or zSP) was also calculated. Then, the homology potentials were calculated in a similar fashion as it was done for the simple statistical potentials. The main difference was that the number of hits for a bin in a given histogram was weighted by the average profile similarity score Si for the bin:

(10)

(10a)

Sometimes there are none, or very little locally similar profiles for some regions of the query sequence. In such cases the profile-based potential cannot be calculated, or the result would be irrelevant. Therefore, the profile-based potentials have been determined only in these cases where the number of hits in the histogram (denoted in the formulas written above as N0) was higher than a certain threshold value NMIN. Proper entries from the corresponding simple statistical potentials were used for the remaining positions along the sequence. Proper means the same database (PDB90 or PDB30) and the same level of the secondary structure information used in the derivation process.

Potential for scoring r14* and r15 distances have been derived in an analogous fashion. After optimization based on the gapless threading test described below, the best values of the cut-off parameters were found. In case where the secondary structure-based scoring was not applied SMIN = 1.0, otherwise SMIN = 1.5. In all cases NMIN = 5.0. Five observations per histogram may appear to be too small. However, it should be noted that in case of our profile-based potentials all observations fall into one or at most into two bins.

Gapless threading procedure
In order to calculate the gapless threading (Sippl and Weitckus, 1992) score for a pair of proteins, the shorter one has been thread within the longer. For each relative position of the first protein in the second protein the short-range energy of the first sequence in the structure of the second protein has been calculated, as well as the energy of the second sequence in the first structure. A minimum energy has been reported for each ‘sequence–structure’ pair. Let Ei,j denote the energy for the i-th sequence and the j-th structure. Then the mean z-score for threading of the sequence through all the structures from the test set is calculated as follows:

(11)
where <Ek,i>i is the average energy calculated for the k-th sequence in all the structures. The mean z-score for threading of all the sequences through a structure is calculated in a similar manner:

(12)
The set of proteins used for the gapless-threading test contained only the continuous-chain proteins from the PDB30 database (N = 1308 structures). Each protein from the set has been thread through all the remaining proteins except of those longer (or shorter) by 80 amino acids or more than the query sequence.


    RESULTS
 TOP
 Abstract
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 CONCLUSION
 REFERENCES
 
Dependence between the local sequence similarity and the local structure similarity
For a pair of proteins from the database PDB30 every sequence fragment from the first protein has been compared with every fragment from the second protein. Due to the low-sequence similarity in the database none of the proteins was compared to its close homologues. For all possible pairs of sequence windows, z SS, zSP, and difference between r13 distances in the two structural fragments were calculated.

The collected statistics can be illustrated in a form of a two-dimensional histogram. Figure 2 shows the number of counts for a given z SS score and the absolute value of the difference between the r13 values in the compared sequences. The difference between the r13 measures the error of the prediction. In Figure 3 a similar statistics for the case of local profile-based comparison is presented. In both cases a score below 1.4 is not statistically significant—any values of the r13 error in the range of 0–2.5 Å are almost equally probable. Comparison of Figure 2 and Figure 3 shows that the use of the sequence profiles (in contrast to sequences alone) leads to a significant fraction of the high scoring hits, with a very low error of the r13 predictions. Histograms for r14 and r15 distances look very similar to those for r13. It is clear that the value of the local profile similarity score higher than a certain threshold value usually implies a significant local structure similarity. Much larger number of the high scoring (and structurally very similar) fragments was detected using the profile-based approach. Thus, it is expected that the homology potentials should be much more specific.



View larger version (60K):
[in this window]
[in a new window]
 
Fig. 2 Correlation between the zSS score and the r13 error for the sequence comparisons.

 


View larger version (63K):
[in this window]
[in a new window]
 
Fig. 3 Correlation between the zSS score and the r13 error for the profile–profile comparison.

 
Summary of the profile-based potentials
In order to asses the influence of different factors on the quality of the derived potentials several variants of R13, R14 and R15 potentials were calculated. The set of test proteins contained only non-broken protein chains from the PDB30 and PDB90 databases.

In order to asses the quality of potentials computed from the data which lack homology relationship with the query sequence, we used the PDB30 set (non-broken proteins and fragments, see ‘Input databases’ for details). The query protein was always removed from the source set. These (low-homology) potentials are addressed to ab-initio simulations.

On the contrary, for many sequences, there are many homologous proteins with already known structures. To model this case, we used the PDB90 as a source dataset.

When the secondary structure information is ignored, the profile-based potentials cover 14.6% of all the residues (the fraction of statistically significant hits), when derived from the PDB30 database, and 37.9% when derived from the PDB90 database. When the secondary structure information is included, this ratio raises to 68 and 76.7%, respectively. Detailed comparison of the results is given in Tables 1 and 2.


View this table:
[in this window]
[in a new window]
 
Table 1 Homology potentials do not cover the whole protein

 

View this table:
[in this window]
[in a new window]
 
Table 2 Average percentages of the residues having the native distances in the global minimum of the R13 potential

 
All calculations described above were conducted with known secondary structure assigned by DSSP. In order to check how the predicted secondary structure influences our potentials we repeated the threading test (threading trough all structures from the PDB30 set) for a set of 37 proteins randomly selected from the PDB30 database. PSIPRED (Jones, 1999) was used as a tool for the secondary structure prediction.

Simple statistical potentials derived from both PDB30 and PDB90 datasets were used as the reference baseline in the evaluation of the profile-based potentials. Gapless threading tests were used for this purpose. Prior the proper tests threading was also used as a tool for optimization of the algorithm's parameters L, NMIN and SMIN. Table 3 contains a summary of the evaluation of the relative performance of various potentials.


View this table:
[in this window]
[in a new window]
 
Table 3 Comparison between the profile-based and the simple statistical potentials in the gapless threading test

 
The z-scores for the simple statistical potentials appear to be very low. Nevertheless, these potentials (and similar potentials) perform very well in ab initio folding simulations and in threading calculations. The explanation is that the local conformational stiffness of polypeptide chains and formation of the secondary structure are cooperative phenomena. Thus, the specificity for larger fragments of the sequence could be significantly higher than it might be expected from the separated entries of the potential. The z-scores for the profile-based potentials are much higher implying a higher specificity. The predicted secondary structure information is almost as good as the exact secondary structure in augmenting the quality of the potentials. Simple tests based on the prediction of the local distances (see Fig. 4) show also a qualitative superiority of the profile-based approach. We expect that the potentials provided here will become a valuable tool for protein structure prediction.



View larger version (24K):
[in this window]
[in a new window]
 
Fig. 4 Dependence between the local energy, calculated for five-residue fragments and dRMSD from native: (a) the statistical potential, (b) the profile-based potential. In both cases the potentials were derived with known secondary structure from PDB90 database. The gray scale is proportional to the number of counts.

 
Test of the profile-based short-range potential in the ab-initio loop modeling
The simple statistical potentials (employed here as a baseline for evaluation of the quality of the profile-based potentials) were used previously in various applications of protein modeling with a reduced representation of conformational space. It has been shown that the reduced models with knowledge-based force field (where the statistical potentials of the short-range interactions were their essential components) allow much more accurate modeling of protein fragments (or loops) than it is possible with more standard tools of molecular modeling (Boniecki et al., 2003; Kolinski, 2004). This way a range of applicability of comparative modeling could be significantly expanded. Recently, we used these statistical potentials for a well-controlled test of applicability of reduced models in loop modeling (Kolinski, 2004) demonstrating very good geometrical fidelity of the resulting models. The results were compared with a very similar test of comparative modeling done recently by Fiser and Sali (2003) for the new version of MODELLER. Here the same experiment is repeated using the profile-based potentials instead the simple statistical ones. All other conditions of the computational experiment are exactly the same as in our previous work, i.e. the same are: the test set of proteins, simulation technique and the remaining components of the force field.

The test set contains five small globular proteins of various structural classes. Using the PDB structures we made the DSSP assignments of their secondary structure. Regular elements of the secondary structure (helices—H and the extended fragments of ß-sheets—E) were assumed to be a template for the loop modeling. Remaining portions of the structures were treated as unknown. Random starting conformations of the loops have been generated in the same fashion as in the previously performed experiment with the simple statistical potential of the short-range interactions. The loop optimization has been done using the CABS-reduced representation (Boniecki et al., 2003; Kolinski, 2004) and the Replica Exchange Monte Carlo sampling protocol. Entire structures were optimized, although the core (or template) part was kept near the starting conformation by a set of strong native-like distance restraints. The lowest energy structures were selected for the final evaluation. The details of the CABS model, its force field and the sampling details could be found in our recent publications (Boniecki et al., 2003; Kolinski, 2004).

The results of the loop modeling are compared in Table 4. The data from the previous work are given in the parentheses. Clearly, the structures generated with the help of the profile-based short-range potentials are consistently more accurate. In two cases (2fdx and 2gb1) the improvement in the model quality was of a qualitative character. Thus it has been demonstrated that the new potentials have higher predictive power not only for the regular fragments of protein structures but they also improve the loop predictions. In all cases the conservative PDB30 versions of the potentials were used. Obviously, the PDB90 potentials can only be more accurate.


View this table:
[in this window]
[in a new window]
 
Table 4 Comparison of the performance of the sequence similarity-based potential with the simple statistical potentials in the loop modeling of globular proteins

 

    CONCLUSION
 TOP
 Abstract
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 CONCLUSION
 REFERENCES
 
We derived and compared various short-range interaction potentials using multiple sequence alignments to identify related proteins and then profile–profile local alignments to score sequence to structure compatibility of short fragments. Geometry of these fragments was described as a set of short-range distances between the alpha carbons and the resulting statistical potentials were stored in a form of energy histograms. The potentials were tested in the context of gapless threading, in the ability to predict geometry of short fragments of protein backbone and in a conservative test of its application to comparative modeling. It has been demonstrated that a higher level of sequence similarity in the structural database as well as known (or predicted) secondary structure increase the specificity and sensitivity of the potentials. Interestingly, the new potentials work well also in the loop regions of protein structures. Example data for a set of proteins are available on our homepage (http://www.biocomp.chem.uw.edu.pl). The algorithms for derivation of the potentials for large sets of proteins are available upon request. Future applications of the new potentials include refinement of the threading alignments, homology modeling with reduced representation of the protein conformational space and ab initio structure prediction for globular proteins.


    Acknowledgments
 
This work was partially supported by grant no PBZ-KBN-088/P04/2003. We would like to express our thanks to Anna Oleksy for critical reading of this manuscript.

Received on June 8, 2004; revised on September 14, 2004; accepted on October 4, 2004

    REFERENCES
 TOP
 Abstract
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 CONCLUSION
 REFERENCES
 

    Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402[Abstract/Free Full Text].

    Boniecki, M., Rotkiewicz, P., Skolnick, J., Kolinski, A. (2003) Protein fragment reconstruction using various modeling techniques. J. Comput. Aided Mol. Des., 17, 725–738[CrossRef][Web of Science][Medline].

    Fiser, A. and Sali, A. (2003) ModLoop: automated modeling of loops in protein structures. Bioinformatics, 19, 2500–2501[Abstract/Free Full Text].

    Gribskov, M., McLachlan, A.D., Eisenberg, D. (1987) Profile analysis: detection of distantly related proteins. Proc. Natl Acad. Sci., USA, 84, 4355–4358[Abstract/Free Full Text].

    Jones, D.T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol., 292, 195–202[CrossRef][Web of Science][Medline].

    Kabsch, W. and Sander, C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22, 2577–2637[CrossRef][Web of Science][Medline].

    Kolinski, A. (2004) Protein modeling and structure prediction with a reduced representation. Acta Biochim. Pol., 51, 349–371[Web of Science][Medline].

    Kolinski, A., Betancourt, M.R., Kihara, D., Rotkiewicz, P., Skolnick, J. (2001) Generalized comparative modeling (GENECOMP): a combination of sequence comparison, threading, and lattice modeling for protein structure prediction and refinement. Proteins, 44, 133–149[CrossRef][Web of Science][Medline].

    Kolinski, A., Rotkiewicz, P., Ilkowski, B., Skolnick, J. (1999) A method for the improvement of threading-based protein models. Proteins, 37, 592–610[CrossRef][Web of Science][Medline].

    Kolinski, A. and Skolnick, J. (1998) Assembly of protein structure from sparse experimental data: an efficient Monte Carlo Model. Proteins, 32, 475–494[CrossRef][Web of Science][Medline].

    Panchenko, A.R. (2003) Finding weak similarities between proteins by sequence profile comparison. Nucleic Acids Res., 31, 683–689[Abstract/Free Full Text].

    Rost, B. and Sander, C. (1993) Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol., 232, 584–599[CrossRef][Web of Science][Medline].

    Simons, K.T., Kooperberg, C., Huang, E., Baker, D. (1997) Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J. Mol. Biol., 268, 209–225[CrossRef][Web of Science][Medline].

    Sippl, M.J. and Weitckus, S. (1992) Detection of native-like models for amino acid sequences of unknown three-dimensional structure in a data base of known protein conformations. Proteins, 13, 258–271[CrossRef][Web of Science][Medline].

    Skolnick, J., Zhang, Y., Arakaki, A.K., Kolinski, A., Boniecki, M., Szilagyi, A., Kihara, D. (2003) TOUCHSTONE: a unified approach to protein structure prediction. Proteins, 53, (Suppl. 6), 469–479.

    Smith, T.F. and Waterman, M.S. (1981) Identification of common molecular subsequences. J. Mol. Biol., 147, 195–197[CrossRef][Web of Science][Medline].

    Wang, G. and Dunbrack, R.L., Jr. (2003) PISCES: a protein sequence culling server. Bioinformatics, 19, 1589–1591[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/7/981    most recent
bti080v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Gront, D.
Right arrow Articles by Kolinski, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gront, D.
Right arrow Articles by Kolinski, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?