Skip Navigation


Bioinformatics Advance Access originally published online on November 13, 2007
Bioinformatics 2008 24(2):272-275; doi:10.1093/bioinformatics/btm564
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/2/272    most recent
btm564v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Ángyán, A. F.
Right arrow Articles by Gáspári, Z.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ángyán, A. F.
Right arrow Articles by Gáspári, Z.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Fast protein fold estimation from NMR-derived distance restraints

Annamária F. Ángyán 1, András Perczel 1,2, Sándor Pongor 3,4 and Zoltán Gáspári 1,*

1Institute of Chemistry, Eötvös Loránd University, 2MHAS-ELTE Protein Modelling Group, Pázmány Péter sétány 1/A, 1117 Budapest, Hungary, 3Protein Structure and Bioinformatics Group, International Centre for Genetic Engineering and Biotechnology, Padriciano 99, 34012 Trieste, Italy and 4Bioinformatics Group, Biological Research Centre, Hungarian Academy of Sciences, Temesvári körút 62, 6701 Szeged, Hungary

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 ACKNOWLEDGEMENTS
 REFERENCES
 

Summary: PRIDE-NMR is a fast novel method to relate known protein folds to NMR distance restraints. It can be used to obtain a first guess about a structure being determined, as well as to estimate the completeness or verify the correctness of NOE data.

Availability: The PRIDE-NMR server is available at http://www.icgeb.org/pride

Contact: szpari{at}chem.elte.hu

Supplementary information: Description of the server and details of the tests presented can be found at http://www.icgeb.org/pride


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 ACKNOWLEDGEMENTS
 REFERENCES
 
The main bottleneck in protein structure determination with NMR spectroscopy is structure calculation by using–primarily NOE-based–structural restraints derived from the acquired spectra. The length and outcome of this multi-step process depends heavily on the quality and quantity of the spectral data and also on the reliability of the resonance assignment. Although there are many approaches to speed up structure calculation, capable of yielding a protein structural model of acceptable quality (Herrmann et al., 2002; Rieping et al., 2007), thorough investigation of a chosen protein may require manual intervention by the researcher in order to separate valid experimental information from artifacts. Moreover, automated or semi-automated methods work best with high quality spectra not accessible for all proteins and conditions of interest. Information about secondary and tertiary structure can be obtained by analyzing chemical shifts (Cavalli et al., 2007) and residual dipolar couplings if a homolog with known 3D coordinates is available (Annila et al., 1999; Delaglio et al., 2000). However, as high-quality NMR structure determination relies primarily on NOE-based restraints, a fast method capable to relate known folds to the obtained NOE data set could be of valuable help for the NMR spectroscopist. Furthermore, even when structure determination is straightforward, an independent test of the validity of the obtained fold could be desirable.

Here we report the development of a conceptually simple and fast method, PRIDE-NMR, able to select folds compatible with a given set of NOE data. The name and concept comes from the fast protein fold comparison procedure, PRIDE (PRobability of IDEntity, Carugo and Pongor, 2002), which is based on the comparison of C{alpha}–C{alpha} distance distributions.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 ACKNOWLEDGEMENTS
 REFERENCES
 
PRIDE-NMR compares the distributions of short interproton distances (obtained from NMR experiments or back-calculated from 3D coordinates) within the widely defined protein backbone (amide H, H{alpha} and Hβ atoms). The number of distance restraints or close H–H pairs is represented as a histogram with bins corresponding to the sequential separation of the participating residues (Fig. 1). The two histograms are compared with contingency analysis. Histograms are normalized to 100% and bins containing <5% of the total data are combined successively with the next ones to ensure than no values below 5% are used (described in detail in Carugo and Pongor, 2002). As in PRIDE, the resulting score (0 ≤ PRIDE-NMR Score ≤1) can be interpreted as the probability of the two data sets representing the same fold. The exclusion of side-chain hydrogen atoms beyond the β position renders the method largely independent of the sequences of the proteins compared.


Figure 1
View larger version (32K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Graphical representation of the PRIDE-NMR method.

 
Two approaches were introduced in order to increase the sensitivity: the first one is a score weighting with a given power (1, 2 or 3) of the ratio of the lengths of the proteins compared:


Formula

The second one is a filter that discards hits differing in length from the query by more than a chosen percentage. The size of the target protein (number of residues) is a piece of sequence-independent information known to the NMR-spectroscopist.

To set up a web server capable of relating NMR distance data sets to a wide range of known folds, we used the SCOP database (Murzin et al., 1995), which contains X-ray structures and also proteins with less than 40 residues, typically accessible to NMR structure determination. We used the 95% sequence similarity filtered subset of SCOP (also used in the Protein Classification Benchmark Database, Sonego et al., 2007). Hydrogen atoms were placed on all structures using the pdb2gmx program of the GROMACS molecular dynamics package (van der Spoel et al., 2001) with the OPLS-AA force field (Kaminski et al., 2001). Minor modifications were made to enable handling of structures with missing atomic coordinates. H–H distance distributions were calculated with distance cutoffs of 5, 6 and 7 Å and averaging the positions of protons in alanine methyl groups. The server uses a database of precalculated distance distributions (back-calculated from 3D structures) and accepts distance restraints in X-PLOR/CNS format (Brünger et al., 1998). The user is allowed to choose the weighting and/or length filtration mode as well as the cutoff distance(s) for the database distributions (for multiple distances, the averaged scores are calculated). The server was implemented in Perl and C++ and is integrated into the PRIDE2 interface at http://www.icgeb.org/pride (Gáspári et al., 2005).


    3 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 ACKNOWLEDGEMENTS
 REFERENCES
 
Owing to the relatively low computational demands of the implementation, the PRIDE-NMR server is extremely fast, yielding results in the order of a second. This speed allows for multiple runs with adjusting the parameters of the query to explore the relationship of an NOE data set to the folds in the SCOP database.

As a first test, five members from each of three protein families represented in SCOP with deposited NMR distance restraint sets were used: the ubiquitin-related (SCOP d.15.1.1; Table 1), the SH3-domain (SCOP b.34.2.1) and PMP inhibitor (SCOP g.4.1.1) families. Even NOE data sets for which the corresponding structure is not represented in the 95% sequence similarity filtered SCOP list yield good results, i.e. the method is able to find related structures in the database (e.g. 1d3z finds 9 relatives in the ubiquitin family in the first 10 hits). This shows the general applicability of the method. It is clear that the relative number of restraints has a profound, but not decisive effect on the hits: the number of positive hits usually increases with the number of restraints, but the relationship is more complex. According to the principles of the PRIDE-NMR method, the most important factor is how well the restraints represent the structure, which generally, but not always improves with the increasing number of NOE restraints (compare 1g6j with 1p1a). Note that other types of restraints might also be used simultaneously for NMR structure determination, thus our results do not directly reflect the ‘quality’ of the database structures.


View this table:
[in this window]
[in a new window]

 
Table 1. Summary of PRIDE-NMR search with the ubiquitin test set

 
Similar results were obtained for the SH3 domain and PMP inhibitor families (see the PRIDE-NMR web site) with only one protein not yielding any positive hits among the first 10 and displaying the general but not exclusive correspondence between the number of positive hits and average number of restraints per residue.

For a more comprehensive test, a set of another 40 proteins with available NMR distance restraints, covering a wide range of folds (each classified differently at the fourth level of the SCOP hierarchy, representing 40 families and 37 superfamilies, for a complete list of the domains and results, see the PRIDE-NMR web site) was selected. These domains all have relatives (domains in the same family) in the database, have an average number of intrabackbone restraints per residue above 1, and are of varying lengths (24–182 residues). A test with criteria similar to that performed by Novotny et al. (2003) to assess protein fold comparison servers was performed: hits in the same Superfamily (third level of classification) as the query were considered positive and the first 100 hits were monitored excluding self-hits. However, we note that in our case the usual procedures to asses server performance should be used with care as the data in our server database and the input NMR-based distances do not correspond to each other on a one-to-one basis (i.e. the input data set is not a subset of the server database as could be for, e.g. a protein fold comparison method).

Best results were obtained using a cutoff distance of 5 Å or the averaged scores calculated for 5 and 6 Å (Table 2): in these cases, the method resulted at least one positive hit within the first 100 for 100% and 97% of the queries, respectively. This success rate is comparable to those reported for the best protein comparison servers (using a different set of proteins, Novotny et al., 2003). However, having a single positive hit among the first 100 is clearly not sufficient for quick structure estimation. Again, the quality of NMR distance data (which cannot be expected to be uniform in our data set) is prevalent, and thus, the exact position of the first true positive (if known) in the hit list can be used to assess the completeness of NOE data (see below).


View this table:
[in this window]
[in a new window]

 
Table 2. Performance of the PRIDE-NMR method with the 40-protein test set

 
Tests were also run using randomly truncated data sets: the number of distances was decreased by reducing the contents of randomly selected bins until the desired percentage of data remained. This procedure was applied both to the back-calculated (from 3D domain structures) and NOE data sets using the same 40 domains as above and repeating the random truncation 10 times for each data set (Fig. 2).


Figure 2
View larger version (20K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Performance of the method with randomly truncated back-calculated (cutoff distances 5 and 6 Å) and NOE data sets (averaged scores for cutoffs of 5 and 6 Å). Columns represent the fraction of the query data set yielding at least one positive hit in the 100 (compare to data in Table 2).

 
Interestingly, a database search with the back-calculated distance distributions yields considerably worse results than using NOE-based data even without truncation (~88% positive hits compared to 97–100%; Fig. 2). This finding is especially surprising given that the back-calculated distance distributions (calculated using a distance cutoff of 6 Å) contain about an order of magnitude more data than the NMR restraint sets. [We note that this can be regarded as a respectable performance compared to the best protein fold comparison servers (Novotny et al., 2003) and does not point to serious classification errors.] In our view, this may partly be explained by the insufficient ability of H–H distance distributions to represent the folds (e.g. the PRIDE method uses 28 sets of C{alpha}–C{alpha} distance distributions; Carugo and Pongor, 2002; Gáspári et al., 2005). On the other hand, it seems that NOE data sets, although sparse because of experimental errors and protein dynamics, represent the ‘quintessence’ a protein fold (i.e. they incorporate the most important H–H distances). The improvement caused by multiple cutoffs is in conceptual agreement with the dynamic nature of proteins and with the notion that a single conformer cannot fulfill all restraints simultaneously (Lindorff-Larsen et al., 2004). The scarcity of NMR distance data also underlines the importance of sophisticated structure calculation methods with reliable force fields to obtain high quality, biologically relevant structural models (Richter et al., 2007).

In summary, PRIDE-NMR is a simple method yielding results well within a minute. We foresee the following application areas:

Obtaining a first guess about the fold before structure calculation, complementing sequence and chemical shift information.
Estimating the completeness of NOE data (i.e. whether or not the available restraints are sufficient for structure determination) in cases when the target structure is related to a known one.
Detecting of errors in resonance assignment if the target structure has known homolog(s).

With PRIDE-NMR, these checks can be performed routinely and multiple times during structure determination, allowing avoidance of futile calculations with erroneous or incomplete NOE data sets.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 ACKNOWLEDGEMENTS
 REFERENCES
 
Funding Grants from the Hungarian Scientific Research Fund (OTKA F68079, TS049812, T046994) and the International Centre for Genetic Engineering and Biotechnology (Hun-04-03), as well as a János Bolyai Research Fellowship (to Z.G.) are acknowledged.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Burkhard Rost

Received on August 23, 2007; accepted on November 6, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Annila A, et al. Recognition of protein folds via dipolar couplings. J. Biomol. NMR (1999) 14:223–230.[CrossRef][Web of Science]

    Brünger AT, et al. Crystallography & NMR System: a new software suite for macromolecular structure determination. Acta Crystallogr. D Biol. Crystallogr. (1998) 54:905–921.[CrossRef][Medline]

    Carugo O, Pongor S. Protein fold similarity estimated by a probabilistic approach based on C{alpha}-C{alpha} distance comparison. J. Mol. Biol. (2002) 315:887–898.[CrossRef][Web of Science][Medline]

    Cavalli A, et al. Protein structure determination from NMR chemical shifts. Proc. Natl Acad. Sci. USA (2007) 104:9615–9620.[Abstract/Free Full Text]

    Delaglio F, et al. Protein structure determination using molecular fragment replacement and NMR dipolar couplings. J. Am. Chem. Soc. (2000) 122:2142–2143.[CrossRef][Web of Science]

    Gáspári Z, et al. Efficient recognition of folds in protein 3D structures by the improved PRIDE algorithm. Bioinformatics (2005) 21:3322–3323.[Abstract/Free Full Text]

    Herrmann T, et al. Protein NMR structure determination with automated NOE-identification in the NOESY spectra using the new software ATNOS. J. Biomol. NMR (2002) 24:171–189.[CrossRef][Web of Science][Medline]

    Kaminski GA, et al. Evaluation and reparametrization of the OPLS-AA force field for proteins via comparison with accurate quantum chemical calculations on peptides. J. Phys. Chem. (2001) 105:6474–6487.[Web of Science]

    Lindorff-Larsen K, et al. Simultaneous determination of protein structure and dynamics. Nature (2004) 433:128–132.

    Murzin AG, et al. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. (1995) 247:536–540.[CrossRef][Web of Science][Medline]

    Novotny M, et al. Evaluation of protein fold comparison servers. Proteins (2003) 233:260–270.

    Richter B, et al. The MUMO (minimal under-restraining minimal over-restraining) method for the determination of native state ensembles of proteins. J. Biomol. NMR (2007) 37:117–135.[CrossRef][Web of Science][Medline]

    Rieping W, et al. ARIA2: automated NOE assignment and data integration in NMR structure calculation. Bioinformatics (2007) 23:381–382.[Abstract/Free Full Text]

    Sonego P, et al. A protein classification benchmark collection for machine learning. Nucleic Acids Res. (2007) 35:D232–D236.[Abstract/Free Full Text]

    van der Spoel D, et al. GROMACS: fast, flexible and free. J. Comput. Chem. (2005) 26:1701–1718.[CrossRef][Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/2/272    most recent
btm564v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Ángyán, A. F.
Right arrow Articles by Gáspári, Z.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ángyán, A. F.
Right arrow Articles by Gáspári, Z.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?