Skip Navigation


Bioinformatics Advance Access originally published online on November 22, 2006
Bioinformatics 2007 23(3):381-382; doi:10.1093/bioinformatics/btl589
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/3/381    most recent
btl589v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Rieping, W.
Right arrow Articles by Nilges, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Rieping, W.
Right arrow Articles by Nilges, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

ARIA2: Automated NOE assignment and data integration in NMR structure calculation

Wolfgang Rieping 1,{dagger}, Michael Habeck 1,{dagger}, Benjamin Bardiaux 1,2, Aymeric Bernard 1, Thérèse E. Malliavin 1 and Michael Nilges 1,*

1 Unité de Bioinformatique structurale, CNRS URA 2185, Institut Pasteur, 25-28 rue du docteur Roux 75015 Paris, France
2 Laboratoire de Biochimie Théorique, CNRS UPR 9080, Institut de Biologie Physico-Chimique, 13 rue P. et M. Curie 75005, Paris, France

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 ITERATIVE NOE ASSIGNMENT
 DATA INTEGRATION
 GRAPHICAL USER INTERFACE
 IMPLEMENTATION
 REFERENCES
 

Summary: Modern structural genomics projects demand for integrated methods for the interpretation and storage of nuclear magnetic resonance (NMR) data. Here we present version 2.1 of our program ARIA (Ambiguous Restraints for Iterative Assignment) for automated assignment of nuclear Overhauser enhancement (NOE) data and NMR structure calculation. We report on recent developments, most notably a graphical user interface, and the incorporation of the object-oriented data model of the Collaborative Computing Project for NMR (CCPN). The CCPN data model defines a storage model for NMR data, which greatly facilitates the transfer of data between different NMR software packages.

Availability: A distribution with the source code of ARIA 2.1 is freely available at http://www.pasteur.fr/recherche/unites/Binfs/aria2

Contact: nilges{at}pasteur.fr


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 ITERATIVE NOE ASSIGNMENT
 DATA INTEGRATION
 GRAPHICAL USER INTERFACE
 IMPLEMENTATION
 REFERENCES
 
The assignment of nuclear Overhauser enhancement (NOE) peaks is the most time-consuming step in the analysis of nuclear magnetic resonance (NMR) data and structure calculation. Though several programs exist that facilitate a manual analysis of spectra, the NOE assignment is tedious due to the large number of assignment possibilities, peak overlap and potential artifacts in the spectra. Therefore, a widely employed approach to NMR structure determination is to calculate a structural model from the experimental data by using programs for automated assignment, such as CANDID/CYANA (Herrmann et al., 2002), AUTOSTRUCTURE (Montelione et al., 2000) or ARIA (Ambiguous Restraints for Iterative Assignment; Nilges et al., 1997). Subsequently, the model is validated against the original data and, if necessary, refined using additional assignments derived with the aid of the model structure. ARIA uses an iterative protocol and the concept of ambiguous distance restraints (ADR) (Nilges, 1995) to automatically assign NOE cross-peaks.

Most software packages use proprietary formats for data storage, which need to be inter-converted for transferring data between different applications. This usually requires manual intervention and can lead to a loss of information since the data conversion is often incomplete. The Collaborative Computing Project for NMR (CCPN) data model (Fogh et al., 2005) alleviates these problems by defining a storage model that integrates all information emerging in a structure determination project in a common framework. This includes details on the molecular system, experimental data such as chemical shifts, NOEs, or residual dipolar couplings, as well as the results of a calculation, most importantly the assigned spectra and the 3D coordinates of the structures. We have incorporated the CCPN data model into ARIA to enable spectroscopists to use existing NMR computer programs in a very efficient way. Version 2.1 of ARIA has several other new features, which we summarize in this article.


    ITERATIVE NOE ASSIGNMENT
 TOP
 ABSTRACT
 INTRODUCTION
 ITERATIVE NOE ASSIGNMENT
 DATA INTEGRATION
 GRAPHICAL USER INTERFACE
 IMPLEMENTATION
 REFERENCES
 
ARIA assigns NOE cross-peaks by first deriving all possible assignments for each peak by matching a list of chemical shifts with frequency ‘windows’ centered around the position of a peak. Peak volumes are converted into distance restraints by using the isolated spin pair approximation, which relates the volume to the inverse sixth power of the distance between the two interacting spins. Ambiguous assignments are converted into ADRs, so that all assignment possibilities contribute to the target distance. However, most of the assignments are inconsistent, and thus cannot be fulfilled simultaneously by a single structure. ARIA performs an iterative protocol to identify wrong assignments and noise peaks: an iteration begins with correcting the restraint list by filtering out unlikely assignments and noise peaks (cf. Nilges, 1997, for details). Based on the filtered restraint list, a new structure ensemble is calculated which is analyzed in the next iteration.

The simplified treatment of non-bonded forces and missing solvent contacts during structure calculation can result in artifacts, such as unrealistic side-chain packing and unsatisfied hydrogen bond donors or acceptors. To reduce such artifacts, we refine the structures in explicit water (Linge et al., 2003), which also leads to a considerable improvement of structural quality (Nederveen et al., 2005). Finally, the water-refined ensemble is validated using several computer programs. We employ WHATIF to compare each conformer with a typical structure found in a database of high-resolution X-ray structures. The program PROCHECK analyzes the local fold in terms of a Ramachandran statistics, and the software PROSA can detect errors in the global fold of a protein. Several reports summarize the validation results and provide information on the assignments and restraints used for the calculation.


    DATA INTEGRATION
 TOP
 ABSTRACT
 INTRODUCTION
 ITERATIVE NOE ASSIGNMENT
 DATA INTEGRATION
 GRAPHICAL USER INTERFACE
 IMPLEMENTATION
 REFERENCES
 
Experience with earlier versions of the program showed that many problems occurring at later stages of a calculation are due to misformatted or inconsistent input files. To facilitate data validation, we have developed a new data format based on the extensible markup language (XML). ARIA defines XML formats to describe molecular systems (we follow the IUPAC recommendation), chemical shifts, and NOE cross-peaks, and uses the CcpNmr FormatConverter (Vranken et al., 2005) to convert >20 proprietary data formats into ARIA XML.

The program also offers the option to retrieve this information, along with other restraints, directly from a CCPN project. A CCPN project can be created manually by using, for example, the FormatConverter. The recommended approach, however, is to employ NMR analysis programs that support the data model directly, such as the software CcpNmr Analysis (Vranken et al., 2005). That way, the user can seamlessly launch an ARIA calculation, without prior data conversion (Fig. 1). Furthermore, ARIA automatically exports the result of a calculation, mainly the assigned peak lists, restraint lists, and the structure ensembles, to a CCPN project. This simplifies the submission of the calculation results to the databases PDB or BMRB, and enables the user, for example, to access the assignments directly from within CcpNmr Analysis, or to validate the restraints by using programs such as QUEEN (Nabuurs et al., 2003).


Figure 1
View larger version (54K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 Workflow in ARIA. A GUI simplifies the project setup and provides functionality to analyze the generated assignments. The CCPN data model is used for data import, and export of assigned spectra and calculated structures. It also simplifies information transfer to other NMR analysis programs, such as CcpNmr Analysis, and the submission of the results to the databases.

 

    GRAPHICAL USER INTERFACE
 TOP
 ABSTRACT
 INTRODUCTION
 ITERATIVE NOE ASSIGNMENT
 DATA INTEGRATION
 GRAPHICAL USER INTERFACE
 IMPLEMENTATION
 REFERENCES
 
A new graphical user interface (GUI) replaces the HTML web-form used in previous versions of ARIA to simplify and streamline the setup of a calculation (Fig. 1). The GUI enables one to modify all relevant program parameters, such as the iterative protocol and the simulated annealing schedule, the shape of the restraining potentials, as well as names and locations of the data files. Loading data from a CCPN project is straightforward by specifying the location of the project and the internal name used to identify the respective data set within a CCPN project.


    IMPLEMENTATION
 TOP
 ABSTRACT
 INTRODUCTION
 ITERATIVE NOE ASSIGNMENT
 DATA INTEGRATION
 GRAPHICAL USER INTERFACE
 IMPLEMENTATION
 REFERENCES
 
ARIA comes as a software library written in the object-oriented programming language Python. The modular design makes it easy for the user to extend and modify the program. The GUI is based on the graphics libraries Tcl/Tk and Tix, interfaced by Python. We use the program CNS (Bruenger et al., 1998) and a simulated annealing (SA) strategy (Nilges et al., 1997) to perform the structure calculation. In principal, the open design of ARIA facilitates the use of other structure calculation engines than CNS. Force field parameters and topology files (version 5.3 of the PARALLHDG parameters), as well as the SA protocol are part of the distribution. ARIA has been tested extensively on different Linux environments, and also runs on SGI machines and Mac OS X. At the time we write this article, more than 500 users worldwide exchange their know-how on a mailing list accessible at http://groups.yahoo.com/group/aria-discuss


    Acknowledgments
 
The authors thank W. Boucher, R. Fogh, T. Stevens, W. Vranken, and E. D. Laue for their support in incorporating the CCPN data model. This work was supported by EU grants QLG2-CT-2000-01313 and QLG2-CT-2002-00988. W.R. thanks the European Molecular Biology Organization for financial support.

Conflict of Interest: none declared.


    FOOTNOTES
 
{dagger}The authors wish it to be known that, in their opinion the first two authors should be regarded as joint First Authors. Back

Present address: Wolfgang Rieping, Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK

Present address: Michael Habeck, Max Planck Institute for Developmental Biology, Spemannstrasse 35 and Max Planck Institute for Biological Cybernetics, Spemannstrasse 38, 72076 Tübingen, Germany

Associate Editor: Dmitrij Frishman

Received on September 11, 2006; revised on November 16, 2006; accepted on November 16, 2006

    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 ITERATIVE NOE ASSIGNMENT
 DATA INTEGRATION
 GRAPHICAL USER INTERFACE
 IMPLEMENTATION
 REFERENCES
 

    Bruenger, A.T., et al. (1998) Crystallography and NMR system (CNS): a new software suite for macromolecular structure determination. Acta Crystallogr, . D 54, 905–921[CrossRef].

    Fogh, R.H., et al. (2005) A framework for scientific data modeling and automated software development. Bioinformatics, 2, 11678–11684.

    Herrmann, T., et al. (2002) Protein NMR structure determination with automated NOE assignment using the new software candid and the torsion angle dynamics algorithm DYANA. J. Mol. Biol, . 319, 209–227[CrossRef][ISI][Medline].

    Linge, J.P., et al. (2003) Refinement of protein structures in explicit solvent. Proteins Struct. Funct. Genet, . 50, 496–506[CrossRef][ISI][Medline].

    Montelione, G.T., et al. (2000) Protein NMR spectroscopy in structural genomics. Nat. Struct. Biol, . 7, S982–S985[CrossRef].

    Nabuurs, S.B., et al. (2003) Quantitative evaluation of experimental NMR restraints. J. Am. Chem. Soc, . 125, 12026–12034[CrossRef][ISI][Medline].

    Nederveen, A.J., et al. (2005) RECOORD: a REcalculated COORdinates Database of 500+ proteins from the PDB using restraints from the BioMagResBank. Proteins, 59, 662–672[CrossRef][ISI][Medline].

    Nilges, M. (1995) Calculation of protein structures with ambiguous distance restraints. Automated assignment of ambiguous NOE crosspeaks and disulphide connectivities. J. Mol. Biol, . 245, 645–660[CrossRef][ISI][Medline].

    Nilges, M., et al. (1997) Automated NOESY interpretation with ambiguous distance restraints: the refined NMR solution structure of the pleckstrin homology domain from spectrin. J. Mol. Biol, . 269, 408–422[CrossRef][ISI][Medline].

    Vranken, W.F., et al. (2005) The CCPN data model for NMR spectroscopy: development of a software pipeline. Proteins, 59, 687–696[CrossRef][ISI][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
W. Rieping, M. Nilges, and M. Habeck
ISD: a software package for Bayesian NMR structure calculation
Bioinformatics, April 15, 2008; 24(8): 1104 - 1105.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. F. Angyan, A. Perczel, S. Pongor, and Z. Gaspari
Fast protein fold estimation from NMR-derived distance restraints
Bioinformatics, January 15, 2008; 24(2): 272 - 275.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
E. L. Ulrich, H. Akutsu, J. F. Doreleijers, Y. Harano, Y. E. Ioannidis, J. Lin, M. Livny, S. Mading, D. Maziuk, Z. Miller, et al.
BioMagResBank
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D402 - D408.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/3/381    most recent
btl589v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Rieping, W.
Right arrow Articles by Nilges, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Rieping, W.
Right arrow Articles by Nilges, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?