Bioinformatics Advance Access originally published online on April 12, 2005
Bioinformatics 2005 21(12):2925-2926; doi:10.1093/bioinformatics/bti437
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
STARS: statistics on inter-atomic distances and torsion angles in protein secondary structures
Department of Biological Sciences and Department of Chemistry, National University of Singapore 14 Science Drive 4, Singapore 117543
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Summary: A graphics package has been developed for performing statistics on interatomic distances and torsion angles in protein secondary structures (STARS) from a protein crystal structure database. It allows one to obtain both the graphical view and the text format of distributions of the distances and angles for atoms located in 10 types of protein secondary structures. STARS will facilitate assignment of ambiguous NOESY peaks, structure determination by nuclear magnetic resonance, structure validation and comparison of protein folds.
Availability: All data, documents and execute files are freely downloadable at http://stars.zhengyuhome.com. The software works appropriately on Windows system, without any compilation or installation.
Contact: dbsydw{at}nus.edu.sg
| INTRODUCTION |
|---|
|
|
|---|
Structure determination by nuclear magnetic resonance (NMR) and structure validation involve estimation of interatomic distances and dihedral angles. The atomatom distances are often derived from nuclear Overhauser effects (NOEs), whereas dihedral angles are derived from J-coupling constants and chemical shifts. Assigning each NOE peak in NOE spectroscopy (NOESY) to a specific pair of atoms is a challenging task even for a small protein because of the chemical shift degeneracy of different protons. Knowledge of interatomic distances for atoms located in each type of secondary structure facilitates the assignment of ambiguous NOEs resulting from chemical shift degeneracy on the basis of secondary structures that can be predicted with fair accuracy from chemical shifts or from amino acid sequence with computational techniques alone. However, if some of the NOE assignments are available (e.g. sequential NOEs), the distance knowledge helps in the determination of protein secondary structures too. Similarly, knowledge of dihedral angles for different types of secondary structures is very useful for deriving structural constraints from J-coupling constants. It can also be used to build internal motional models based on experimental J-coupling data. Besides applications to NMR, information about interatomic distance and torsion angle may be used to validate protein structures and compare protein folds.
Statistics on the distance and dihedral angle are often derived from many known protein structures. It is tedious to obtain the information from a large number of proteins. To the best of our knowledge, there is no tool available for computing the statistics though many tools can calculate distances and dihedral angles for only one given protein structure at one time. Here, we present a software tool for statistics on interatomic distances and dihedral angles in protein secondary structures (STARS). STARS provides highly interactive visualization of statistical results. Its friendly window-based interface makes it extremely easy to use.
| OVERVIEW OF STARS |
|---|
|
|
|---|
Composition of database
With the aid of CullPDB (Hobohm et al., 1992), a non-redundant database of protein crystal structures was generated by extracting structural data from Protein Data Bank. Hydrogen atoms were added using MOLMOL (Koradi et al., 1996). Proteins selected for our database meet the following criteria:
- sequence identity <20%,
- resolution
1.6 Å and R-factor
0.25 and
- residue number >50 and without non-standard amino acid and chain break.
Definition
The definitions and identifiers of amino acids, atoms and torsion angles used in STARS comply with the IUPAC recommendations in 1998 (Markley et al., 1998). Secondary structure and chirality were assigned automatically for all proteins in the database using the DSSP method (Kabsch and Sander, 1983). On the basis of biologist's preference, however, ß-sheets were subdivided into three types. Totally, 10 types of secondary structures were defined, including
-helix, 310-helix,
-helix, antiparallel-, parallel-ß-sheets and the combination of these two sheets, turn, bend, ß-bridge and random coil. To obtain statistics on atomatom distances and torsion angles, only relative positions among atoms in a protein chain are required. When the first and second atoms are located at residues i and i + n, respectively (where i is a positive integer while n is an integer), the relative position of the second atom with respect to the first one is denoted as n. The definition of residues i, J, K, j and k in a ß-sheet is shown in Figure 1, the relative positions of the second atoms in residues J + n, K + n, j + n and k + n with respect to the first atom in residue i are referred to as J + n, K + n, j + n and k + n.
|
User interface
The STARS interface is intentionally uncluttered (Fig. 2). The main window is shown in Figure 2a. The users can define the number of proteins used in the statistics and let the program select proteins randomly. Alternatively, the structures can be selected manually from a selection window (Fig. 2d) in which proteins can be sorted by name, resolution, chain length or R-value. The statistics can be done over all residues, or the residues in one or more specific secondary structures selected by users (Fig. 2a). The relative position(s) of the second atom(s) with respect to the first atom can be specified by a single expression (e.g. 2 or J 1), a series of expressions (e.g. 2, 0, 1,J 1, K + 1), a range of numbers (e.g. 2
2 or j2
j + 2), or a combination of different expressions (e.g. 3, 1 1, k2
k+1). If some of the specified atoms are not located in the selected secondary structure(s), the output will not contain distances or angles involved in these atoms. The statistics can be obtained in a single (Fig. 2a and b) or batch mode (Fig. 2c). Since the batch mode uses a parallel process algorithm, it is
10 times faster than the single mode for obtaining the same amount of information. With a job editor, jobs can be created, saved, loaded, edited, sorted, deleted or moved easily in the job list, and submitted at the user's convenience. The statistic results are displayed in a 3D color-bar-style chart in the result analysis window (Fig. 2d). Almost all features of the chart can be reset by users in terms of color, zoom, mark, label, rotation, range, grid, etc. The software allows the users to view, compare, select, sort, save or load statistics through a result display window. All data files are saved as a common ASCII format which can be read by a normal text editor. A detailed manual is accessible by clicking the help button in the main window or pressing the F1 key.
|
| Acknowledgments |
|---|
This research was supported by a grant from the Biomedical Research Council (BMRC) and Agency for Science, Technology and Research, A*Star of Singapore.
Received on February 28, 2005; revised on March 28, 2005; accepted on April 5, 2005
| REFERENCES |
|---|
|
|
|---|
Hobohm, U., et al. (1992) Selection of representative protein data sets. Protein Sci., 1, 409417[Web of Science][Medline].
Kabsch, W. and Sander, C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22, 25772637[CrossRef][Web of Science][Medline].
Koradi, R., et al. (1996) MOLMOL: a program for display and analysis of macromolecular structures. J. Mol. Graphics, 14, 5155[CrossRef][Web of Science][Medline].
Markley, J., et al. (1998) Recommendations for the presentation of NMR structures of proteins and nucleic acids. Pure Appl. Chem., 70, 117142.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

