Bioinformatics Advance Access originally published online on January 19, 2007
Bioinformatics 2007 23(5):637-638; doi:10.1093/bioinformatics/btl679
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SMotif: a server for structural motifs in proteins
1School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, 2National Centre for Biological Sciences, Bangalore 560 065, India and 3National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: SMotif is a server that identifies important structural segments or motifs for a given protein structure(s) based on conservation of both sequential as well as important structural features such as solvent inaccessibility, secondary structural content, hydrogen bonding pattern and residue packing. This server also provides three-dimensional orientation patterns of the identified motifs in terms of inter-motif distances and torsion angles. These motifs may form the common core and therefore, can also be employed to design and rationalize protein engineering and folding experiments.
Availability: SMotif server is available via the URL http://caps.ncbs.res.in/SMotif/index.html.
Contact: chakraba{at}mail.nih.gov, mini{at}ncbs.res.in or EPNSugan{at}ntu.edu.sg
Supplementary information: Supplementary data are available at Bioinformatics online.
| 1 INTRODUCTION |
|---|
|
|
|---|
Previous studies (Farber and Petsko, 1990; Kannan et al., 2001) have pointed to a small number of structural elements that are required for retention of fold and function of a protein. Though subsequences forming similar substructures do not always show high sequence similarity, these common substructures contain conserved key amino acid positions and have important implications in protein folding (Friedberg and Margalit, 2002). Sequence-based representations, however, are only an approximation to the underlying structural and functional information. Therefore, motifs identified at three-dimensional structure level provide significant and reliable information.
Here we present a web server, SMotif that identifies set of important structural segments or motifs for a given protein structure(s) based on conservation of both sequential as well as important structural features (Chakrabarti et al., 2003; Chakrabarti and Sowdhamini, 2004). Such motifs among structurally aligned proteins are recognized by the conservation of amino acid preference and solvent inaccessibility and are examined for the conservation of other important structural features like secondary structural content, hydrogen-bonding pattern and residue packing. Spatial orientations of the motifs, in terms of inter-motif distances and torsion angles, are also examined. These motifs may form the common core by maintaining a particular spatial pattern when compared across different proteins belonging to the same family or superfamily. Such motifs can also be employed to design and rationalize protein engineering and folding experiments.
| 2 Methodology |
|---|
|
|
|---|
2.1 Identification of structural motifs
Structural motifs are identified by the presence of at least three consecutive solvent-buried (inaccessible) residues that have higher amino acid exchange scores. Conservation of more structural parameters like secondary structural content, hydrogen bonding and residue packing (Ooi number; Nishikawa and Ooi, 1986) are also examined among structurally aligned multiple proteins. The SMotif server identifies structural motifs following the same principle as described in Chakrabarti et al. (2003).
In the SMotif algorithm, solvent accessibility is measured using the PSA program from JOY4.0 suite (Mizuguchi et al., 1998). Residues that have accessible surface area less than 7% are treated as solvent buried or inaccessible. At every alignment position, all possible pairs of proteins and their observed amino acids are scored using a standard 20 x 20 substitution matrix (Johnson and Overington, 1993) derived from structure-based sequence alignments of homologous protein families. SSTRUC program that is part of JOY4.0 suite of programs is used to identify secondary structural positions. The HBOND program, also part of JOY4.0 suite, has been used to identify hydrogen bonds. Residue packing has been measured in terms of Ooi number that provides the number of residues surrounding each C
atom of residues in a protein. Higher Ooi numbers correspond to high residue packing and suggest that the residue is in a well-packed environment.
2.2 Input options
Structural motifs are identified from the alignment submitted by the users. Separately, users can upload only the protein sequence or structure where homologous protein sequences and structures are retrieved by running a PSI-BLAST (Altschul et al., 1997) against SWISSPROT sequence database (Apweiler et al., 1997) and a structure database (PDB: Berman et al., 2000). Homologous structures are superimposed using the program STAMP (Russell and Burton, 1994) and subsequent structural alignment is used to identify motif regions.
2.3 Output options
2.3.1 Display of structural motifs on alignment
Structural motifs are projected on the alignment using different color codes for visual clarity. Important structural features are also marked on the alignment and provided as an additional output file.
2.3.2 3D graphical display of structural motifs
Interactive 3D views of the structural motifs on the individual and superposed protein structures are displayed for better understanding and visualization.
2.3.3 Spatial orientation patterns of the motifs
Spatial orientations are represented in terms of inter-motif distances and angular orientations of the identified motif regions. Structural motifs are converted into vector representation and the distances and virtual torsion angles between all possible pairs of motifs are calculated using standard vector algebra.
| 3 Results |
|---|
|
|
|---|
SMotif algorithm has been benchmarked against alignments of proteins that are related at the superfamiliy level. About 52 such structural alignments are considered as a test set for which structural motifs have previously been identified by careful manual intervention (Chakrabarti and Sowdhamini, 2004). Results (please see Supplementary Materials for details) from the benchmarking study suggest high sensitivity (
75%) and accuracy (
82%) for the SMotif algorithm. This web server can be quite powerful to extract structurally important regions of protein folds rapidly and effectively. For example, the average sequence identity between the members of the Transglutaminase superfamily can be as low as 9.5% (at the full-length alignment) and 11% (at the structural-motif regions) and it is still possible by SMotif server to extract the structurally conserved regions (shown in Fig. SM1, Supplementary Materials).
| 4 Conclusions |
|---|
|
|
|---|
Structural motifs identified on the basis of conservation of important structural properties like solvent inaccessibility, secondary structure content, hydrogen-bonding interactions and compactness of residues possess value and can provide useful information regarding homologous core of similar protein structures. SMotif provides a fast and interactive interface to identify and visualize such important structural segments and therefore, can be a useful tool to design and rationalize protein engineering and folding experiments.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
We thank Shameer Khadar for his help in installing the SMotif server. G.P. and P.N.S. acknowledge the financial support offered by the A*Star (Agency for Science, Technology and Research). S.C. acknowledges Intramural Research Program of the National Library of Medicine at NIH/DHHS. R.S. acknowledges National Centre for Biological Sciences (TIFR) for infrastructural support. Funding to pay the Open Access publication charges was provided by the Wellcome Trust, UK, as part of the Senior Research Fellowship of R.S.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Dmitrij Frishman
Received on October 27, 2006; revised on December 15, 2006; accepted on January 5, 2007
| REFERENCES |
|---|
|
|
|---|
Apweiler R, et al. Protein sequence annotation in the genome era: the annotation concept of SWISS-PROT, TREMBL. (1997) Proceedings of the 5th International Conference on ISMB. 33–43.
Altschul SF, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. (1997) 25:3389–3402.
Berman HM, et al. The protein data bank. Nucleic Acids Res. (2000) 28:235–242.
Chakrabarti S, Sowdhamini R. Regions of minimal structural variation among members of protein domain superfamilies: application to remote homology detection and modeling using distant relationships. FEBS Lett. (2004) 569:31–36.[CrossRef][Web of Science][Medline]
Chakrabarti S, et al. SMoS: a database of structural motifs of protein superfamilies. Protein Eng. (2003) 16:791–793.
Farber GK, Petsko GA. The evolution of alpha/beta barrel enzymes. Trends Biochem. Sci. (1990) 15:228–234.[CrossRef][Web of Science][Medline]
Friedberg I, Margalit H. Persistently conserved positions in structurally similar, sequence dissimilar proteins: roles in preserving protein fold and function. Protein Sci. (2002) 11:350–360.[CrossRef][Web of Science][Medline]
Johnson MS, Overington JP. A structural basis for sequence comparisons. an evaluation of scoring methodologies. J. Mol. Biol. (1993) 233:716–738.[CrossRef][Web of Science][Medline]
Kannan N, et al. Clusters in alpha/beta barrel proteins: implications for protein structure, function, and folding: a graph theoretical approach. Proteins (2001) 43:103–112.[CrossRef][Web of Science][Medline]
Mizuguchi K, et al. JOY: protein sequence-structure representation and analysis. Bioinformatics (1998) 14:617–623.
Nishikawa K, Ooi TJ. Radial locations of amino acid residues in a globular protein: correlation with the sequence. J. Biochem. (Tokyo) (1986) 100:1043–1047.
Russell RB, Barton GJ. Structural features can be unconserved in proteins with similar folds. An analysis of side-chain to sidechain contacts secondary structure and accessibility. J. Mol. Biol. (1994) 244:332–350.[CrossRef][Web of Science][Medline]
This article has been cited by other articles:
![]() |
G. Pugalenthi, K. Tang, P. N. Suganthan, and S. Chakrabarti Identification of structurally conserved residues of proteins in absence of structural homologs using neural network ensemble Bioinformatics, January 15, 2009; 25(2): 204 - 210. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Pugalenthi, P. N. Suganthan, R. Sowdhamini, and S. Chakrabarti MegaMotifBase: a database of structural motifs in protein families and superfamilies Nucleic Acids Res., January 1, 2008; 36(suppl_1): D218 - D221. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

