Skip Navigation


Bioinformatics Advance Access originally published online on March 31, 2005
Bioinformatics 2005 21(12):2832-2838; doi:10.1093/bioinformatics/bti420
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/12/2832    most recent
bti420v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (12)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Qi, G.
Right arrow Articles by Hayward, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Qi, G.
Right arrow Articles by Hayward, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

A comprehensive and non-redundant database of protein domain movements

Guoying Qi 1, Richard Lee 1 and Steven Hayward 1,2,*

1School of Computing Sciences, University of East Anglia Norwich, NR4 7TJ, UK
2School of Biological Sciences, University of East Anglia Norwich, NR4 7TJ, UK

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 IMPLEMENTATION AND RESULTS
 4 DISCUSSION
 REFERENCES
 

Motivation: The current DynDom database of protein domain motions is a user-created database that suffers from selectivity and redundancy. The aim of the analysis presented here was to overcome both these limitations and to produce both a comprehensive and a non-redundant description of domain movements from structures stored in the current protein data bank.

Results: A multi-step procedure is applied that starts with grouping proteins in the structural databank into families based on sequence similarity. Multiple sequence alignment, conformational clustering and a dimensional clustering method based on the Gram–Schmidt algorithm are applied to members of each family to remove dynamic redundancy in their domain movements. Representative domain movements are described in terms of domains, hinge axes and hinge-bending residues using the DynDom program. The results show that within an average family of 11.5 members, there are on average only 1.31 different domain movements indicating a high redundancy in the movements these structures represent. This verifies earlier findings that domain movements are usually highly controlled. Despite the removal of this considerable redundancy, the process has resulted in double the number of domain movements stored in the user-created database. The data are organized in a relational database with a web-interface.

Availability: The database can be browsed and searched at http://www.cmp.uea.ac.uk/dyndom

Contact: sjh{at}cmp.uea.ac.uk


    1 INTRODUCTION
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 IMPLEMENTATION AND RESULTS
 4 DISCUSSION
 REFERENCES
 
Domain movements form an important category of functional movements in proteins and occur in proteins carrying out various tasks. They have been found in binding proteins, enzymes, signalling proteins, molecular machines, etc. The term ‘domain’ has various meanings but when one talks of domain movements, it is a part of a protein that can be defined on the basis of movement of the part without reference to structure. These ‘dynamic domains’ may coincide with definitions based on structure alone, but are in a sense more useful because they are based on real conformational change that may relate to function. In order to assign dynamic domains, two or more conformations are needed as determined by X-ray crystallography or Nuclear Magnetic Resonance experiments. The DynDom program (Hayward and Berendsen, 1998; Hayward and Lee, 2002) implements a methodology based on rigid-body kinematics that is able to interpret conformational change between two structures in terms of a model of a protein comprising rigid domains connected by flexible interdomain-bending regions. In other words if the conformational change in a protein can be described as the movement of its parts as quasi-rigid bodies (the dynamic domains) it will be interpreted in terms of this model. The DynDom database of protein domain motions (Lee et al., 2003) is a database and web application that allows users to select pairs of PDB files that may reveal a domain movement for input to the DynDom program. If the run is successful then the results are automatically loaded into the database along with all other successful runs. The web-interface to the database allows one to browse the results of all successful runs and make simple searches. The current DynDom database allows us to tap the knowledge of individual specialists around the world to help populate a single database. This is useful for the database acting as a repository of information on protein domain movements, but it has two major disadvantages for our intended use of the database for the understanding of the structural mechanisms of domain movements in proteins. First, we can never be sure that it is comprehensive, and second much of the data is redundant. The latter case can easily be appreciated in the example of the domain movement in horse or human liver alcohol dehydrogenase as represented by the currently available X-ray structures. There are currently 10 open monomer structures (it is a dimer) and 62 closed monomer structures. This could result in 620 successful DynDom runs. However, although there may be some slight variation in the details, the results of these different runs are all basically the same in terms of their domain movement. In its present form we have no way of controlling this and all 620 runs could be loaded into the database. In this sense our database is redundant. For obvious reasons it is also not comprehensive either.

Here we describe our attempt to create a comprehensive and non-redundant database of protein domain movements based on the DynDom program.


    2 METHODS
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 IMPLEMENTATION AND RESULTS
 4 DISCUSSION
 REFERENCES
 
2.1 Grouping protein structures into families
Nuclear Magnetic Resonance theoretical models, non-protein structures and proteins less than 40 amino acids in length were removed from our copy of the PDB (September 2004 release) to form a ‘working set’ of structures. Then the sequences from this working set as found in the ATOM field of the PDB files were extracted and written in FASTA format. Structures with sequences that had 10% or more of their residues annotated as unknown were also removed from our working set. Originally, the program CD-HIT (Li et al., 2001) was used to group sequences into ‘families’. Running at 90% sequence identity, occasionally, obvious members of a particular family were not included in that family even if they had a 90% sequence identity with the representative. We surmised that this comes from the word filtering method used to speed up the program. In order to overcome this we decided to write our own program that is very similar to CD-HIT but does not include any word filtering method. This program first chooses the protein with the longest sequence as a representative, and then does pairwise sequence alignment to determine a list of all other proteins with sequences that have a 90% or greater sequence identity with the representative. Sequences that covered <50% of the representative sequence in the alignment were removed from the list. All the remaining proteins were assigned to a single family identified by the representative. The next longest sequence not in this family is chosen as the next representative and the process is repeated to exhaustion. The program FASTA (Pearson and Lipman, 1988) was used for the pairwise sequence alignments.

2.2 Multiple sequence alignment within families
Our aim is to quantify internal conformational differences between structures within a single family based on the root mean-square deviation (RMSD) as given by least-squares best-fits. In order to perform a least-squares best-fit, equivalent residues amongst the different members within a family need to be assigned. This was achieved by doing a multiple sequence alignment of the sequences within each family using the program ClustalW (Thompson et al., 1994). All residues in all sequences aligned with a gap were removed so that we were left with sets of aligned blocks (Fig 1). This ensured that we were dealing with the same number of atoms for all family members in subsequent RMSD calculations. In a small number of families, large portions of sequences would be removed by this process due to proteins with sequences having appreciably different lengths. In order to overcome this, these families were divided into subfamilies based upon the average sequence length of the family. Those with lengths exceeding the average length by 30% were assigned to a subfamily of ‘long’ sequences, those with lengths within 30% of the average were assigned to a subfamily of ‘average’ sequences and those with lengths <30% of the average were assigned to a subfamily of ‘short’ sequences. The blocks of aligned residues were then determined within each subfamily. After the multiple sequence alignment was performed, the three terminal residues at each terminus were removed. This ensured that uninteresting terminal fluctuations were not included in our RMSD calculations. Let N denote the number of equivalent residues within a family or subfamily.



View larger version (27K):
[in this window]
[in a new window]
 
Fig. 1 After multiple alignment of the sequences, sequence segments common to all members of the family were determined as indicated by the grey blocks in the figure. The subsequent RMSD calculations were performed using residues from these blocks only, the residue equivalencies for least-squares best-fit routine being determined from these alignments. This ensured that we were considering the same number of atoms in our RMSD calculations for all family members.

 
2.3 Removal of external movement
Within each family or subfamily a single structure was selected. Then all other members were fitted to this structure using a least-squares best-fit routine applied to the C{alpha} atoms over equivalent residues as determined by the multiple sequence alignment. This process removes external displacement caused by the translation and rotation of the whole structures, leaving differences as contributions due to changes in internal conformation. These internal conformational changes occur in a 3N – 6 dimensional space.

2.4 Window averaging to create ‘window-averaged structures’
In order to make our analysis specific to domain movements, a sliding window of 21 residues in length was used to generate segments 21 residues in length. The coordinates of the centres of mass of each segment were calculated. This was done for every structure to create a set of ‘window-averaged structures’. Local conformational changes, perhaps due to sequence variation at certain sites, or local flexibility, will be averaged out from these structures whereas non-local conformational changes such as those due to domain movements will not be.

2.5 Conformational clustering
The RMSDs between all pairs of window-averaged structures within each family were calculated. This resulted in M(M – 1)/2 RMSD values for a family of M members. Then a single-linkage clustering algorithm was used based on these RMSDs to cluster the conformations (represented as points in the 3N – 6 dimensional space) up to a cut-off of 0.5 Å. In the single-linkage algorithm if the minimum distance between any two points belonging to two different clusters is less than the cut-off, then the two clusters are merged. We used this algorithm to group very similar conformations. It is known that many conformations from solved protein structures are very similar and would therefore form ‘tight’ clusters separated from others in the space. Representative conformations were selected from tight clusters that had a diameter (defined as the distance between the most separated points of the cluster) ≤0.5 Å. The representative was chosen to be the one with the highest resolution, or if two structures were of the same resolution, then the one with the lowest number of unknown residues. Single-linkage clustering can result in quite ‘extended’ clusters. It was felt that representatives should not be chosen from these clusters. Therefore clusters with diameters >0.5 Å were not assigned representatives. Figure 2 illustrates a possible result. All the conformations within these extended clusters and only the representative conformations from the tight clusters were considered for the next step in the process.



View larger version (11K):
[in this window]
[in a new window]
 
Fig. 2 Illustration of the single-linkage clustering algorithm. The RMSDs between all window-averaged structures of members of a family or subfamily were calculated to determine a ‘distance’ matrix. The single-linkage clustering algorithm was used to cluster structures at a cut-off of 0.5 Å. Each structure is represented by a point in this figure. The clusters were classified into two types, ‘tight’ and ‘extended’, indicated here by being enclosed by a circle in the case of the former, and an ellipse in the case of the latter. Tight clusters were represented by a single structure chosen to be the one with the highest resolution, and are indicated as dark points in the figure. Extended clusters were not assigned representatives and all structures within an extended cluster were passed on to the dimensional clustering process. Therefore all structures corresponding to dark points were passed on to the next stage in the process.

 
2.6 Dimensional clustering
Given M' conformations (M' ≤ M due to the clustering described above) there are M'(M'–1)/2 possible pairs of conformations, many of which would show a similar conformational change. The purpose of the following process is to find a set of pairs of structures that represent the total conformational freedom within each family or subfamily. Within the concept of representing conformations as points in a high-dimensional space, different movements are those that occur in perpendicular spaces. For example, if one considers three points, and if they all fall on the same line, then any of the three pairs would represent the same movement although their extents would be different. However, if they do not fall on the same line, then two different movements are involved.

One would normally create a coordinate system within the plane defined by the three points and the two axes would represent the two different movements. However, we want to avoid the use of unreal structures that combine movements between actual structures. This would be the case if we were to apply principal component analysis for example.

In order to use actual structures to represent the movements, we have developed the following method based upon the Gram–Schmidt process. It is applied to the window-averaged structures that are passed on from the conformational clustering process. Dimensional clustering is illustrated in Figure 3. First, the pair of conformations, labelled 1 and 2, with the largest RMSD is chosen as the ‘first dimension’. All conformations within 0.5 Å of the line 1–2, are ‘associated’ with the first dimension. An associated conformation can be approximated by applying part of the movement 1–2 to conformation 1. Conformation 3 is chosen as the conformation furthest from the line 1–2, but not associated with it. The second dimension is selected as line 2–3 or line 1–3, according to which had the angle with line 1–2 that was closest to the perpendicular. In Figure 3 this is 1–3. All conformations within 0.5 Å of the plane 1–2–3 are associated with these three conformations and can be created by merely combining parts of the movements of 1–2 and 1–3 applied to conformation 1. Conformation 4 is chosen as the conformation furthest from the plane 1–2–3 but not associated with it. The third dimension was selected as the line 1–4, 2–4 or 3–4, again according to which had the angle with the plane 1–2–3 that was closest to the perpendicular. In Figure 3 this is 3–4. Again all points within 0.5 Å of the hyperplane defined by 1–2–3–4 are associated with this space. The process is repeated until exhaustion. Instead of M' (M'–1)/2 pairs, this process results in maximally M' – 1 pairs. It results in a set of pairs of conformations that represent different movements within a family or subfamily. Associated points can be represented as linear combinations of vectors that join the representative points of the space they are associated with. For example a point associated with the first pair 1–2 might be half way between these two points when projected onto the line connecting them. Thus this conformation could be represented as 0.5v1–2 as, for example, point b in Figure 3. A point associated with the full three-dimensional space might be represented as 0.2v1–2+0.9v1–3+0.4v3–4 as, for example, point e in the figure. These components will be called ‘projection values’.



View larger version (22K):
[in this window]
[in a new window]
 
Fig. 3 Illustration of the ‘dimensional clustering’ process used. A number of points representing window-averaged structures are distributed in a 3-dimensional space. The distance between points 1 and 2 is the largest and this is chosen as the first dimension. All points, here points a and b, that are within 0.5 Å of this line are ‘associated’ with the movement represented by 1–2. The point furthest from the line 1–2 is then sought, which is point 3 in this case. The second dimension is either 1–3 or 2–3 whichever is the most perpendicular to the 1–2 line; in this case 1–3. All points, here points c and d, that are within 0.5 Å of the plane defined by 1–2–3 are associated with the movements defined by 1–2 and 1–3. Then, the furthest point from the plane 1–2–3 is sought, which is point 4 in this case. The third dimension is 3–4 because this is most perpendicular to the plane 1–2–3. Finally all points, here e and f, that are within 0.5 Å of the hyperplane defined by 1–2–3–4 are associated with the movements defined by 1–2, 1–3 and 3–4.

 
2.7 Analysis of domain movements
The processes described above were used to select representative pairs for input to the DynDom program for analysis of their domain movements. For this process the original structures were used (not those edited by the multiple sequence alignment or window-averaged structures) and residue equivalents from the two structures required by the DynDom program for the purpose of superposition were assigned on the basis of a pairwise sequence alignment. The results of the DynDom program together with the linear combinations that place all the associated structures within the conformational space form the basic data on domain movements within each family or subfamily. These data are integrated into the existing database and web application (Lee et al., 2003).


    3 IMPLEMENTATION AND RESULTS
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 IMPLEMENTATION AND RESULTS
 4 DISCUSSION
 REFERENCES
 
3.1 General results
3.1.1 Families
All 22 349 PDB accession codes together with chain identifiers were determined for the working set and put in a ‘chain list’ 50 346 in length. After grouping sequences based on their sequence identity, 7657 families were generated. The removal of single-member families reduced the number to 5842. Dividing some families into subfamilies resulted in a total of 5987 families or subfamilies (referred to collectively as ‘families’ from here onwards) with more than one chain. The distribution of families against number of members is shown in Figure 4.



View larger version (13K):
[in this window]
[in a new window]
 
Fig. 4 The distribution of the number of members within each family. The average number of members in each family is 11.5. However, leaving out the large family of T4 lysozyme brings this average down to 8.0.

 
3.1.2 Clustering
The clustering algorithm was applied to these 5987 families. Due to technical problems that arose because of missing atoms, 14 families were not included in this analysis. Clusters were classified as tight, or extended as described in the Methods section. The average number of conformations in a tight cluster is about 5. However, some tight clusters contain a very large number of members. For example, all 264 members of the streptavidin family are in one single tight cluster with a diameter of 0.345 Å. Approximately two-thirds of families, namely 4148, have a single tight cluster and therefore a single representative conformation. No domain movement analysis was performed for these families. This left 1825 remaining families. Overall, 60% of conformations could be removed from the subsequent analysis by choosing representatives.

In all, 646 families had extended clusters. They are comparatively rare as out of a total of 8147 clusters, only 676 are extended. The largest extended cluster comes from the bacteriophage T4 lysozyme. It has an extended cluster with 423 members and the largest RMSD between any two members is 2.27 Å.

3.1.3 Dimensional clustering
Dimensional clustering was performed on the remaining 1825 families. Most families (1455) have only one dimension. The highest number of dimensions is 11 for calmodulin. The average number of dimensions is 1.31. The average number of family members in these 1825 families is 11.5. However, this number drops to 8.0 if the large family (431 members) of T4 lysozyme is omitted. This low average number of dimensions compared to the average number of family members is a measure of the average redundancy in the data. With 11 family members there are 10 possible different motions. The fact that the average number of dimensions is near to 1 is due to the fact that most domain movements are highly controlled. For example, a protein with two hinges that are separated in space undergoes a controlled domain closure about an axis passing through the two hinges, with other domain movements being restricted by comparison. This has been referred to previously as a ‘door closing motion’ (Hayward, 1999).

3.1.4 Number of DynDom results
The 1825 families gave 2396 representative pairs. However, only 1246 pairs belonging to 951 families resulted in a successful DynDom run (not all pairs of structures result in a successful analysis of their conformational change in terms of domain movements due to the strict criteria used by the DynDom program). Despite the removal of redundancy, the analysis has resulted in more than twice the number of successful DynDom runs stored in the user-created database.

3.1.5 Organisation of database and web-interface
The data arising from this analysis have been organized within a relational database that extends the original DynDom database (Lee et al., 2003). The data from this non-redundant analysis are merged with the redundant data from the user-initiated DynDom runs. Additional tables that relate to protein families, the conformational clustering and the dimensional clustering were specifically created to store the results from the non-redundant analysis. The web-interface allows browsing of families. Links from the family list access pages that give details on the conformational clustering and the dimensional clustering for each individual family. Figure 5 shows a screen-shot of this page for P58 killer cell inhibitory receptor.



View larger version (48K):
[in this window]
[in a new window]
 
Fig. 5 Screen-shot of the family webpage for P58 killer cell inhibitory receptor. The ‘protein family details’ table gives general information on numbers of family members, clusters, etc. The ‘clusters’ table gives details of the clustering process. In this case there are five clusters, three single member tight clusters, one further tight cluster with two members, and an extended cluster with three members. The ‘dimensions’ tables give the pairs of structures that make up the representative dimensions. The first dimension, v1, is defined by the movement from 1nkr in tight 3 to 2dl2_A in tight 4. The second dimension, v2, is defined by the movement from 2dl2_A to 1m4k_A in tight 1. Links are provided to the results page of the DynDom run for the pair concerned. The final two tables show the associated chains for dimension 1, and dimensions 1 and 2. In the first table there are two associated chains that come from the extended cluster. They are 0.52 and 0.71 of the way between 1nkr and 2dl2_A. In the second table, the chains associated with dimensions 1 and 2 are shown. Here the structure 1b6u is defined as 0.79v1+0.13v2. This means that from the structure 1nkr it combines 0.79 of the movement from 1nkr to 2dl2_A and 0.13 of the movement of 1nkr to 1m4k_A. Similarly for 2dli_A

 
The interface allows simple searches on protein name or PDB accession code. Such searches result in a list of all families that match the query being presented. Links from each individual family record access pages such as that of Figure 5.

3.2 Specific results of interest
3.2.1 Alcohol dehydrogenase
As mentioned in the introduction, there are 72 monomer structures of horse or human liver alcohol dehydrogenase. The conformational clustering algorithm resulted in two tight clusters. These tight clusters represent the open and closed conformations of that enzyme. The closed domain conformations are from the holoenzyme which is bound to the coenzyme NAD. The open domain conformations correspond to the apoenzyme structure where the enzyme is either unliganded or bound to inhibitors that do not induce domain closure because of the lack of crucial closure-inducing interactions with the enzyme (Hayward, 2004). The representative structures are 1ju9_A for the open and 1n8k_A for the closed. Figure 6 shows the result of the DynDom analysis of the movement between these two structures which reveals a 8.8° rotation of one domain relative to the other. A comparison of this result with those in the redundant user-created database shows that this pair has a very similar domain movement to those in the user-created database in terms of the domain decomposition, location and orientation of the hinge axis, and the angle through which the domains rotate.



View larger version (46K):
[in this window]
[in a new window]
 
Fig. 6 Open structure of liver alcohol dehydrogenase, 1ju9_A, coloured according to its movement in going to the closed structure 1n8k_A. The blue domain is the co-enzyme binding domain; the red, the catalytic domain; and the green the interdomain bending regions. The view is looking down the hinge axis located at the cross-hairs. The movement between these two structures represents the domain movement amongst 72 monomer structures.

 
3.2.2 Bacteriophage T4 lysozyme bacteriophage
T4 lysozyme is the family with the largest number of members at 431. It has six clusters, five of which are tight containing a total of eight structures. The remaining 423 structures belong to the single extended cluster. The dimensional clustering analysis yielded five dimensions of which three produced a DynDom result. The conformational change represented by the first two dimensions appears to be caused by insertion mutants (Vetter et al., 1996). In all, 410 out of the 431 structures are associated with these two dimensions.

3.2.3 Calmodulin
Calmodulin has 65 members which reduce to 19 clusters, 14 of which are tight and five of which are extended. The dimensional clustering process yielded 11 dimensions, the largest number of dimensions for any family. Of these 11, nine were successfully analysed by DynDom (the other two dimensions resulted in failed DynDom runs due to missing atoms). This large number of dimensions reflects the flexibility that exists between the two calcium-binding domains afforded by a single interdomain-connecting region.


    4 DISCUSSION
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 IMPLEMENTATION AND RESULTS
 4 DISCUSSION
 REFERENCES
 
A new database, aimed at being both comprehensive and non-redundant, has been constructed based on a multi-step analysis of domain movements in proteins. The first step grouped proteins into ‘families’ based on sequence similarity. This approach was chosen for its simplicity and speed. It represents a compromise between two extreme approaches, one that ensures that all conformational change arises from the intrinsic flexibility and not from sequence variations by demanding 100% sequence identity between structures in a group, and the other that assumes that as long as the overall backbone structure is locally more or less the same, sequence variations can be ignored. The former has the drawback of excluding many structures from a group with trivial sequence variations. The latter has the drawback of ignoring the possible influence of large sequence variations on intrinsic flexibility despite backbone structures being largely similar. The latter also has a major practical drawback. Grouping structures based on structure alone would require us to solve an extremely difficult and as yet unsolved problem. Structural alignment assigns proteins to the same family based on similarity in the relative positions of sets of atoms. This would mean that even structures of the same protein that are different due to intrinsic flexibility would be automatically assigned to different families. This is particularly a problem in domain proteins where the domain movement usually produces a conformational change that spans the whole protein. In structural databases, such as SCOP (Murzin et al., 1995), CATH (Orengo et al., 1997) and FSSP (Holm and Sander, 1996), this problem is largely circumvented by restricting the structural alignments to domains themselves. This restriction precluded our use of these databases for our purpose of analysing domain movements. Our approach represents a reasonable compromise between the two extreme approaches.

The pairwise sequence alignment method used to group sequences into families required the use of a cut-off. This cut-off was set to 90%. A lower cut-off of 40% was originally used but resulted in proteins with rather diverged sequences being assigned to the same family. It was then found that in some cases, contributions to the RMSD in the conformational clustering procedure derived from differences in sequence rather than in intrinsic flexibility of the family.

The dimensional clustering approach we have used has a major advantage over principal component analysis, which may immediately come to mind as a way of approaching this problem. By using real structures to represent movements rather than unreal ones constructed from eigenvectors from a principal component analysis, we will be able to study the precise interactions with ligands that often cause domain movements (Hayward, 2004).

A cut-off of 0.5 Å was used in both the conformational and dimensional clustering processes. This cut-off was based on the finding for the user-created database that there are very few successful DynDom runs when the RMSD between the two structures in <0.5 Å. Reducing this parameter has the effect of increasing the number of clusters and dimensions, thus increasing the number of movements that are classified as different. Increasing this parameter will have the opposite effect and increase the number of movements classified as similar. Results can be judged by comparing domain decompositions and screw axes of movements that are classified to be different. A similarity in these quantities would suggest that this parameter is set too high. Although no systematic investigation of this parameter was undertaken, results suggest that 0.5 Å is a good cut-off to use.

An important result is the average number of dimensions, 1.31, compared to the average number of family members, 11.5. This is in some sense a measure of the redundancy in the data and surely reflects the fact that most domain movements in proteins have a functional purpose and are therefore highly controlled.

Despite the removal of the considerable redundancy, the analysis has resulted in twice as many successful DynDom results as stored in the user-created database. Many of these results will be unknown and certainly this will be the first time that these have been brought together in a single database. The database can act therefore as a single repository for known domain movements and will be of help to experts on particular proteins. However, our primary purpose in constructing this database is to categorize and understand protein domain movements. Previously domain movements have been categorized into hinge and shear movements (Gerstein et al., 1994; Gerstein and Krebs, 1998). Preliminary analyses suggest possible categories that proteins might be put into on the basis of their domain movements that have little direct relation to these hinge and shear categories. Calmodulin, which has a large variety in its domain movements, has a single linker between the domains whereas alcohol dehydrogenase which has one main movement, has three interdomain bending regions. This suggests a ‘controlled domain movement’ category for alcohol dehydrogenase and an ‘uncontrolled domain movement’ category for calmodulin. The structures of the interdomain bending regions are of great interest and will also form a basis for categorization of domain proteins. In an earlier study two bending regions at the termini of neighbouring strands were found to be a common motif in domain proteins (Hayward, 1999). A more recent study has shown that for liver alcohol dehydrogenase and the catalytic subunit of cAMP protein kinase, this ‘double-hinged’ ß-sheet is involved in driving domain closure by interacting with the closure-inducing ligand (Hayward, 2004). It is expected that future work on categorizing the domain movements along these lines will help us to understand mechanisms of domain closure in proteins. The work presented is therefore an important but preliminary step towards this goal.


    Acknowledgments
 
We thank Dr Kenji Mizuguchi for helpful discussions.

Received on February 4, 2005; revised on March 18, 2005; accepted on March 30, 2005

    REFERENCES
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 IMPLEMENTATION AND RESULTS
 4 DISCUSSION
 REFERENCES
 

    Gerstein, M. and Krebs, W. (1998) A database of macromolecular movements. Nucleic Acids Res., 26, 4280–4290[Abstract/Free Full Text].

    Gerstein, M., et al. (1994) Structural mechanisms for domain movements in proteins. Biochemistry, 33, 6739–6749[CrossRef][Medline].

    Hayward, S. (1999) Structural principles governing domain motions in proteins. Proteins, 36, 425–435[CrossRef][Web of Science][Medline].

    Hayward, S. (2004) Identification of specific interactions that drive ligand-induced closure in five enzymes with classic domain movements. J. Mol. Biol., 339, 1001–1021[CrossRef][Web of Science][Medline].

    Hayward, S. and Berendsen, H.J.C. (1998) Systematic analysis of domain motions in proteins from conformational change: new results on citrate synthase and T4 lysozyme. Proteins, 30, 144–154[CrossRef][Web of Science][Medline].

    Hayward, S. and Lee, R.A. (2002) Improvements in the analysis of domain motions in proteins from conformational change: DynDom version 1.50. J. Mol. Graph. Model., 21, 181–183[CrossRef][Web of Science][Medline].

    Holm, L. and Sander, C. (1996) The FSSP database: fold classification based on structure–structure alignment of proteins. Nucleic Acids Res., 24, 206–209[Abstract/Free Full Text].

    Lee, R.A., et al. (2003) The DynDom database of protein domain motions. Bioinformatics, 19, 1290–1291[Abstract/Free Full Text].

    Li, W., et al. (2001) Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics, 17, 282–283[Abstract/Free Full Text].

    Murzin, A.G., et al. (1995) SCOP—a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247, 536–540[CrossRef][Web of Science][Medline].

    Orengo, C.A., et al. (1997) CATH—a hierarchic classification of protein domain structures. Structure, 5, 1093–1108[Medline].

    Pearson, W.R. and Lipman, D.J. (1988) Improved tools for biological sequence comparison. Proc. Natl Acad. Sci. USA, 85, 2444–2448[Abstract/Free Full Text].

    Thompson, J.D., et al. (1994) Clustal-W—improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673–4680[Abstract/Free Full Text].

    Vetter, I.R., et al. (1996) Protein structural plasticity exemplified by insertion and deletion mutants in T4 lysozyme. Prot. Sci., 5, 2399–2415[Web of Science][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J. Biol. Chem.Home page
F. Sheng, X. Jia, A. Yep, J. Preiss, and J. H. Geiger
The Crystal Structures of the Open and Catalytically Competent Closed Conformation of Escherichia coli Glycogen Synthase
J. Biol. Chem., June 26, 2009; 284(26): 17796 - 17807.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
L.-W. Yang, E. Eyal, I. Bahar, and A. Kitao
Principal component analysis of native ensembles of biomolecular structures (PCA_NEST): insights into functional dynamics
Bioinformatics, March 1, 2009; 25(5): 606 - 614.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
W. Nishima, G. Qi, S. Hayward, and A. Kitao
DTA: dihedral transition analysis for characterization of the effects of large main-chain dihedral changes in proteins
Bioinformatics, March 1, 2009; 25(5): 628 - 635.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
K.-i. Okazaki and S. Takada
Dynamic energy landscape view of coupled binding and protein conformational change: Induced-fit versus population-shift mechanisms
PNAS, August 12, 2008; 105(32): 11182 - 11187.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Nigham, L. Tucker-Kellogg, I. Mihalek, C. Verma, and D. Hsu
pFlexAna: detecting conformational changes in remotely related proteins
Nucleic Acids Res., July 1, 2008; 36(suppl_2): W246 - W251.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
P. K. Fyfe, S. L. Oza, A. H. Fairlamb, and W. N. Hunter
Leishmania Trypanothione Synthetase-Amidase Structure Reveals a Basis for Regulation of Conflicting Synthetic and Hydrolytic Activities
J. Biol. Chem., June 20, 2008; 283(25): 17672 - 17680.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
J. Oria-Hernandez, H. Riveros-Rosas, and L. Ramirez-Silva
Dichotomic Phylogenetic Tree of the Pyruvate Kinase Family: K+-DEPENDENT AND -INDEPENDENT ENZYMES
J. Biol. Chem., October 13, 2006; 281(41): 30717 - 30724.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/12/2832    most recent
bti420v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (12)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Qi, G.
Right arrow Articles by Hayward, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Qi, G.
Right arrow Articles by Hayward, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?