Bioinformatics Advance Access originally published online on July 28, 2006
Bioinformatics 2006 22(20):2574-2576; doi:10.1093/bioinformatics/btl413
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hierarchical classification of hydrolases catalytic sites
School of Enzymology, Department of Chemistry, M.V. Lomonosov Moscow State University Moscow, 119992, Russia
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: Universal ontology of catalytic sites is required to systematize enzyme catalytic sites, their evolution as well as relations between catalytic sites and protein families, organisms and chemical reactions. Here we present a classification of hydrolases catalytic sites based on hierarchical organization. The web-accessible database provides information on the catalytic sites, protein folds, EC numbers and source organisms of the enzymes and includes software allowing for analysis and visualization of the relations between them.
Availability: http://www.enzyme.chem.msu.ru/hcs/
Contact: gariev{at}hotmail.com
| INTRODUCTION |
|---|
|
|
|---|
Research on evolution of enzymes and their catalytic sites draw significant attention (Bartlett et al., 2003). Partly it is determined by an increase of data available for analysis, since knowledge on protein sequences and three-dimensional (3D) structures is growing exponentially. Another issue is a significance of these studies for rational design of biocatalysts. One may expect that understanding the laws of enzyme evolution will make these efforts more successful.
Few amino acid residues of protein (catalytic site) are directly involved in chemical catalysis. The number of known enzymes far exceeds the number of catalytic sites, since homologous proteins tend to preserve residues of the catalytic sites, while substrate specificity may vary. Moreover, unrelated enzymes often acquire identical catalytic sites during evolution accounting for the so-called convergent evolution. The phenomena of divergent evolution, when enzyme family members are characterized by different catalytic sites or reaction mechanisms, are less abundant. Despite deep knowledge about many individual enzymes, more general questions remain unanswered, e.g. how many different catalytic sites are known; what organisms use enzymes with a catalytic site of a certain type; what sites are capable of catalyzing a given reaction.
To detect evolution of catalytic site, including both convergent and divergent events, two issues should be addressed: are the two given enzymes homologous and are their catalytic sites identical? Determination of protein homology is feasible owing to the methods of sequence comparison and in case of more distant relatives owing to comparison of protein fold. Computer-accessible databases of related protein sequences (Bateman et al., 2004; Tatusov et al., 2003) and protein folds CATH (Pearl et al., 2005) and SCOP (Lo Conte et al., 2002) are known. However, comparison of catalytic sites is complicated because the data are scattered through the literature.
Some efforts were undertaken to systematize this information. The Catalytic Site Atlas (Porter et al., 2004) provides annotations of the catalytic residues extracted from the literature. However, automated comparison of catalytic sites based on these annotations is not straightforward. First, the authors may apply diverse thresholds to include a residue to a set of catalytic residues. Therefore, enzymes with essentially similar catalytic sites may be annotated with different numbers of catalytic residues. Second, even if amino acid composition of two sites is identical, the catalytic mechanisms, geometry and roles of residues may be different. The MACiE database (Holliday et al., 2005) complements the Catalytic Site Atlas by providing animations of the catalytic mechanism for a number of selected enzymes. The Enzyme Catalytic-mechanism database (Nagano, 2005) provides hierarchic classification of catalytic mechanisms by reaction type, reactive groups of substrates, details of the catalytic mechanism and types of the catalytic residues. The reported mechanisms and catalytic sites for a given substrate or reaction can easily be found, but identical catalytic sites are scattered over several classes if their substrates are different. The StructureFunction Linkage database (Pegg et al., 2006) classifies enzymes according to a common partial reaction that the enzymes catalyze, and it is helpful in finding an enzyme for a particular structure of a substrate or reaction to be performed.
To provide means for analysis and comparison of the catalytic sites independent of the substrate specificity or protein families, we present a database of the known catalytic sites and a classification scheme that allows one to overcome the specified difficulties. Currently only hydrolases are included in the classification since they are the most studied and abundant enzymes.
| ORGANIZATION OF HIERARCHY |
|---|
|
|
|---|
Any classification that divides objects into a set of non-related classes is restricted in either one of two ways. If the number of classes is small then each class contains many objects and their subtle differences are lost. If the number of classes is large then several classes contain related objects, and generalized comparisons are hindered. To overcome the shortcomings a hierarchical classification is proposed.
A catalytic site of a subclass refines that of its base class, i.e. it contains all residues of the basic class and some additional ones. Three simple rules are applied:
- Residues are ranked by their importance for catalysis. The residue forming a covalent bond with the substrate in reaction takes priority. The more distant is a residue from the reaction center, the lower is its priority.
- If a catalytic site includes metal ion(s), it is classified according to the type and the number of ions.
- If two catalytic sites show identical composition but are known to have different catalytic mechanisms, they belong to different classes.
Therefore, enzymes with a catalytic triad SerHisAsp are organized as follows: class S, serine hydrolases; class S.01, serine hydrolases with Ser-His dyad and, finally, class S.01.01, hydrolases with SerHisAsp/Glu triad.
A partial tree of serine hydrolases is given in Figure 1. The depth of the hierarchy is not limited, but in practice does not exceed four. A protein can belong to any class, not necessarily to a terminal one.
|
| CONTENT |
|---|
|
|
|---|
Two main objects in the database are classes of the catalytic sites and individual enzymes linked to them.
For each class the following information is provided: manually drawn schemes of the catalytic mechanism; an example image of the catalytic site which can be viewed in 3D with JMol or Chime browser plug-in; a list of linked enzymes and distribution trees of the enzymes' properties as discussed below.
Several mechanisms are displayed for a given class if they are proposed in the literature. References to original publications, as well as short manual annotations, are provided. In case of uncertainty, the policy is to include the residues and catalytic steps that are consistent and to refine them in future upon availability of new data. Therefore a class can be moved down to a more specific position in the hierarchy tree.
Every enzyme is annotated by its name, EC number, source organism, amino acid sequence and its fold according to SCOP and CATH databases. Since a protein can contain several domains, only the catalytic one is mentioned, which is defined as a domain where residues of a catalytic site are located. The protein annotation includes links to UniProt and ENZYME (Bairoch et al., 2005), NCBI Taxonomy, CATH, SCOP and PDB (Berman et al., 2000) databases.
Principal methods of annotation are direct literature mining and automated search of catalytic sites in protein structures by templates. For searching, the Jess program was used (Barker and Thornton, 2003). Structure analysis was chosen because it is the only method capable of detecting different catalytic sites in homologous proteins. Manual annotation was necessary for catalytic sites containing one residue only and for enzymes all PDB structures of which contain mutations in catalytic residues. Identification of the natural cofactor of metal-dependent hydrolases also required an examination of the literature data.
There are 1160 hydrolases divided into 87 classes in the current release of the database; 950 of them were found automatically and 210 were classified manually. In addition, there are 220 proteins annotated as hydrolases in the UniProt or PDB databases, however they act as part of protein complexes and lack catalytic residues, or with the known structure of non-catalytic domains only. They are placed in special classes too. Currently, about half of protein catalytic sites are confirmed by the data from the literature. Following this trend, we are able to detect new mechanisms proposed for a member of a class and to reinvestigate the other members if necessary. We are planning to update the information on a quarterly basis and to include new proteins and classes.
| SOFTWARE |
|---|
|
|
|---|
Web-accessible software consists of the search and visualization modules. A protein can be found by its name, database (UniProt or PDB) code, EC number, catalytic site class, fold or organism. Fully qualified or partial identifier can be used for searching; e.g. EC number 3.4.-.- can be used for selecting all proteases. The same principle applies to the catalytic site class, fold or organism as well, each of them is considered to be a protein property with a hierarchical classification.
Once a group of proteins is selected, a distribution tree similar to that in Figure 1 can be created on-the-fly for each of their properties. Links to precompiled trees are provided from pages of classes; for non-terminal classes with proteins two options are available: to select proteins directly linked to that class only or proteins of that class and all its subclasses. To find the events of the catalytic site convergent evolution a tree of protein folds can be analyzed for proteins of a certain catalytic site. To detect cases of divergent evolution a tree of catalytic sites for proteins of 1-fold should be constructed. As an example, Figure 2 shows a distribution of folds for enzymes with CysHisAsp triad. At least five evolutionary lineages can be detected.
|
The data uniformity model described above allows one to use any protein property to search and compile a distribution tree.
| Acknowledgments |
|---|
The authors are grateful to Dr Jonathan Barker for the Jess source code and to Prof. Alexander Nemukhin for the valuable discussion of the manuscript.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: T Charlie Hodgman
Received on March 13, 2006; revised on July 7, 2006; accepted on July 25, 2006
| REFERENCES |
|---|
|
|
|---|
Bairoch, A., et al. (2005) The Universal Protein Resource (UniProt). Nucleic Acids Res, . 33, D154D159
Barker, J.A. and Thornton, J.M. (2003) An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis. Bioinformatics, 19, 16441649
Bartlett, G.J., et al. (2003) Catalysing new reactions during evolution: economy of residues and mechanism. J. Mol. Biol, . 331, 829860[CrossRef][ISI][Medline].
Bateman, A., et al. (2004) The Pfam protein families database. Nucleic Acids Res, . 32, D138D141
Berman, H.M., et al. (2000) The Protein Data Bank. Nucleic Acids Res, . 28, 23542
Pearl, F., et al. (2005) The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Res, . 33, D247D251
Lo Conte, L., et al. (2002) SCOP database in 2002: refinements accommodate structural genomics. Nucleic Acids Res, . 30, 264267
Holliday, G.L., et al. (2005) MACiE: a database of enzyme reaction mechanisms. Bioinformatics, 21, 43154316
Nagano, N. (2005) EzCatDB: the Enzyme Catalytic-mechanism Database. Nucleic Acids Res, . 33, D407D412
Pegg, S.C., et al. (2006) Leveraging enzyme structure-function relationships for functional inference and experimental design: the structure-function linkage database. Biochemistry, 45, 25452555[CrossRef][Medline].
Porter, C.T., et al. (2004) The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res, . 32, D129D133
Tatusov, R.L., et al. (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics, 4, 41[CrossRef][Medline].
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

