Skip Navigation


Bioinformatics Advance Access originally published online on July 28, 2006
Bioinformatics 2006 22(20):2574-2576; doi:10.1093/bioinformatics/btl413
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/20/2574    most recent
btl413v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Gariev, I. A.
Right arrow Articles by Varfolomeev, S. D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gariev, I. A.
Right arrow Articles by Varfolomeev, S. D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oupjournals.org

Hierarchical classification of hydrolases catalytic sites

Igor A. Gariev * and Sergey D. Varfolomeev

School of Enzymology, Department of Chemistry, M.V. Lomonosov Moscow State University Moscow, 119992, Russia

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 ORGANIZATION OF HIERARCHY
 CONTENT
 SOFTWARE
 REFERENCES
 

Summary: Universal ontology of catalytic sites is required to systematize enzyme catalytic sites, their evolution as well as relations between catalytic sites and protein families, organisms and chemical reactions. Here we present a classification of hydrolases catalytic sites based on hierarchical organization. The web-accessible database provides information on the catalytic sites, protein folds, EC numbers and source organisms of the enzymes and includes software allowing for analysis and visualization of the relations between them.

Availability: http://www.enzyme.chem.msu.ru/hcs/

Contact: gariev{at}hotmail.com


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 ORGANIZATION OF HIERARCHY
 CONTENT
 SOFTWARE
 REFERENCES
 
Research on evolution of enzymes and their catalytic sites draw significant attention (Bartlett et al., 2003). Partly it is determined by an increase of data available for analysis, since knowledge on protein sequences and three-dimensional (3D) structures is growing exponentially. Another issue is a significance of these studies for rational design of biocatalysts. One may expect that understanding the laws of enzyme evolution will make these efforts more successful.

Few amino acid residues of protein (catalytic site) are directly involved in chemical catalysis. The number of known enzymes far exceeds the number of catalytic sites, since homologous proteins tend to preserve residues of the catalytic sites, while substrate specificity may vary. Moreover, unrelated enzymes often acquire identical catalytic sites during evolution accounting for the so-called convergent evolution. The phenomena of divergent evolution, when enzyme family members are characterized by different catalytic sites or reaction mechanisms, are less abundant. Despite deep knowledge about many individual enzymes, more general questions remain unanswered, e.g. how many different catalytic sites are known; what organisms use enzymes with a catalytic site of a certain type; what sites are capable of catalyzing a given reaction.

To detect evolution of catalytic site, including both convergent and divergent events, two issues should be addressed: are the two given enzymes homologous and are their catalytic sites identical? Determination of protein homology is feasible owing to the methods of sequence comparison and in case of more distant relatives owing to comparison of protein fold. Computer-accessible databases of related protein sequences (Bateman et al., 2004; Tatusov et al., 2003) and protein folds CATH (Pearl et al., 2005) and SCOP (Lo Conte et al., 2002) are known. However, comparison of catalytic sites is complicated because the data are scattered through the literature.

Some efforts were undertaken to systematize this information. The Catalytic Site Atlas (Porter et al., 2004) provides annotations of the catalytic residues extracted from the literature. However, automated comparison of catalytic sites based on these annotations is not straightforward. First, the authors may apply diverse thresholds to include a residue to a set of catalytic residues. Therefore, enzymes with essentially similar catalytic sites may be annotated with different numbers of catalytic residues. Second, even if amino acid composition of two sites is identical, the catalytic mechanisms, geometry and roles of residues may be different. The MACiE database (Holliday et al., 2005) complements the Catalytic Site Atlas by providing animations of the catalytic mechanism for a number of selected enzymes. The Enzyme Catalytic-mechanism database (Nagano, 2005) provides hierarchic classification of catalytic mechanisms by reaction type, reactive groups of substrates, details of the catalytic mechanism and types of the catalytic residues. The reported mechanisms and catalytic sites for a given substrate or reaction can easily be found, but identical catalytic sites are scattered over several classes if their substrates are different. The Structure–Function Linkage database (Pegg et al., 2006) classifies enzymes according to a common ‘partial reaction’ that the enzymes catalyze, and it is helpful in finding an enzyme for a particular structure of a substrate or reaction to be performed.

To provide means for analysis and comparison of the catalytic sites independent of the substrate specificity or protein families, we present a database of the known catalytic sites and a classification scheme that allows one to overcome the specified difficulties. Currently only hydrolases are included in the classification since they are the most studied and abundant enzymes.


    ORGANIZATION OF HIERARCHY
 TOP
 ABSTRACT
 INTRODUCTION
 ORGANIZATION OF HIERARCHY
 CONTENT
 SOFTWARE
 REFERENCES
 
Any classification that divides objects into a set of non-related classes is restricted in either one of two ways. If the number of classes is small then each class contains many objects and their subtle differences are lost. If the number of classes is large then several classes contain related objects, and generalized comparisons are hindered. To overcome the shortcomings a hierarchical classification is proposed.

A catalytic site of a subclass refines that of its base class, i.e. it contains all residues of the basic class and some additional ones. Three simple rules are applied:

  1. Residues are ranked by their importance for catalysis. The residue forming a covalent bond with the substrate in reaction takes priority. The more distant is a residue from the reaction center, the lower is its priority.
  2. If a catalytic site includes metal ion(s), it is classified according to the type and the number of ions.
  3. If two catalytic sites show identical composition but are known to have different catalytic mechanisms, they belong to different classes.

Therefore, enzymes with a catalytic triad Ser–His–Asp are organized as follows: class S, serine hydrolases; class S.01, serine hydrolases with Ser-His dyad and, finally, class S.01.01, hydrolases with Ser–His–Asp/Glu triad.

A partial tree of serine hydrolases is given in Figure 1. The depth of the hierarchy is not limited, but in practice does not exceed four. A protein can belong to any class, not necessarily to a terminal one.


Figure 1
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 Partial classification tree for serine hydrolases.

 

    CONTENT
 TOP
 ABSTRACT
 INTRODUCTION
 ORGANIZATION OF HIERARCHY
 CONTENT
 SOFTWARE
 REFERENCES
 
Two main objects in the database are classes of the catalytic sites and individual enzymes linked to them.

For each class the following information is provided: manually drawn schemes of the catalytic mechanism; an example image of the catalytic site which can be viewed in 3D with JMol or Chime browser plug-in; a list of linked enzymes and distribution trees of the enzymes' properties as discussed below.

Several mechanisms are displayed for a given class if they are proposed in the literature. References to original publications, as well as short manual annotations, are provided. In case of uncertainty, the policy is to include the residues and catalytic steps that are consistent and to refine them in future upon availability of new data. Therefore a class can be moved down to a more specific position in the hierarchy tree.

Every enzyme is annotated by its name, EC number, source organism, amino acid sequence and its fold according to SCOP and CATH databases. Since a protein can contain several domains, only the catalytic one is mentioned, which is defined as a domain where residues of a catalytic site are located. The protein annotation includes links to UniProt and ENZYME (Bairoch et al., 2005), NCBI Taxonomy, CATH, SCOP and PDB (Berman et al., 2000) databases.

Principal methods of annotation are direct literature mining and automated search of catalytic sites in protein structures by templates. For searching, the Jess program was used (Barker and Thornton, 2003). Structure analysis was chosen because it is the only method capable of detecting different catalytic sites in homologous proteins. Manual annotation was necessary for catalytic sites containing one residue only and for enzymes all PDB structures of which contain mutations in catalytic residues. Identification of the natural cofactor of metal-dependent hydrolases also required an examination of the literature data.

There are 1160 hydrolases divided into 87 classes in the current release of the database; 950 of them were found automatically and 210 were classified manually. In addition, there are 220 proteins annotated as hydrolases in the UniProt or PDB databases, however they act as part of protein complexes and lack catalytic residues, or with the known structure of non-catalytic domains only. They are placed in special classes too. Currently, about half of protein catalytic sites are confirmed by the data from the literature. Following this trend, we are able to detect new mechanisms proposed for a member of a class and to reinvestigate the other members if necessary. We are planning to update the information on a quarterly basis and to include new proteins and classes.


    SOFTWARE
 TOP
 ABSTRACT
 INTRODUCTION
 ORGANIZATION OF HIERARCHY
 CONTENT
 SOFTWARE
 REFERENCES
 
Web-accessible software consists of the search and visualization modules. A protein can be found by its name, database (UniProt or PDB) code, EC number, catalytic site class, fold or organism. Fully qualified or partial identifier can be used for searching; e.g. EC number 3.4.-.- can be used for selecting all proteases. The same principle applies to the catalytic site class, fold or organism as well, each of them is considered to be a protein property with a hierarchical classification.

Once a group of proteins is selected, a distribution tree similar to that in Figure 1 can be created on-the-fly for each of their properties. Links to precompiled trees are provided from pages of classes; for non-terminal classes with proteins two options are available: to select proteins directly linked to that class only or proteins of that class and all its subclasses. To find the events of the catalytic site convergent evolution a tree of protein folds can be analyzed for proteins of a certain catalytic site. To detect cases of divergent evolution a tree of catalytic sites for proteins of 1-fold should be constructed. As an example, Figure 2 shows a distribution of folds for enzymes with Cys–His–Asp triad. At least five evolutionary lineages can be detected.


Figure 2
View larger version (8K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2 Distribution of folds for hydrolases with Cys–His–Asp triad.

 
The data uniformity model described above allows one to use any protein property to search and compile a distribution tree.


    Acknowledgments
 
The authors are grateful to Dr Jonathan Barker for the Jess source code and to Prof. Alexander Nemukhin for the valuable discussion of the manuscript.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: T Charlie Hodgman

Received on March 13, 2006; revised on July 7, 2006; accepted on July 25, 2006

    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 ORGANIZATION OF HIERARCHY
 CONTENT
 SOFTWARE
 REFERENCES
 

    Bairoch, A., et al. (2005) The Universal Protein Resource (UniProt). Nucleic Acids Res, . 33, D154–D159[Abstract/Free Full Text].

    Barker, J.A. and Thornton, J.M. (2003) An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis. Bioinformatics, 19, 1644–1649[Abstract/Free Full Text].

    Bartlett, G.J., et al. (2003) Catalysing new reactions during evolution: economy of residues and mechanism. J. Mol. Biol, . 331, 829–860[CrossRef][ISI][Medline].

    Bateman, A., et al. (2004) The Pfam protein families database. Nucleic Acids Res, . 32, D138–D141[Abstract/Free Full Text].

    Berman, H.M., et al. (2000) The Protein Data Bank. Nucleic Acids Res, . 28, 235–42[Abstract/Free Full Text].

    Pearl, F., et al. (2005) The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Res, . 33, D247–D251[Abstract/Free Full Text].

    Lo Conte, L., et al. (2002) SCOP database in 2002: refinements accommodate structural genomics. Nucleic Acids Res, . 30, 264–267[Abstract/Free Full Text].

    Holliday, G.L., et al. (2005) MACiE: a database of enzyme reaction mechanisms. Bioinformatics, 21, 4315–4316[Abstract/Free Full Text].

    Nagano, N. (2005) EzCatDB: the Enzyme Catalytic-mechanism Database. Nucleic Acids Res, . 33, D407–D412[Abstract/Free Full Text].

    Pegg, S.C., et al. (2006) Leveraging enzyme structure-function relationships for functional inference and experimental design: the structure-function linkage database. Biochemistry, 45, 2545–2555[CrossRef][Medline].

    Porter, C.T., et al. (2004) The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res, . 32, D129–D133[Abstract/Free Full Text].

    Tatusov, R.L., et al. (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics, 4, 41[CrossRef][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/20/2574    most recent
btl413v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Gariev, I. A.
Right arrow Articles by Varfolomeev, S. D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gariev, I. A.
Right arrow Articles by Varfolomeev, S. D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?