Bioinformatics Advance Access originally published online on September 18, 2007
Bioinformatics 2007 23(20):2660-2664; doi:10.1093/bioinformatics/btm411
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The DOMON domains are involved in heme and sugar recognition
National Center for Biotechnology Information, National Library of Medicine and National Institute of Health, Bethesda, MD 20894, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
We expand the functionally uncharacterized DOMON domain superfamily to identify several novel families, including the first prokaryotic representatives. Using several computational tools we show that it is involved in ligand binding—either as heme- or sugar-binding domains. We present evidence that the DOMON domain along with the DM13 domain comprises a novel electron-transfer system potentially involved in oxidative modification of animal cell-surface proteins. Other novel versions might function as sugar sensors of histidine kinases of bacterial two component systems.
Contact: aravind{at}ncbi.nlm.nih.gov or aravind{at}mail.nih.gov
Supplementary information: Supplementary data are available at Bioinformatics online and also at ftp://ftp.ncbi.nih.gov/pub/aravind/domon/.
| 1 INTRODUCTION |
|---|
|
|
|---|
The DOMON (dopamine ß-monooxygenase N-terminal) domain also called DoH was originally identified in several secreted, or cell surface proteins from plants and animals (Aravind, 2001; Ponting, 2001). It usually occurs fused to other domains such as other Cu-ascorbate-dependent mono-oxygenases associated with catecholamine metabolism (the eponymous enzyme in which it was found), cytochrome b561 and adhesion modules such as EGF, reelin and SEA. This ß-strand-rich domain was predicted to adopt a ß-sandwich-like fold, and based on its domain architectural contexts it was predicted to mediate protein–protein interactions (Aravind, 2001; Ponting, 2001). However, there exists little to no experimental evidence supporting such a function and the domain's biochemical role remains poorly characterized. The explosive growth in sequence and structure databases often provides new leads that allow uncovering the functions of previously enigmatic domains. Using sensitive sequence and structural analysis, we show that DOMON is widely distributed outside of animals and plants in fungi, various protists, bacteria and archaea. Based on the contextual and structural information gleaned from these newly identified forms, we show that the DOMON domains are predominantly heme- or sugar-binding domains. We show that the DOMON superfamily has been widely utilized in various contexts involving redox reactions potentially as a direct participant in the electron transfer process.
| 2 RESULTS |
|---|
|
|
|---|
2.1 Identification of the extended DOMON superfamily
To obtain a more complete understanding of the evolution and functions of DOMON domains, we initiated searches of NR and a locally compiled database of unfinished eukaryotic genomes using PSI-BLAST, and an input position-specific score matrix representing all previously identified DOMON domains (see Supplementary Material for a detailed description of materials and methods). These searches recovered novel DOMON homologs in diverse protists such as ciliates, oomycetes, diatoms and Naegleria. With further transitive searches we also retrieved the N-terminal cytochrome domain of the fungal cellobiose dehydrogenases (CDH), and bacterial proteins from diverse taxa, such as the ethylbenzene dehydrogenase
subunit (EDH
), the C-terminal domain of certain NirT proteins and the carbohydrate binding domain family 9 (CBD9) domains of xylanases and bacterial extracellular cellulases. For example, PSI-BLAST searches initiated with the DOMON domain of the human dopamine ß-monooxygenase (DM, gi: 30474, region 50–166) against NR recovered with significant E-values (E < 0.001) in eight iterations several novel representatives in ciliates, Dictyostelium and bacteria such as Thermococcus and Roseobacter, in addition to previously reported versions. The search also retrieved the cytochrome domain of the fungal CDH (PDB: 1D7B; (E < 10–15). Further searches initiated with the DOMON-containing Tetrahymena protein TTHERM_00460560 protein (gi: 118371105) retrieved several bacterial proteins, such as the C-terminal region of a NirT homolog from Colwellia (gi: 71278993), which in turn recovered EDH
(PDB: 2ivf, chain C). Finally, PSI-BLAST searches with EDH
recovered the CBD9 domains of bacterial xylanases and cellulases (PDB: 1I82A) and a set of related proteins from fungi, ciliates and Dictyostelium (e.g. Gibberella zeae FG07921.1) with significant E-values. Using the DALI program we also conducted structure searches of the PDB database with CDH, EDH
and the CBD9 domain as queries. These usually recovered each other and also the bacterial Glucodextranase C-terminal domain as best hits. The above structures are classified under the CBD9 superfamily of the immunoglobulin (Ig) fold in the SCOP database (http://scop.mrc-lmb.cam.ac.uk/scop/). All the retrieved sequences were classified into distinct families by first clustering with the BLASTCLUST program (Supplementary Material) and then further combining the clusters using uniquely shared sequence features and shared domain architectures. By this we obtained at least nine distinct protein families: classical DOMON (of which the fungal CDH is a member), EDH
-cytochrome domain, CbsA/cytochrome b558/556, Hajella HCH_03667-like, NirT C-terminal-like, Shewanella SO2192-like, CBD9-like, Gibberella zeae FG07921.1-like and Glucodextranase C-terminal domain-like families (Table 1 in Supplementary Material). Hereinafter, we refer to this unified monophyletic assemblage of domains as the DOMON superfamily. A comprehensive multiple alignment of the superfamily shows that the Ig-like ß-sandwich DOMON superfamily has 10–11 strands, and shares several unique structural features (Figs 1 and 2). These include: (1) a common ligand-binding interface. (2) Several additional strands beyond the core seven strands of the classical Ig-like fold, including two additional N-terminal strands, and an extra strand in the ligand binding sheet and (3) a characteristic long loop between strands 5 and 6 of the conserved core that folds against the ligand-binding ß-sheet and provides an interface for ligand contact. Several other ß-sandwich domains have been previously identified as carbohydrate (e.g. Cellulose binding domain II and III)- or heme (e.g. Cytochrome f)-binding modules (http://scop.mrc-lmb.cam.ac.uk/scop/). However, the DOMON domain differs from all of them, both in terms of the position and specific mode of ligand interaction, and number of strands in the ß-sandwich. These suggest innovation of specific ligand-binding features in the DOMON superfamily after their divergence from the generic group of ligand-binding ß-sandwich domains. The defining conserved sequence features of the DOMON superfamily (Figs 1 and 2) include: (1) multiple hydrophobic residues that contribute to the hydrophobic core of the strands of the ß-sandwich, and small residues found at the boundaries of strands and loops. (2) A strongly conserved charged residue (usually arginine/lysine) at the end of strand 9. The strong conservation of this non-ligand-binding residue suggests that it may have a structural role, such as stabilizing the loop between strands 9 and 10 or mediating conformational changes and (3) a polar residue (usually histidine, lysine or arginine), that interacts or coordinates ligands.
|
|
2.2 Deciphering the DOMON domain's function: evidence from sequence and structure
The above characterization of the DOMON superfamily resulted in identification of previously experimentally characterized versions binding different ligands. Of these EDH
and CDH bind a single heme moiety, NirT-C binds a di-heme cofactor and the CBD9 either glucose or cellobiose (Devreese et al., 2000; Hallberg et al., 2000; Kloer et al., 2006; Notenboom et al., 2001). Four of these, representing diverse families of this domain, have crystal structures, with three of them containing a bound soluble ligand (Fig. 2). In all these structures the ligand is bound in a strikingly similar fashion, in a comparable pocket formed by one of the sheets of the ß-sandwich (Fig. 2). The presence of a similarly bound ligand, and conservation of the essential structural features (Figs 1 and 2) required for the maintenance of the binding pocket, suggest that soluble ligand binding is likely to be the conserved function of this domain. To understand the shared ligand-binding features of the superfamily, we extracted all ligand-interacting residues from the available structures and compared them with the conservation pattern seen in the multiple alignment.
As suggested by previous studies, the two heme-binding versions EDH
and CDH contain a conserved methionine in the curved loop between conserved strands 5 and 6, which is directly linked to the heme (Hallberg et al., 2000; Kloer et al., 2006). These heme-binding versions also share a histidine or lysine residue present in the beginning of the terminal strand that directly contacts the ligand. The primary ligand-contacting residue in the sugar-binding CBD9 family is a conserved arginine that occupies the same position as the heme-interacting H/K residue of the above versions, and makes contacts with the polar groups of the sugar moiety. The CBD9 family also contains a unique conserved tryptophan in a large insert in the last strand, and was previously shown to stack against the sugar (Notenboom et al., 2001; Figs 1 and 2). At least six of the nine families, namely DOMON, EDH
-like, cytochrome b558/556-like, HCH_03667-like, NirT C-terminus-like and Gibberella zeae FG07921.1-like families, have the conserved ligand-contacting histidine or lysine at the base of the terminal strand. Of these, most members of the DOMON, EDH
-like and cytochrome b558/556-like families also contain the methionine residue in the insert between strands 5 and 6, strongly supporting a heme-binding function for these versions. Despite lacking the methionine, the corresponding inserts of the NirT C-terminus and HCH_03667 families contain conserved histidine residues which could provide an alternative ligand for heme (Fig. 1). Some members of the former family also contain a conserved insert between strands 1 and 2 with two cysteine and histidine residues that might contribute to binding a second heme. The functionally obscure Gibberella zeae FG07921.1-like proteins have a conserved tyrosine in the insert in place of the methionine. Given that tyrosine has been observed as a heme-ligand in unrelated heme-binding Ig-fold proteins, it is possible that it functionally substitutes the methionine (Fig. 1). The Glucodextranase C-terminal domain family shares the conserved arginine at the base of the terminal strand and the distinctive insert in the terminal strand with the conserved tryptophan with the CBD9 family, suggesting a similar sugar-binding function. The Shewanella SO2192-like family is also related to the sugar-binding versions. In spite of sharing a common ligand, very few of the other residues lining the binding site in the DOMON domains of CDH and EDH
are conserved across both the known and predicted heme-binding versions. Likewise few residues beyond those described above are shared by binding sites of the known or predicted sugar-binding forms. At least in the case of the heme-binding forms, this might indicate that the only other constraint on the ligand-binding site is the maintenance of its general hydrophobicity. Thus, the poorly conserved ligand-contacting residues only generically complement the primary residues by making non-specific contacts. It is, however, possible that these differences in the binding pocket have a more subtle effect on the redox properties of the bound heme. Comparisons of the carbohydrate-binding versions suggest that at least in some cases the differences might translate into differences in terms of the ligand bound. Nevertheless, the presence of at least one shared polar ligand-contacting position in the sugar and heme-binding versions at the base of the terminal strand supports an ancestral ligand-binding role for the DOMON superfamily.
2.3 Deciphering the DOMON domain's function: evidence from contextual information
Contextual information in the form of domain architectures and gene neighborhoods are often used to gain functional insights into poorly characterized domains and domain families. Most proteins of the DOMON superfamily are secreted and contain a signal peptide. In the heme-binding versions, the greatest diversity of architectures was seen in the eukaryotic version of the DOMON family. One predominant architectural theme was association with cytochromes or enzymatic domains whose activity involved redox or electron transfer reactions. Thus, DOMON is fused to (1) a transmembrane cytochrome b561 domain in several proteins from diverse eukaryotes (Supplementary Material) and in the bacterial HCH_03667-like proteins, (2) Cu-ascorbate dependent mono-oxygenases in dopamine ß-monooxygenase-like proteins of animals, chlorophyte algae and diatoms, (3) cytochrome B5-like ferric reductases in ciliates, Phytophthora and Naegleria, (4) various Rossmann fold FAD and NAD binding oxidoreductase domains as in the fungal CDH, other oxidoreductases from Phytophthora and Naegleria and (5) a cytochrome c domain in certain NirT proteins (e.g. Colwellia; gi: 71278993) (Gross et al., 2005). Thus, in several cases the DOMON domain is fused to multiple cytochrome or Rossmann oxidoreductase domains in the same polypeptide (Fig. 2, Supplementary Material). Interestingly, an archaeal DOMON containing protein (Methanosaeta Mthe_1462), is fused to a ferritin family iron-binding domain.
Experimental studies on the fungal CDH have shown that the heme bound by the DOMON domain transfers electrons to the flavin ligand of the oxidoreductase domain during oxidation of cellobiose or cello-oligosaccharides (Stoica et al., 2006). The animal SDR2-like proteins that are fused to a cytochrome B561 domain have been shown to function as ferric reductases (Vargas et al., 2003). Together, these observations suggest that the heme-binding versions of the DOMON domains are cytochromes mediating electron transfers in redox reactions. This prediction offers an important functional clue regarding the several animal extracellular matrix proteins in which the DOMON domain is fused to other adhesion modules such as EGF, reelins, trypsin-inhibitor and SEA domains and the poly-DOMON domain proteins (Aravind, 2001). An examination of these proteins shows that at least one copy of the DOMON domain in them contains the conventional heme ligands—the methionine and histidine/lysine. This suggests that rather than being passive extracellular structural proteins they are likely to function as cytochromes involved in as yet unidentified redox reactions potentially related to protein hydroxylation or oxidative cross-linking. In many of the above animal proteins the DOMON domain typically occurs with the DM13 domain, which is also predicted to have a ß-strand-rich fold. The DM13 domain interestingly contains a nearly absolutely conserved cysteine, which can be potentially involved in a redox reaction either as a naked thiol group or by binding a prosthetic group like heme. The DOMON domains of some members of the dopamine ß-monooxygenase family, like MOXD1, contain a conventional heme-binding pocket suggesting that it functions as a cytochrome providing electrons for mediating the monooxygenase reaction. The DOMONs of vertebrate DM and the arthropod and nematode tyramine ß-hydroxylase lack the heme-coordinating methionine and histidine. However, the ligand-binding pocket is predicted to remain intact (Fig. 1) suggesting that they either bind an unknown ligand or weakly bind a heme, which might be critical for the properties of these enzymes.
DOMON domains of the Shewanella SO2192-like family are all extracellular/periplasmic domains of receptor histidine kinases (Fig. 2), which are encoded by a predicted operon also containing a neighboring gene for a protein with an HTH fused to a receiver domain. These proteins are likely to comprise a two-component system that potentially sense environmental sugars with the DOMON domain. The CBD9 and the Glucodextranase C-terminal type DOMON domains are found in proteins fused to other sugar-binding domains, S-layer homology domains and sugar transferases, or in operons encoding possible sugar transporters, consistent with their role in polysaccharide metabolism.
| 3 EVOLUTIONARY HISTORY AND CONCLUSIONS |
|---|
|
|
|---|
Our investigations show that in addition to eukaryotes the DOMON superfamily is wide distributed in phylogenetically diverse bacteria and sporadically in archaea. This domain shows the greatest diversity in the bacterial superkingdom both in terms of number of different families and domain architectures. This is also consistent with the extraordinary diversity of other versions of the ß-sandwich seen in bacteria. These observations suggest a possible origin for this domain in bacteria followed by possible dispersion through lateral transfer to eukaryotes and certain archaea. Divergence of the domain into heme- and sugar-binding versions also appears to have occurred in the bacteria. At least three distinct families of the DOMON superfamily have been transferred to eukaryotes of which the classical DOMON and the Gibberella zeae FG07921.1-like families are present in a wide range of eukaryotes suggesting an early transfer. Likewise, we were also able to identify several bacterial versions (Supplementary Material) of the DM13 domain, previously known only from animals. Thus, the DM13 domain and the classical family of the DOMON domain, both ultimately of bacterial origin, appear to have proliferated in animal extracellular proteins. We believe the identification of the DOMON domain as a cytochrome or a sugar-binding domain would help in understanding better their biochemical properties. In particular, we hope that it might help in exploring a hitherto unknown predicted electron transfer system possibly involved in modifying animal extracellular proteins.
| ACKNOWLEDGEMENT |
|---|
|
|
|---|
The authors acknowledge the Intramural research program of the NLM, National Institutes of Health, USA, for funding their research.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: John Quackenbush
Received on June 13, 2007; revised on July 19, 2007; accepted on August 8, 2007
| REFERENCES |
|---|
|
|
|---|
Aravind L. DOMON: an ancient extracellular domain in dopamine beta-monooxygenase and other proteins. Trends Biochem. Sci, ( (2001) ) 26, : 524–526.[CrossRef][ISI][Medline].
Devreese B, et al. Primary structure characterization of a Rhodocyclus tenuis diheme cytochrome c reveals the existence of two different classes of low-potential diheme cytochromes c in purple phototropic bacteria. Arch. Biochem. Biophys, ( (2000) ) 381, : 53–60.[CrossRef][ISI][Medline].
Gross R, et al. Site-directed modifications indicate differences in axial haem c iron ligation between the related NrfH and NapC families of multihaem c-type cytochromes. Biochem. J, ( (2005) ) 390, : 689–693.[ISI][Medline].
Hallberg BM, et al. A new scaffold for binding haem in the cytochrome domain of the extracellular flavocytochrome cellobiose dehydrogenase. Structure, ( (2000) ) 8, : 79–88.[Medline].
Kloer DP, et al. Crystal structure of ethylbenzene dehydrogenase from Aromatoleum aromaticum. Structure, ( (2006) ) 14, : 1377–1388.[Medline].
Notenboom V, et al. Crystal structures of the family 9 carbohydrate-binding module from Thermotoga maritima xylanase 10A in native and ligand-bound forms. Biochemistry, ( (2001) ) 40, : 6248–6256.[CrossRef][Medline].
Ponting CP. Domain homologues of dopamine beta-hydroxylase and ferric reductase: roles for iron metabolism in neurodegenerative disorders? Hum. Mol. Genet, ( (2001) ) 10, : 1853–1858.
Stoica L, et al. Direct electron transfer – a favorite electron route for cellobiose dehydrogenase (CDH) from Trametes villosa. Comparison with CDH from Phanerochaete chrysosporium. Langmuir, ( (2006) ) 22, : 10801–10806.[CrossRef][ISI][Medline].
Vargas JD, et al. Stromal cell-derived receptor 2 and cytochrome b561 are functional ferric reductases. Biochim. Biophys. Acta, ( (2003) ) 1651, : 116–123.[Medline].
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

