Bioinformatics Advance Access originally published online on July 28, 2006
Bioinformatics 2006 22(20):2459-2462; doi:10.1093/bioinformatics/btl414
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SANTA domain: a novel conserved protein module in Eukaryota with potential involvement in chromatin regulation
Centre for Advanced Research in Environmental Genomics (CAREG), Department of Biology, University of Ottawa Ottawa, ON K1N 6N5, Canada
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Since packaging of DNA in the chromatin structure restricts the accessibility for regulatory factors, chromatin remodeling is required to facilitate nuclear processes such as gene transcription, replication, and genome recombination. Many conserved non-enzymatic protein domains have been identified that contribute to the activities of multiprotein remodeling complexes. Here we identified a novel conserved protein domain in Eukaryota whose putative function may be in regulating chromatin remodeling. Since this domain is associated with a known SANT domain in several vertebrate proteins, we named it the SANTA (SANT Associated) domain. Sequence analysis showed that the SANTA domain is approximately a 90 amino acid module and likely composed of four central ß-sheets and three flanking
-helices. Many hydrophobic residues exhibited high conservation along the domain, implying a possible function in proteinprotein interactions. The SANTA domain was identified in mammals, chicken, frog, fish, sea squirt, sea urchin, worms and plants. Furthermore, a phylogenetic tree of SANTA domains showed that one plant-specific duplication event happened in the Viridiplantae lineage.
Contact: trudeauv{at}uottawa.ca
Supplementary Information: Supplementary Figure S1 for this paper is available at Bioinformatics online.
| 1 INTRODUCTION |
|---|
|
|
|---|
In the nucleus of eukaryotic cells, DNA is wrapped around octamers of histone proteins to form chromosomes, which further assemble into higher-order chromatin structure. This type of packaging compacts the entire genomic DNA inside the nucleus; however, it also prevents access to DNA for many regulatory proteins essential for transcription, replication, DNA repair, recombination and chromosome segregation. Therefore, a dynamic change of chromatin structure (chromatin remodeling) is both necessary and fundamental to the function and regulation of these nuclear genetic processes. Two distinct classes of enzymes have been implicated in these proceses: ATP-dependent chromatin remodeling enzymes (e.g. SWI2/SNF2 ATPase) (Flaus and Owen-Hughes, 2004) and the histone-modifying enzymes that catalyze the chemical modification of histone tails (e.g. acetylation, methylation and ubiquitination) (Strahl and Allis, 2000). Usually, these enzymes act with many other associated subunits in the context of large multiprotein complexes. Moreover, diverse complexes are not functionally independent, but rather cooperate to regulate chromatin structure (Narlikar et al., 2002).
Many conserved non-enzymatic protein modules or domains have been identified and characterized among these large remodeling complexes, such as the SANT domain (Aasland et al., 1996; Boyer et al., 2004), chromodomains (Koonin et al., 1995; Brehm et al., 2004), bromodomains (Dhalluin et al., 1999), and PHD finger (Bienz, 2005). It has been proposed that these non-enzymatic domains greatly contribute to chromatin remodeling based on the following findings. Firstly, these domains commonly bear histone binding activity specifically to different modification states of histone tails. For example, bromodomains selectively bind lysine-acetylated histone (Dhalluin et al., 1999), chromodomains bind lysine-methylated histone (Brehm et al., 2004), and the SANT domain binds unacetylated histone (Boyer et al., 2004). Secondly, they mediate interactions among remodeling proteins and other regulators like transcriptional factors. For example, SANT domains in diverse proteins can interact with transcriptional factor Sp1 (Ding et al., 2004), the Brf1 subunit of RNA polymerase (Kassavetis et al., 2006), and several nuclear receptors (Wang et al., 2006). Thirdly, some domains (e.g. chromodomains) possess DNA-binding and RNA-binding activities (Brehm et al., 2004). Thus, non-enzymatic domains determine complex interactions among remodeling proteins, histones, DNA, RNA and other regulators, which facilitate the assembly of multiprotein complexes and targeting substrate recognition during chromatin remodeling.
In addition to well-studied domains, novel remodeling-related domains continue to be identified, including SWIRM (Aravind and Iyer, 2002), DBINO (Bakshi et al., 2004), and Epc-N (Perry, 2006), which shed light on new mechanisms of chromatin regulation. In the present study, we describe a previously uncharacterized domain which is proposed to be involved in regulating chromatin remodeling. One important character of this domain is that it is associated with the known SANT domain in a set of vertebrate proteins, and therefore, we named it the SANTA (SANT Associated) domain.
| 2 RESULTS AND DISCUSSION |
|---|
|
|
|---|
2.1 Identification and characterization of the SANTA domain
The SANTA domain was initially identified as a conserved region (from 321 to 408) in one zebrafish protein sequence (GI: 54792766), which also harbors one known SANT domain (Pfam: PF00249, also named Myb-like DNA-binding domain) in its C-terminal region. The conservation was further illustrated by PSI-Blast searching (Altschul et al., 1997) with strict criteria (E-value < 0.0005) against the protein non-redundant database at National Center for Biotechnology Information (NCBI) and Blast searching in Ensembl database (Hubbard et al., 2005). Furthermore, the existence of domain sequences was confirmed through extracting mRNA or EST data in the NCBI nucleotide database (zebrafish GI: 54792765, mouse GI: 31044420, human GI: 15620864, sea squirt GI: 23575791, Caenorhabditis elegans GI: 71988494, Arabidopsis thaliana GI: 18406213, Oryza sativa GI: 32987831 and O.sativa GI: 32996314). The N- and C- terminal boundaries were fixed based on the multiple alignment of the full sequences of SANTA-containing proteins.
Many homologous sequences for the SANTA domain are present in mammals, fish, sea squirt, sea urchin, worms and plants. Sequence analysis shows it is approximately a 90 amino acid module (Fig. 1). The putative secondary structure is likely composed of four center ß-sheets and three flanking
-helices at its terminals (H-EEEE-HH) (H for
-helix and E for ß-sheet), which is consistently supported by three popular structural prediction programs with distinct algorithms. There were many hydrophobic residues that exhibited high conservation along the domain and fell into several characteristic motifs such as the N-terminal LxxW motif, central SxxIxxR and TxSGxxxxLxG motifs, and C-terminal FxxGFPxxW motif. The conserved residues are positionally consistent within these secondary structures, suggesting a critical role of these residues in structure and function. For example, several conserved glycine residues most likely contribute to forming turn structures which are located between sheet3 and sheet4, and between helix2 and helix3.
|
Interestingly, the basic structural composition of H-EEEE-H has been described in a recently identified PATAN domain (Makarova et al., 2006). However, careful comparison of PATAN and SANTA domains showed the SANTA domain is different in relative secondary structural location and two distinct C-terminal
-helices. Furthermore, no similar structures were found in our searches in 3D-Jury (Ginalski et al., 2003) and Genesilico (Kurowski and Bujnicki, 2003) meta-servers which utilized various protein fold recognition methods, implying the structure of the SANTA domain potentially defines a novel fold.
2.2 Domain architectures and putative function of the SANTA domain
Domain architectures (Fig. 2) show that in mammal, frog, chicken, and fish proteins, the SANTA domain is generally associated with the SANT domain (Pfam: PF00249). In one A.thaliana protein (GI: 15217905), the SANTA domain is coupled with a KIP1-like protein domain (Pfam: PF07765). In addition, unique SANTA domains were present in the proteins of other species such as sea squirt, sea urchin, worms, plants, and gray short-tailed opossum (Monodelphis domestica), suggesting that SANTA defines a novel protein family.
|
The association of the SANTA and SANT domains in many vertebrate proteins implies that these two domains are functionally related. The SANT domain is a conserved protein module found in a number of chromatin remodeling proteins with multiple activities such as DNA-binding, histone tail binding, and proteinprotein interactions (Aasland et al., 1996; Boyer et al., 2004; Wang et al., 2006). It appears as though the SANT domain represents a central module characteristic for chromatin regulation, since it is more broadly represented among remodeling proteins than other chromatin-related domains (e.g. SWIRM domain, bromodomain and PHD finger) (Boyer et al., 2004). Also, the SANT domain usually cooperates with other chromatin-related domains to constitute diverse sets of homologous proteins in different species. Here, the SANTA domain is similarly associated with SANT domains in many vertebrate proteins, implying a putative role in chromatin remodeling. Furthermore, the highly conserved hydrophobic residues among SANTA domains (Fig. 1) suggest a function in proteinprotein interactions, but not charged nucleotide binding activity. The KIP1-like protein domain is another known SANTA-coupled module, which is conserved among some plant proteins. Although there is little functional information available for this domain, one KIP1 containing protein is known to be involved in signaling regulation through interacting with the kinase domain of PRK1, a receptor-like kinase in plant pollen (Skirpan et al., 2001).
The proposed chromatin remodeling function of the SANTA domain is further supported by the published functional information about these SANTA-containing proteins. Yeast two-hybrid screening of a human colon carcinoma cell line indicates that a human protein (GI: 42415492) harboring both SANTA and SANT domains interacts with transcriptional factor Sp1 (Gunther et al., 2000), suggesting that the SANTA domain participates in transcriptional regulation, a process requiring chromatin remodeling. Similarly, in a recent paper an interaction between a SANT domain and Sp1 in regulating human MI-ER1 expression has been described (Ding et al., 2004). Moreover, expression information from EST data or mRNA data indicate SANTA containing proteins may have biological function in zebrafish embryonic inner ear (Coimbra et al., 2000), mouse preimplantation development of embryos (Ko et al., 2000), and human brain (Nagase et al., 2001).
2.3 Unique evolutionary history of SANTA domains
Phylogenetic analysis of SANTA domains offers insight into unique characteristics of these domains. Overall, there are six evolutionary lineages in which we could identify SANTA domains: Tetrapoda, Teleostei, Urochordata, Nematoda, Echinodermata, and Viridiplantae (Fig. 3). Interestingly, the SANTA domains in many proteins among two close lineages of Tetrapoda and Teleostei are characterized by coupling with the SANT domain, whereas other SANTA domains in ancient lineages are not. This implies that the association between SANT domain and SANTA domain is a recent product during evolution. There is one exception. In the gray short-tailed opossum (M.domestica) protein, the SANTA domain is not associated with SANT domain (Fig. 2), although it belongs to Tetrapoda lineage (Supplementary Figure S1).
|
It is well documented that ancient fish-specific whole genome duplication events have occurred after the divergence of teleost and the tetrapod lineages. This resulted in multiple copies of many genes, for example, the Hox gene family (Amores et al., 1998) and Midkine growth factor (Winkler et al., 2003). However, in the case of SANTA domains only one homolog was identified in each published genome within the Teleostei and other Metazoa lineages, suggesting that homologous SANTA domains may have been lost during evolution. Interestingly, we identified two SANTA homologs in the genomes of O.sativa and A.thaliana which belong to the Viridiplantae lineage. They have different chromosome (Chr) locations: At1 and At2 are located on A.thaliana Chr 1 and Chr 5, respectively, whereas Os1 and Os2 are located on O.sativa Chr 4 and Chr 1, respectively. Further phylogenetic relationships among plant homologs suggest that the SANTA domain may have undergone one plant-specific duplication event after the divergence of the Metazoa and Viridiplantae lineages. This kind of plant-specific duplication event is consistent with the proposed function of the SANTA domain in chromatin regulation, since Shiu et al. (2005) recently reported that transcriptional factor families, but not other genes, underwent dramatic expansion in plants when compared to other eukaryotes.
| 3 CONCLUSION |
|---|
|
|
|---|
One hallmark of chromatin regulation is that many conserved non-enzymatic domains among remodeling complexes exert crucial roles in assembly of multiprotein remodeling complexes and targeting substrate recognition. Discovering new domains and defining their functions will better our understanding of molecular mechanisms of chromatin remodeling. Here we described a previously uncharacterized conserved SANTA (SANT Associated) domain, which we speculate is involved in regulating chromatin remodeling. Sequence and structural analysis showed that the SANTA domain defines a unique protein family characterized by a structural fold of H-EEEE-HH with putative proteinprotein interaction activity. Phylogenetic analysis showed that the SANTA domain underwent one plant-specific duplication event in the Viridiplantae lineage. The proposed function of the SANTA domain in chromatin remodeling is experimentally supported by evidence that a human SANTA-containing protein interacts with transcriptional factor Sp1. The diverse proteins we identified that contain the novel SANTA domain are essentially novel and unresearched. Defining the exact biological role of the SANTA domains will require comprehensive functional analysis such as mutational and structural assessments.
| Acknowledgments |
|---|
We would like to thank Dr. Stephane Aris-Brosou and Susanna C. Wiens for critical reading of this manuscript and for helpful comments. The financial support from the University of Ottawa International Scholarship program (to D.Z.), and the NSERC Discovery and Strategic programs (to V.L.T.) is gratefully appreciated.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Alex Bateman
Received on June 8, 2006; revised on July 21, 2006; accepted on July 25, 2006
| REFERENCES |
|---|
|
|
|---|
Aasland, R., et al. (1996) The SANT domain: a putative DNA-binding domain in the SWI-SNF and ADA complexes, the transcriptional co-repressor N-CoR and TFIIIB. Trends Biochem. Sci, . 21, 8788[CrossRef][ISI][Medline].
Altschul, S.F., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, . 25, 33893402
Amores, A., et al. (1998) Zebrafish hox clusters and vertebrate genome evolution. Science, 282, 17111714
Aravind, L. and Iyer, L.M. (2002) The SWIRM domain: a conserved module found in chromosomal proteins points to novel chromatin-modifying activities. Genome Biol, . 3, 8[Medline].
Bakshi, R., et al. (2004) In silico characterization of the INO80 subfamily of SWI2/SNF2 chromatin remodeling proteins. Biochem. Biophys. Res. Commun, . 320, 197204[CrossRef][ISI][Medline].
Bateman, A., et al. (2004) The Pfam protein families database. Nucleic Acids Res, . 32, D138D141
Bienz, M. (2005) The PHD finger, a nuclear protein-interaction domain. Trends Biochem Sci, 2006, 31, 3540.
Boyer, L.A., et al. (2004) The SANT domain: a unique histone-tail-binding module? Nat. Rev. Mol. Cell Biol, . 5, 158163[CrossRef][ISI][Medline].
Brehm, A., et al. (2004) The many colours of chromodomains. Bioessays, 26, 133140[CrossRef][ISI][Medline].
Coimbra, R.S., et al. (2000) A subtracted cDNA library from the zebrafish (Danio rerio) embryonic inner ear. Genome Res, 2002, 12, 10071011.
Dhalluin, C., et al. (1999) Structure and ligand of a histone acetyltransferase bromodomain. Nature, 399, 491496[CrossRef][Medline].
Ding, Z., et al. (2004) The SANT domain of human MI-ER1 interacts with Sp1 to interfere with GC box recognition and repress transcription from its own promoter. J. Biol. Chem, . 279, 2800928016
Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res, . 32, 17921797
Felsenstein, J. (1989) PHYLIP-Phylogeny Inference Package (Version 3.2). Cladistics, 5, 164166.
Flaus, A. and Owen-Hughes, T. (2004) Mechanisms for ATP-dependent chromatin remodelling: farewell to the tuna-can octamer? Curr. Opin. Genet. Dev, 14, 165173[CrossRef][ISI][Medline].
Ginalski, K., et al. (2003) 3D-Jury: a simple approach to improve protein structure predictions. Bioinformatics, 19, 10151018
Goodstadt, L. and Ponting, C.P. (2001) CHROMA: consensus-based colouring of multiple alignments for publication. Bioinformatics, 17, 845846
Gunther, M., et al. (2000) A set of proteins interacting with transcription factor Sp1 identified in a two-hybrid screening. Mol. Cell Biochem, . 210, 131142[CrossRef][ISI][Medline].
Hubbard, T., et al. (2005) Ensembl 2005. Nucleic Acids Res, . 33, D447D453
Kassavetis, G.A., et al. (2006) Mapping the principal interaction site of the Brf1 and Bdp1 subunits of S. cerevisiae TFIIIB. J. Biol. Chem, . 281, 1432114329
Ko, S.H., et al. (2000) Large-scale cDNA analysis reveals phased gene expression patterns during preimplantation mouse development. Development, 127, 17371749[Abstract].
Koonin, E.V., et al. (1995) The chromo superfamily: new members, duplication of the chromo domain and possible role in delivering transcription regulators to chromatin. Nucleic Acids Res, . 23, 42294233
Kumar, S., et al. (2004) MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment. Brief Bioinform, . 5, 15063
Kurowski, M.A. and Bujnicki, J.M. (2003) GeneSilico protein structure prediction meta-server. Nucleic Acids Res, . 31, 33053307
Letunic, I., et al. (2002) Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Res, . 30, 242244
Lin, K., et al. (2005) A simple and fast secondary structure prediction method using hidden neural networks. Bioinformatics, 21, 152159
Makarova, K., et al. (2006) Cyanobacterial response regulator PatA contains a conserved N-terminal domain (PATAN) with an alpha-helical insertion. Bioinformatics, 22, 12971301
Marchler-Bauer, A., et al. (2005) CDD: a Conserved Domain Database for protein classification. Nucleic Acids Res, . 33, D192D196
McGuffin, L.J., et al. (2000) The PSIPRED protein structure prediction server. Bioinformatics, 16, 404405
Nagase, T., et al. (2001) Prediction of the coding sequences of unidentified human genes. XXI. The complete sequences of 60 new cDNA clones from brain which code for large proteins. DNA Res, 8, 179187[Abstract].
Narliker, G.J., et al. (2002) Cooperation between complexes that regulate chromatin structure and transcription. Cell, 108, 475487[CrossRef][ISI][Medline].
Perry, J. (2006) The Epc-N domain: a predicted protein-protein interaction domain found in select chromatin associated proteins. BMC Genomics, 7, 6[CrossRef][Medline].
Poirot, O., et al. (2003) Tcoffee@igs: A web server for computing, evaluating and combining multiple sequence alignments. Nucleic Acids Res, . 31, 35033506
Pollastri, G., et al. (2002) Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins, 47, 228235[CrossRef][ISI][Medline].
Shiu, S., et al. (2005) Transcription factor families have much higher expansion rates in plants than in animals. Plant Physiol, . 139, 1826
Simossis, V.A. and Heringa, J. (2005) PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information. Nucleic Acids Res, . 33, W289W294
Skirpan, A.L., et al. (2001) Isolation and characterization of kinase interacting protein 1, a pollen protein that interacts with the kinase domain of PRK1, a receptor-like kinase of petunia. Plant Physiol, . 126, 14801492
Strahl, B.D. and Allis, C.D. (2000) The language of covalent histone modifications. Nature, 403, 4145[CrossRef][Medline].
Wang, L., et al. (2006) Histone deacetylase-associating Atrophin proteins are nuclear receptor corepressors. Genes Dev, . 20, 525530
Winkler, C., et al. (2003) Functional divergence of two zebrafish midkine growth factors following fish-specific gene duplication. Genome Res, . 13, 10671081
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


