Bioinformatics Advance Access originally published online on October 14, 2004
Bioinformatics 2005 21(6):699-702; doi:10.1093/bioinformatics/bti065
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Published by Oxford University Press 2004.
OCRE: a novel domain made of imperfect, aromatic-rich octamer repeats
Département de Biologie Structurale, LMCP, CNRS UMR7590, Universités Paris 6 & Paris 7 Case 115, 4 place Jussieu, 75252 Paris Cedex 05, France
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Summary: In this study, we describe a novel domain, OCRE, which is shared by the recently identified angiogenic factor VG5Q and a specific family of RNA-binding motif proteins.
The OCRE domain is characterized by a 5-fold, imperfectly repeated octameric sequence, which includes a triplet of often-conserved aromatic amino acids predicted to form a ß-strand and in which the slightly modified fifth repeat might act as a repeat terminator.
Although the function of this domain remains to be elucidated, the domain architecture of OCRE containing proteins and experimental data suggest a role in RNA metabolism and/or in signalling pathways activated by the tumor necrosis factor superfamily of cytokines.
Contact: Isabelle.Callebaut{at}lmcp.jussieu.fr
| INTRODUCTION |
|---|
|
|
|---|
VG5Q was recently reported as the first susceptibility gene for KlippelTrenaunay syndrome (KTS), a disorder characterized by diverse effects in the vascular system (Tian et al., 2004). Susceptibility to vascular defects typical of KTS is increased either by higher expression of the gene due to chromosomal translocation, or by a mutant protein which is assumed to be hyperactive. The VG5Q protein acts as an angiogenic factor, since gene expression at high levels promotes blood vessel growth. The VG5Q protein is secreted when vessel formation is initiated, and is capable of binding to the surface of endothelial cells and interacts with TWEAK, a member of the tumor necrosis factor (TNF) superfamily that induces angiogenesis in vivo.
In search of information helping us to understand the molecular mechanism by which VG5Q promotes angiogenesis, we investigated further the domain architecture of this protein and identified a novel domain, which is shared by other proteins involved in RNA metabolism. This domain was referred to as OCRE, after OCtamer REpeat (see below).
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
Identification of the OCRE domain
The VG5Q protein [GenBank identifier (gi) 45708564] contains a forkhead-associated (FHA) domain and a G-patch motif (Fig. 1). We further analyzed its domain architecture using hydrophobic cluster analysis (HCA) (Callebaut et al., 1997), a two-dimensional (2D) approach, which is well suited to analyze the 2D texture of proteins. We identified a distinct globular domain (residues 200261; approximately one-third of hydrophobic amino acids, gathered into clusters, the positions of which mainly correspond to those of regular secondary structures), encircled by non-hydrophobic sequences.
|
PSI-BLAST search (Altschul et al., 1997) of the non-redundant database (2 010 419 sequences) at the National Center for Biological Information (http://www.ncbi.nlm.nih.gov) using this domain as query led to the identification of statistically significant similarities to other proteins (BLASTP 2.2.9, inclusion E-value of 0.005; convergence after five iterations; 74 significant hits). Reciprocal searches using all the retrieved sequences as queries yield several additional sequences for a total of one-hundred hits. A representative set of aligned sequences is shown in Figure 2 and is accessible in the EMBL-Align database under the accession number ALIGN_000772. In particular, a potential ortholog of the VG5Q protein, sharing similar domain architecture, was also retrieved in the pathogenic fungus Cryptococcus neoformans (CNBE2600; Fig. 2), in the twilight zone of several PSI-BLAST searches (E = 0.067 using VG5Q as query). The significance of this similarity was further assessed using HCA.
|
In several of the retrieved sequences, the so-called domain is clearly separated from the rest of the protein by non-globular sequences or is located at the very N-terminus, thereby allowing a clear definition of its N- and C-terminal limits. This domain is found in a variety of eukaryotic species, including animals, higher plants, the parasites Plasmodium falciparum and Plasmodium yoelii and the fungal pathogen C.neoformans, indicating a conserved and presumably important function.
The domain identified here was named OCRE as it consists of a repeated sequence of eight residues, organized around a triplet of conserved aromatic amino acids (Fig. 2), which might form a ß-strand structure, as suggested by the high association of the corresponding hydrophobic cluster with this kind of regular secondary structure [65% ß versus 15%
on a representative set of 3242 clusters built with only three successive hydrophobic amino acids; Hennetin et al. (2003) and our unpublished data] and by other methods of secondary structure prediction (Cuff et al., 1998). The aromatic triplet (populated on an average at 81% of aromatic amino acids for the three inner repeats and at 62% for the outer ones) is generally preceded by two tiny amino acids and followed by an acidic residue. The repeated character of sequences shown here, which belong to the OCRE family, was already noticed on an ad hoc basis for some of the specific family of RNA-binding proteins (Drabkin et al., 1999; Inoue et al., 1996) (see below). The OCRE domain strictly consists of five repeats, the fifth one being more divergent and appears to initiate the repeat termination (Fig. 2). Octamer repeats of the OCRE family distinguish from other octamer repeats that share some marginal sequence similarities with the OCRE sequences (e.g. a mouse hypothetical proteingi 51708554) or are sequence-unrelated [e.g. repeats of animal prions and ice nucleation proteins (Gazit, 2002) and of the C-terminus of histone H1 (Bharath et al., 2002)]. All these repeats are indeed not limited to five and do not possess terminator sequences similar to the degenerated last repeat of the OCRE domains. Fold recognition methods (Kelley et al., 2000; Shi et al., 2001) did not readily assign an already known structure to the OCRE domain. However, the repeated character of the octamer suggests an equally repeated 3D architecture, most probably based on ß-strands. Right-handed ß-helices might fulfill some of the most striking features of the OCRE domains. Indeed, the triangular kidney-shaped right-handed ß-helices are composed of successive rungs of 2225 amino acids in which, at each corner of stretches of
8 amino acids, glycines are frequently encountered (Jenkins and Pickersgill, 2001). Clusters of three successive hydrophobic amino acids are also frequently found in these structures, centered on ß-strands (e.g. the Aspergillus niger pectin lyasepdb 1QCX; amino acids 227234, 258265, 266273). Moreover, more or less large loops can protrude at the corners, a situation which is consistent with the insertions found in RBM6, and to a lesser extent in F28A23.100 (Fig. 2). The first three OCRE repeats might thus constitute a 24 amino acid rung, followed by two-third of another rung (fourth and fifth repeats). This unusually limited ß-helix might allow the side chains of the numerous aromatic amino acids (mainly tyrosine) to almost freely interact with each other on the surface of the domain through
and CH ···
interactions as well as through H-bonds. This structure might thus constitute a specific interaction area, in which the conserved aspartic acids and serines (Fig. 2) could be engaged in locking interactions. Alternatively, the OCRE domains might fold as a two-sheet ß-roll (two-and-a-half rungs), similar to that encountered in alkaline protease (pdb 1 KAP; amino acids 336379). However, the possibility of another, and possibly a new fold for the OCRE domains cannot be ruled out. Sequence composition of the N-terminus of the OCRE domains, before the first repeat, suggests the presence of a helix of variable length, which might complete the domain together with the C-terminus of the fifth repeat.
Domain architecture of the OCRE proteins
Searching the domain databases [SMART (Letunic et al., 2004) and Pfam (Bateman et al., 2004)] using the retrieved sequences as queries revealed a limited number of domain combinations within the OCRE family (Fig. 1). The OCRE domain of the VG5Q protein family is associated with an FHA domain, a signalling module that is found in a variety of proteins and which preferentially binds to the phospho-threonine residues (Durocher and Jackson, 2002). Most of the OCRE domains, including that of VG5Q, are included in proteins with G-patch motifs, which are found in several RNA-associated proteins and have been suggested to play a role in RNA binding (Aravind and Koonin, 1999). Most of these G-patch containing proteins also have RNA-recognition motifs (RRMs) and specific zinc fingers, and belong to a specific family of RNA-binding motif (RBM) proteins (Fig. 1). This family includes two proteins, human RBM5 and human RBM6, the genes of which are adjacent in the lung cancer tumor suppressor locus 3p21.3 (Timmer et al., 1999). The RBM5 gene (also known as LUCA-15) was suggested to act as an oncogene, since a direct correlation has been shown between up-regulation of LUCA-15 RNA and HER-2/neu oncogene overexpression (Oh et al., 1999). LUCA-15 was also proposed to play an important role in the control of apoptosis (Sutherland et al., 2001). The sequence of human RBM6 is identical to that of the lung cancer antigen NY-LU-12, encoding a gene identified by screening a lung cancer cDNA expression library with autologous patient antisera (Gure et al., 1998) and to that of DEF-3, the gene of which was isolated by positional cloning from a small-cell lung carcinoma in 3p21.3 and, in parallel, found as differentially expressed during myelopoiesis (Drabkin et al., 1999). The RBM5/RBM6 family also includes rat RBM10 (S11 protein), a nucleus-extracted protein that binds to RNA homopolymers (Inoue et al., 1996).
Functional prediction for the OCRE proteins and OCRE domain
The architecture of the different proteins containing the OCRE domain, including RRMs and/or G-patch (Fig. 1), suggests a role in RNA processing. Several lines of evidence strengthens this hypothesis. First, recombinant proteins containing the RRMs of RBM5/LUCA-15 and RBM6/DEF-3 are able to specifically bind poly(G) RNA tracts in vitro (Drabkin et al., 1999; Edamatsu et al., 2000). Second, the rat RMB-10/S1-1 protein binds homopolymer RNA in vitro [especially poly(G) and poly(U)] and associates with heterogeneous nuclear ribonucleoproteins (hnRNPs) in the nucleus (Inoue et al., 1996). Besides this probable role in RNA metabolism, the putative tumour suppressor property of RBM5/LUCA-15 was proposed to be associated with its ability to modulate apoptosis (for a review see Sutherland et al., 2001). Overexpression of RBM5 suppressed cell proliferation both by inducing apoptosis and by extending the G1 phase of the cell cycle (Mourtada-Maarabouni et al., 2003). This modulation appears to be mediated by different cytokines of the TNF superfamily, such as TRAIL, but also TNF-
and FAS (Rintala-Maki et al., 2004). Thus, RBM5/LUCA-15 enhances multiple receptor-initiated death signals.
The association of OCRE domains with RNA-binding modules, which have in common the presence of conserved aromatic amino acids, suggests that the OCRE domain itself could be a novel RNA-binding module. As in RRMs, these aromatic amino acids (principally tyrosine and phenylalanine) might be directly involved in nucleic acid binding (Birney et al., 1993). This hypothesis raises the possibility that the angiogenic VG5Q protein, which is cytoplasmic but also secreted upon angiogenesis, could also be involved in RNA metabolism. This is further supported by the fact that the G-patch motif, which is found at the C-terminus of the VG5Q protein, is also mainly found in proteins involved in RNA metabolism (Aravind and Koonin, 1999). Another common feature of different OCRE proteins is their participation in signalling pathways activated by the TNF superfamily of cytokines, in particular involved in the stimulation of cell growth and angiogenesis (VG5Q) (Tian et al., 2004) and apoptosis (LUCA-15) (Rintala-Maki et al., 2004). Whether or not the OCRE domain could play a direct role in these mechanisms remains to be determined.
The delineation of the OCRE domain should allow directed investigations relative to the identification of its potential ligand(s) as well as to the understanding of its structure and biological role.
| Acknowledgments |
|---|
I.C and J.P.M. are supported by the CEA-LRC 27V. The support from the CNRS program Protéomique et Génie des Protéines is also acknowledged.
Received on September 6, 2004; revised on October 8, 2004; accepted on October 8, 2004
| REFERENCES |
|---|
|
|
|---|
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J. (1997) Gapped-BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 33893402
Aravind, L. and Koonin, E. (1999) G-patch: a new conserved domain in eukaryotic RNA-processing proteins and type D retroviral polyproteins. Trends Biochem. Sci., 24, 342344[CrossRef][ISI][Medline].
Bateman, A., Coin, L., Durbin, R., Finn, R., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E., et al. (2004) The Pfam protein families database. Nucleic Acids Res, 32, D138D141
Bharath, M., Ramesh, S., Chandra, N., Rao, M. (2002) Identification of a 34 amino acid stretch within the C-terminus of histone H1 as the DNA-condensing domain by site-directed mutagenesis. Biochemistry, 41, 76177627[CrossRef][Medline].
Birney, E., Kumar, S., Krainer, A. (1993) Analysis of the RNA-recognition motif and RS and RGG domains: conservation in metazoan pre-mRNA splicing factors. Nucleic Acids Res., 21, 58035816
Callebaut, I., Labesse, G., Durand, P., Poupon, A., Canard, L., Chomilier, J., Henrissat, B., Mornon, J.P. (1997) Deciphering protein sequence information through hydrophobic cluster analysis (HCA): current status and perspectives. Cell Mol. Life Sci., 53, 621645[CrossRef][ISI][Medline].
Cuff, J., Clamp, M., Siddiqui, A., Finlay, M., Barton, G. (1998) Jpred: a consensus secondary structure prediction server. Bioinformatics, 14, 892893
Drabkin, H., West, J., Hotfilder, M., Heng, Y., Erickson, P., Calvo, R., Dalmau, J., Gemmill, R., Sablitzky, F. (1999) DEF-3(g16/NY-LU-12), an RNA binding protein from the 3p21.3 homozygous deletion region in SCLC. Oncogene, 18, 25892597[CrossRef][ISI][Medline].
Durocher, D. and Jackson, S. (2002) The FHA domain. FEBS Lett., 513, 5866[CrossRef][ISI][Medline].
Edamatsu, H., Kaziro, Y., Itoh, H. (2000) LUCA15, a putative tumour suppressor gene encoding an RNA-binding nuclear protein, is downregulated in ras-transformed cells. Genes Cells, 5, 849858[Abstract].
Gazit, E. (2002) Global analysis of tandem octapeptide repeats: the significance of the aromatic-glycine motif. Bioinformatics, 18, 880883
Gure, A., Altorki, N., Stockert, E., Scanlan, M., Old, L., Chen, Y. (1998) Human lung cancer antigens recognized by autologous antibodies: definition of a novel cDNA derived from the tumor suppressor gene locus on chromosome 3p21.3. Cancer Res., 58, 10341041
Hennetin, J., LeTuan, K., Canard, L., Colloc'h, N., Mornon, J.-P., Callebaut, I. (2003) Non intertwined binary patterns of hydrophobic/non hydrophobic amino acids are considerably better markers of regular secondary structures than nonconstrained binary patterns. Proteins, 51, 236244[CrossRef][ISI][Medline].
Inoue, A., Takahashi, K., Kimura, M., Watanabe, T., Morisawa, S. (1996) Molecular cloning of a RNA binding protein, S1-1. Nucleic Acids Res., 24, 29902997
Jenkins, J. and Pickersgill, R. (2001) The architecture of parallel beta-helices and related folds. Prog. Biophys. Mol. Biol., 77, 111175[CrossRef][ISI][Medline].
Kelley, L., MacCallum, R., Sternberg, M. (2000) Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol., 299, 499520[ISI][Medline].
Letunic, I., Copley, R., Schmidt, S., Ciccarelli, F., Doerks, T., Schultz, J., Ponting, C., Bork, P. (2004) SMART 4.0: towards genomic data integration. Nucleic Acids Res., 32, D142D144
Mourtada-Maarabouni, M., Sutherland, L., Meredith, J., Williams, G. (2003) Simultaneous acceleration of the cell cycle and suppression of apoptosis by splice variant delta-6 of the candidate tumour suppressor LUCA-15/RBM5. Genes Cells, 8, 109119[Abstract].
Oh, J., Grosshans, D., Wong, S., Slamon, D. (1999) Identification of differentially expressed genes associated with HER-2/neu overexpression in human breast cancer cells. Nucleic Acids Res., 27, 40084017
Rintala-Maki, N. and Sutherland, L. (2004) LUCA-15/RBM5, a putative tumour suppressor, enhances multiple receptor-initiated death signals. Apoptosis, 9, 475484[CrossRef][ISI][Medline].
Shi, J., Blundell, T.L., Mizuguchi, K. (2001) FUGUE: sequencestructure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J. Mol. Biol., 310, 243257[CrossRef][ISI][Medline].
Sutherland, L., Lerman, M., Williams, G., Miller, B. (2001) LUCA-15 suppresses CD95-mediated apoptosis in Jurkat T cells. Oncogene, 20, 27132719[CrossRef][ISI][Medline].
Tian, X., Kadaba, R., You, S., Liu, M., Timur, A., Yang, L., Chen, Q., Szafranski, P., Rao, S., Wu, L., et al. (2004) Identification of an angiogenic factor that when mutated causes susceptibility to KlippelTrenaunay syndrome. Nature, 427, 640645[CrossRef][Medline].
Timmer, T., Terpstra, P., van den Berg, A., Veldhuis, P., Ter Elst, A., Voutsinas, G., Hulsbeek, M., Draaijers, T., Looman, M., Kok, K., Naylor, S.L., Buys, C.H. (1999) A comparison of genomic structures and expression patterns of two closely related flanking genes in a critical lung cancer region at 3p21.3. Eur. J. Hum. Genet., 7, 478486[CrossRef][ISI][Medline].
This article has been cited by other articles:
![]() |
A. Bateman Bioinformatics--The new home for protein sequence motifs Bioinformatics, January 1, 2006; 22(1): 2 - 2. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


