Bioinformatics Advance Access originally published online on April 21, 2006
Bioinformatics 2006 22(18):2189-2191; doi:10.1093/bioinformatics/btl123
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
G8: a novel domain associated with polycystic kidney disease and non-syndromic hearing loss
1 Key Laboratory of Protein Chemistry and Developmental Biology of Education Committee, College of Life Sciences, Hunan Normal University Changsha, People's Republic of China
2 State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University Handan Road 220, Shanghai 200433, People's Republic of China
3 The Sainsbury Laboratory Norwich NR4 7UH, UK
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: We report a novel protein domainG8which contains five repeated ß-strand pairs and is present in some disease-related proteins such as PKHD1, KIAA1199, TMEM2 as well as other uncharacterized proteins. Most G8-containing proteins are predicted to be membrane-integral or secreted. The G8 domain may be involved in extracellular ligand binding and catalysis. It has been reported that mis-sense mutations in the two G8 domains of human PKHD1 protein resulted in a less stable protein and are associated with autosomal-recessive polycystic kidney disease, indicating the importance of the domain structure. G8 is also present in the N-terminus of some non-syndromic hearing loss disease-related proteins such as KIAA1109 and TMEM2. Discovery of G8 domain will be important for the research of the structure/function of related proteins and beneficial for the development of novel therapeutics.
Contact: liangsp{at}hunnu.edu.cn
| INTRODUCTION |
|---|
|
|
|---|
Here we report a novel domain named G8, containing eight conserved glycine residues and consisting five ß-strand pairs. This novel domain is found in human disease-associated proteins PKHD1, KIAA1109, TMEM2 and some other uncharacterized proteins.
The PKHD1 protein (also known as fibrocystin and polyductin) is a large (447 kDa) membrane protein involved in autosomal recessive polycystic kidney and hepatic disease. It is abundant in fetal-kidney collecting ducts but absent in the kidneys of some patients with autosomal recessive polycystic kidney disease. Its predicted structure suggests that it is an integral membrane receptor with extracellular protein-interaction sites and intracellular phosphorylation sites (Ward et al., 2002) and may interact with extracellular proteinligands and transduce intracellular signals to the nucleus (Wilson, 2004).
KIAA1199, one of inner-ear-specific genes, is expressed in the cochlea and vestibule tissues. The KIAA1199 protein may be essential for auditory function and its mutated forms may cause non-syndromic hearing loss (Abe et al., 2003). Recently, it was reported that upregulation of the KIAA1199 gene is associated with cellular mortality (Michishita et al., 2005).
Human TMEM2 is expressed in cochlea and a variety of other tissues. It is located on the DFNB7-DFNB11 locus, a region linked to autosomal recessive non-syndromic hearing loss (ARNSHL), but no disease-causing mutations were found in TMEM2 coding region (Scott et al., 2000).
Identification of the G8 domain should help our understanding of the structure/function of these related proteins and benefit the development of novel therapeutics.
| METHODS |
|---|
|
|
|---|
While analyzing the protein sequence of the KIAA1199 protein and its homologs, we found that they contain a glycine-rich region in the N-terminus that did not match any entry in the Pfam 19.0 (Finn et al., 2006) and SMART 5.0 (Letunic et al., 2006) databases. Using PSI-BLAST (Altschul et al., 1997) with an inclusion threshold of 0.05, we searched the NCBI non-redundant protein databases (http://www.ncbi.nih.gov/blast/) against the human KIAA1199 protein (amino acid residues 44170 in gi|38638698). The search converged after five iterations and retrieved 98 non-redundant protein sequences in total. A multiple sequence alignment and phylogenetic tree of 26 distinct proteins (32 sequences) were generated using ClustalX (Thompson et al., 1997) with manual adjustment. The alignment was colored using Chroma (Goodstadt and Ponting, 2001) (Fig. 1).
|
The region was named the G8 domain, since it contained eight conserved glycine residues. To predict the secondary structure of G8, the profile of the alignment was submitted to Jpred server (http://www.compbio.dundee.ac.uk/~www-jpred/submit.html) (Cuff et al., 1998). Taxon distribution was determined by searches against all available genome and protein database at GenBank using TBLAST (http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi).
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
The G8 domain is about 120 amino acid residues in length. The secondary structure prediction of the G8 domain suggests that it contains 10 ß-strands and 1 helix. These strands are separated by conserved glycine residues and contain some conserved hydrophobic residues. After further examining the alignment, we found that the G8 domain is actually composed of five ß-strand pairs (Fig. 1). Each repeat has a sequence resembling hX(03)hX(13)GX(111)hX(13)h, where X is any residue and h is a hydrophobic residue. Based on the structural prediction, the conserved glycine residues and hydrophobic residues might be important for correct folding of G8 domains, the glycine residues allowing rotation in the backbone, and hydrophobic interactions among hydrophobic residues on ß-strands/helix contributing to structural stabilization. The alignment also indicates some potential functionally important residues such as K2038, H2040 and T2048 in human PKHD1 protein (gi|22213548). These highly conserved polar residues cluster on the C terminus of the G8 domain and may comprise the core of its active site.
The G8 domain is widely distributed, being found in proteins from various animals (from Strongylocentrotus purpuratus to Homo sapiens), lower eukayrotes (such as Dictyostelium discoideum and Tetrahymena thermophila) and bacteria (such as the alpha-proteobacteria Nitrobacter hamburgensis, gamma-proteobacterium Hahella chejuensis and the green non-sulfur bacterium Chloroflexusaurantiacus) but absent in plants, viruses and archaea. Many G8-containing proteins are integral membrane proteins with signal peptides and/or transmembrane segments, and others lacking TM domain may be secreted (Fig. 2).
|
Several other protein domains frequently co-occurr in proteins with a G8 domain. These include the IPT/TIG domain (SMART: SM00429, ig-like, plexins, transcription factors domain), the GG domain (domain in KIAA1199, FAM3, POMGnT1 and TMEM2 proteins, with two well-conserved glycine residues) (Guo et al., 2006) and the PbH1 domain (SMART: SM00710, Parallel beta-helix repeats domain). IPT/TIG domains are found in cell surface receptors such as Met and Ron as well as in intracellular transcription factors and take a role in the control of cell dissociation, motility, invasion of extracellular matrices as well as DNA binding (Collesi et al., 1997). The GG domain is widely present in eukaryotic proteins and T4 phage gp35 proteins. It was predicted to be structurally important in long tail fibers of T4 (Guo et al., 2006), which is responsible for host cell recognition and infection and initial attachment to susceptible bacteria (Dickson, 1973). Known functions of the PbH1 domain include binding extracellular proteins and catalysis of polysaccharide hydrolysis (Bedford and Leder, 1999). Based on the functions of G8-associated domains and proteins, it is reasonable to predict that G8 may involve in extracellular ligand binding and catalysis processing.
Many G8-containing proteins have been associated with diseases such as polycystic kidney disease and non-syndromic hearing loss. The PKHD1 protein contains two G8 domains that, based on their high degree of sequence identity (28%) between the tandem copies of the G8 domain, probably originated from tandem duplication. Nine mis-sense mutations in the G8 domains in human PKHD1 protein (gi|22213548) are reported to be associated with autosomal-recessive polycystic kidney disease (ARPKD) including D1942G, G1971D, E1995G, I1998T and V2032L in the first G8 domain; D2761Y, L2772P, S2861G and Y2863C in the second one (Fig. 1) (Bergmann et al., 2004; Rossetti et al., 2003; Ward et al., 2002). These results show that substitution of a conserved glycine residue (G1971D) and hydrophobic residues (I1998T, V2032L and L2772P) might disrupt the proper conformation and thus lead to the depletion of the normal function of G8. Until now, no disease-causing mutation in G8 domain of KIAA1109 and TMEM2 proteins has been observed. The role of the G8 domain in non-syndromic hearing loss disease is still unknown.
| CONCLUSIONS |
|---|
|
|
|---|
In summary, the G8 domain is widely distributed, presenting in both animal and bacterial proteins including some hereditary disease related protein such as PKHD1, KIAA1109 and TMEM2 proteins. It contains five repeated ß-strand pairs. Structural and domain architecture analysis indicates that G8 domain may be involved in extracellular ligand binding and progress of catalysis. Mutations of G8 domain in human PKHD1 protein are associated with ARPKD. Discovery of G8 domain would be important for the research of the structure/function of related proteins and benefit the development of novel therapeutics.
| Acknowledgments |
|---|
The authors thank Dr Alex Bateman (Wellcome Trust Sanger Institute, UK) and Dr Jingchu Luo (Peking University, China) for suggestions and comments on the manuscript. This work was supported by the grants from National 973 project of China (2001CB5102), National Natural Science Foundation of China (30430170, 90408017) and a grant from Human Liver Proteomics Project.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Alex Bateman
Received on September 20, 2004; revised on January 7, 2005; accepted on January 18, 2005
| REFERENCES |
|---|
|
|
|---|
Abe, S., et al. (2003) Mutations in the gene encoding KIAA1199 protein, an inner-ear protein expressed in Deiters' cells and the fibrocytes, as the cause of nonsyndromic hearing loss. J Hum. Genet, . 48, 564570[CrossRef][ISI][Medline].
Altschul, S.F., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, . 25, 33893402
Bedford, M.T. and Leder, P. (1999) The FF domain: a novel motif that often accompanies WW domains. Trends Biochem. Sci, . 24, 264265[CrossRef][ISI][Medline].
Bergmann, C., et al. (2004) PKHD1 mutations in families requesting prenatal diagnosis for autosomal recessive polycystic kidney disease (ARPKD). Hum. Mutat, . 23, 487495[CrossRef][ISI][Medline].
Collesi, C.S.M., et al. (1997) A splicing variant of the RON transcript induces constitutive tyrosine kinase activity and an invasive phenotype. Mol. Cell. Biol, . 16, 55185526.
Cuff, J.A., et al. (1998) JPred: a consensus secondary structure prediction server. Bioinformatics, 14, 892893
Dickson, R.C. (1973) Assembly of bacteriophage T4 tail fibers. IV. Subunit composition of tail fibers and fiber precursors. J. Mol. Biol, . 79, 633647[CrossRef][ISI][Medline].
Finn, R.D., et al. (2006) Pfam: clans, web tools and services. Nucleic Acids Res, . 34, D247D251
Goodstadt, L. and Ponting, C.P. (2001) CHROMA: consensus-based colouring of multiple alignments for publication. Bioinformatics, 17, 845846
Guo, J., et al. (2006) GG: a domain involved in phage LTF apparatus and implicated in human MEB and non-syndromic hearing loss diseases. FEBS Lett, . 580, 581584.
Letunic, I., et al. (2006) SMART 5: domains in the context of genomes and networks. Nucleic Acids Res, . 34, D257D260
Michishita, E., et al. (2006) Upregulation of the KIAA1199 gene is associated with cellular mortality. Cancer Lett, . 239, 7177[CrossRef][ISI][Medline].
Rossetti, S., et al. (2003) A complete mutation screen of PKHD1 in autosomal-recessive polycystic kidney disease (ARPKD) pedigrees. Kidney Int, . 64, 391403[CrossRef][ISI][Medline].
Scott, D.A., et al. (2000) Refining the DFNB7-DFNB11 deafness locus using intragenic polymorphisms in a novel gene, TMEM2. Gene, 246, 265274[CrossRef][ISI][Medline].
Thompson, J.D., et al. (1997) The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res, . 25, 48764882
Ward, C.J., et al. (2002) The gene mutated in autosomal recessive polycystic kidney disease encodes a large, receptor-like protein. Nat. Genet, . 30, 259269[CrossRef][ISI][Medline].
Wilson, P.D. (2004) Polycystic kidney disease. N. Engl. J. Med, . 350, 151164
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

