Bioinformatics Advance Access originally published online on October 5, 2007
Bioinformatics 2007 23(24):3400-3402; doi:10.1093/bioinformatics/btm476
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Identification and visualization of cage-shaped proteins
1State Key Laboratory of CAD & CG and 2College of Life Sciences, Zhejiang University, Hangzhou, 310058, China
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: Cage-shaped protein, with its special structure, may have potential applications in biomedicine and nanotechnology. We developed a program CSPro (Cage-Shaped Protein) for efficient identification of cage-shaped proteins based on quaternary structure. CSPro is capable of revealing the corresponding cage-shaped feature more clearly and quickly than traditional visualization tools. Using CSPro, we have searched the full set of PDB (protein data bank) and three types of proteins are retrieved with notably large central cavities inside. CSPro can be used to validate if the quaternary structure of a protein is cage shaped in molecular simulation.
Availability: http://www.cad.zju.edu.cn/home/humin
Contact: humin{at}cad.zju.edu.cn
Supplementary information: Supplementary data are available at Bioinformatics online.
The quaternary structural arrangement of protein is typically responsible for complex cellular functions. Many known quaternary structures have been identified by experimentation and are listed in the protein data bank (PDB) (Berman et al., 2000), facilitating researchers to study the global structural features of proteins comprehensively. One of the interesting issues is how to explore the global shape features of quaternary structures available from the large PDB, in an automatic fashion. Due to the complexity of quaternary structure, few computational approaches are currently available. Although some structural visualization tools, such as VMD (Humphrey et al., 1996), MOLSCRIPT (Kraulis, 1991), RasMol (see http://www.rasmol.org), etc. can help users to observe quaternary structure conveniently in multimodes, one of the important shape features of some proteins, e.g. a large hollow core inside as observed for the metal transport protein (PDB ID 1qgh) (Ilari et al., 2000), is not clearly shown by traditional display modes. In Figure 1A and B, the spatial arrangement of the protein peptides can be seen plainly, but the vacant space inside the protein body is sheltered from its exterior components. The hidden hollow core may be observed by displaying the cavities of the protein with LSMS (Can et al., 2006) and CAVITY SEARCH (Ho and Marshall, 1990), or by visualizing the electrostatic potential of the protein with GRASP (Nicholls and Honig, 1991), but these methods require user intervention. To develop simple and efficient visualization tools for displaying the cage shape of proteins would be of significance to the entire research community. Ferritins, apoferritins and DPS (DNA-binding proteins from starved cells) are all cage-shaped proteins involved in essential cellular events, including iron homeostasis regulation and redox stress protection (Ilari et al., 2000; Yang et al., 2000). These proteins may also have potential usefulness in synthesizing nanoparticles (Tsukamoto et al., 2005). In addition, in life science research, some other types of protein complexes have been proven to have the cage-shaped feature, for instance, the chaperonin nanocage required for protein folding (Ellis, 2006; Tang and Chang, 2006), and the proteasome complex required for protein degradation (Meng et al., 1999). Protein folding disorders and protein degradation disorders are still two major concerns in medicine. Therefore, it would be germane to be able to identify cage-shaped proteins automatically when analyzing and mining information from large-scale structural datasets.
|
We developed an effective tool CSPro (Cage-Shaped Protein) to automatically identify and visualize the shape features of a protein based on its quaternary structure. The cage-shaped protein in our algorithm is recognized as the one that holds a large central cavity or hollow core.
Using the molecular surface model of the protein (Lee and Richards, 1971; Liang et al., 1998), we transform the structural data of the PDB format into a uniform voxel representation (Stouch and Jurs, 1986). Such discrete spatial representation permits our algorithm independent from the great number of atoms of the protein. If a voxel is occupied by an atom, its value is set to 1; if not, its value is set to 0. A binary 3D image is thus created. Taking the voxel at a corner of the image as the seed, we can find the locally connected zero-valued voxels (Kong and Rosenfeld, 1989; Kronheimer, 1992) of the image and reset the values of all these voxels to –1. The background voxels in the image of the protein are then marked. The image now contains three parts: (1) cavities, which are composed of voxels labeled 0, (2) background regions with voxels labeled –1 and (3) protein body with voxels labeled 1 (Fig. 2A). For cavities, we focus on that covering the protein centroid, and for protein body, we are only interested in its actual surface (Fig. 2B).
|
If we allow Pc to be the voxel situated on the protein centroid, and if Pc is labeled 0, we choose Pc as the seed and find its locally connected region in the image. Apparently, this region exhibits the central cavity of the protein. The volume of the central cavity can be estimated efficiently by adding up all of the voxels inside the cavity. Similarly, the protein volume can be computed by finding all of the locally connected voxels that are labeled 1. We then estimate the ratio of the volume of the central cavity to that of the protein, denoted by R. If R is greater than some threshold T, notable cage-features can be detected, such as T = 0.08. For convenience, the threshold T can be set by users in CSPro, which makes it possible to screen proteins with different R values.
Following the theorem of discrete multidimensional Jordan surface, a closed surface separates the discrete environment into two parts with a connected inside and a connected outside (Herman, 1992). There exist two Jordan surfaces in Figure 2B. One is the outer surface, which is the boundary of the protein against the background, and the other is the inner surface, which is the boundary of the central cavity against the protein. By rendering both the outer and the inner surfaces simultaneously, we can illustrate the cage-shaped feature clearly (Fig. 1C and E), where the blue portion shows the large central cavity and the green portion represents the outer surface of the protein. The fascinating and strongly functional linked shape is now visible. However, there are some cage-shaped proteins with tunnels penetrating through the outer surface (such as PDB1fnt), or a deep hole on the surface of the protein (such as PDB1svt). In these cases, the inner surface disappears (see the Supplementary Figures), and we then show the outer surface of the detected protein.
CSPro is implemented at an interactive rate. We conducted experiments on a PC with a 3.00 GHz Pentium-4 CPU and 2 GB main memory. The runtime for determining and displaying the cage-shaped protein is completed in <1 s based on 3D image resolution of 64 x 64 x 64. Using CSPro, we searched the structural data of all proteins listed in the PDB (see http://www.rcsb.org/pdb) released by the end of 2006. Three types of quaternary structures were retrieved with notable cage-shaped features. The volume (V) of delineated cages and R values were evaluated. Some representatively retrieved proteins are listed in Table 1. We can see that the maximum volume ratio of ferritin gets up to 31%. The beneficial properties of cage-shaped proteins, including their great spatial capacity and stability of the quaternary structure, are attractive options when we select candidates for the carrier of pharmacological molecules.
|
The contribution of this article involves two aspects: (1) an efficient algorithm for automatic identification of the cage-shaped protein complex and (2) a fast visualization tool for exhibiting the cage-shaped feature of the quaternary structure.
CSPro supplies a complimentary means for analysis and visualization of structural feature of complicated protein complex. It can be used for validation, or to confirm that if a synthetic protein has or does not have a large hollow core inside. We believe that CSPro will assist in finding more cage-shaped proteins in a wide range of species.
| ACKNOWLEDGEMENT |
|---|
|
|
|---|
This article is supported by the NSFC Key Project under grant No. 60533050 and the NSFZJC project under grant No. R304098.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Anna Tramontano
Received on June 24, 2007; revised on September 4, 2007; accepted on September 17, 2007
| REFERENCES |
|---|
|
|
|---|
Berman HM, et al. The Protein data bank. Nucl. Acids Res (2000) 28:235–242.
Can T, et al. Efficient molecular surface generation using level-set methods. J. Mol. Graph. Model (2006) 25:442–454.[CrossRef][Web of Science][Medline]
Ellis RJ. Protein folding inside the cage. Nature (2006) 442:360–362.[CrossRef][Medline]
Herman GT. Discrete multidimensional Jordan surfaces. CVGIP-Graph. Model. Image process (1992) 54:507–515.[CrossRef]
Ho CMW, Marshall GR. Cavity search: an algorithm for the isolation and display of cavity-like binding regions. J. Comput. Aided. Mol. Des (1990) 4:337–354.[CrossRef][Web of Science][Medline]
Humphrey W, et al. VMD – Visual Molecular Dynamics. J. Mol. Graph (1996) 14:33–38.[CrossRef][Web of Science][Medline]
Ilari A, et al. The dodecameric ferritin from Listeria innocua contains a novel intersubunit iron-binding site. Nat. Struct. Biol (2000) 7:38–43.[CrossRef][Web of Science][Medline]
Kong TY, Rosenfeld A. Digital topology: introduction and survey. Comput. Vison Graph (1989) 48:357–393.
Kraulis PJ. MOLSCRIPT: a program to produce both detailed and schematic plots of protein structures. J. Appl. Cryst (1991) 24:946–950.[CrossRef][Web of Science]
Kronheimer EH. The topology of digital images. Topol. Appl (1992) 46:279–303.[CrossRef]
Lee B, Richards FM. The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol (1971) 55:379–400.[CrossRef][Web of Science][Medline]
Liang J, et al. Analytical shape computation of macromolecules: I. molecular area and volume through alpha shape. Proteins (1998) 33:1–17.[Web of Science][Medline]
Meng L, et al. Epoxomicin, a potent and selective proteasome inhibitor, exhibits in vivo antiinflammatory activity. Proc. Natl Acad. Sci. USA (1999) 96:10403–10408.
Nicholls A, Honig B. A rapid finite difference algorithm, utilising succesive over relaxation to solve the Poisson–Boltzmann equation. J. Comput. Chem (1991) 12:435–445.[CrossRef][Web of Science]
Stouch TR, Jurs PC. A simple method for the representation, quantification, and comparison of the volumes and shapes of chemical compounds. J. Chem. Inf. Comput. Sci (1986) 26:4–12.[CrossRef][Web of Science]
Tang Y-C, Chang H-C. Structural features of the GroEL-GroES nano-cage required for rapid folding of encapsulated protein. Cell (2006) 125:903–914.[CrossRef][Web of Science][Medline]
Tsukamoto R, et al. Synthesis of CoB3BOB4B nanoparticles using the cage-shaped protein, Apoferritin. Bull. Chem. Soc. Jpn (2005) 78:2075–2081.[CrossRef][Web of Science]
Yang XE, et al. Iron oxidation and hydrolysis reactions of a novel ferritin from Listeria innocua. Biochem. J (2000) 349:783–786.[Web of Science][Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

