Bioinformatics Advance Access originally published online on October 28, 2004
Bioinformatics 2005 21(7):1288-1290; doi:10.1093/bioinformatics/bti101
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
hp-DPI: Helicobacter pylori Database of Protein Interactomesembracing experimental and inferred interactions

Division of Biostatistics and Bioinformatics, National Health Research Institutes #128, Sec. 2 Yaun-Chio-Yun Rd, Taipei 115, Taiwan
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Summary: We implemented a statistical model into our protein interaction database for validation of two-hybrid assays of Helicobacter pylori, and prediction of putative protein interactions not yet discovered experimentally. To present the enormous amount of experimental and inferred protein interaction networking maps, the H.pylori Database of Protein Interactomes (hp-DPI) is developed with a succinct yet comprehensive visualization tool integrated with annotation from Genbank, GO, and KEGG. hp-DPI is first built with, but not limited to, H.pylori protein interactions and is expected to naturally include other organisms protein interacting relationships in the future.
Availability: hp-DPI can be accessed at http://www.dpi.nhri.org.tw/hp/
Contact: cylin{at}nhri.org.tw
| INTRODUCTION |
|---|
|
|
|---|
Domains are recognized as functional blocks of compact protein structure typically with a hydrophobic core (Copley et al., 2002) usually evolutionarily conserved, and therefore, are employed by current protein domain databases, such as Pfam (Bateman et al., 2002), PROSITE (Falquet et al., 2002), PRINTS-S (Attwood et al., 2000), Prodom (Fabian et al., 1997), and SMART (Letunic et al., 2004), to describe possible functions of a predicted novel gene product. Based on these thoughts, Pawson and Nash (2003) proposed assembling cell regulatory systems through protein interacting domains which frequently lineated in a cassette-like fashion within regulatory proteins, contributing versatile functions to a protein. This study utilized recent knowledge and advances in protein domains as a strategy to predict the protein interacting network from the insight of domain interactions.
Helicobacter pylori is a human pathogen, found in the gastric mucus layer or attached to the gastric epithelium. It is recognized as a causative agent of gastric diseases ranging from gastritis, peptic ulcer disease, to cancer. Besides its high infection rate of 4060% of the world population, the correlation between chronic lesions caused by infection and the incidence of gastric cancer have drawn enormous scientific work in elucidating the virulence and pathogenesis of H.pylori.
Our task in this study is to establish an integrated on-line service of an experimental and predicted H.pylori protein interaction database incorporated with comprehensive annotations. Protein interactions are accessed from their domain interactions with the assistance of protein domain scanning tools to define specific protein-domain relationships and a statistical method to estimate the probability of domain associations. These domain interactions are then employed to infer putative interacting partners among all annotated ORFs from H.pylori genome.
Similar analyzing tools and visualization systems are available. BIND (Bader et al., 2003) incorporates a map viewer called SPREY, which solely gives maps by single IDs, with no aliases allowed and no gene annotation attached either. JDIP (Xenarios et al., 2002) a stand-alone Java application for DIP, works in a similar way as well. Detailed annotation is absent in other network viewing systems, such as VisAnt (Hu et al., 2004), Osprey (Breitkreutz et al., 2003) InterViewer3 (Ju and Han, 2003), Pajek (Batagelj and Mrvar, 2001), and Tulip (David, 2001). The only interaction database for H.pylori is the PIMRider® H.pylori database1 of Hybrigenics S.A. This database contains the same experimental data set as H.pylori Database of Protein Interactions (hp-DPI), manual annotation which is only partially done, and a network viewing system. As a commercial package, it requires a licensing procedure before access.
hp-DPI on the other hand, performs as a convenient user-friendly interface integrated with rapid graphical networking maps plus instant and comprehensive gene annotation. Along with the predictive power, we believe that the potential interacting partners to bio-medically targeted protein provided by this service can shed some light on the path of effective treatment of H.pylori-related human diseases.
System implementation
To introduce a user-friendly graphical interface, the (hp-DPI) is built with the so-called LAMP system (Linux Mandrake 9.1, Apache 2.0, MySQL 4.0, and PHP 4.0). Annotation of each protein and domain in hp-DPI come from GenBank (NC_000915), GO (Camon et al., 2003), and KEGG (Kanehisa et al., 2004).
The H.pylori proteome data we used in this approach is based on the data published by Tomb et al. (1997) in which 1716 domains were identified from 1496 proteins by InterProScan (version 3.2) and InterPro member-database (release 6.1, Zdobnov and Apweiler, 2001); 1462 interactions from a recent study of two-hybrid analysis (Rain et al., 2001) were introduced as observed interactions.
Measures of association for all possible domaindomain pairs are adopted from the log-odds value introduced by Sprinzak et al. (2001) to detect over-represented pairs of domains in contrast to the pairs which occurred at random. The observed frequency of a domain pair is calculated from the experimental data of protein interactions over the product of each domain frequency throughout the experimental data. By selecting an appropriate threshold as a reference, every association measure in domain level is dichotomized as predictors of protein interactions.
Features in hp-DPI
An easy target searching process in hp-DPI is achieved by ORFs, locus, or full-text search with keywords, generating immediate query reports in an output table. It is the users choice to select a preferred statistical threshold for inferred interactions or experimental data only for a graphical interaction map of a targeted protein through a click on the pull-down menu in the output table.
In the graphical map, a prompt annotation box, containing detailed GO, KEGG, and Genbank annotation information of each protein node, pops out with a mouse-on for the users' convenience, without trapping them in seeking annotations through complicated table listings. This handy application is available for linking edges to show the estimated probability of a specific interaction. With the help of GO and KEGG, preferred new targets can be easily identified by similar protein functions or the pathway they reside in. Functions of hypothesized proteins can be speculated through such a procedure as well.
The most common problem of constructing a protein network map is the chaos given rise by numerous nodes and edges crowded within a limited window screen. One of our solutions to that is the implementation of upper, down, and self-interacting curve lines in addition to basic straight edges to reduce the complexity of a networking map. Three different patterns of connecting edges indicate different strengths of interactions based on the association measures. Protein interaction relationships in a map can extend up to three levels.
An important use of our prediction model, association measures, is to examine the validity of protein interactions derived from two-hybrid analyses. Although there have been skeptical arguments over the rightness of interactions observed in two-hybrid assays, its facile nature is still currently irreplaceable. Scrutinizing the association measure of each observed interaction does give certain contradictory circumstances where possibility given out by hp-DPI is not equal to one (e.g., interaction between gppA and HP0042 was observed but its association measure was estimated as 0.5). Therefore, our statistic model not only generates inferred interactions but also functions as an examining tool to point out possible false positive results derived from two-hybrid experiments.
| CONCLUSIONS |
|---|
|
|
|---|
hp-DPI has revolutionized currently existing databases of H. pylori protein interactions by incorporating the association measures to infer putative interacting relationships among proteins from domain interactions. It performs as a convenient user-friendly interface integrated with rapid graphical networking maps plus instant gene annotation. With the power of combining experimental and inferred interactions, hp-DPI is capable of filtering out false experimental data, annotating ORFs without available functional expositions, and offering proper candidates to narrow down the scale of further high-throughput screening and validating experiments for drug targets. Specific proteinprotein and proteinligand interactions are central to most biological processes and are the focus of many avenues of research to develop small molecule-based therapies that will disrupt these essential interactions. Hence, it is foreseen that the developmental process of anti-H. pylori drugs will be remarkably expedited by diminishing the research time and cost with such a tool as beneficial as hp-DPI. In the near future, we anticipate this platform to be applied to various organisms with massive interactions, leading to the deciphering of more secrets of other life forms.
| Acknowledgments |
|---|
This research was partially supported by National Health Research Institutes, Taiwan, Bioinformatics Core Laboratory BS-092-PP-05 and National Science Council, Taiwan, and National Science and Technology Program for Genomic Medicine NSC 923112-B-400007-Y.
| Footnotes |
|---|
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.
1Available at http://www.pim.hybrigenics.com/pimriderext/common/AnnotationRules.jsp ![]()
Received on June 24, 2004; revised on September 17, 2004; accepted on October 17, 2004
| REFERENCES |
|---|
|
|
|---|
Attwood, T.K., Croning, M.D., Flower, D.R., Lewis, A.P., Mabey, J.E., Scordis, P., Selley, J.N., Wright, W. (2000) PRINTS-S: the database formerly known as PRINTS. Nucleic Acids Res., 28, 225227
Bader, G.D., Betel, D., Hogue, C.W. (2003) BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res., 31, 248250
Batagelj, V. and Mrvar, A. (2001) Pajek-analysis and visualization of large networks. In Graphic Drawing. Lecture Notes in Comput. Sci., Vol. 2265, , pp. 477478.
Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S.R., Griffiths-Jones, S., Howe, K.L., Marshall, M., Sonnhammer, E.L. (2002) The Pfam protein families database. Nucleic Acids Res., 30, 276280
Breitkreutz, B.J., Stark, C., Tyers, M. (2003) Osprey: a network visualization system. Genome Biol., 4, R22[CrossRef][Medline].
Camon, E., Magrane, M., Barrell, D., Binns, D., Fleischmann, W., Kersey, P., Mulder, N., Oinn, T., Maslen, J., Cox, A., Apweiler, R. (2003) The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro. Genome Res., 13, 662672
Copley, R.R., Doerks, T., Letunic, I., Bork, P. (2002) Protein domain analysis in the era of complete genomes. FEBS Lett., 513, 129134[CrossRef][ISI][Medline].
David, A. (2001) Graph Drawing. In Tulip, P. (Ed.). Lecture Notes in Comput. Sci., Vol. 2265, , pp. 435437.
Fabian, P., Murvai, J., Hatsagi, Z., Vlahovicek, K., Hegyi, H., Pongor, S. (1997) The SBASE protein domain library, release 5.0: a collection of annotated protein sequence segments. Nucleic Acids Res., 25, 240243
Falquet, L., Pagni, M., Bucher, P., Hulo, N., Sigrist, C.J., Hofmann, K., Bairoch, A. (2002) The PROSITE database, its status in 2002. Nucleic Acids Res., 30, 235238
Hu, Z., Mellor, J., Wu, J., DeLisi, C. (2004) VisANT: an online visualization and analysis tool for biological interaction data. BMC Bioinformatics, 5, 17[CrossRef][Medline].
Ju, B.H. and Han, K. (2003) Complexity management in visualizing protein interaction networks. Bioinformatics, 19, (Suppl 1), i177i179[Abstract].
Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., Hattori, M. (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res., 32, D277D280
Letunic, I., Copley, R.R., Schmidt, S., Ciccarelli, F.D., Doerks, T., Schultz, J., Ponting, C.P., Bork, P. (2004) SMART 4.0: towards genomic data integration. Nucleic Acids Res., 32, D142D144
Pawson, T. and Nash, P. (2003) Assembly of cell regulatory systems through protein interaction domains. Science, 300, 445452
Rain, J.C., Selig, L., De Reuse, H., Battaglia, V., Reverdy, C., Simon, S., Lenzen, G., Petel, F., Wojcik, J., Schachter, V., Chemama, Y., Labigne, A., Legrain, P. (2001) The protein-protein interaction map of Helicobacter pylori. Nature, 409, 211215[CrossRef][Medline].
Sprinzak, E. and Margalit, H. (2001) Correlated sequence-signatures as markers of proteinprotein interaction. J. Mol. Biol., 311, 681692[CrossRef][ISI][Medline].
Tomb, J.F., White, O., Kerlavage, A.R., Clayton, R.A., Sutton, G.G., Fleischmann, R.D., Ketchum, K.A., Klenk, H.P., Gill, S., Dougherty, B.A., et al. (1997) The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature, 388, 539547[CrossRef][Medline].
Xenarios, I., Salwinski, L., Duan, X.J., Higney, P., Kim, S.M., Eisenberg, D. (2002) DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res., 30, 303305
Zdobnov, E.M. and Apweiler, R. (2001) InterProScanan integration platform for the signature-recognition methods in InterPro. Bioinformatics, 17, 847848
This article has been cited by other articles:
![]() |
C.-Y. Lin, C.-H. Chin, H.-H. Wu, S.-H. Chen, C.-W. Ho, and M.-T. Ko Hubba: hub objects analyzer--a framework of interactome hubs identification for network biology Nucleic Acids Res., July 1, 2008; 36(suppl_2): W438 - W443. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
