Bioinformatics Vol. 17 no. 4 2001
Pages 359-363
© 2001 Oxford University Press
Original Paper |
Mining literature for proteinprotein interactions
1 Molecular Biology Institute, UCLA-DOE
Laboratory of Structural Biology & Molecular Medicine,
University of California at Los Angeles, PO Box 951570, Los
Angeles, CA 90095-1570, USA
2 Protein Pathways Inc., 1145 Gayley Avenue,
Ste. 304, Los Angeles, CA 90024, USA
3 Institute of Cellular and Molecular
Biology, Department of Chemistry and Biochemistry, University of
Texas at Austin, 2500 Speedway, Austin, TX 78712, USA
Received on August 3, 2000
; revised on November 16, 2000
; accepted on November 22, 2000
Motivation: A central problem in bioinformatics is how to capture information from the vast current scientific literature in a form suitable for analysis by computer. We address the special case of information on proteinprotein interactions, and show that the frequencies of words in Medline abstracts can be used to determine whether or not a given paper discusses proteinprotein interactions. For those papers determined to discuss this topic, the relevant information can be captured for the Database of Interacting Proteins. Furthermore, suitable gene annotations can also be captured.
Results: Our Bayesian approach scores Medline abstracts for
probability of discussing the topic of interest according to the
frequencies of discriminating words found in the abstract. More than
80 discriminating words (e.g. complex, interaction, two-hybrid) were
determined from a training set of 260 Medline abstracts
corresponding to previously validated entries in the Database of
Interacting Proteins. Using these words and a log likelihood scoring
function,
2000 Medline abstracts were identified as
describing interactions between yeast proteins. This approach now
forms the basis for the rapid expansion of the Database of
Interacting Proteins.
Contact: marcotte{at}icmb.utexas.edu; ixenario{at}mbi.ucla.edu; david{at}mbi.ucla.edu
* These authors contributed equally to this work.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
E. R. Jefferson, T. P. Walsh, T. J. Roberts, and G. J. Barton SNAPPI-DB: a database and API of Structures, iNterfaces and Alignments for Protein-Protein Interactions Nucleic Acids Res., January 12, 2007; 35(suppl_1): D580 - D589. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Han, Z. Obradovic, Z.-Z. Hu, C. H. Wu, and S. Vucetic Substring selection for biomedical document classification Bioinformatics, September 1, 2006; 22(17): 2136 - 2142. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Saric, L. J. Jensen, R. Ouzounova, I. Rojas, and P. Bork Extraction of regulatory gene/protein networks from Medline Bioinformatics, March 15, 2006; 22(6): 645 - 650. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Ma'ayan and R. Iyengar From components to regulatory motifs in signalling networks Brief Funct Genomic Proteomic, March 1, 2006; 5(1): 57 - 61. |
||||
![]() |
X. Wu, L. Zhu, J. Guo, D.-Y. Zhang, and K. Lin Prediction of yeast protein-protein interaction network: insights from the Gene Ontology and annotations. Nucleic Acids Res., January 1, 2006; 34(7): 2137 - 2150. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Hao, X. Zhu, M. Huang, and M. Li Discovering patterns to extract protein-protein interactions from the literature: Part II Bioinformatics, August 1, 2005; 21(15): 3294 - 3300. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Liu, N. Liu, and H. Zhao Inferring protein-protein interactions through high-throughput interaction data from diverse organisms Bioinformatics, August 1, 2005; 21(15): 3279 - 3285. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Goetz and C.-W. von der Lieth PubFinder: a tool for improving retrieval rate of relevant PubMed abstracts Nucleic Acids Res., July 1, 2005; 33(suppl_2): W774 - W778. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Ding, K. Viswanathan, D. Berleant, L. Hughes, E. S. Wurtele, D. Ashlock, J. A. Dickerson, A. Fulmer, and P. S. Schnable Using the biological taxonomy to access biological literature with PathBinderH Bioinformatics, May 15, 2005; 21(10): 2560 - 2562. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Hofmann and D. Schomburg Concept-based annotation of enzyme classes Bioinformatics, May 1, 2005; 21(9): 2059 - 2066. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Santos, D. Eggle, and David. J. States Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction Bioinformatics, April 15, 2005; 21(8): 1653 - 1658. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Prigent, J. C. Thierry, O. Poch, and F. Plewniak DbW: automatic update of a functional family-specific multiple alignment Bioinformatics, April 15, 2005; 21(8): 1437 - 1442. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Droit, G. G Poirier, and J. M Hunter Experimental and bioinformatic approaches for interrogating protein-protein interactions to determine protein function J. Mol. Endocrinol., April 1, 2005; 34(2): 263 - 280. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. D. Wren, J. T. Chang, J. Pustejovsky, E. Adar, H. R. Garner, and R. B. Altman Biomedical term mapping databases Nucleic Acids Res., January 1, 2005; 33(suppl_1): D289 - D293. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. von Mering, L. J. Jensen, B. Snel, S. D. Hooper, M. Krupp, M. Foglierini, N. Jouffre, M. A. Huynen, and P. Bork STRING: known and predicted protein-protein associations, integrated and transferred across organisms Nucleic Acids Res., January 1, 2005; 33(suppl_1): D433 - D437. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. Karaoz, T. M. Murali, S. Letovsky, Y. Zheng, C. Ding, C. R. Cantor, and S. Kasif Whole-genome annotation by using evidence integration in functional-linkage networks PNAS, March 2, 2004; 101(9): 2888 - 2893. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Rebholz-Schuhmann, S. Marcel, S. Albert, R. Tolle, G. Casari, and H. Kirsch Automatic extraction of mutations from Medline and cross-validation with OMIM Nucleic Acids Res., January 2, 2004; 32(1): 135 - 142. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Albert, S. Gaudan, H. Knigge, A. Raetsch, A. Delgado, B. Huhse, H. Kirsch, M. Albers, D. Rebholz-Schuhmann, and M. Koegl Computer-Assisted Generation of a Protein-Interaction Database for Nuclear Receptors Mol. Endocrinol., August 1, 2003; 17(8): 1555 - 1567. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. J. Duan, I. Xenarios, and D. Eisenberg Describing Biological Protein Interactions in Terms of Protein States and State Transitions : THE LiveDIP DATABASE Mol. Cell. Proteomics, February 1, 2002; 1(2): 104 - 116. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Mellor, I. Yanai, K. H. Clodfelter, J. Mintseris, and C. DeLisi Predictome: a database of putative functional links between proteins Nucleic Acids Res., January 1, 2002; 30(1): 306 - 309. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Greenbaum, N. M. Luscombe, R. Jansen, J. Qian, and M. Gerstein Interrelating Different Types of Genomic Data, from Proteome to Secretome: 'Oming in on Function Genome Res., September 1, 2001; 11(9): 1463 - 1468. [Abstract] [Full Text] [PDF] |
||||







