Bioinformatics Vol. 19 Suppl. 1 2003
Pages i180-i182
© 2003 Oxford University Press
GENIA corpusa semantically annotated corpus for bio-textmining
1 CREST, Japan Science and Technology Corporation,
Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
2 Department of Computer Science, University of Tokyo,
Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
Received on January 6, 2003
; accepted on February 20, 2003
Motivation: Natural language processing (NLP) methods are regarded as being useful to raise the potential of text mining from biological literature. The lack of an extensively annotated corpus of this literature, however, causes a major bottleneck for applying NLP techniques. GENIA corpus is being developed to provide reference materials to let NLP techniques work for bio-textmining.
Results: GENIA corpus version 3.0 consisting of 2000 MEDLINE abstracts has been released with more than 400 000 words and almost 100 000 annotations for biological terms.
Availability: GENIA corpus is freely available at http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA
Keywords: Text Mining, Information Extraction, Corpus, Natural Language Processing, Computational Molecular Biology
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
S. Kim, S.-Y. Shin, I.-H. Lee, S.-J. Kim, R. Sriram, and B.-T. Zhang PIE: an online prediction system for protein-protein interactions from text Nucleic Acids Res., July 1, 2008; 36(suppl_2): W411 - W415. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Settles ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text Bioinformatics, July 15, 2005; 21(14): 3191 - 3192. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Song, E. Kim, G. G. Lee, and B.-k. Yi POSBIOTM--NER: a trainable biomedical named-entity recognition system Bioinformatics, June 1, 2005; 21(11): 2794 - 2796. [Abstract] [Full Text] [PDF] |
||||

