Bioinformatics Advance Access published online on October 15, 2008
Bioinformatics, doi:10.1093/bioinformatics/btn534
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
BioCaster: detecting public health rumors with a Web-based text mining system
1 National Institute of Informatics, ROIS, Tokyo, 101-8430, Japan.
2 PRESTO, Japan Science and Technology Corporation, Tokyo, 101-8430, Japan.
3 Department of Anthropology, Lehman College, CUNY, New York, 10468-1589, USA.
4 National Institute of Genetics, ROIS, Mishima, 411-8540, Japan.
5 University of Science, Vietnam National University at HCMC, Vietnam.
6 NECTEC and the Department of Computer Engineering, Kasetsart University, Bangkok, Thailand.
7 Okayama University, Okayama, 700-8530, Japan.
8 National Institute of Infectious Disease, Tokyo, 162-8640, Japan.
*To whom correspondence should be addressed. Dr. Nigel Collier, E-mail: collier{at}nii.ac.jp
| Abstract |
|---|
Summary: BioCaster is an ontology-based text mining system for detecting and tracking the distribution of infectious disease outbreaks from linguistic signals on the Web. The system continuously analyzes documents reported from over 1,700 RSS feeds, classifies them for topical relevance and plots them onto a Google map using geocoded information. The background knowledge for bridging the gap between layman's terms and formal coding systems is contained in the freely available BioCaster ontology which includes information in eight languages focused on the epidemiological role of pathogens as well as geographical locations with their latitudes/longitudes. The system consists of four main stages: topic classification, named entity recognition (NER), disease/location detection and event recognition. Higher order event analysis is used to detect more precisely specified warning signals that can then be notified to registered users via email alerts. Evaluation of the system for topic recognition and entity identification is conducted on a gold standard corpus of annotated news articles.
Availability: The BioCaster map and ontology are freely available via a web portal at http://www.biocaster.org
Contact: collier{at}nii.ac.jp
Supplementary information: XXXX
Associate Editor: Dr. Jonathan Wren
Received on June 11, 2008; revised on October 7, 2008; accepted on October 9, 2008
This article has been cited by other articles:
![]() |
L. M. Schriml, C. Arze, S. Nadendla, A. Ganapathy, V. Felix, A. Mahurkar, K. Phillippy, A. Gussman, S. Angiuoli, E. Ghedin, et al. GeMInA, Genomic Metadata for Infectious Agents, a geospatial surveillance pathogen database Nucleic Acids Res., January 1, 2010; 38(suppl_1): D754 - D764. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Antezana, M. Kuiper, and V. Mironov Biological knowledge management: the emerging role of the Semantic Web technologies Brief Bioinform, July 1, 2009; 10(4): 392 - 407. [Abstract] [Full Text] [PDF] |
||||

