Bioinformatics Advance Access originally published online on October 27, 2004
Bioinformatics 2005 21(5):694-695; doi:10.1093/bioinformatics/bti087
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
MedKit: a helper toolkit for automatic mining of MEDLINE/PubMed citations
Department of Electrical and Computer Engineering, Iowa State University Ames, IA 50011, USA
*To whom correspondence should be addressed.
Summary: MEDLINE/PubMed is one of the most important information sources for bioinformatics text mining. However, there remain limitations in working with MEDLINE/PubMed citations. For example, PubMed imposes an upper limit of 10 000 for downloading PMID list or citations; and MEDLINE files are too large for most off-the-shelf XML parsers. We developed a Java package, MedKit, to work-around the limitations, as well as provide other useful functionalities, e.g. random sampling. Its four modules (querier, sampler, fetcher and parser) can work independently, or be pipelined in various combinations. It can be used as a stand-alone GUI application, or integrated into other text-mining systems. Text mining researchers and others may download and use the toolkit free for non-commercial purposes.
Availability: http://metnetdb.gdcb.iastate.edu/medkit
Contact: berleant{at}iastate.edu