Bioinformatics Advance Access published online on November 15, 2007
Bioinformatics, doi:10.1093/bioinformatics/btm557
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Text processing through Web services: Calling Whatizit
1European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, U.K
*To whom correspondence should be addressed. Dr. Dietrich Rebholz-Schuhmann, E-mail: rebholz{at}ebi.ac.uk
| Abstract |
|---|
Motivation: Text-mining (TM) solutions could turn are developing into efficient services to researchers in the biomedical research community. Such solutions have to scale with the growing number and size of resources (e.g., available controlled vocabularies), with the amount of literature to be processed (e.g., about 17 million documents in PubMed) and with the demands of the user community (e.g., different methods for fact extraction). These demands induce the development of server-based solutions that can be accessed programmatically.
Whatizit is a suite of modules that analyse text for contained information, e.g. any own text documents, scientific publications or Medline abstracts. Each module identifies terms and then links them to the corresponding entries in bioinformatics databases such as UniProtKb/Swiss-Prot data entries and gene ontology concepts. Other modules identify a set of selected annotation types like the set produced by the EBIMed analysis pipeline for proteins. In the case of Medline abstracts, Whatizit offers access to EBI's inhouse installation via PMID or term query. For large quantities of own text, the server can be operated in a streaming mode. (http://www.ebi.ac.uk/webservices/whatizit)
Associate Editor: Dr. Jonathan Wren
Received on July 10, 2007; revised on October 2, 2007; accepted on November 4, 2007