Skip Navigation



Bioinformatics Advance Access published online on July 26, 2006

Bioinformatics, doi:10.1093/bioinformatics/btl405
This Article
Right arrow Advance Access manuscript (PDF) Freely available
Right arrowOA All Versions of this Article:
22/19/2421    most recent
btl405v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Friedman, C.
Right arrow Articles by Lussier, Y. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Friedman, C.
Right arrow Articles by Lussier, Y. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2006 The Author(s)
Received May 15, 2006
Revised July 20, 2006
Accepted July 21, 2006

Article

Bio-ontology and text: bridging the modeling gap

Carol Friedman 1 *, Tara Borlawsky 1, Lyudmila Shagina 1, H. Rosie Xing 2, and Yves A. Lussier 3

1 Department of Biomedical Informatics, Columbia University, New York, USA, 10032
2 Depts. Of Pathology and of Radiation Oncology, The University of Chicago, 5841 South Maryland Ave, Chicago, IL, USA, 60637
3 Department of Biomedical Informatics, Columbia University, New York, USA, 10032; Department of Medicine, Center for Biomedical Informatics, The University of Chicago, 5841 South Maryland Ave, Chicago, IL, USA, 60637

* To whom correspondence should be addressed.
Carol Friedman, E-mail: Friedman{at}dbmi.columbia.edu


   Abstract

Motivation: Natural language processing (NLP) techniques are increasingly being used in biology to automate the capture of new biological discoveries in text, which are being reported at a rapid rate. Yet, information represented in NLP data structures is classically very different from information organized with ontologies as found in model organisms or genetic databases. To facilitate the computational reuse and integration of information buried in unstructured text with that of genetic databases, we propose and evaluate a translational schema that represents a comprehensive set of phenotypic and genetic entities, as well as their closely related biomedical entities and relations as expressed in natural language. In addition, the schema connects different scales of biological information, and provides mappings from the textual information to existing ontologies, which are essential in biology for integration, organization, dissemination, and knowledge management of heterogeneous phenotypic information. A common comprehensive representation for otherwise heterogeneous phenotypic and genetic datasets, such as the one proposed, is critical for advancing systems biology because it enables acquisition and reuse of unprecedented volumes of diverse types of knowledge and information from text.

Results: A novel representational schema, PGschema, was developed that enables translation of phenotypic, genetic, and their closely related information found in textual narratives to a well-defined data structure comprising phenotypic and genetic concepts from established ontologies along with modifiers and relationships. Evaluation for coverage of a selected set of entities showed that 90% of the information could be represented (95% confidence interval: 86%-93%; n=268). Moreover, PGschema can be expressed automatically in an XML format by using natural language techniques to process the text. To our knowledge, we are providing the first evaluation of a translational schema for NLP that contains declarative knowledge about genes and their associated biomedical data (e.g. phenotypes).

Availability: http://zellig.cpmc.columbia.edu/PGschema.


Associate Editor: Martin Bishop
Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Proc Am Thorac SocHome page
Y. A. Lussier and Y. Liu
Computational Approaches to Phenotyping: High-Throughput Phenomics
Proceedings of the ATS, January 1, 2007; 4(1): 18 - 25.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.