Bioinformatics Advance Access originally published online on October 10, 2008
Bioinformatics 2008 24(23):2760-2766; doi:10.1093/bioinformatics/btn502
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A bioinformatics analysis of the cell line nomenclature
1National Center for Integrative Biomedical Informatics and the Center for Computational Medicine and Biology, 2Department of Psychiatry and 3Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: Cell lines are used extensively in biomedical research, but the nomenclature describing cell lines has not been standardized. The problems are both linguistic and experimental. Many ambiguous cell line names appear in the published literature. Users of the same cell line may refer to it in different ways, and cell lines may mutate or become contaminated without the knowledge of the user. As a first step towards rationalizing this nomenclature, we created a cell line knowledgebase (CLKB) with a well-structured collection of names and descriptive data for cell lines cultured in vitro. The objectives of this work are: (i) to assist users in extracting useful information from biomedical text and (ii) to highlight the importance of standardizing cell line names in biomedical research. This CLKB contains a broad collection of cell line names compiled from ATCC, Hyper CLDB and MeSH. In addition to names, the knowledgebase specifies relationships between cell lines. We analyze the use of cell line names in biomedical text. Issues include ambiguous names, polymorphisms in the use of names and the fact that some cell line names are also common English words. Linguistic patterns associated with the occurrence of cell line names are analyzed. Applying these patterns to find additional cell line names in the literature identifies only a small number of additional names. Annotation of microarray gene expression studies is used as a test case. The CLKB facilitates data exploration and comparison of different cell lines in support of clinical and experimental research.
Availability: The web ontology file for this cell line collection can be downloaded at http://www.stateslab.org/data/celllineOntology/cellline.zip.
Contact: dstates{at}umich.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
Associate Editor: Alex Bateman
Received on March 26, 2008; revised on August 10, 2008; accepted on September 19, 2008