Bioinformatics, Vol 14, 575-582, Copyright © 1998 by Oxford University Press
J Macauley, H Wang and N Goodman
MOTIVATION: Integration of molecular biology databases remains limited in
practice despite its practical importance and considerable research effort.
The complexity of the problem is such that an experimental approach is
mandatory, yet this very complexity makes it hard to design definitive
experiments. This dilemma is common in science, and one tried-and-true
strategy is to work with model systems. We propose a model system for this
problem, namely a database of genes integrating diverse data across
organisms, and describe an experiment using this model. RESULTS: We
attempted to construct a database of human and mouse genes integrating data
from GenBank and the human and mouse genome- databases. We discovered
numerous errors in these well-respected databases: approximately 15% of
genes are apparently missing from the genome-databases; links between the
sequence and genome-databases are missing for another 5-10% of the cases;
about a third of likely homology links are missing between the
genome-databases; 10-20% of entries classified as 'genes' are apparently
misclassified. By using a model system, we were able to study the problems
caused by anomalous data without having to face all the hard problems of
database integration. CONTACT: nat@jax.org
ARTICLES
A model system for studying the integration of molecular biology databases
The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA.
![]()
CiteULike
Connotea
Del.icio.us What's this?