Bioinformatics Vol. 17 no. 7 2001
Pages 587-601
© 2001 Oxford University Press
Extending traditional query-based integration approaches for functional characterization of post-genomic data
1 Department of Bioinformatics,
GlaxoSmithKline, King of Prussia, PA, USA
2 Data Management Systems, Gene Logic Inc.,
Berkeley, CA, USA
Received on December 23, 2000
; revised on February 28, 2001
; accepted on March 6, 2001
Motivation: To identify and characterize regions of functional interest in genomic sequence requires full, flexible query access to an integrated, up-to-date view of all related information, irrespective of where it is stored (within an organization or across the Internet) and its format (traditional database, flat file, web site, results of runtime analysis). Wide-ranging multi-source queries often return unmanageably large result sets, requiring non-traditional approaches to exclude extraneous data.
Results: Target Informatics Net (TINet) is a readily extensible data integration system developed at GlaxoSmith- Kline (GSK), based on the Object-Protocol Model (OPM) multidatabase middleware system of Gene Logic Inc. Data sources currently integrated include: the Mouse Genome Database (MGD) and Gene Expression Database (GXD), GenBank, SwissProt, PubMed, GeneCards, the results of runtime BLAST and PROSITE searches, and GSK proprietary relational databases. Special-purpose class methods used to filter and augment query results include regular expression pattern-matching over BLAST HSP alignments and retrieving partial sequences derived from primary structure annotations. All data sources and methods are accessible through an SQL-like query language or a GUI, so that when new investigations arise no additional programming beyond query specification is required. The power and flexibility of this approach are illustrated in such integrated queries as: (1) find homologs in genomic sequence to all novel genes cloned and reported in the scientific literature within the past three months that are linked to the MeSH term neoplasms"; (2) using a neuropeptide precursor query sequence, return only HSPs where the target genomic sequences conserve the G[KR][KR] motif at the appropriate points in the HSP alignment; and (3) of the human genomic sequences annotated with exon boundaries in GenBank, return only those with valid putative donor/acceptor sites and start/stop codons.
Availability: Freely available to non-profit educational and research institutions. Usage by commercial entities requires a license agreement.
Contact: barbara_ eckman{at}sbphrd.com
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
Y. Tao, C. Friedman, and Y. A. Lussier Visualizing information across multidimensional post-genomic structured and textual databases Bioinformatics, April 15, 2005; 21(8): 1659 - 1667. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Safran, V. Chalifa-Caspi, O. Shmueli, T. Olender, M. Lapidot, N. Rosen, M. Shmoish, Y. Peter, G. Glusman, E. Feldmesser, et al. Human Gene-Centric Databases at the Weizmann Institute of Science: GeneCards, UDB, CroW 21 and HORDE Nucleic Acids Res., January 1, 2003; 31(1): 142 - 146. [Abstract] [Full Text] [PDF] |
||||

