Bioinformatics, Vol 15, 106-110, Copyright © 1999 by Oxford University Press
GA Seluja, A Farmer, M McLeod, C Harger and PA Schad
MOTIVATION: The nucleotide sequence databases are invaluable tools both for
the private and the academic research communities, from the retrieval of
sequences to homology searching. Several issues related to data quality,
such as the existence of sequencing artifacts and errors, are facing the
databases. We investigated a major source of these errors, i.e. the
presence of vector-contaminated sequences. RESULTS: Using a panel of 180
vector polylinker sequences, we found 0.36% or 3029 vector-matching
sequences in GenBank Release 95-96, with an average vector-matching length
of 72 nucleotides. The number of vector- contaminated sequences has been
growing with the database; however, the percent contamination has remained
approximately constant at an average of 0.28% from 1982 to 1996.
AVAILABILITY: Access to the database of vector polylinker sequences via
sequence similarity searching is available at
http://seqsim.ncgr.org/vector/ CONTACT: gas@molinfo.com
ARTICLES
Establishing a method of vector contamination identification in database sequences
National Center for Genome Resources, 1800-A Old Pecos Trail, Santa Fe, NM 87505, USA.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
B. Lee and G. Shin CleanEST: a database of cleansed EST libraries Nucleic Acids Res., October 2, 2008; (2008) gkn648v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Liang, Y. Liu, L. Liu, A. C. Davis, Y. Shen, and Q. Q. Li Expressed Sequence Tags With cDNA Termini: Previously Overlooked Resources for Gene Annotation and Transcriptome Exploration in Chlamydomonas reinhardtii Genetics, May 1, 2008; 179(1): 83 - 93. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. S. Kohane Bioinformatics and Clinical Informatics: The Imperative to Collaborate J. Am. Med. Inform. Assoc., September 1, 2000; 7(5): 512 - 516. [Full Text] |
||||


