Bioinformatics, Vol 14, 349-356, Copyright © 1998 by Oxford University Press
I Anderson and A Brass
MOTIVATION: Searching DNA sequences against a DNA database is an essential
element of sequence analysis. However, few systematic studies have been
carried out to determine when a match between two DNA sequences has
biological significance and this is limiting the use that can be made of
DNA searching algorithms. RESULTS: A test set of DNA sequences has been
constructed consisting of artificially evolved and real sequences. This set
has been used to test various database searching algorithms (BLAST, BLAST2,
FASTA and Smith-Waterman) on a subset of the EMBL database. The results of
this analysis have been used to determine the sensitivity and coverage of
all of the algorithms. Guidelines have been produced which can be used to
assess the significance of DNA database search results. The Smith-Waterman
algorithm was shown to have the best coverage, but the worst sensitivity,
whereas the default BLASTN algorithm (word length set to 11) was shown to
have good sensitivity, but poor coverage. A sensible compromise between
speed, sensitivity and coverage can be obtained using either the FASTA or
BLAST (word length set to 6) algorithms. However, analysis of the results
also showed that no algorithm works well when the length of the probe
sequence is <200 bases. In general, matches can accurately be identified
between coding regions of DNA sequences when there is >35% sequence
identity between the corresponding proteins. Searching a DNA sequence
against a DNA sequence database can, therefore, be a useful tool in
sequence analysis. AVAILABILITY: The test sets used are available via
anonymous ftp from mbisg2.sbc.man.ac.uk in the directory
/pub/cabios/testdata/ CONTACT: I.Anderson@stud.man.ac.uk; abrass@man.ac.uk
ARTICLES
Searching DNA databases for similarities to DNA sequences: when is a match significant?
School of Biological Sciences, University of Manchester, 2.205 Stopford Building, Oxford Road, Manchester M13 9PT, UK. I.Anderson@stud.man.ac.uk
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
H. A. Ross, S. Murugan, and W. L. Sibon Li Testing the Reliability of Genetic Methods of Species Identification via Simulation Syst Biol, April 1, 2008; 57(2): 216 - 230. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Gotea, V. Veeramachaneni, and W. Makalowski Mastering seeds for genomic size nucleotide BLAST searches Nucleic Acids Res., December 1, 2003; 31(23): 6935 - 6941. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Unneberg, A. Wennborg, and M. Larsson Transcript identification by analysis of short sequence tags--influence of tag length, restriction site and transcript database Nucleic Acids Res., April 15, 2003; 31(8): 2217 - 2226. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. M. Freimoser, S. Screen, S. Bagga, G. Hu, and R. J. St Leger Expressed sequence tag (EST) analysis of two subspecies of Metarhizium anisopliae reveals a plethora of secreted proteins with potential activity in insect hosts Microbiology, January 1, 2003; 149(1): 239 - 247. [Abstract] [Full Text] [PDF] |
||||


