Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (25)
Right arrowRequest Permissions
Citing Articles
Right arrowScopus Links
Right arrowCiting Articles via CrossRef
Google Scholar
Right arrow Articles by Anderson, I.
Right arrow Articles by Brass, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Anderson, I.
Right arrow Articles by Brass, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics, Vol 14, 349-356, Copyright © 1998 by Oxford University Press


ARTICLES

Searching DNA databases for similarities to DNA sequences: when is a match significant?

I Anderson and A Brass
School of Biological Sciences, University of Manchester, 2.205 Stopford Building, Oxford Road, Manchester M13 9PT, UK. I.Anderson@stud.man.ac.uk

MOTIVATION: Searching DNA sequences against a DNA database is an essential element of sequence analysis. However, few systematic studies have been carried out to determine when a match between two DNA sequences has biological significance and this is limiting the use that can be made of DNA searching algorithms. RESULTS: A test set of DNA sequences has been constructed consisting of artificially evolved and real sequences. This set has been used to test various database searching algorithms (BLAST, BLAST2, FASTA and Smith-Waterman) on a subset of the EMBL database. The results of this analysis have been used to determine the sensitivity and coverage of all of the algorithms. Guidelines have been produced which can be used to assess the significance of DNA database search results. The Smith-Waterman algorithm was shown to have the best coverage, but the worst sensitivity, whereas the default BLASTN algorithm (word length set to 11) was shown to have good sensitivity, but poor coverage. A sensible compromise between speed, sensitivity and coverage can be obtained using either the FASTA or BLAST (word length set to 6) algorithms. However, analysis of the results also showed that no algorithm works well when the length of the probe sequence is <200 bases. In general, matches can accurately be identified between coding regions of DNA sequences when there is >35% sequence identity between the corresponding proteins. Searching a DNA sequence against a DNA sequence database can, therefore, be a useful tool in sequence analysis. AVAILABILITY: The test sets used are available via anonymous ftp from mbisg2.sbc.man.ac.uk in the directory /pub/cabios/testdata/ CONTACT: I.Anderson@stud.man.ac.uk; abrass@man.ac.uk
Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Syst BiolHome page
H. A. Ross, S. Murugan, and W. L. Sibon Li
Testing the Reliability of Genetic Methods of Species Identification via Simulation
Syst Biol, April 1, 2008; 57(2): 216 - 230.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
V. Gotea, V. Veeramachaneni, and W. Makalowski
Mastering seeds for genomic size nucleotide BLAST searches
Nucleic Acids Res., December 1, 2003; 31(23): 6935 - 6941.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. Unneberg, A. Wennborg, and M. Larsson
Transcript identification by analysis of short sequence tags--influence of tag length, restriction site and transcript database
Nucleic Acids Res., April 15, 2003; 31(8): 2217 - 2226.
[Abstract] [Full Text] [PDF]


Home page
MicrobiologyHome page
F. M. Freimoser, S. Screen, S. Bagga, G. Hu, and R. J. St Leger
Expressed sequence tag (EST) analysis of two subspecies of Metarhizium anisopliae reveals a plethora of secreted proteins with potential activity in insect hosts
Microbiology, January 1, 2003; 149(1): 239 - 247.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.