Skip Navigation



Bioinformatics Advance Access published online on September 19, 2007

Bioinformatics, doi:10.1093/bioinformatics/btm420
This Article
Right arrow Advance Access manuscript (PDF)
Right arrow Supplementary data
Right arrow Supplementary Data
Right arrow All Versions of this Article:
23/20/2665    most recent
btm420v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Reed, C.
Right arrow Articles by Fofanov, Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Reed, C.
Right arrow Articles by Fofanov, Y.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author (2007). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Effect of the mutation rate and background size on the quality of pathogen identification

Chris Reed 1, Viacheslav Fofanov 2, Catherine Putonti 1,3, Sergei Chumakov 4, Tom Slezak 5 and Yuriy Fofanov 1,3,*

1Department of Computer Science, University of Houston, 501 Philip G. Hoffman Hall, Houston, TX USA 77204.
2Department of Statistics, Rice University, 6100 Main Street, MS138, Houston, TX USA 77005.
3Department of Biology and Biochemistry, University of Houston, Science and Research Bldg 2, Houston, TX USA 77204.
4Departmento de Fisica, CUCEI, Universidad de Guadalajara, Revolucion 1500, Guadalajara, Jal. Mexico 44430.
5Computations Department, Lawrence Livermore National Laboratory, 7000 East Ave. L-174, Livermore, CA USA 94550

*To whom correspondence should be addressed. Yuriy Fofanov, E-mail: yfofanov{at}bioinfo.uh.edu


   Abstract

Motivation: Genomic-based methods have significant potential for fast and accurate identification of organisms or even genes of interest in complex environmental samples (air, water, soil, food, etc.), especially when isolation of the target organism cannot be performed by a variety of reasons. Despite this potential, the presence of the unknown, variable, and usually large quantities of background DNA can cause interference resulting in false positive outcomes.

Results: In order to estimate how the genomic diversity of the background (total length of all of the different genomes present in the background), target length, and target mutation rate affect the probability of misidentifications, we introduce a mathematical definition for the quality of an individual signature in the presence of a background based on its length and number of mismatches needed to transform the signature into the closest subsequence present in the background. This definition, in conjunction with a probabilistic framework, allows one to predict the minimal signature length required to identify the target in the presence of different sizes of backgrounds and the effect of the target’s mutation rate on the quality of its identification. The model assumptions and predictions were validated using both Monte-Carlo simulations and real genomic data examples. The proposed model can be used to determine appropriate signature lengths for various combinations of target and background genome sizes. It also predicted that any genomic signatures will be unable to identify target if its mutation rate is greater than 5%.

Contact: yfofanov{at}bioinfo.uh.edu

Supplementary information: Supplementary data is available at Bioinformatics online

Associate Editor: Dr. Chris Stoeckert


Received on May 10, 2007; revised on August 10, 2007; accepted on August 13, 2007

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.