Bioinformatics Advance Access originally published online on May 14, 2004
Bioinformatics 2004 20(15):2401-2410; doi:10.1093/bioinformatics/bth258
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics 20(15) © Oxford University Press 2004; all rights reserved.
Algorithms for sequence analysis via mutagenesis
1 Department of Mathematics, 2 Australian Genome Research Facility and 3 Institute for Molecular Bioscience, University of Queensland, St. Lucia Qld. 4072, Australia
Received on January 18, 2004; revised on March 11, 2004; accepted on March 23, 2004
Advance Access Publication May 14, 2004
Motivation: Despite many successes of conventional DNA sequencing methods, some DNAs remain difficult or impossible to sequence. Unsequenceable regions occur in the genomes of many biologically important organisms, including the human genome. Such regions range in length from tens to millions of bases, and may contain valuable information such as the sequences of important genes. The authors have recently developed a technique that renders a wide range of problematic DNAs amenable to sequencing. The technique is known as sequence analysis via mutagenesis (SAM). This paper presents a number of algorithms for analysing and interpreting data generated by this technique.
Results: The essential idea of SAM is to infer the target sequence using the sequences of mutants derived from the target. We describe three algorithms used in this process. The first algorithm predicts the number of mutants that will be required to infer the target sequence with a desired level of accuracy. The second algorithm infers the target sequence itself, using the mutant sequences. The third algorithm assigns quality values to each inferred base. The algorithms are illustrated using mutant sequences generated in the laboratory.
Availability: Software will be made available upon request.
Contact: j.keith1{at}mailbox.uq.edu.au