Bioinformatics Vol. 18 no. 90001 2002
Pages S294-S302
© 2002 Oxford University Press
Efficiently detecting polymorphisms during the fragment assembly process
Informatics Research, Celera Genomics, 45 W. Gude Dr., Rockville MD 20850, USA
Received on January 24, 2002
; revised on March 28, 2002
; accepted on March 28, 2002
Motivation: Current genomic sequence assemblers assume that the input data is derived from a single, homogeneous source. However, recent whole-genome shotgun sequencing projects have violated this assumption, resulting in input fragments covering the same region of the genome whose sequences differ due to polymorphic variation in the population. While single-nucleotide polymorphisms (SNPs) do not pose a significant problem to state-of-the-art assembly methods, these methods do not handle insertion/deletion (indel) polymorphisms of more than a few bases.
Results: This paper describes an efficient method for detecting sequence discrepencies due to polymorphism that avoids resorting to global use of more costly, less stringent affine sequence alignments. Instead, the algorithm uses graph-based methods to determine the small set of fragments involved in each polymorphism and performs more sophisticated alignments only among fragments in that set. Results from the incorporation of this method into the Celera Assembler are reported for the D. melanogaster, H. sapiens, and M. musculus genomes.
Availability: The method described herein does not constitute a stand-alone software application, but is laid out in sufficient detail to be implemented as a component of any genomic sequence assembler.
Contact: daniel.fasulo{at}celera.com
Keywords: whole-genome assembly; shotgun sequencing; polymorphism.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. Pop, D. S. Kosack, and S. L. Salzberg Hierarchical Scaffolding With Bambus Genome Res., January 1, 2004; 14(1): 149 - 159. [Abstract] [Full Text] [PDF] |
||||
