Bioinformatics Advance Access originally published online on February 12, 2004
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics 20(10) © Oxford University Press 2004; all rights reserved.
Gap statistics for whole genome shotgun DNA sequencing projects
Genome Sequencing Center, Washington University School of Medicine, Box 8501, 4444 Forest Park Blvd., Saint Louis, MO 63108 USA
Received on May 19, 2003; revised on December 12, 2003; accepted on January 5, 2004
Advance Access Publication February 19, 2004
Motivation: Investigators utilize gap estimates for DNA sequencing projects. Standard theories assume sequences are independently and identically distributed, leading to appreciable under-prediction of gaps.
Results: Using a statistical scaling factor and data from 20 representative whole genome shotgun projects, we construct regression equations that relate coverage to a normalized gap measure. Prokaryotic genomes do not correlate to sequence coverage, while eukaryotes show strong correlation if the chaff is ignored. Gaps decrease at an exponential rate of only about one-third of that predicted via theory alone. Case studies suggest that departure from theory can largely be attributed to assembly difficulties for repeat-rich genomes, but bias and coverage anomalies are also important when repeats are sparse. Such factors cannot be readily characterized a priori, suggesting upper limits on the accuracy of gap prediction. We also find that diminishing coverage probability discussed in other studies is a theoretical artifact that does not arise for the typical project.
Contact: mwendl{at}wustl.edu
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
U. K. Simon and M. Weiss Intragenomic Variation of Fungal Ribosomal Genes Is Higher than Previously Thought Mol. Biol. Evol., November 1, 2008; 25(11): 2251 - 2254. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. N. Bainbridge, R. L. Warren, A. He, M. Bilenky, A. G. Robertson, and S. J.M. Jones THOR: targeted high-throughput ortholog reconstructor Bioinformatics, October 1, 2007; 23(19): 2622 - 2624. [Abstract] [Full Text] [PDF] |
||||

