Bioinformatics Advance Access published online on February 12, 2004
Bioinformatics, doi:10.1093/bioinformatics/bth120
Bioinformatics © Oxford University Press 2004; all rights reserved
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Genome Sequencing Center, Washington University School of Medicine, Box 8501, 4444 Forest Park Blvd., Saint Louis, MO 63108, USA
* To whom correspondence should be addressed. E-mail: mwendl{at}wustl.edu.
Motivation: Investigators utilize gap estimates for DNA sequencing projects. Standard theories assume sequence is independently and identically distributed, leading to appreciable under-prediction of gaps. Results: Using a statistical scaling factor and data from 20 representative whole genome shotgun projects, we construct regression equations which relate coverage to a normalized gap measure. Prokaryotic genomes do not correlate to sequence coverage, while eukaryotes show strong correlation if chaff is ignored. Gaps decrease at an exponential rate of only about one-third of that predicted via theory alone. Case studies suggest that departure from theory can largely be attributed to assembly difficulties for repeat-rich genomes, but bias and coverage anomalies are also important when repeats are sparse. Such factors cannot be readily characterized a priori, suggesting upper limits on the accuracy of gap prediction. We also find that diminishing coverage probability discussed in other studies is a theoretical artifact that does not arise for the typical project.
Revised December 12, 2003
Accepted January 5, 2004
Article
Gap statistics for whole genome shotgun DNA sequencing projects
![]()
Abstract ![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
U. K. Simon and M. Weiss Intragenomic Variation of Fungal Ribosomal Genes Is Higher than Previously Thought Mol. Biol. Evol., November 1, 2008; 25(11): 2251 - 2254. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. N. Bainbridge, R. L. Warren, A. He, M. Bilenky, A. G. Robertson, and S. J.M. Jones THOR: targeted high-throughput ortholog reconstructor Bioinformatics, October 1, 2007; 23(19): 2622 - 2624. [Abstract] [Full Text] [PDF] |
||||

