Skip Navigation



Bioinformatics Advance Access published online on March 4, 2008

Bioinformatics, doi:10.1093/bioinformatics/btn074
This Article
Right arrow Advance Access manuscript (PDF) Freely available
Right arrowOA All Versions of this Article:
24/8/1035    most recent
btn074v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Denisov, G.
Right arrow Articles by Sutton, G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Denisov, G.
Right arrow Articles by Sutton, G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Consensus Generation and Variant Detection by Celera Assembler

Gennady Denisov *, Brian Walenz , Aaron L. Halpern , Jason Miller , Nelson Axelrod , Samuel Levy and Granger Sutton

J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850

*To whom correspondence should be addressed. Dr. Gennady Denisov, E-mail: gdenisov{at}jcvi.org


   Abstract

Motivation: We present an algorithm to identify allelic variation given a Whole Genome Shotgun (WGS) assembly of haploid sequences, and to produce a set of haploid consensus sequences rather than a single consensus sequence. Existing WGS assemblers take a column-by-column approach to consensus generation, and produce a single consensus sequence which can be inconsistent with the underlying haploid alleles and inconsistent with any of the aligned sequence reads. Our new algorithm takes a dynamic windowing approach. It detects alleles by simultaneously processing portions of aligned reads spanning a region of sequence variation, assigns reads to their respective alleles, phases adjacent variant alleles, and generates a consensus sequence corresponding to each confirmed allele. This algorithm was used to produce the first diploid genome sequence of an individual human (Levy et al. 2007). It can also be applied to assemblies of multiple diploid individuals and hybrid assemblies of multiple haploid organisms.

Results: Being applied to the individual human genome assembly, the new algorithm detects exactly two confirmed alleles and reports two consensus sequences in 98.98% of the total number 2,033,311 detected regions of sequence variation. In 33,269 out of 460,373 detected regions of size > 1 bp, it fixes the constructed errors of a mosaic haploid representation of a diploid locus as produced by the original Celera Assembler consensus algorithm. Using an optimized procedure calibrated against 1,506,344 known SNPs, it detects 438,814 new heterozygous SNPs with false positive rate 12%.

Associate Editor: Prof. John Quackenbush


Received on November 3, 2007; revised on January 31, 2008; accepted on February 22, 2008

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
T. Rausch, S. Koren, G. Denisov, D. Weese, A.-K. Emde, A. Doring, and K. Reinert
A consistency-based consensus algorithm for de novo and reference-guided sequence assembly of short reads
Bioinformatics, May 1, 2009; 25(9): 1118 - 1124.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
N. Axelrod, Y. Lin, P. C. Ng, T. B. Stockwell, J. Crabtree, J. Huang, E. Kirkness, R. L. Strausberg, M. E. Frazier, J. C. Venter, et al.
The HuRef Browser: a web resource for individual human genomics
Nucleic Acids Res., January 1, 2009; 37(suppl_1): D1018 - D1024.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. R. Miller, A. L. Delcher, S. Koren, E. Venter, B. P. Walenz, A. Brownley, J. Johnson, K. Li, C. Mobarry, and G. Sutton
Aggressive assembly of pyrosequencing reads with mates
Bioinformatics, December 15, 2008; 24(24): 2818 - 2824.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.