Bioinformatics Advance Access originally published online on January 9, 2009
Bioinformatics 2009 25(4):458-464; doi:10.1093/bioinformatics/btp010
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Profiling model T-cell metagenomes with short reads
1BC Cancer Agency, Michael Smith Genome Sciences Centre, 675 West 10th Avenue, Vancouver, BC V5Z 1L3 Canada and 2BC Cancer Agency, Deeley Research Centre, 2410 Lee Ave, Victoria, BC V8R 6V5 Canada
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: T-cell receptor (TCR) diversity in peripheral blood has not yet been fully profiled with sequence level resolution. Each T-cell clonotype expresses a unique receptor, generated by somatic recombination of TCR genes and the enormous potential for T-cell diversity makes repertoire analysis challenging. We developed a sequencing approach and assembly software (immuno-SSAKE or iSSAKE) for profiling T-cell metagenomes using short reads from the massively parallel sequencing platforms.
Results: Models of sequence diversity for the TCR β-chain CDR3 region were built using empirical data and used to simulate, at random, distinct TCR clonotypes at 1–20 p.p.m. Using simulated TCRβ (sTCRβ) sequences, we randomly created 20 million 36 nt reads having 1–2% random error, 20 million 42 or 50 nt reads having 1% random error and 20 million 36 nt reads with 1% error modeled on real short read data. Reads aligning to the end of known TCR variable (V) genes and having consecutive unmatched bases in the adjacent CDR3 were used to seed iSSAKE de novo assemblies of CDR3. With assembled 36 nt reads, we detect over 51% and 63% of rare (1 p.p.m.) clonotypes using a random or modeled error distribution, respectively. We detect over 99% of more abundant clonotypes (6 p.p.m. or higher) using either error distribution. Longer reads improve sensitivity, with assembled 42 and 50 nt reads identifying 82.0% and 94.7% of rare 1 p.p.m. clonotypes, respectively. Our approach illustrates the feasibility of complete profiling of the TCR repertoire using new massively parallel short read sequencing technology.
Availability: ftp://ftp.bcgsc.ca/supplementary/iSSAKE
Contact: rwarren{at}bcgsc.ca
Supplementary information: Supplementary methods and data are available at Bioinformatics online.
Associate Editor: Alfonso Valencia
Received on September 26, 2008; revised on November 28, 2008; accepted on January 1, 2009
This article has been cited by other articles:
![]() |
J. D. Freeman, R. L. Warren, J. R. Webb, B. H. Nelson, and R. A. Holt Profiling the T-cell receptor beta-chain repertoire by massively parallel sequencing Genome Res., October 1, 2009; 19(10): 1817 - 1824. [Abstract] [Full Text] [PDF] |
||||
