Bioinformatics Advance Access originally published online on February 10, 2004
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics 20(8) © Oxford University Press 2004; all rights reserved.
Correction of sequence-based artifacts in serial analysis of gene expression
Genzyme Corporation, Framingham, MA 01701-9322, USA
Received on May 19, 2003; revised on October 27, 2003; accepted on October 18, 2003
Advance Access Publication February 10, 2004
Motivation: Serial Analysis of Gene Expression (SAGE) is a powerful technology for measuring global gene expression, through rapid generation of large numbers of transcript tags. Beyond their intrinsic value in differential gene expression analysis, SAGE tag collections afford abundant information on the size and shape of the sample transcriptome and can accelerate novel gene discovery. These latter SAGE applications are facilitated by the enhanced method of Long SAGE. A characteristic of sequencing-based methods, such as SAGE and Long SAGE is the unavoidable occurrence of artifact sequences resulting from sequencing errors. By virtue of their low-random incidence, such tag errors have minimal impact on differential expression analysis. However, to fully exploit the value of large SAGE tag datasets, it is desirable to account for and correct tag artifacts.
Results: We present estimates for occurrences of tag errors, and an efficient error correction algorithm. Error rate estimates are based on a stochastic model that includes the Polymerase chain reaction and sequencing error contributions. The correction algorithm, SAGEScreen, is a multi-step procedure that addresses ditag processing, estimation of empirical error rates from highly abundant tags, grouping of similar-sequence tags and statistical testing of observed counts. We apply SAGEScreen to Long SAGE libraries and compare error rates for several processing scenarios. Results with simulated tag collections indicate that SAGEScreen corrects 78% of recoverable tag errors and reduces the occurrences of singleton tags.
Availability: The SAGEScreen software is available for academic users from the first author.
Contact: slava.akmaev{at}genzyme.com
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
J. Khattra, A. D. Delaney, Y. Zhao, A. Siddiqui, J. Asano, H. McDonald, P. Pandoh, N. Dhalla, A.-l. Prabhu, K. Ma, et al. Large-scale production of SAGE libraries from microdissected tissues, flow-sorted cells, and cell lines Genome Res., January 1, 2007; 17(1): 108 - 116. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. L. Nielsen, A. L. Hogh, and J. Emmersen DeepSAGE--digital transcriptomics with high sensitivity, simple experimental protocol and multiplexing of samples Nucleic Acids Res., November 14, 2006; 34(19): e133 - e133. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Ge, Q. Wu, Y.-C. Jung, J. Chen, and S. M. Wang A large quantity of novel human antisense transcripts detected by LongSAGE Bioinformatics, October 15, 2006; 22(20): 2475 - 2479. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. B. Wahl, U. Heinzmann, and K. Imai LongSAGE analysis revealed the presence of a large number of novel antisense genes in the mouse genome Bioinformatics, April 15, 2005; 21(8): 1389 - 1392. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. B. Wahl, U. Heinzmann, and K. Imai LongSAGE analysis significantly improves genome annotation: identifications of novel genes and alternative transcripts in the mouse Bioinformatics, April 15, 2005; 21(8): 1393 - 1400. [Abstract] [Full Text] [PDF] |
||||


