Skip Navigation


Bioinformatics Advance Access originally published online on February 10, 2004
This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow All Versions of this Article:
20/8/1254    most recent
bth077v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (16)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Akmaev, V. R.
Right arrow Articles by Wang, C. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Akmaev, V. R.
Right arrow Articles by Wang, C. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics 20(8) © Oxford University Press 2004; all rights reserved.

Correction of sequence-based artifacts in serial analysis of gene expression

Viatcheslav R. Akmaev * and Clarence J. Wang

Genzyme Corporation, Framingham, MA 01701-9322, USA

Received on May 19, 2003; revised on October 27, 2003; accepted on October 18, 2003
Advance Access Publication February 10, 2004

Motivation: Serial Analysis of Gene Expression (SAGE) is a powerful technology for measuring global gene expression, through rapid generation of large numbers of transcript tags. Beyond their intrinsic value in differential gene expression analysis, SAGE tag collections afford abundant information on the size and shape of the sample transcriptome and can accelerate novel gene discovery. These latter SAGE applications are facilitated by the enhanced method of Long SAGE. A characteristic of sequencing-based methods, such as SAGE and Long SAGE is the unavoidable occurrence of artifact sequences resulting from sequencing errors. By virtue of their low-random incidence, such tag errors have minimal impact on differential expression analysis. However, to fully exploit the value of large SAGE tag datasets, it is desirable to account for and correct tag artifacts.

Results: We present estimates for occurrences of tag errors, and an efficient error correction algorithm. Error rate estimates are based on a stochastic model that includes the Polymerase chain reaction and sequencing error contributions. The correction algorithm, SAGEScreen, is a multi-step procedure that addresses ditag processing, estimation of empirical error rates from highly abundant tags, grouping of similar-sequence tags and statistical testing of observed counts. We apply SAGEScreen to Long SAGE libraries and compare error rates for several processing scenarios. Results with simulated tag collections indicate that SAGEScreen corrects 78% of recoverable tag errors and reduces the occurrences of singleton tags.

Availability: The SAGEScreen software is available for academic users from the first author.

Contact: slava.akmaev{at}genzyme.com

* To whom correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Genome ResHome page
J. Khattra, A. D. Delaney, Y. Zhao, A. Siddiqui, J. Asano, H. McDonald, P. Pandoh, N. Dhalla, A.-l. Prabhu, K. Ma, et al.
Large-scale production of SAGE libraries from microdissected tissues, flow-sorted cells, and cell lines
Genome Res., January 1, 2007; 17(1): 108 - 116.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
K. L. Nielsen, A. L. Hogh, and J. Emmersen
DeepSAGE--digital transcriptomics with high sensitivity, simple experimental protocol and multiplexing of samples
Nucleic Acids Res., November 14, 2006; 34(19): e133 - e133.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
X. Ge, Q. Wu, Y.-C. Jung, J. Chen, and S. M. Wang
A large quantity of novel human antisense transcripts detected by LongSAGE
Bioinformatics, October 15, 2006; 22(20): 2475 - 2479.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. B. Wahl, U. Heinzmann, and K. Imai
LongSAGE analysis revealed the presence of a large number of novel antisense genes in the mouse genome
Bioinformatics, April 15, 2005; 21(8): 1389 - 1392.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. B. Wahl, U. Heinzmann, and K. Imai
LongSAGE analysis significantly improves genome annotation: identifications of novel genes and alternative transcripts in the mouse
Bioinformatics, April 15, 2005; 21(8): 1393 - 1400.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.