Bioinformatics Vol. 19 no. 4 2003
Pages 443-448
© 2003 Oxford University Press
Can transcriptome size be estimated from SAGE catalogs?
Laboratory of Cardiovascular Science, Gerontology Research Center, National Institute on Aging, NIH, 5600 Nathan Shock Drive, Baltimore, MD 21224, USA
Received on June 21, 2002
; revised on September 26, 2002
; accepted on October 1, 2002
Motivation: SAGE (Serial Analysis of Gene Expression) can be used to estimate the number of unique transcripts in a transcriptome. A simple estimator that corrects for sequencing and sampling errors was applied to a SAGE library (137 832 tags) obtained from mouse embryonic stem cells, and also to Monte Carlo simulated libraries generated using assumed distributions of true expression levels consistent with the data.
Results: When the corrected data themselves were taken as the underlying model of ground truth, the estimator converged to the true value (53 535) only after counting 300 000 simulated tags, more than twice the number in the experiment. The SAGE data could also be well fit by a Monte Carlo model based on a truncated inverse-square distribution of expression levels, with 130 000 true transcripts and 106 samples needed for convergence. We conclude that the size of a transcriptome is ill-determined from SAGE libraries of even moderately large size. In order to obtain a valid estimate, one must sample a number of tags inversely proportional to the lowest abundance level, which is not known a priori. This constrains the design of SAGE experiments intended to determine biological complexity.
Availability: The homemade software used for this analysis was not designed for general or production use, but the authors will be happy to share Fortran sourcecode with interested parties.
Contact: sternm{at}grc.nia.nih.gov
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. Reverter, A. Ingham, S. A. Lehnert, S.-H. Tan, Y. Wang, A. Ratnakumar, and B. P. Dalrymple Simultaneous identification of differential gene expression and connectivity in inflammation, adipogenesis and cancer Bioinformatics, October 1, 2006; 22(19): 2396 - 2404. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Poroyko, L.G. Hejlek, W.G. Spollen, G.K. Springer, H.T. Nguyen, R.E. Sharp, and H.J. Bohnert The Maize Root Transcriptome by Serial Analysis of Gene Expression Plant Physiology, July 1, 2005; 138(3): 1700 - 1710. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. LEE, J. BAO, G. ZHOU, J. SHAPIRO, J. XU, R. Z. SHI, X. LU, T. CLARK, D. JOHNSON, Y. C. KIM, et al. Detecting novel low-abundant transcripts in Drosophila RNA, June 1, 2005; 11(6): 939 - 946. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. Wobus and K. R. Boheler Embryonic Stem Cells: Prospects for Developmental Biology and Cell Therapy Physiol Rev, April 1, 2005; 85(2): 635 - 678. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. P. So, R. F. B. Turner, and C. A. Haynes Increasing the efficiency of SAGE adaptor ligation bydirected ligation chemistry Nucleic Acids Res., July 6, 2004; 32(12): e96 - e96. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. C. Meyers, D. W. Galbraith, T. Nelson, and V. Agrawal Methods for Transcriptional Profiling in Plants. Be Fruitful and Replicate Plant Physiology, June 1, 2004; 135(2): 637 - 652. [Full Text] [PDF] |
||||
![]() |
M. Richards, S.-P. Tan, J.-H. Tan, W.-K. Chan, and A. Bongso The Transcriptome Profile of Human Embryonic Stem Cells as Defined by SAGE Stem Cells, January 1, 2004; 22(1): 51 - 64. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. R. Steen, S. Zuyderduyn, D. L. Toffaletti, M. Marra, S. J. M. Jones, J. R. Perfect, and J. Kronstad Cryptococcus neoformans Gene Expression during Experimental Cryptococcal Meningitis Eukaryot. Cell, December 1, 2003; 2(6): 1336 - 1349. [Abstract] [Full Text] [PDF] |
||||






