Bioinformatics Advance Access published online on June 9, 2004
Bioinformatics, doi:10.1093/bioinformatics/bth342
Bioinformatics © Oxford University Press 2004; all rights reserved
1 Department of Statistics, Northwestern University, Evanston, IL 60208; Department of Statistics, Pennsylvania State University, University Park, PA 16802
* To whom correspondence should be addressed. E-mail: jzwang{at}northwestern.edu.
Motivation: The gene expression intensity information conveyed by EST data can be used to infer important cDNA library properties, such as gene number and expression patterns (Audic and Claverie, 1997; Stekel et al., 2000). However, EST clustering errors, which often lead to greatly inflated estimates of obtained unique genes, have become a major obstacle in the analyses. The EST clustering error structure, the relationship between clustering error and clustering criteria, and possible error correction methods need to be systematically investigated. Results: We identify and quantify two types of EST clustering error, namely, Type I and II in EST clustering using CAP3 assembling program. A Type I error occurs when ESTs from the same gene do not form a cluster whereas a Type II error occurs when ESTs from distinct genes are falsely clustered together. While the Type II error rate is less than 1.5% for both 5' and 3' EST clustering, the Type I error in the 5' EST case is about 10 times higher than the 3' EST case (30% vs 3%). An over-stringent identity rule, for example, P Availability: We have automated the methods developed in this paper in a web-based software ESTstat at http://cwdg5.bio.psu.edu/eststat. The supplemental data are available at the same website.
Revised May 13, 2004
Accepted May 18, 2004
Article
EST clustering error evaluation and correction
2 Department of Statistics, Pennsylvania State University, University Park, PA 16802
3 Department of Biology, Pennsylvania State University, University Park, PA 16802
![]()
Abstract
95%, may even inflate the Type I error in both cases. We demonstrate that about 80% of the Type I error is due to insufficient overlap among sibling ESTs (ISO error) in 5' EST clustering. A novel statistical approach is proposed to correct ISO error to provide more accurate estimates of the true gene cluster profile.![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
L. M. Bragg and G. Stone k-link EST clustering: evaluating error introduced by chimeric sequences under different degrees of linkage Bioinformatics, September 15, 2009; 25(18): 2302 - 2308. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. C. Almeida and R. DeSalle Orthology, Function and Evolution of Accessory Gland Proteins in the Drosophila repleta Group Genetics, January 1, 2009; 181(1): 235 - 245. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. M. Freeman JR., M. Wu, M-M. Cordonnier-Pratt, L. H. Pratt, C. E. Gruber, M. Smith, E. S. Lander, N. Stange-Thomann, C. J. Lowe, J. Gerhart, et al. cDNA Sequences for Transcription Factors and Signaling Proteins of the Hemichordate Saccoglossus kowalevskii: Efficacy of the Expressed Sequence Tag (EST) Approach for Evolutionary and Developmental Studies of a New Organism Biol. Bull., June 1, 2008; 214(3): 284 - 302. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. H. Nagaraj, R. B. Gasser, and S. Ranganathan A hitchhiker's guide to expressed sequence tag (EST) analysis Brief Bioinform, January 1, 2007; 8(1): 6 - 21. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Malde, K. Schneeberger, E. Coward, and I. Jonassen RBR: library-less repeat detection for ESTs Bioinformatics, September 15, 2006; 22(18): 2232 - 2236. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Cui, P. K. Wall, J. H. Leebens-Mack, B. G. Lindsay, D. E. Soltis, J. J. Doyle, P. S. Soltis, J. E. Carlson, K. Arumuganathan, A. Barakat, et al. Widespread genome duplications throughout the history of flowering plants Genome Res., June 1, 2006; 16(6): 738 - 749. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. H. Pratt, C. Liang, M. Shah, F. Sun, H. Wang, St. P. Reid, A. R. Gingle, A. H. Paterson, R. Wing, R. Dean, et al. Sorghum Expressed Sequence Tags Identify Signature Genes for Drought, Pathogenesis, and Skotomorphogenesis from a Milestone Set of 16,801 Unique Transcripts Plant Physiology, October 1, 2005; 139(2): 869 - 884. [Abstract] [Full Text] [PDF] |
||||





