Bioinformatics Vol. 17 no. 12 2001
Pages 1093-1104
© 2001 Oxford University Press
DNA sequence quality trimming and vector removal
1 Department of Zoology and Genetics,
Department of Computer Science, Iowa State University, Ames, IA
50011, USA
2 Department of Bioinformatics, The
Institute for Genomic Research, 9712 Medical Center Drive,
Rockville, MD 20850, USA
Received on March 8, 2001
; revised on June 11, 2001
; accepted on June 12, 2001
Motivation: Most sequence comparison methods assume that the data being compared are trustworthy, but this is not the case with raw DNA sequences obtained from automatic sequencing machines. Nevertheless, sequence comparisons need to be done on them in order to remove vector splice sites and contaminants. This step is necessary before other genomic data processing stages can be carried out, such as fragment assembly or EST clustering. A specialized tool is therefore needed to solve this apparent dilemma.
Results: We have designed and implemented a program that specifically addresses the problem. This program, called LUCY, has been in use since 1998 at The Institute for Genomic Research (TIGR). During this period, many rounds of experience-driven modifications were made to LUCY to improve its accuracy and its ability to deal with extremely difficult input cases. We believe we have finally obtained a useful program which strikes a delicate balance among the many issues involved in the raw sequence cleaning problem, and we wish to share it with the research community.
Availability: LUCY is available directly from TIGR (http://www.tigr.org/softlab). Academic users can download LUCY after accepting a free academic use license. Business users may need to pay a license fee to use LUCY for commercial purposes.
Contact: Questions regarding the quality assessment module of LUCY should be directed to Michael Holmes (mholmes{at}tigr.org). Questions regarding other aspects of LUCY should be directed to Hui-Hsien Chou (hhchou{at}iastate.edu).
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
C. Liang, Y. Liu, L. Liu, A. C. Davis, Y. Shen, and Q. Q. Li Expressed Sequence Tags With cDNA Termini: Previously Overlooked Resources for Gene Annotation and Transcriptome Exploration in Chlamydomonas reinhardtii Genetics, May 1, 2008; 179(1): 83 - 93. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Malde The effect of sequence quality on sequence alignment Bioinformatics, April 1, 2008; 24(7): 897 - 900. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Feng, M. Shuda, Y. Chang, and P. S. Moore Clonal Integration of a Polyomavirus in Human Merkel Cell Carcinoma Science, February 22, 2008; 319(5866): 1096 - 1100. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. R. White, M. Roberts, J. A. Yorke, and M. Pop Figaro: a novel statistical method for vector sequence removal Bioinformatics, February 15, 2008; 24(4): 462 - 467. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Nagel, L. K. Culley, Y. Lu, E. Liu, P. D. Matthews, J. F. Stevens, and J. E. Page EST Analysis of Hop Glandular Trichomes Identifies an O-Methyltransferase That Catalyzes the Biosynthesis of Xanthohumol PLANT CELL, January 1, 2008; 20(1): 186 - 200. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Liang, G. Wang, L. Liu, G. Ji, Y. Liu, J. Chen, J. S. Webb, G. Reese, and J. F. D. Dean WebTraceMiner: a web service for processing and mining EST sequence trace files Nucleic Acids Res., July 13, 2007; 35(suppl_2): W137 - W142. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. H. Kim, M. S. Waterman, and L. M. Li Diploid genome reconstruction of Ciona intestinalis and comparative analysis with Ciona savignyi Genome Res., July 1, 2007; 17(7): 1101 - 1110. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Corby-Harris, A. C. Pontaroli, L. J. Shimkets, J. L. Bennetzen, K. E. Habel, and D. E. L. Promislow Geographical Distribution and Diversity of Bacteria Associated with Natural Populations of Drosophila melanogaster Appl. Envir. Microbiol., June 1, 2007; 73(11): 3470 - 3479. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Kim, P. S. Miguel, W. Nelson, K. Collura, M. Wissotski, J. G. Walling, J. P. Kim, S. A. Jackson, C. Soderlund, and R. A. Wing Comparative Physical Mapping Between Oryza sativa (AA Genome Type) and O. punctata (BB Genome Type) Genetics, May 1, 2007; 176(1): 379 - 390. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Jain, J. Shrager, E. H. Harris, R. Halbrook, A. R. Grossman, C. Hauser, and O. Vallon EST assembly supported by a draft genome sequence: an analysis of the Chlamydomonas reinhardtii transcriptome Nucleic Acids Res., April 1, 2007; (2007) gkm081v2. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Ouyang, T. Yang, H. Li, L. Zhang, Y. Zhang, J. Zhang, Z. Fei, and Z. Ye Identification of early salt stress response genes in tomato root by suppression subtractive hybridization and microarray analysis J. Exp. Bot., February 1, 2007; 58(3): 507 - 520. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. R. Cole, B. Chai, R. J. Farris, Q. Wang, A. S. Kulam-Syed-Mohideen, D. M. McGarrell, A. M. Bandela, E. Cardenas, G. M. Garrity, and J. M. Tiedje The ribosomal database project (RDP-II): introducing myRDP space and quality controlled public data Nucleic Acids Res., January 12, 2007; 35(suppl_1): D169 - D172. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. J. Emrich, W. B. Barbazuk, L. Li, and P. S. Schnable Gene discovery and annotation using LCM-454 transcriptome sequencing Genome Res., January 1, 2007; 17(1): 69 - 73. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S. Hawkins, H. Kim, J. D. Nason, R. A. Wing, and J. F. Wendel Differential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium Genome Res., October 1, 2006; 16(10): 1252 - 1261. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Udall, J. M. Swanson, K. Haller, R. A. Rapp, M. E. Sparks, J. Hatfield, Y. Yu, Y. Wu, C. Dowd, A. B. Arpat, et al. A global assembly of cotton ESTs Genome Res., March 1, 2006; 16(3): 441 - 450. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S.S. Ammiraju, M. Luo, J. L. Goicoechea, W. Wang, D. Kudrna, C. Mueller, J. Talag, H. Kim, N. B. Sisneros, B. Blackmon, et al. The Oryza bacterial artificial chromosome library resource: Construction and analysis of 12 deep-coverage large-insert BAC libraries that represent the 10 genome types of the genus Oryza Genome Res., January 1, 2006; 16(1): 140 - 147. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Fu, S. J. Emrich, L. Guo, T.-J. Wen, D. A. Ashlock, S. Aluru, and P. S. Schnable Quality assessment of maize assembled genomic islands (MAGIs) and large-scale experimental verification of predicted genes PNAS, August 23, 2005; 102(34): 12282 - 12287. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. J. Min, G. Butler, R. Storms, and A. Tsang TargetIdentifier: a webserver for identifying full-length cDNAs from EST sequences Nucleic Acids Res., July 1, 2005; 33(suppl_2): W669 - W672. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. Durham, A. Y. Kashiwabara, F. T. G. Matsunaga, P. H. Ahagon, F. Rainone, L. Varuzza, and A. Gruber EGene: a configurable pipeline generation system for automated sequence analysis Bioinformatics, June 15, 2005; 21(12): 2812 - 2813. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Jantasuriyarat, M. Gowda, K. Haller, J. Hatfield, G. Lu, E. Stahlberg, B. Zhou, H. Li, H. Kim, Y. Yu, et al. Large-Scale Identification of Expressed Sequence Tags Involved in Rice and Rice Blast Fungus Interaction Plant Physiology, May 1, 2005; 138(1): 105 - 115. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Messing, A. K. Bharti, W. M. Karlowski, H. Gundlach, H. R. Kim, Y. Yu, F. Wei, G. Fuks, C. A. Soderlund, K. F. X. Mayer, et al. Sequence composition and genome organization of maize PNAS, October 5, 2004; 101(40): 14349 - 14354. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Fu, A.-P. Hsia, L. Guo, and P. S. Schnable Types and Frequencies of Sequencing Errors in Methyl-Filtered and High C0t Maize Genome Survey Sequences Plant Physiology, August 1, 2004; 135(4): 2040 - 2045. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Chevreux, T. Pfisterer, B. Drescher, A. J. Driesel, W. E.G. Muller, T. Wetter, and S. Suhai Using the miraEST Assembler for Reliable and Automated mRNA Transcript Assembly and SNP Detection in Sequenced ESTs Genome Res., June 1, 2004; 14(6): 1147 - 1159. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Kuhl, F. Cheung, Q. Yuan, W. Martin, Y. Zewdie, J. McCallum, A. Catanach, P. Rutherford, K. C. Sink, M. Jenderek, et al. A Unique Set of 11,008 Onion Expressed Sequence Tags Reveals Expressed Sequence and Genomic Differences between the Monocot Orders Asparagales and Poales PLANT CELL, January 1, 2004; 16(1): 114 - 125. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Qiu, L. Guo, T.-J. Wen, F. Liu, D. A. Ashlock, and P. S. Schnable DNA Sequence-Based "Bar Codes" for Tracking the Origins of Expressed Sequence Tags from a Maize cDNA Library Constructed Using Multiple mRNA Sources Plant Physiology, October 1, 2003; 133(2): 475 - 481. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Sorek and H. M. Safer A novel algorithm for computational identification of contaminated EST libraries Nucleic Acids Res., February 1, 2003; 31(3): 1067 - 1074. [Abstract] [Full Text] [PDF] |
||||









