Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (82)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Chou, H.-H.
Right arrow Articles by Holmes, M. H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Chou, H.-H.
Right arrow Articles by Holmes, M. H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics Vol. 17 no. 12 2001
Pages 1093-1104
© 2001 Oxford University Press

DNA sequence quality trimming and vector removal

Hui-Hsien Chou 1 and Michael H. Holmes 2

1 Department of Zoology and Genetics, Department of Computer Science, Iowa State University, Ames, IA 50011, USA
2 Department of Bioinformatics, The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA

Received on March 8, 2001 ; revised on June 11, 2001 ; accepted on June 12, 2001

Motivation: Most sequence comparison methods assume that the data being compared are trustworthy, but this is not the case with raw DNA sequences obtained from automatic sequencing machines. Nevertheless, sequence comparisons need to be done on them in order to remove vector splice sites and contaminants. This step is necessary before other genomic data processing stages can be carried out, such as fragment assembly or EST clustering. A specialized tool is therefore needed to solve this apparent dilemma.

Results: We have designed and implemented a program that specifically addresses the problem. This program, called LUCY, has been in use since 1998 at The Institute for Genomic Research (TIGR). During this period, many rounds of experience-driven modifications were made to LUCY to improve its accuracy and its ability to deal with extremely difficult input cases. We believe we have finally obtained a useful program which strikes a delicate balance among the many issues involved in the raw sequence cleaning problem, and we wish to share it with the research community.

Availability: LUCY is available directly from TIGR (http://www.tigr.org/softlab). Academic users can download LUCY after accepting a free academic use license. Business users may need to pay a license fee to use LUCY for commercial purposes.

Contact: Questions regarding the quality assessment module of LUCY should be directed to Michael Holmes (mholmes{at}tigr.org). Questions regarding other aspects of LUCY should be directed to Hui-Hsien Chou (hhchou{at}iastate.edu).


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
GeneticsHome page
C. Liang, Y. Liu, L. Liu, A. C. Davis, Y. Shen, and Q. Q. Li
Expressed Sequence Tags With cDNA Termini: Previously Overlooked Resources for Gene Annotation and Transcriptome Exploration in Chlamydomonas reinhardtii
Genetics, May 1, 2008; 179(1): 83 - 93.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. Malde
The effect of sequence quality on sequence alignment
Bioinformatics, April 1, 2008; 24(7): 897 - 900.
[Abstract] [Full Text] [PDF]


Home page
ScienceHome page
H. Feng, M. Shuda, Y. Chang, and P. S. Moore
Clonal Integration of a Polyomavirus in Human Merkel Cell Carcinoma
Science, February 22, 2008; 319(5866): 1096 - 1100.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. R. White, M. Roberts, J. A. Yorke, and M. Pop
Figaro: a novel statistical method for vector sequence removal
Bioinformatics, February 15, 2008; 24(4): 462 - 467.
[Abstract] [Full Text] [PDF]


Home page
Plant CellHome page
J. Nagel, L. K. Culley, Y. Lu, E. Liu, P. D. Matthews, J. F. Stevens, and J. E. Page
EST Analysis of Hop Glandular Trichomes Identifies an O-Methyltransferase That Catalyzes the Biosynthesis of Xanthohumol
PLANT CELL, January 1, 2008; 20(1): 186 - 200.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. Liang, G. Wang, L. Liu, G. Ji, Y. Liu, J. Chen, J. S. Webb, G. Reese, and J. F. D. Dean
WebTraceMiner: a web service for processing and mining EST sequence trace files
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W137 - W142.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
J. H. Kim, M. S. Waterman, and L. M. Li
Diploid genome reconstruction of Ciona intestinalis and comparative analysis with Ciona savignyi
Genome Res., July 1, 2007; 17(7): 1101 - 1110.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
V. Corby-Harris, A. C. Pontaroli, L. J. Shimkets, J. L. Bennetzen, K. E. Habel, and D. E. L. Promislow
Geographical Distribution and Diversity of Bacteria Associated with Natural Populations of Drosophila melanogaster
Appl. Envir. Microbiol., June 1, 2007; 73(11): 3470 - 3479.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
H. Kim, P. S. Miguel, W. Nelson, K. Collura, M. Wissotski, J. G. Walling, J. P. Kim, S. A. Jackson, C. Soderlund, and R. A. Wing
Comparative Physical Mapping Between Oryza sativa (AA Genome Type) and O. punctata (BB Genome Type)
Genetics, May 1, 2007; 176(1): 379 - 390.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Jain, J. Shrager, E. H. Harris, R. Halbrook, A. R. Grossman, C. Hauser, and O. Vallon
EST assembly supported by a draft genome sequence: an analysis of the Chlamydomonas reinhardtii transcriptome
Nucleic Acids Res., April 1, 2007; (2007) gkm081v2.
[Abstract] [Full Text] [PDF]


Home page
J Exp BotHome page
B. Ouyang, T. Yang, H. Li, L. Zhang, Y. Zhang, J. Zhang, Z. Fei, and Z. Ye
Identification of early salt stress response genes in tomato root by suppression subtractive hybridization and microarray analysis
J. Exp. Bot., February 1, 2007; 58(3): 507 - 520.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. R. Cole, B. Chai, R. J. Farris, Q. Wang, A. S. Kulam-Syed-Mohideen, D. M. McGarrell, A. M. Bandela, E. Cardenas, G. M. Garrity, and J. M. Tiedje
The ribosomal database project (RDP-II): introducing myRDP space and quality controlled public data
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D169 - D172.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
S. J. Emrich, W. B. Barbazuk, L. Li, and P. S. Schnable
Gene discovery and annotation using LCM-454 transcriptome sequencing
Genome Res., January 1, 2007; 17(1): 69 - 73.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
J. S. Hawkins, H. Kim, J. D. Nason, R. A. Wing, and J. F. Wendel
Differential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium
Genome Res., October 1, 2006; 16(10): 1252 - 1261.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
J. A. Udall, J. M. Swanson, K. Haller, R. A. Rapp, M. E. Sparks, J. Hatfield, Y. Yu, Y. Wu, C. Dowd, A. B. Arpat, et al.
A global assembly of cotton ESTs
Genome Res., March 1, 2006; 16(3): 441 - 450.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
J. S.S. Ammiraju, M. Luo, J. L. Goicoechea, W. Wang, D. Kudrna, C. Mueller, J. Talag, H. Kim, N. B. Sisneros, B. Blackmon, et al.
The Oryza bacterial artificial chromosome library resource: Construction and analysis of 12 deep-coverage large-insert BAC libraries that represent the 10 genome types of the genus Oryza
Genome Res., January 1, 2006; 16(1): 140 - 147.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
Y. Fu, S. J. Emrich, L. Guo, T.-J. Wen, D. A. Ashlock, S. Aluru, and P. S. Schnable
Quality assessment of maize assembled genomic islands (MAGIs) and large-scale experimental verification of predicted genes
PNAS, August 23, 2005; 102(34): 12282 - 12287.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
X. J. Min, G. Butler, R. Storms, and A. Tsang
TargetIdentifier: a webserver for identifying full-length cDNAs from EST sequences
Nucleic Acids Res., July 1, 2005; 33(suppl_2): W669 - W672.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. M. Durham, A. Y. Kashiwabara, F. T. G. Matsunaga, P. H. Ahagon, F. Rainone, L. Varuzza, and A. Gruber
EGene: a configurable pipeline generation system for automated sequence analysis
Bioinformatics, June 15, 2005; 21(12): 2812 - 2813.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
C. Jantasuriyarat, M. Gowda, K. Haller, J. Hatfield, G. Lu, E. Stahlberg, B. Zhou, H. Li, H. Kim, Y. Yu, et al.
Large-Scale Identification of Expressed Sequence Tags Involved in Rice and Rice Blast Fungus Interaction
Plant Physiology, May 1, 2005; 138(1): 105 - 115.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
J. Messing, A. K. Bharti, W. M. Karlowski, H. Gundlach, H. R. Kim, Y. Yu, F. Wei, G. Fuks, C. A. Soderlund, K. F. X. Mayer, et al.
Sequence composition and genome organization of maize
PNAS, October 5, 2004; 101(40): 14349 - 14354.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
Y. Fu, A.-P. Hsia, L. Guo, and P. S. Schnable
Types and Frequencies of Sequencing Errors in Methyl-Filtered and High C0t Maize Genome Survey Sequences
Plant Physiology, August 1, 2004; 135(4): 2040 - 2045.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
B. Chevreux, T. Pfisterer, B. Drescher, A. J. Driesel, W. E.G. Muller, T. Wetter, and S. Suhai
Using the miraEST Assembler for Reliable and Automated mRNA Transcript Assembly and SNP Detection in Sequenced ESTs
Genome Res., June 1, 2004; 14(6): 1147 - 1159.
[Abstract] [Full Text] [PDF]


Home page
Plant CellHome page
J. C. Kuhl, F. Cheung, Q. Yuan, W. Martin, Y. Zewdie, J. McCallum, A. Catanach, P. Rutherford, K. C. Sink, M. Jenderek, et al.
A Unique Set of 11,008 Onion Expressed Sequence Tags Reveals Expressed Sequence and Genomic Differences between the Monocot Orders Asparagales and Poales
PLANT CELL, January 1, 2004; 16(1): 114 - 125.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
F. Qiu, L. Guo, T.-J. Wen, F. Liu, D. A. Ashlock, and P. S. Schnable
DNA Sequence-Based "Bar Codes" for Tracking the Origins of Expressed Sequence Tags from a Maize cDNA Library Constructed Using Multiple mRNA Sources
Plant Physiology, October 1, 2003; 133(2): 475 - 481.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. Sorek and H. M. Safer
A novel algorithm for computational identification of contaminated EST libraries
Nucleic Acids Res., February 1, 2003; 31(3): 1067 - 1074.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.