Bioinformatics Advance Access originally published online on October 18, 2005
Bioinformatics 2005 21(24):4414-4415; doi:10.1093/bioinformatics/bti709
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Evaluating and improving cDNA sequence quality with cQC
1Department of Plant Sciences, University of Arizona Tucson, AZ 85721-0036, USA
2Department of Computer Science, University of Arizona Tucson, AZ 85721-0077, USA
*To whom correspondence should be addressed.
Summary: Errors are prevalent in cDNA sequences but the extent to which sequence collections differ in frequencies and types of errors has not been investigated systematically. cDNA quality control, or cQC, was developed to evaluate the quality of cDNA sequence collections and to revise those sequences that differ from a higher quality genomic sequence. After removing rRNA, vector, bacterial insertion sequence and chimeric cDNA contaminants, small-scale nucleotide discrepancies were found in 51% of cDNA sequences from one Arabidopsis cDNA collection, 89% from a second Arabidopsis collection and 75% from a rice collection. These errors created premature termination codons in 4 and 42% of cDNA sequences in the respective Arabidopsis collections and in 7% of the rice cDNA sequences.
Availability: A web-based version of cQC, source code and revised cDNA collections are available at http://genomics.arizona.edu/software/cQC/
Contact: raj{at}ag.arizona.edu
Supplementary information: Further text, tables and figures are available at the above website or on Bioinformatics online.
Received on May 27, 2005; revised on September 1, 2005; accepted on October 6, 2005