Bioinformatics Advance Access originally published online on February 26, 2004
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics 20(11) © Oxford University Press 2004; all rights reserved.
An a posteriori strategy for enhancing gene discovery in anonymous cDNA microarray experiments
1 CSIRO Mathematical and Information Sciences, GPO Box 664, Canberra ACT 2601, 2 CSIRO Plant Industry and Cotton R&D Corporation and 3 CSIRO Plant Industry and CRC for Sustainable Rice Production, GPO Box 1600, Canberra ACT 2601, Australia
Received on September 19, 2003; revised on December 15, 2003; accepted on January 25, 2003
Advance Access Publication February 26, 2004
Motivation: Because of the high cost of sequencing, the bulk of gene discovery is performed using anonymous cDNA microarrays. Though the clones on such arrays are easier and cheaper to construct and utilize than unigene and oligonucleotide arrays, they are there in proportion to their corresponding gene expression activity in the tissue being examined. The associated redundancy will be there in any pool of possibly interesting differentially expressed clones identified in a microarray experiment for subsequent sequencing and investigation. An a posteriori sampling strategy is proposed to enhance gene discovery by reducing the impact of the redundancy in the identified pool.
Results: The proposed strategy exploits the fact that individual genes that are highly expressed in a tissue are more likely to be present as a number of spots in an anonymous library and, as a direct consequence, are also likely to give higher fluorescence intensity responses when present in a probe in a cDNA microarray experiment. Consequently, spots that respond with low intensities will have a lower redundancy and so should be sequenced in preference to those with the highest intensities. The proposed method, which formalizes how the fluorescence intensity of a spot should be assessed, is validated using actual microarray data, where the sequences of all the clones in the identified pool had been previously determined. For such validations, the concept of a repeat plot is introduced. It is also utilized to visualize and examine different measures for the characterization of fluorescence intensity. In addition, as confirmatory evidence, sequencing from the lowest to the highest intensities in a pool, with all the sequences known, is compared graphically with their random sequencing. The results establish that, in general, the opportunity for gene discovery is enhanced by avoiding the pooling of different biological libraries (because their construction will have involved different hybridization episodes) and concentrating on the clones with lower fluorescence intensities.
Contact: Bob.Anderssen{at}csiro.au