Skip Navigation


Bioinformatics Advance Access originally published online on December 15, 2005
Bioinformatics 2006 22(5):523-526; doi:10.1093/bioinformatics/btk003
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary data
Right arrow All Versions of this Article:
22/5/523    most recent
btk003v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Yang, G.
Right arrow Articles by Hall, T. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Yang, G.
Right arrow Articles by Hall, T. C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

HideNseek, a post-genome approach to locate transgenes exemplified in Arabidopsis thaliana

Guojun Yang {dagger} and Timothy C. Hall *

Institute of Developmental and Molecular Biology and Department of Biology, Texas A&M University College Station, TX 77843, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 ALGORITHM AND APPROACH
 IMPLEMENTATION
 RESULTS
 DISCUSSION AND CONCLUSION
 REFERENCES
 

Summary: Determination of transgene location is essential for investigating the effects of position on transgene expression levels and facilitates cloning of the resident gene affected by insertion. Currently used PCR-based approaches for determination of transgene location are relatively complicated and often fail when the transgene is duplicated, rearranged or fragmented. HideNseek is a new bioinformatics tool that allows computation of transgene locations, provided that a suitable genomic restriction enzyme digestion profile is available. Since the new approach is not based on the terminal sequences of the transgene insert, it is less sensitive to transgene duplication, rearrangement or fragmentation. HideNseek has been tested experimentally and by in silico simulation. The experimental example provided here shows that this simple approach is feasible, permitting rapid location of transgenes with little bench work.

Availability: available on request from the authors.

Contact: tim{at}idmb.tamu.edu

Supplementary data: HideNseek input and output examples, experimental procedures and figures showing experimental results are provided as supplementary files: Supplementary material 1, 2, 3 and Supplementary figures (Figs 1 and 2), respectively. Supplementary data is available at Bioinformatics online.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 ALGORITHM AND APPROACH
 IMPLEMENTATION
 RESULTS
 DISCUSSION AND CONCLUSION
 REFERENCES
 
Widely used technologies such as Agrobacterium-mediated T-DNA transformation (Bechtold and Pelletier, 1998; Hiei et al., 1994) and transposon-mediated transformation (Aarts et al., 1993; Garza et al., 1991; Loukeris et al., 1995) are effective for the insertion of foreign genetic material into host genomes. Identification of the genomic location of the insertion is important not only for the cloning of the resident gene affected by the insertion but also for gaining insight into the contribution of the position effect to variations in transgene expression levels. Consequently, various PCR methods have been developed to determine transgene locations (Balzergue et al., 2001; Cottage et al., 2001; Jones and Winistorfer, 1993; Lagerstrom et al., 1991; Ochman et al., 1988; Parker et al., 1991; Rohan et al., 1990). While being productive, the multiple factors impacting the efficacy of PCR frequently lead to inconsistent results from these approaches. Additionally, the frequent truncation, duplication and rearrangement of transgenic DNA insertions often affect bordering host sequences, rendering many PCR methods inaccurate. To circumvent these issues, we developed a bioinformatics approach that uses a genomic blot restriction fragment profile (p-blot). Fragment sizes calculated from the p-blot are fed into a program (named HideNseek) that allows computational determination of the transgene insertion locus by comparison with an in silico profile derived from the known genome sequence. Additionally, this p-blot approach can detect transgene rearrangement events and predict actual insertion sizes. An example is provided in which the T-DNA insertion locus in the Arabidopsis genome was identified using HideNseek.


    ALGORITHM AND APPROACH
 TOP
 ABSTRACT
 INTRODUCTION
 ALGORITHM AND APPROACH
 IMPLEMENTATION
 RESULTS
 DISCUSSION AND CONCLUSION
 REFERENCES
 
The HideNseek approach is based on the fact that a unique fragment profile can be generated in silico for any locus, provided a sufficient number of restriction enzymes is used. The location of a transgene can be determined by identification of an in silico profile that matches the experimental (p-blot) profile (Fig. 1A). As shown in the following equation, the location (l) of a transgene resides inside the overlapping region of the fragments obtained with different enzymes:

Formula 1(1)
where EI1 through EIz are the positions of the fragments from Enzyme I that have sizes determined experimentally by a genomic (Southern, 1975) profile blot. To evaluate the degree of correlation between the in silico profile of a predicted locus and the profile obtained by Southern blot analysis, a correlation coefficient (r) was introduced and is defined as

Formula 2(2)
where P is a fragment size in the predicted profile, B is a fragment size in the blot profile and i is the number of enzymes. The value for r is between –1 and +1; a high r value indicates a good correlation between the predicted profile and the p-blot profile.


Figure 1
View larger version (18K):
[in this window]
[in a new window]
 
Fig. 1 Flow chart for location of transgene position by HideNseek. (A) Diagram relating restriction fragments for three different restriction endonucleases (E1, E2, E3) to their arrangement on the genome (black horizontal line). M, DNA size marker. (B) Sequence of analyses used for HideNseek. To perform profile comparison, HideNseek interrogatcs the in silico data base derived from pseudochromosomes for a match where E1 = 4 kb, E2 = 1 kb and E3 = 2.5 kb. Since it is likely that several locations will fit these parameters, additional fragments derived rom digestion with Enzyme E4 and even E4 and E5 may be needed to define a unique chromosomal location. Once a single location is identified, accurate fragment sizes can be derived and the size of the insertion precisely determined.

 
After plant or animal transgenesis, progenies are selected or screened for transformants. Authentic transformants are typically confirmed by genomic DNA blot analysis. In the present approach, in place of digesting the genomic DNA with a single restriction enzyme, several enzymes are used. The enzymes selected must not have star activity and the number of recognition sites on the transferred DNA should typically produce four or fewer fragments. For each enzyme used, the fragment size, together with a range reflecting the error in size estimation, are tabulated to give a profile. Tabulation of the fragment size profiles for all enzymes used (p-blot) reflects the restriction profile of the genomic integration site and is subsequently identified by the HideNseek program (Fig. 1B).

Digester, a component in HideNseek, is used to digest database-derived pseudochromosome DNA sequences in silico with the enzymes used to obtain the p-blot. The restriction sites on the pseudochromosomes for a given enzyme are recorded. Multiple digestions using individual enzymes for all pseudochromosomes of the genome yield a pre-computed restriction fragment database. The major function of HideNseek is to carry out the task of profile matching. Once HideNseek is fed with the size profile obtained from the p-blot, it retrieves from the restriction database the location for each fragment that is within the expected size range for a certain band on p-blot. A pool of fragments is generated for each band of each enzyme. If a fragment from the pool of Enzyme 1 and another fragment from the pool of Enzyme 2 share an overlapping region, these two fragments are called an overlapping pair. The overlapping regions between the overlapping pairs form a new pool, which is subsequently used to screen against the pool of Enzyme 3. This process is reiterated until the pools for all of the enzymes used to establish the p-blot are screened. Locations that do not contain overlapping fragments are discarded from further analysis. The number of candidate locations identified by HideNseek decreases with each iteration, with the resultant in silico locus having a restriction profile that fits the experimental p-blot profile. The output designates the chromosome number and predicted locations of the in silico fragments. Since the integration of T-DNA or transposons is not always discrete, truncation, rearrangement or duplication of the insert may occur. As a result, the actual size of the integrated foreign DNA may differ from the presumptive insert size (e.g. the T-DNA size). To allow the detection of non-presumptive insert size, HideNseek is set to detect insert sizes ranging from one-half of the presumptive size to three times the presumptive insert length (in user defined increments). The output indicates the corrected size of the insertion as well as a correlation coefficient between the predicted profile and the input p-blot profile.


    IMPLEMENTATION
 TOP
 ABSTRACT
 INTRODUCTION
 ALGORITHM AND APPROACH
 IMPLEMENTATION
 RESULTS
 DISCUSSION AND CONCLUSION
 REFERENCES
 
Practical extraction and report language (Perl) (Schwartz and Christianson, 1997) was used to write the HideNseek program. Bioperl modules of Bio::Seq, Bio::SeqIO, Bio::Tools::RestrictionEnzyme.pm (Stajich et al., 2002) were also used. The programs were tested on ManDrake linux 8.1 and Windows NT/2000 systems. The method for data input is described in detail in the README.txt file that is included with the program set.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 ALGORITHM AND APPROACH
 IMPLEMENTATION
 RESULTS
 DISCUSSION AND CONCLUSION
 REFERENCES
 
To simulate locating transgenes in silico, genomic loci on Arabidopsis thaliana clones were randomly chosen as putative transgene insertion sites. Restriction profiles were obtained for a given set of enzymes. These profiles (Supplementary material 1 for input examples) were used as input for HideNseek. For all (>50) of the in silico datasets tested, the expected loci were predicted (Supplementary material 2 for output examples). Simultaneous identification of two presumptive transgene loci was also included in the test dataset and both loci were correctly predicted by the program.

To test the practical feasibility of the HideNseek approach, A.thaliana ecotype Columbia was transformed with pRubq2-953(dIn) (Supplementary Figure 1A), a derivative of pJDV (Hall et al., 2001). Hygromcycin resistant, GFP-expressing plants were selected. Plant Rubq2-953(dIn)-6 was identified as having a single locus insert on the basis of a 3:1 segregation ratio for hygromycin-resistant progeny. Genomic DNA from hygromycin-resistant Rubq2-953(dIn)-6 progeny was digested in single enzyme reactions with each of the following enzymes: BstXI, HindIII, MslI, NdeI, NruI, PmlI or PstI. Using a T-DNA probe obtained by digestion with EcoRV and NsiI (dashed line, Supplementary Figure 1A), a p-blot was obtained (Supplementary Figure 1A). The number of bands in the NdeI lane was more than that expected, indicating the presence of an additional T-DNA fragment, or a rearrangement, in the insertion locus. The PmlI digest also gave an anomalous profile. Therefore, data from the NdeI and PmlI digests were not used for calculation of the insert position. However, the digestion profiles obtained with the other five enzymes were discrete and contained the expected number of fragments; these profiles were used as input parameters (Supplementary material 1) for HideNseek. The output (Supplementary material 2) predicted a locus between positions 13 422 792 and 13 424 606 on Arabidopsis pseudochromosome 2 and a corrected insertion size of 6917 bp. This region corresponds to the accession no. AC007071 [GenBank] from 63 077 to 64 894 (Supplementary Figure 1C). To determine whether the predicted region is the bona fide insertion locus in Rubq2-953(dIn)-6, inverse PCR was undertaken (Supplementary Experimental Procedure and Figure 1A for primer positions). PstI was used to digest the genomic DNA because the p-blot in Supplementary Figure 2B revealed the PstI sites to be close to the T-DNA integration site, facilitating PCR as only a short product needed to be amplified. A single band of ~2.1 kb was obtained (Supplementary Figure 2A). Sequencing of the gel-purified PCR products revealed the presence of a duplicated insert region (~2 kb), that included the gfp coding sequence, immediately adjacent to the right border of the T-DNA (Supplementary Figure 2B). This duplication accounts for the difference between the predicted insertion size of 6917 bp and the presumptive T-DNA size of 4834 bp and explains the anomalous bands seen in Supplementary Figure 1B for the NdeI and PmlI digests. Sequencing beyond the duplicated region revealed the T-DNA insertion site to be at position 63 815 on AC007071 [GenBank] , confirming the prediction by HideNseek.


    DISCUSSION AND CONCLUSION
 TOP
 ABSTRACT
 INTRODUCTION
 ALGORITHM AND APPROACH
 IMPLEMENTATION
 RESULTS
 DISCUSSION AND CONCLUSION
 REFERENCES
 
As illustrated by the earlier example, a p-blot provides a restriction profile, an indication of transgene rearrangement and information helpful in choosing a suitable enzyme for confirmatory inverse PCR. Accurate estimation of fragment sizes and size ranges is important. A lower accuracy for estimated fragment size is usually accompanied by a larger fragment size range, yielding an increased number of predicted loci. Conversely, increasing the number of enzymes used to obtain the p-blot will reduce the number of loci predicted by HideNseek. In the example used here (Supplementary Figure 1), restriction enzymes that cut within the T-DNA regions were used to produce a profile by standard agarose gel electrophoresis. A p-blot may also be obtained with enzymes that only cut outside of the T-DNA sequence. Compared with using enzymes that cut inside the T-DNA, this approach has a higher tolerance for transgene rearrangements. However, the restriction fragments containing the T-DNA insertion are likely to be large, requiring pulsed-field gel electrophoresis for resolution.

The enzymes used to obtain a p-blot should produce fragments that are separable and resolvable by a chosen separation method. Since 5-cytosine methylation on genomic DNA may interfere with enzyme activity, another consideration in choosing an enzyme is its sensitivity to methylation. Ideally, all the enzymes chosen should be insensitive to DNA methylation, however, methylation sensitive enzymes may still be usable because (1) not every recognition site is methylated in a genome and (2) methylation can often be recognized by a unique banding (e.g. tailing).

The approach described here is suitable only for chromosomes or genomes that have been entirely sequenced. It may also have limited use for highly polymorphic genomes. As more pseudochromosome sequences are available from various organisms, the HideNseek approach will be applicable to more organisms.


    Acknowledgments
 
We thank Charlie Harris at IDMB and The Supercomputing Facility at Texas A&M University for expert help and establishing a suitable computing environment. Supported in part by NSF grant MCB-0110477.

Conflict of Interest: none declared.


    FOOTNOTES
 
{dagger}Present address: Department of Plant Biology, University of Georgia, Athens, GA 30602, USA

Associate Editor: Steven L. Salzberg Back

Received on July 13, 2005; revised on December 8, 2005; accepted on December 11, 2005

    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 ALGORITHM AND APPROACH
 IMPLEMENTATION
 RESULTS
 DISCUSSION AND CONCLUSION
 REFERENCES
 

    Aarts, M.G., et al. (1993) Transposon tagging of a male sterility gene in Arabidopsis. Nature, 6431, 715–717.

    Balzergue, S., et al. (2001) Improved PCR-walking for large-scale isolation of plant T-DNA borders. Biotechniques, 3, 496–498 502; 504.

    Bechtold, N. and Pelletier, G. (1998) In planta Agrobacterium-mediated transformation of adult Arabidopsis thaliana plants by vacuum infiltration. Methods Mol. Biol, . 82, 259–266[Medline].

    Cottage, A., et al. (2001) Identification of DNA sequences flanking T-DNA insertions by PCR-walking. Plant Mol. Biol. Rep, . 19, 321–327.

    Garza, D., et al. (1991) Introduction of the transposable element mariner into the germline of Drosophila melanogaster. Genetics, 2, 303–310.

    Hall, T.C., et al. (2001) Gene silencing and its reactivation in transgenic rice. In Khush, G.S. (Ed.), et al. Gene Silencing and Its Reactivation in Transgenic Rice, , Los Baños (Philippines) International Rice Research Institute, pp. 465–481.

    Hiei, Y., et al. (1994) Efficient transformation of rice (Oryza sativa L.) mediated by Agrobacterium and sequence analysis of the boundaries of the T-DNA. Plant J, . 2, 271–282.

    Jones, D.H. and Winistorfer, S.C. (1993) Genome walking with 2- to 4-kb steps using panhandle PCR. PCR Methods Appl, . 197–203.

    Lagerstrom, M., et al. (1991) Capture PCR: efficient amplification of DNA fragments adjacent to a known sequence in human and YAC DNA. PCR Methods Appl, . 2, 111–119.

    Loukeris, T.G., et al. (1995) Gene transfer into the medfly, Ceratitis capitata, with a Drosophila hydei transposable element. Science, 5244, 2002–2005.

    Ochman, H., et al. (1988) Genetic applications of an inverse polymerase chain reaction. Genetics, 3, 621–623.

    Parker, J.D., et al. (1991) Targeted gene walking polymerase chain reaction. Nucleic Acids Res, . 11, 3055–3060.

    Rohan, R.M., et al. (1990) Direct sequencing of PCR-amplified junction fragments from tandemly repeated transgenes. Nucleic Acids Res, . 20, 6089–6095.

    Schwartz, R.L. and Christianson, T. (1997) Learning Perl. , Sebastopol, CA O'Reilly.

    Southern, E.M. (1975) Detection of specific sequences among DNA fragments separated by gel electrophoresis. J. Mol. Biol, . 3, 503–517.

    Stajich, J.E., et al. (2002) The bioperl toolkit: perl modules for the life sciences. Genome Res, . 10, 1611–1618.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary data
Right arrow All Versions of this Article:
22/5/523    most recent
btk003v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Yang, G.
Right arrow Articles by Hall, T. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Yang, G.
Right arrow Articles by Hall, T. C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?