Bioinformatics Advance Access originally published online on January 18, 2007
Bioinformatics 2007 23(6):680-686; doi:10.1093/bioinformatics/btl669
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A 3D pattern matching algorithm for DNA sequences
LIMSI-CNRS, Univ. Paris-Sud, 91403 Orsay, France
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Biologists usually work with textual DNA sequences (succession of A, C, G and T). This representation allows biologists to study the syntax and other linguistic properties of DNA sequences. Nevertheless, such a linear coding offers only a local and a one-dimensional vision of the molecule. The 3D structure of DNA is known to be very important in many essential biological mechanisms. By using 3D conformation models, one is able to construct a 3D trajectory of a naked DNA molecule. From the various studies that we performed, it turned out that two very different textual DNA sequences could have similar 3D structures.
Results: In this article, we address a new research work on 3D pattern matching for DNA sequences. The aim of this work is to enhance conventional pattern matching analyses with 3D-augmented criteria. We have developed an algorithm, based on 3D trajectories, which compares angles formed by these trajectories and thus quantifies the difference between two 3D DNA sequences. This analysis performs from a global scale to al local one.
Availability: Available on request from the authors.
Contact: herisson{at}epigenomique.genopole.fr
| 1 INTRODUCTION |
|---|
|
|
|---|
Huge quantities of data are available today since the development of current sequencing programs. These raw data require the extraction of biologically relevant information such as gene positions, expression signals, and so on. This analysis partly rests on required motifs which consist in locating definite sequence segments corresponding to documented biological functions (Dardel and Képès, 2002).
Most biologists work with textual DNA representations: sequence of the four known nucleotides A, C, G and T. This representation makes it possible to study the syntax and other linguistic properties of DNA sequences in order to detect, predict, classify or validate. For detection and prediction, we can do pattern matching (Baeza-Yates and Gonnet, 1992; Boyer and Moore, 1977; Karp and Rabin, 1987) in order to detect genes start, repeated sequences possibly organized in palindromes, or many other specific sites. However, such a linear coding offers only a local and a monodimensional vision of DNA molecules. However, within a cell, DNA is coiled as a double-helix (Watson and Crick, 1953), resulting in a complex trajectory defined in space.
Nowadays, we are able to observe molecules (mainly proteins and DNA) through two methods which allow calculating the three-dimensional (3D) coordinates of each atom of these molecules: X-ray crystallography and nuclear magnetic resonance (NMR). These two processes are experimental, what means the result obtained depends on the experiment conditions (molecules are often denaturated). NMR (Carrington and McLachlan, 1967) solves the 3D structure of proteins in solution (thus less denaturated) but is limited to small proteins (<30 kDa). In addition, NMR can solve some hydrogen atoms. For bigger molecules, there is X-ray crystallography (Buerger, 1942). In general, this method is not able to solve hydrogen atoms position or to distinguish nitrogen from oxygen and carbon.
Because it is very difficult and very expensive to observe DNA in vitro (a fortiori in vivo), modelling provides a good alternative. The 3D structure of DNA is very important in many essential biological mechanisms, like replication, transcription or regulation. Experimental evidence has demonstrated contributions of DNA intrinsic curvature in regulating the transcription of several genes, such as those coding for H-NS histone-like protein (Atlung and Ingmer, 1997),
s (Espinosa-Urgel and Tormo, 1993), IHF and HU regulatory proteins (Pérez-Martin et al., 1994),
54-dependent glnA (Carmona and Magasanik, 1996) and artificial constructs using the T7 virus promoter (Collis et al., 1989), among others. Let us take the example of sigma factor
s (KatF) that controls the expression of number of genes in Escherichia coli. Promoters recognized by
s do not present a good consensus sequence in their amount regions while they are located in curved DNA regions.
Thanks to 3D conformation models (Bolshoy et al., 1991; Cacchione et al., 1989; De Santis et al., 1988; El Hassan and Calladine, 1995; Shpigelman et al., 1993), we are able to construct a 3D trajectory of a naked DNA molecule from its textual sequence. As DNA is wrapped around nucleosomes and displays high levels of folding inside cells, these 3D conformation models do not aim to represent these 3D structures. Indeed, the aim of this approach is to offer another point of view on naked DNA sequences and to try to extract some pertinent (from a biological point of view) information from these 3D data.
The 3D information, from model construction, could either be visualized or processed further (Shpigelman et al., 1993). The 3D visualization, contrary to the textual one, offers a global view of the molecules and was achieved through software called ADN-Viewer (Gherbi and Hérisson, 2002; Hérisson et al., 2004). The 3D engine of our ADN-Viewer software takes both textual DNA sequences and the 3D conformation model as input, and it outputs the 3D coordinates of each nucleotide. The 3D conformation model provides, for each dinucleotide (i.e. each succession of two nucleotides in the textual sequence), three angular values and a raise translation. We have chosen this model because it is the only one that provides an algorithm for bases pair's stacking up. Currently, ADN-Viewer can load and visualize multiple sequences of tens of millions of nucleotides (depending on memory size) on a desktop workstation as well in a virtual reality immersive environment.
From the various studies on 3D trajectories of DNA sequences that we performed (Gherbi and Hérisson, 2002; Hérisson et al., 2004; Matte-Tailliez et al., 2006), it turned out that two very different textual DNA sequences could have similar 3D structures. Besides, it seems that the 3D structure of DNA plays a significant role in the interaction mechanisms with the other biological elements of its environment (Burleigh et al., 2003). In this context, it becomes essential to compare two DNA sequences using 3D criteria in order to highlight motifs that have a common 3D structure.
However, it seems difficult to directly apply traditional pattern matching algorithms (Baeza-Yates and Gonnet, 1992; Boyer and Moore, 1977; Karp and Rabin, 1987) to 3D data. Indeed, if two elements coming from two sequences are to be compared, the meaning of such a comparison will not be the same according to whether one compares two letters (out of four) or two floating numbers (out of an infinity). One would sample continuous 3D data to fit them to the conventional algorithms by introducing a margin of error. But the main obstacle is the local approach of these algorithms. Even if they perform smart shifts of the motif along the sequence or have fast decision engines in order to avoid quadratic number of character comparisons, they do a very basic comparison: character by character. By applying the local comparison approach to sampled 3D data, the error will progressively cumulate what would increase the number of false-positives. We have to apply 3D comparisons methods if we consider 3D curved objects.
There are several methods to analyse and to compare curved objects. A method for matching curves that accommodates large and small deformations was implemented in 1998 (Cohen and Herlin, 1998). This method preserves geometric similarities in the case of small deformation, and decreases geometric constraints when large deformations occur. The approach is based on the computation of a set of geodesic paths connecting the curves. These two curves are defined as a source area and a destination area that can have an arbitrary number of connected components and different topologies. In computer vision, some methods have been developed to measure the degree of similarity between two image contours. One of them (Basri et al., 1995) proposes to compare contours by taking into account deformations in object shape, the articulation of parts and variations in the shape and size of portions of objects. This method uses dynamic programming to compute the minimum cost of bringing one shape into the other via local deformations. Another method to study similarities between curves consists of a 2D shape recognition and classification method based on matching shape outlines (Sebastian et al., 2001). The correspondence between outlines (curves) is based on a notion of an alignment curve and on a measure of similarity between the intrinsic properties of the curve, namely, length and curvature, and is found by an efficient dynamic-programming method. The correspondence is used to find a similarity measure that is used in a recognition system. Most methods that measure similarities between two curves are designed to analyse very complex curves (real images) or parametric curves without any knowledge on them. However, our 3D DNA sequences are actually defined by points, so we have a very high resolution knowledge of the objects. Another work has been published on a comparison of shapes of 3D DNA minicircles trajectories based on root mean square deviation (RMSD) minimization procedure (Amzallag et al., 2006). This approach is well fitted for constrained DNA fragments (as minicircles for instance), i.e. we know that all sequences formed minicircles. Our data are without a priori knowledge.
The aim of this article is to show the enrichment of conventional studies with 3D criteria. A 3D approach could be beneficial in two aspects:
- to discover hidden phenomena from the textual sequence
- to reveal phenomena easier/faster, as compared to textual sequence.
Thus, 3D-based analysis does not replace 1D technique, but it may rather be exploited as a complementary approach. On the one hand, 1D-based algorithms are suitable mainly for local analyses by performing nucleotide to nucleotide comparison; on the other hand, 3D approaches offer a more global analysis. Thus, a global approach may be advantageous to compare 3D DNA sequences.
| 2 METHODS |
|---|
|
|
|---|
The goal of pattern matching is to find all the positions of a motif M of size m in a sequence T of size n. The first stage of the study is to define what a 3D comparison of two sequences is while the second one consists in defining the notion of equality between two angles.
2.1 First stage—definition of a 3D comparison
For each sequence, data are represented by the succession of 3D coordinates of the centre of each plate, computed by the 3D engine of ADN-Viewer. If we would like to keep a control on the rotational setting of DNA, we can represent data by the succession of 3D vectors, the algorithm will work the same way. The comparison between two 3D sequences, basing itself directly on the 3D coordinates, does not support rotations in the motif referential. Indeed, even if one considers vectors rather than 3D coordinates, two identical motifs having different orientations in space will have different vector coordinates. Thus, the comparison cannot be done in an absolute referential. One would compare the relative 3D coordinates but it would come in all ways to perform a local search. Now, if we consider the angles formed by couple of vectors, we obtain a value that remains invariant whatever the referential is. Our approach consists in transforming 3D coordinates into angles from a global scale to a local one in order to decrease the number of false-positives (Fig. 1). We apply a one-by-one method of displacement of the motif along the sequence and a principle of calculation of the vectors on the fly. Thus, one will make the m length motif shift (nucleotide by nucleotide) along the sequence and will compare the angles formed by each couple of vectors.
|
2.2 Second stage—definition of angles equality
The second stage of the study consists in defining the notion of equality between two angles. The only case where angles of a motif will be mathematically equal to those of a fragment of sequence happens when both sequences have the same succession of nucleotides. This equality can be detected from the textual sequences in a very efficient way (Boyer and Moore, 1977). Our approach addresses the cases in which textual sequences are significantly different, but which give 3D similar patterns. It is thus preferable to compare angles in a flexible way, by introducing an error parameter
. A strong similarity will be obtained by small values of
whereas a less selective detection will be defined by greater values of
.
2.3 The approach
For each motif position on the sequence, the algorithm calculates the angle displayed on the right of DEPTH 1 in Figure 2 and defined by:
|
|
|
It computes in the same way the angle corresponding to the fragment of the compared sequence (cf. left of DEPTH 1 in Fig. 2). In this way, we obtain global information on both the motif and the sequence fragment. On the one hand, if both angles are different, then both trajectories are different; on the other hand, if both angles are similar, both trajectories would be similar. The algorithm cuts each vector into two equal vectors in order to refine the comparison. Then, it compares the angles in a linear way (i.e. in the order of appearance as displayed on DEPTH 2 and 3 in Fig. 2), and it stops when the angles are too different or when the vectors are formed by 10 nucleotides (one double helix turn). All along the process, the algorithm calculates a comparison score (ranging between 0 and 100) that depends on both the number of iterations performed and on the difference between all the compared angles. Indeed, for a given margin of error, the similarity is not the same depending on whether the comparison process stops at the first iteration or at the tenth one.
Figure 3 shows that the vector-based cutting allows the algorithm to distinguish, from the first iteration, both trajectories which have been wrongly recognized as similar by a local comparison method (Fig. 1).
|
The user has to define the score threshold that he/she considers as sufficient for the recognition. A score value of 100 means an exact detection, whereas a smaller value makes it possible to recognize similar motifs. Let us consider a threshold score value at 95: it means only the fragments of sequences similar at 95% to the motif (according to the combination of the difference between the compared angles and the number of iterations) will be detected by the algorithm. We note that exact search is equivalent to a textual pattern matching, but that the algorithmic complexity of this approach is, on average, linear (in the length of the sequence) and remains less powerful than a Boyer–Moore solution (Boyer and Moore, 1977), which has a sub-linear complexity on average. The number of iterations is taken into account through a bonus score calculated according to the level of cutting (the depth) and which is added to the original calculated score. Thus, the deeper we are, the bigger the difference between two angles is to be accepted. In this way, if the end (the last 10%) of the sequence is very different (sharp bend for instance) from the motif, the algorithm would find it as similar as the motif if the global trajectories are similar. The number of false-negatives results is thus reduced. The user can fit the bonus parameter to control how the tolerated error will evolve during the matching process: the bigger bonus will be the more tolerated local accidents on the 3D trajectory will be.
In other words, the user defines two parameters:
- the score threshold (original error tolerated) and
- the bonus percentage (the evolution of original error during the process).
As soon as one fixes a threshold score lower than 100, the algorithm will detect consecutive positions in the sequence for a given motif. It is due to the proximity of the studied fragments of sequence. Indeed, we are sure that a fragment of sequence of a few hundreds of nucleotides has roughly the same 3D structure as those being upstream and downstream the detected position. The weaker the percentage of similarity is, the larger the interval of position of the consecutive fragments will be. It is thus preferable to filter positions in order to retain only one of them: the one that gives the best score.
Figure 4 shows the global scheme of the algorithm including user parameters.
|
2.4 The algortihm
Please read the different names of variables and functions:
- T corresponding to the text within the motif M will be searched,
- score_threshold is the minimal score for recognized motifs,
- the value of bonus_percent is the manner of which the score will be recalculated according to the depth,
- posT is the position along T,
- depth is the level of cutting,
- n_vect is the number of vectors at the current depth,
- vect_size is the size of vectors at the current depth
- k is the angle number along both M and considered fragment of T,
- CMAF3DP corresponds to Compute Middle Angle From 3DPoints function which takes first and last positions with the corresponding 3DPoint vector. It returns the angle formed by the first, the median and the last point by doing the scalar product.
Algorithm 3DPatternMatching
(input:
3DPoint vector T,
3DPoint vector M,
float score_threshold,
float bonus_percent) {
for posT = 1 to T.size
ang_T = CMAF3DP(T, posT, posT + M.size)
ang_M = CMAF3DP(M, 1, M.size)
diff = (ang_M-ang_T)/ang_M x 100
mean_score = 100-diff
depth = 2
n_vect = 2depth
vect_size = M.size/n_vect
while mean_score
score_threshold & vect_size > 10
score = 0
for k = 1 to n_vect
start = (k-1)*vect_size
end = start + vect_size
ang_T = CMAF3DP(T, posT + start, posT + end)
ang_M = CMAF3DP(M, start, end)
diff = (ang_M-ang_T)/ang_M x 100
score = (score + 100-diff)/k
score = score + (100-score) x bonus_percent/100
mean_score = (mean_score + score)/depth
depth = depth + 1
n_vect = 2depth
vect_size = M.size/n_vect
if vect_size
10 and mean_score
score_threshold
then motif found at position posT, score = mean_score
}
| 3 RESULTS |
|---|
|
|
|---|
As textual pattern matching algorithms, 3D methods consist in locating definite 3D sequence segments corresponding to documented biological functions. Our study has to be processed on a well-known organism. Arabidopsis thaliana is a model plant. The small size of its genome made it useful for genetic mapping and sequencing. At about 157 million base pairs and five chromosomes, it is a small genome for a plant species. It was the first sequenced plant genome, in 2000. Much work has been done to assign a function to the 25 500 genes so far found.
Our study does not aim to be exhaustive but to show an example of 3D pattern matching on a concrete case. All data and description were extracted from The Arabidopsis Information Resource (TAIR, version 5): http://www.arabidopsis.org. We download textual sequence of all the chromosomes (along with position of each gene) and transform them into 3D chromosomes by using the ADN-Viewer engine. Then, for a given gene, we extract the corresponding 3D pattern within a chromosome and we perform the search within desired chromosome(s).
Let us consider for example the AT3G24310 gene (1232 base pairs), which is an Open Reading Frame (ORF) of chromosome 3 from A.thaliana and which starts at nucleotide 8 812 369 and ends at nucleotide 8 811 138; description: myb family transcription factor, similar to myb protein 305 GB:JQ0958 from (garden snapdragon) (Jackson et al., 1991). We perform a search along the chromosome of all the 3D motifs that match this gene with at least 90% of 3D similarity. Beyond the auto-matching of the gene (100%), the obtained result includes two other positions:
AT3G24310: 1232 bp
A. thaliana chromosome 3
match = 8 811 138 – 8 812 369: 100% of 3D similarity
(AT3G24310 gene: auto-detection)
match = 21 348 410 – 21 349 641: 90.9% of 3D similarity
(AT3G57620 gene: 21 348 541 – 21 350 278)
A. thaliana chromosome 2
(none)
A. thaliana chromosome 1
match = 2 564 260 – 2 565 491: 90.3% of 3D similarity
(AT1G08180 gene: 2 564 738 – 2 565 073)
A. thaliana chromosome 4
(none)
A. thaliana chromosome 5
(none)
The interesting point is that the area that matches 90.9% of similarity with AT3G24310 gene includes a large part of another gene and starts at nucleotide 21 348 410 of the chromosome 3. This gene is AT3G57620 (1738 bp) and starts at nucleotide 21 348 541; description: glyoxal oxidase-related, contains similarity to glyoxal oxidase precursor (Phanerochaete chrysosporium) gi | 1050302 | gb | AAA87594 [GenBank] . The other matched area (90.3% of 3D similarity) includes the whole AT1G08180 gene that starts at nucleotide 21 348 541 of the chromosome 1 and is 336 bp long; description: expressed protein.
The next step is to visually check 3D structures considered (gene and motifs) to be well matched. Such a visualization is possible thanks to the ADN-Viewer software developed in our lab. Nevertheless, in order to validate the 3D contribution of such results, it is advisable to perform a linear alignment of the corresponding textual sequences. Indeed, if any textual sequences comparison gives high alignment score, the 3D corresponding structures will logically be similar. On the contrary, if the linear alignment score of the textual sequences is not significant, the 3D-based matching result becomes very interesting.
Figure 5A shows the result of the Basic Local Alignment Search Tools (BLAST) (Altschul et al., 1990) alignment between the AT3G24310 gene and the DNA sequence of the area that matches 90.9% of 3D similarity. In Figure 5B the 3D trajectories of these two sequences is displayed. Figure 6 is based on Figure 5 but with the DNA sequence of the area that matches 90.3% of 3D similarity. No significant similarity was found with the BLAST alignment of the corresponding textual sequences. Such a result highlights the relevance and the contribution of 3D pattern matching of both sequences, and generally for any similar cases.
|
|
We are confronted with the particular case where 3D pattern matching is useful, i.e. when textual sequences alignment is unfruitful. Moreover, in Figure 5, we can see two genes that share a large part of 3D trajectory while they have completely different primary sequences. In Figure 6, AT1G08180 is too small to link 3D shape and function. However, it is interesting to notice that the AT1G08180 gene environment is very similar to the AT3G24310 one.
Nevertheless, we have to take into account the relevance of 3D pattern matching. Indeed, it seems natural that a motif defined by a few nucleotides appears better than a larger motif. Actually, the larger the motif is, the more different the trajectories will be. In Figures 5 and 6, the global shape is similar for both structures. However, some disparities of trajectory can be observed at a local scale. Indeed, a cutoff threshold of 90% of similarity constrains the proximity of the angles at the global point of view, but allows some flexibility when the comparison is refined.
A situation that likely occurs in real cases is when a sequence contains a particular 3D motif but also has a few extra bases inserted. There are several possible cases:
- the sequence and the motif have similar textual sequences, so even with a few insertions, the similarity is easily detectable from textual sequences,
- the sequence and the motif have only similar 3D trajectories:
- if the insertions do not significantly modify the global shape of the trajectory, the search will give a good score of similarity,
- if the insertions modify the global shape of the 3D trajectory, the algorithm does not detect the 3D motif inside of the sequence. In this case, a 3D alignment will be more appropriate.
- if the insertions do not significantly modify the global shape of the trajectory, the search will give a good score of similarity,
| 4 CONCLUSION |
|---|
|
|
|---|
In this article, we have designed a new algorithm for 3D pattern matching especially fitted for 3D DNA sequences. The concept is to perform a comparison of angles from a global scale to a local one and thus detect similar 3D patterns.
Such a 3D pattern tool offers a new way to integrate geometrical criteria into bioinformatics analyses. It leads to an improved or a complementary result for pattern matching. It may be very interesting to search all 3D pattern matching hits from a gene within its chromosome, within another chromosome of its genome or within another genome. In addition, this algorithm can be applied to search 3D pattern in any 3D trajectory. For example, it would be interesting to apply it on H curves (Hamori and Ruskin, 1983) or Z curves (Zhang and Zhang, 1994; Zhang et al., 2003) to detect a same evolution of nucleotide composition within different areas of DNA. In addition, proteins have very similar challenges and it would be very interesting to apply this kind of algorithm on 3D proteins backbones.
Moreover, such geometrical processing could enhance the efficiency of search algorithms of tandem repeat motifs. It is known that the detection of tandem repeat motifs is expensive in computation time (Delgrange and Rivals, 2004). With ADN-Viewer, tandem repeat motifs seem to form a 3D super-helix structure (Rouleux-Bonin et al., 2004). One direction for future work is to detect such 3D structures more quickly than tandem repeat motifs within textual DNA sequences.
Finally, and it is the ultimate question and purpose of such works, is there any relationship between DNA 3D structure and function? Are two genes that share 3D features functionally related?
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
We thank C. Toffano-Nioche for fruitful discussions.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Keith Crandall
Received on September 11, 2006; revised on November 26, 2006; accepted on December 30, 2006
| REFERENCES |
|---|
|
|
|---|
Altschul SF, et al. Basic local alignment search tool. J. Mol. Biol. (1990) 215:403–410.[CrossRef][Web of Science][Medline]
Amzallag A, et al. 3D reconstruction and comparison of shapes of DNA minicircles observed by cryo-electron microscopy. Nucleic Acids Res. (2006) 34:e125–e125.
Atlung T, Ingmer H. H-NS: a modulator of environmentally regulated gene expression. Mol. Microbiol. (1997) 24:7–17.[CrossRef][Web of Science][Medline]
Baeza-Yates RA, Gonnet GH. A new approach to text searching. Commun. ACM (1992) 35:74–82.
Basri R, et al. Determining the similarity of deformable shapes. In: Proceedings of the IEEE Workshop on Physics Based Modeling in Computer Vision (1995) 135–143.
Bolshoy A, et al. Curved dna without a-a: experimental estimation of all 16 dna wedge angles. Proc. Natl. Acad. Sci. USA (1991) 88:2312–2316.
Boyer RS, Moore JS. A fast string searching algorithm. Commun. ACM (1977) 20:762–772.[CrossRef]
Buerger MJ. X-ray Crystallography (1942) New York: Wiley.
Burleigh I, et al. Dna in action! a 3d swarm-based model of a gene regulatory system. In: First Australian Conference on Artificial Life (2003) Australia: Canberra.
Cacchione S, et al. Periodical polydeoxynucleotides and dna curvature. Biochemistry (1989) 28:8706–8713.[CrossRef][Medline]
Carmona M, Magasanik B. Activation of transcription at
54-dependent promoters on linear templates requires intrinsic or induced bending of the DNA. J. Mol. Biol. (1996) 261:348–356.[CrossRef][Web of Science][Medline]
Carrington A, McLachlan AD. Introduction to Magnetic Resonance (1967) London: Chapman and Hall.
Cohen I, Herlin I. Curves matching using geodesic paths. In: CVPR'98 (1998) Santa-Barbara, USA: IEEE. 741–746.
Collis CM, et al. Influence of the sequence-dependent flexure of DNA on transcription in E.coli. Nucleic Acids Res. (1989) 17:9447–9468.
Dardel F, Képès F. Bioinformatique, Génomique et post-génomique (2002) France: Les Éditions de lÉcole Polytechnique, Palaiseau.
De Santis P, et al. A theoretical model of dna curvature. Biophys. Chem. (1988) 32:305–317.[CrossRef][Web of Science][Medline]
Delgrange O, Rivals E. STAR: an algorithm to search for tandem approximate repeats. Bioinformatics (2004) 20:2812–2820.
El Hassan MA, Calladine CR. The assessment of the geometry of dinucleotide steps in double-helical dna; a new local calculation scheme. J. Mol. Biol. (1995) 251:648–664.[CrossRef][Web of Science][Medline]
Espinosa-Urgel M, Tormo A. Sigma s-dependent promoters in Escherichia coli are located in DNA regions with intrinsic curvature. Nucleic Acids Res. (1993) 21:3667–3670.
Gherbi R, Hérisson J. Representation and processing of complex DNA spatial architecture and its annotated content. In: Proceedings of the International Pacific Symposium on Biocomputing (2002) Kavai, Hawaii, USA,: World Scientific Press. 151–162.
Gros P.-E, et al. Combining applications and remote databases view in a common SQL distributed genomic database. Data Sci. J. (2005) 4:244–254.[CrossRef]
Hamori E, Ruskin J. H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences. J. Biol. Chem. (1983) 258:1318–1327.
Hérisson J, et al. DNA in virtuo: visualization and exploration of 3D genomic structures. In: Proceedings of the 3rd ACM International Conference on Virtual Reality (2004) , Computer Graphics, Visualization and Interaction, November 3–5.
Jackson D, et al. Expression patterns of myb genes from antirrhinum flowers. Plant Cell (1991) 3:115–125.
Karp RM, Rabin MO. Efficient randomised pattern matching algorithms. IBM J. Res. Develop. (1987) 2:249–260.
Matte-Tailliez O, et al. Yeast naked DNA spatial organization predisposes to transcriptional regulation. In: Proceedings of the International Conference on Computational Science and its Applications (2006) 222–231.
Pérez-Martin J, et al. Promoters responsive to DNA bending: a common theme in prokaryotic gene expression. Microbiol. Rev. (1994) 58:268–290.
Rouleux-Bonnin F, et al. Structural and transcriptional features of Bombus terrestris satellite DNA and their potential involvement in the differentiation process. Genome (2004) 47:877–888.[Medline]
Sebastian TB, et al. Alignment-based recognition of shape outlines. In: Proceedings of the 4th International Workshop on Visual Form (2001) 606–618. May 28–30.
Shpigelman ES, et al. CURVATURE: software for the analysis of curved DNA. CABIOS (1993) 9:435–440.[Medline]
Watson JD, Crick F.HC. Molecular structure of nucleic acids. Nature (1953) 171:737–738.[CrossRef][Medline]
Zhang CT, et al. The Z curve database: a graphic representation of genome sequences. Bioinformatics (2003) 19:593–599.
Zhang R, Zhang CT. Z Curves, an intuitive tool for visualizing and analyzing DNA sequences. J. Biomol. Struc. Dynamics (1994) 11:767–782.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






