Bioinformatics Advance Access originally published online on June 9, 2008
Bioinformatics 2008 24(15):1731-1732; doi:10.1093/bioinformatics/btn259
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
DNAlive: a tool for the physical analysis of DNA at the genomic scale
1Joint IRB-BSC Program on Computational Biology, Institute of Research in Biomedicine, Parc Científic de Barcelona, Josep Samitier 1-5, Barcelona 08028, 2Barcelona Supercomputing Center, Jordi Girona 31, Barcelona 08034, 3National Institute of Bioinformatics, Parc Científic de Barcelona, Josep Samitier 1-5, 4Departament de Bioquímica, Facultat de Biología, Avgda Diagonal 647, Barcelona 08028 and 5Institut Català per la Recerca i Estudis Avançats (ICREA), Barcelona, Spain
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: DNAlive is a tool for the analysis and graphical display of structural and physical characteristics of genomic DNA. The web server implements a wide repertoire of metrics to derive physical information from DNA sequences with a powerful interface to derive 3D information on large sequences of both naked and protein-bound DNAs. Furthermore, it implements a mesoscopic Metropolis code which allows the inexpensive study of the dynamic properties of chromatin fibers. In addition, our server also surveys other protein and genomic databases allowing the user to combine and explore the physical properties of selected DNA in the context of functional features annotated on those regions.
Availability: http://mmb.pcb.ub.es/DNAlive/ ; http://www.inab.org/
Contact: modesto{at}mmb.pcb.ub.es
Supplementary information: Supplementary data are available at Bioinformatics online.
| 1 INTRODUCTION |
|---|
|
|
|---|
Massive genomic projects have revealed the sequence of nearly 50 eukaryotic genomes, including several mammals (among them, humans) and many more will become available in the coming years. So far, the annotation of these genomes has been nearly restricted to the identification and the one-dimensional location of functional features (mostly genes and their regulatory regions), without considering the structural parameters of their environment, which have been proven to be crucial for the functionality of DNA. Determining the structural properties of DNA and the combination of functional features is necessary to interpret and understand the functionality of genomes in a more complex, and therefore real, environment. The identification of these structural parameters allows scientists to consider different levels of accessibility of certain DNA regions to different proteins, such as transcription factors, polymerases and DNA methylases. For example, specific deformability or helical properties in a given region of DNA facilitate or impair the formation of nucleosomes hundreds of base pairs away, or can affect dimerization of two DNA-binding proteins which might be separated by thousands of bases in sequence. Different groups (Abeel et al., 2008; Goñi et al., 2007; Ohler et al., 2001; Pedersen et al., 2000; Singhal et al., 2008) have demonstrated that regulatory regions in DNA display unusual physical properties, and in fact, two groups have recently proven independently (Abeel et al., 2008; Goñi et al., 2007) that eukaryotic promoters can be located with surprisingly good accuracy just analyzing simple physical descriptors of DNA, which confirms the existence of a hidden physical code that controls gene function. In summary, functional annotation needs to be complemented with physical data to understand the structure, dynamics and the general functionality of genomic DNA.
DNAlive has been developed to give a complete description of the physical properties of genomic DNA in a simple way, thus providing data that can be easily understood by non-structural experts. Among others, DNAlive allows the user to (i) determine potential correlations between genome annotations (such as transcription start sites, exons, splicing sites, ...) and a battery of 29 physical descriptors of DNA (stability, helical descriptors, curvature, non-canonical B-DNA affinity, stiffness, ...); (ii) find out the most stable 3D structure of long genome fragments (both naked DNA and DNA-protein complexes) using sequence-dependent average helical parameters, and, when available, experimental structural data on DNA-protein complexes; (iii) perform a dynamic analysis of chromatin fiber exploring the range of deformability sampled during trajectory and the possibility of the formation of transient protein–protein complexes and (iv) display structural parameters of DNA in the context of associated functional features obtained form several public databases. The tool is available as a web page and also as different webservices, which can be incorporated in user workflows (Supplementary Material).
| 2 IMPLEMENTATION |
|---|
|
|
|---|
2.1 Entry data
The only mandatory input data for DNAlive is a DNA sequence in FASTA format or the genomic coordinates of a supported vertebrate genome. The program retrieves parameters from their internal databases (Supplementary Table 1) to determine physical profiles and to create a 3D structure of the naked DNA. Given a DNA sequence, the program determines potentially bound transcription factor binding sites (TFBS) by scanning the public TRANSFAC database (http://www.gene-regulation.com/) linked to PDB (http://www.rcsb.org/) and Uniprot databases (http://www.ebi.uniprot.org/). The selection of the complex of interest can be monitored externally by the user, who can force the generation of specific complexes (for example, nucleosomes, protein-multicomplexes, etc.).
2.2 Server workflow
Once a DNA sequence is entered (Fig. 1), the program computes the profile for the 29 physical properties available for the fiber (Supplementary Table 1). All properties are represented in a 2D plot using either the UCSC Genome Browser (http://genome.ucsc.edu) in combination with annotated genes whenever genomic coordinates for the genome are provided, or Gnuplot (Fig. 1 and Supplementary Fig. 1).
|
To combine the visualization of DNA physical properties with public annotations of the genome, coordinates of the input DNA sequence can be matched by running a search in our local Blat server (Kent, 2002). Although the user is able to annotate transcription factor PDB structures on specific positions of the DNA input sequence, we have implemented an automatic method to perform this step using the TFBS Perl library (Lenhard and Wasserman, 2002). The reconstruction of the average 3D structure of DNA is achieved using sequence-dependent base step parameters derived from accurate atomistic molecular dynamics (Pérez, 2007) and making use of a local adaptation of X3DNA (Lu and Olson, 2003) script (Fig. 1 and Supplementary Fig. 2). When structural information on protein–DNA complexes is available, modeled structures in the corresponding segment are substituted by the experimental geometries, and junctions are refined if required. The visualization of 3D structures is performed by integrating Jmol Java applets (http://www.jmol.org/) in the HTML page. All physical descriptors can be mapped into the 3D structure to favor the detection of potential correlations between conformation, functional annotations and physico-chemical properties (Fig. 1).
The server also includes unique tools for a rapid representation of chromatin dynamics, which, in extensive analysis performed in our laboratory on our database of more than 100 trajectories, showed a surprisingly high accuracy of the essential deformation pattern of DNA. The method uses a mesoscopic Metropolis Monte Carlo algorithm, where the geometry of each base pair is defined by three local rotations (roll, tilt and twist) and translations (slide, shift and rise), and the conformational energy is estimated from the deformation matrix using a harmonic model (Equation 1), where the index i stands for one of the M base pair steps and the index j stands for the six unique helical parameters (
) for each step. The equilibrium values for one helical parameter in a given base pair step type and (
) and the associated deformation constant (Ki,j) were previously determined from molecular dynamics simulations (Pérez, 2007). Once a movement in helical coordinates is accepted by the Metropolis test, the corresponding Cartesian structure of the fiber is generated using an adaptation of X3DNA (Lu and Olson, 2003) for VIDEO visualization using JMOL Java applets in the HTML page (Supplementary Fig. 3). Basic manipulation and analysis of the trajectories and structure (rotations, translations, distance measurements,...) are allowed by the Jmol interface, which allows the determination of potential DNA-mediated protein-clusters.
|
| (1) |
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
We thank the help of Agnes Noy, David Piedra, Henrique Proen
and Joaquín Panadero as β-testers of the server. Funding: This work has been supported by the Spanish Ministry of Education and Science (BIO2006-01602 and BIO2006-15036), the Spanish Ministry of Health (COMBIOMED network), the Fundación Marcelino Botín and the National Institute of Bioinformatics (Structural Bioinformatics Node).
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Alfonso Valencia
Received on March 27, 2008; revised on May 16, 2008; accepted on June 4, 2008
| REFERENCES |
|---|
|
|
|---|
Abeel T, et al. Generic eukaryotic core promoter prediction using structural features of DNA. Genome Res (2008) 18:310–323.
Goñi JR, et al. Determining promoter location based on DNA structure first-principles calculations. Genome Biol (2007) 8:R263.[CrossRef][Medline]
Kent WJ. BLAT- the BLAST-like alignment tool. Genome Res (2002) 12:656–664.
Lenhard B, Wasserman WW. TFBS: computational framework for transcription factor binding site analysis. Bioinformatics (2002) 18:1135–1136.
Lu XJ, Olson WK. 3DNA: a software package for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures. Nucleic Acids Res (2003) 31:5108–5121.
Ohler U, et al. Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition. Bioinformatics (2001) 17(Suppl. 1):S199–S206.[Abstract]
Pedersen AG, et al. A DNA structural atlas for Escherichia coli. J. Mol. Biol (2000) 299:907–930.[CrossRef][Web of Science][Medline]
Pérez A, et al. Refinement of the AMBER force field for nucleic acids. Improving the description of
/
conformers. Biophys. J (2007) 92:3817–3829.[CrossRef][Web of Science][Medline]
Singhal P, et al. Prokaryotic gene finding based on physicochemical characteristics of codons calculated from molecular dynamics simulations. Biophys. J (2008) [EPub ahead of print; DOI:10.1529/biophysj.107.116392].
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

