Skip Navigation


Bioinformatics Advance Access originally published online on August 18, 2005
Bioinformatics 2005 21(20):3926-3928; doi:10.1093/bioinformatics/bti632
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/20/3926    most recent
bti632v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (7)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Chan, C. Y.
Right arrow Articles by Ding, Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Chan, C. Y.
Right arrow Articles by Ding, Y.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oxfordjournals.org

Structure clustering features on the Sfold Web server

Chi Yu Chan 1,*, Charles E. Lawrence 1,2 and Ye Ding 1

1Bioinformatics Center, Wadsworth Center, New York State Department of Health 150 New Scotland Avenue, Albany, NY 12208, USA
2Center for Computational Molecular Biology and Division of Applied Mathematics, Brown University 182 George Street, Providence, RI 02912, USA

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 BACKGROUND
 INPUT
 OUTPUT
 REFERENCES
 

Summary: The energy landscape of RNA secondary structures is often complex, and the Boltzmann-weighted ensemble usually contains distinct clusters. Furthermore, the minimum free energy structure often lies outside of the cluster containing the structure determined by comparative sequence analysis. We have developed procedures to characterize and visualize the Boltzmann-weighted ensemble, and have made them available on the Sfold Web server. The new features on the Web server include clustering statistics, ensemble and cluster centroids, multi-dimensional scaling display and energy landscape representation of the Boltzmann-weighted ensemble.

Availability: http://sfold.wadsworth.org; http://www.bioinfo.rpi.edu/applications/sfold

Contact: chanc{at}wadsworth.org


    BACKGROUND
 TOP
 Abstract
 BACKGROUND
 INPUT
 OUTPUT
 REFERENCES
 
Free energy minimization is a long-established paradigm for the prediction of RNA secondary structures. Algorithms have been developed for computing the optimal (Zuker and Stiegler, 1981) and suboptimal folds (Zuker, 1989). More recently, Ding and Lawrence (2003) have developed an algorithm for drawing a statistical sample from the Boltzmann-weighted ensemble of RNA secondary structures. We have shown that the sampling approach can substantially improve structure prediction through clustering of sampled structures and the identification of centroids of structural clusters (Ding et al., 2005). In order to make the benefits of this novel prediction framework widely available, we have included clustering features in the Srna module of the Sfold server. Here, we describe these new features and their implementation on the Sfold Web server. Features offered by other application modules of Sfold were reported previously (Ding et al., 2004).


    INPUT
 TOP
 Abstract
 BACKGROUND
 INPUT
 OUTPUT
 REFERENCES
 
Structural clustering features and centroid identification are available in the Srna module output for every submitted job on our server. Input sequences can be submitted in raw format, FASTA format or GenBank format. Because these features can be efficiently computed, the limits on sequence length remain unchanged, at 200 bases for an interactive job and 5000 bases for a batch job. The character N, commonly used for an undetermined nucleotide, is now allowed. All Ns are forced to be unpaired in all sampled structures. After submission of an interactive job, the user receives a progress update in the same browser window every 5 s. Users submitting batch jobs will receive email notifications once their jobs are completed.


    OUTPUT
 TOP
 Abstract
 BACKGROUND
 INPUT
 OUTPUT
 REFERENCES
 
The clustering features are available in the output page of the Srna module. Users submitting jobs to other modules can access the clustering features by selecting Srna under the ‘output from other application modules’ heading in their job output. Two new sections have been added in Srna output, as described below.

Ensemble centroid in comparison with the minimum free energy structure
The centroid for a set of structures is defined as the structure with the minimum total base-pair distance to all structures in the set. The ensemble centroid is the centroid for the entire sampled ensemble (Ding et al., 2005). In this section of Srna output, graphical and base-pair distance information is available for comparison between the ensemble centroid structure and the MFE structure computed by mfold 3.1 (Zuker, 2003), for the same set of Turner thermodynamic parameters (Xia et al., 1998; Mathews et al., 1999). Structural diagrams of these two ensemble-level representatives are available in PNG, PDF, PostScript and GCG connect formats. The PNG version offers interactive capabilities of zooming and re-centering, which are useful for local structure display. For print-quality figures, the PDF and PostScript versions are recommended. Base pairs marked as green dots in structural diagrams of the MFE structure and the ensemble centroid represent the common base pairs between these two structures. Base pairs in blue are those present only in the MFE structure, and base pairs in red are those present only in the ensemble centroid. The sum of the numbers of base pairs in red and in blue is thus the base-pair distance between the two structures. Circle diagrams generated by the sir_graph software package developed by Stewart and Zuker (Zuker, 2003) are also available to facilitate comparison. In a circle diagram, bases are positioned along a circle, in a clockwise orientation. An arc connecting two bases across the circle indicates pairing between the bases.

Cluster representation of the Boltzmann ensemble
The partitioning of the sampled ensemble into various clusters is summarized in table format. Clusters are sorted in descending order of cluster size. The cluster containing the MFE structure is marked using a red asterisk. The size of a cluster is the sampling estimate for the probability of the cluster, i.e. the sample frequency of the cluster. The MFE structure is excluded from the calculation of cluster sizes, and the sizes of all clusters sum to one. A structure diagram of the cluster centroid and a cluster-level two-dimensional histogram (2Dhist) are provided for each cluster. Cluster centroids are the centroids for structures classified into each cluster. A cluster-level 2Dhist displays base-pair frequencies for that cluster. In addition to individual plots, cluster-level 2Dhist plots and circle diagrams of centroids for all clusters are also available in panel format.

Multi-dimensional scaling (MDS) plot of the sampled ensemble and representative structures
MDS is a technique for representing high-dimensional objects in typically two dimensions (Kruskal and Wish, 1977). For RNA secondary structures, base-pair distances are used as an input to MDS. A 2D MDS plot of the sampled ensemble and representative structures is available. Members of the five largest clusters with sizes of at least 0.010 are drawn as small dots in different colors. Members of all other clusters are plotted as small circles. Representative structures, including the MFE structure, ensemble centroid and centroids of all colored clusters, are drawn as large dots in the graph. The units on the axes of the MDS plot only serve the purpose of indicating the relative positions of the objects; they do not have any real meaning. Because of the drastic reduction in dimensionality that it achieves, MDS may not visually preserve the strong separation of distinct clusters in the original space of a higher dimension. For RNA structure applications, MDS works reasonably well for the majority of tested sequences. In cases for which clusters overlap on the MDS plot, users are encouraged to refer to the text output file as described below, before drawing any conclusion.

Energy landscape of the sampled ensemble and representative structures
A 3D energy landscape plot of the sampled ensemble and representative structures (Fig. 1) is constructed by adding a third dimension of free energies of secondary structures to the 2D MDS plane. In order to enhance the 3D visualization effect and to allow users to choose the best angle at which a particular plot can be viewed, this energy landscape plot is presented as an animation rotating around an imaginary vertical line passing through the center of the horizontal plane. Depending on connection speed, the animation will first take a short while to download all the necessary image files and then it will begin automatically when it is ready. The animation rotates clockwise (as viewed from the top), with an inter-frame increment of 30° and at a speed of 1 frame per s. The user has the option to pause the animation at any particular angle, and then to download the corresponding image in PDF or PostScript format. The color scheme and dot patterns of structures remain the same as in the 2D MDS plot. Coordinates of representative structures are included below the plot to help the user locate their exact positions. Although JavaScript is required to display the animation properly, users with browsers not supporting JavaScript or having JavaScript disabled will still be able to view a static page containing links to individual plots at every angle, in PNG, PDF and PostScript formats.



View larger version (37K):
[in this window]
[in a new window]
 
Fig. 1 The energy landscape of the sampled ensemble and representative structures for Zygosaccharomyces bailii RNase P RNA (GenBank accession no. AF186231 [GenBank] ), with a length of 205 nt. The optimal number of clusters determined by our software is 11. Structures belonging to the five largest clusters are marked as solid dots of five different colors. Members of all other clusters are plotted as small circles under the same category of ‘All other clusters’. The MFE structure and the ensemble centroid are both in the largest cluster (light blue color), with a probability of 0.622. The coordinates for a structure are (axis 1, axis 2, energy), where the horizontal axes are from MDS and the vertical axis is the free energy of a secondary structure. The coordinates in this example are, (–2.19, –4.81, –66.90) for the MFE structure (–1.13, –3.15, –63.40) for the ensemble centroid (–2.02, –4.21, –65.00) for the centroid of cluster 1, (14.05, 10.13, –62.10) for the centroid of cluster 2, (–26.06, 7.76, –63.50) for the centroid of cluster 3, (0.18, 17.12, –53.90) for the centroid of cluster 4 and (–12.06, 30.98, –62.30) for the centroid of cluster 5.

 
In addition to the graphical updates mentioned above, a new text output file describing results from the clustering procedures, including listings of cluster members and distances within and between clusters, is available in the ‘Text files’ section. With the exception of the energy landscape plot, all new graphical plots and text output are included in the compressed archive for download.


    Acknowledgments
 
The Computational Molecular Biology and Statistics Core at the Wadsworth Center is acknowledged for providing computing resources for this work. The Computer Systems Support group at the Wadsworth Center is acknowledged for providing hosting space and network connectivity to the Sfold web server at the Wadsworth Center. The long-term development of the Sfold software and the maintenance of the web server are supported by the National Science Foundation grant DMS-0200970 and the National Institutes of Health grant GM068726 to Y.D. This work was also supported, in part, by the National Institutes of Health grant HG01257 to C.E.L. The server at Rensselaer Polytechnic Institute (RPI) runs on a Linux cluster acquired through an IBM SUR grant awarded to RPI/Wadsworth Bioinformatics Center (Zuker, 2003).

Conflicts of Interest: none declared.

Received on June 23, 2005; revised on August 12, 2005; accepted on August 15, 2005

    REFERENCES
 TOP
 Abstract
 BACKGROUND
 INPUT
 OUTPUT
 REFERENCES
 

    Ding, Y. and Lawrence, C.E. (2003) A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Res., 31, 7280–7301[Abstract/Free Full Text].

    Ding, Y., et al. (2004) Sfold web server for statistical folding and rational design of nucleic acids. Nucleic Acids Res., 32, W135–W141[Abstract/Free Full Text].

    Ding, Y., et al. (2005) RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble. RNA, 11, 1157–1166[Abstract/Free Full Text].

    Kruskal, J.B. and Wish, M. Multidimensional Scaling, (1977) , Beverly Hills, CA Sage Publications.

    Mathews, D.H., et al. (1999) Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol., 288, 911–940[CrossRef][ISI][Medline].

    Xia, T., et al. (1998) Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson–Crick base pairs. Biochemistry, 37, 14719–14735[CrossRef][Medline].

    Zuker, M. (1989) On finding all suboptimal foldings of an RNA molecule. Science, 244, 48–52[Abstract/Free Full Text].

    Zuker, M. (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res., 31, 3406–3415[Abstract/Free Full Text].

    Zuker, M. and Stiegler, P. (1981) Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res., 9, 133–148[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
M. Mayho, K. Fenn, P. Craddy, S. Crosthwaite, and K. Matthews
Post-transcriptional control of nuclear-encoded cytochrome oxidase subunits in Trypanosoma brucei: evidence for genome-wide conservation of life-cycle stage-specific regulatory elements
Nucleic Acids Res., October 6, 2006; 34(18): 5312 - 5324.
[Abstract] [Full Text] [PDF]


Home page
RNAHome page
Y. DING
Statistical and Bayesian approaches to RNA secondary structure prediction.
RNA, March 1, 2006; 12(3): 323 - 331.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/20/3926    most recent
bti632v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (7)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Chan, C. Y.
Right arrow Articles by Ding, Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Chan, C. Y.
Right arrow Articles by Ding, Y.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?