Skip Navigation


Bioinformatics Advance Access originally published online on October 10, 2005
Bioinformatics 2005 21(24):4425-4426; doi:10.1093/bioinformatics/bti712
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/24/4425    most recent
bti712v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Birzele, F.
Right arrow Articles by Zimmer, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Birzele, F.
Right arrow Articles by Zimmer, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oxfordjournals.org

QUASAR—scoring and ranking of sequence–structure alignments

Fabian Birzele *,{dagger}, Jan E. Gewehr {dagger} and Ralf Zimmer

Practical Informatics and Bioinformatics Group, Department of Informatics, Ludwig-Maximilians-University Amalienstrasse 17, D-80333 Munich, Germany

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 USE CASES
 4 CONCLUSION
 REFERENCES
 

Summary: Sequence–structure alignments are a common means for protein structure prediction in the fields of fold recognition and homology modeling, and there is a broad variety of programs that provide such alignments based on sequence similarity, secondary structure or contact potentials. Nevertheless, finding the best sequence–structure alignment in a pool of alignments remains a difficult problem. QUASAR (quality of sequence–structure alignments ranking) provides a unifying framework for scoring sequence–structure alignments that aids finding well-performing combinations of well-known and custom-made scoring schemes. Those scoring functions can be benchmarked against widely accepted quality scores like MaxSub, TMScore, Touch and APDB, thus enabling users to test their own alignment scores against ‘standard-of-truth’ structure-based scores. Furthermore, individual score combinations can be optimized with respect to benchmark sets based on known structural relationships using QUASAR's in-built optimization routines.

Availability: The software, examples, the Java documentation and a tutorial are available at http://www.bio.ifi.lmu.de/QUASAR

Contact: fabian.birzele{at}ifi.lmu.de


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 USE CASES
 4 CONCLUSION
 REFERENCES
 
With the growing gap between the number of known protein sequences in databases like Swiss-Prot/TrEMBL (Boeckmann et al., 2003) and the number of experimentally determined protein structures in the PDB (Berman et al., 2000), automated structure prediction methods have become valuable tools for assigning potential coordinate models to new protein sequences. The first step to building a complete all-atom model is often to align a sequence of unknown structure (the so-called target) to a database of sequences with known structures (so-called templates). On the basis of these alignments and the underlying known template structures, models are built and refined. Since alignment quality determines the model quality, it is desirable to identify good models at the alignment stage in order to get rid of the overhead of producing obviously unsuitable coordinate models. This mainly restricts efforts to sequence– and secondary-structure-based measures (i.e. alignment scores) instead of using structural properties.

The QUASAR (quality of sequence–structure alignments ranking) system has been designed to fit two needs. First, it is a platform-independent and easily extendable software package for scoring and ranking sequence–structure alignments coming from different sources. Second, it aids the process of developing, benchmarking and optimizing new alignment quality measurements. The graphical user interface (GUI) of QUASAR provides quick access to each of the possible use cases and allows for visualization and comparison of the results as well as for configuration of all essential parts. Once configured, QUASAR can also be used directly from the command-line.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 USE CASES
 4 CONCLUSION
 REFERENCES
 
2.1 Scoring alignments
The so-called scoring schemes represent alignment quality scores that require only information that is available from the sequence (e.g. predicted secondary-structure) or that can be directly inferred from the template structure. Scoring schemes provided by the system include several amino acid and secondary-structure-based exchange matrices [like PAM (Dayhoff et al., 1978) and (Luthy et al., 1991)], the two standard secondary-structure fit measures Q3 and SOV (Zemla et al., 1999) as well as two contact-capacity-based scores (Berrera et al., 2003; Singer et al., 2002). The number of available scoring schemes can be easily extended by implementing a Java interface or, in the case of (amino acid exchange) scoring matrices, by adding a text file in a QUASAR specific format that contains the matrix information. This provides a fast connection to matrix collections such as the AAIndex database (Kawashima et al., 1999).

2.2 Combining scores
With the so-called score conductor, the user can integrate several scoring schemes into one scoring function by combining the scores in a weighted sum (assigning user-specified weights for the single scores), i.e. as a linear combination of the individual scores. In addition, by editing the configuration file, experienced users can build more complex, tree-like formulas using further operators like multiplication and division. Therefore, a user can test different combinations of scoring schemes with a minimal amount of extra time and thus improve the ranking quality over the performance of the single scores. The final quality score of every alignment is calculated by combining the single alignment scores according to the formula given in the configuration. Single scores can also be normalized to range between zero and one to combine scores with different magnitudes.

2.3 Benchmarking scores
To help the user find a scoring function that gives the best possible results, QUASAR contains a number of structure-based quality scores like Touch, APDB (O'Sullivan et al., 2003), as well as reimplementations of MaxSub (Siew et al., 2000) and TMScore (Zhang and Skolnick, 2004), both based on a different superimposition routine (Fortran QRT fit). For a given alignment benchmark set for which the structures of query and template proteins are known, QUASAR measures the correlation coefficient of the ranking resulting from the specified alignment score with a structure-based benchmark measure (e.g. RMSD). It is also possible to use a user-defined quality score as a reference by annotating it to the alignments (Fig. 1). This makes it easy to compare the performance of an alignment score or a combination of scores with a given ‘standard-of-truth’ without the need to implement the score in Java.



View larger version (29K):
[in this window]
[in a new window]
 
Fig. 1 QUASAR reads protein alignments (input layer) and allows to evaluate the structural quality of the alignments according to built in and/or user programmed (Java) quality measures (ranking module). In addition, it supports benchmarking and optimizing scoring functions, consisting of a (non-)linear combination of weighted scoring schemes, with respect to a set of standard-of-truth (structural) alignments (optimization and benchmark modules).

 
2.4 Optimizing scores
The performance of a scoring function depends heavily on the weights which are assigned to the individual scoring schemes. Thus, QUASAR allows optimizing these weights with respect to a benchmark set of alignments with assigned or computed standard-of-truth scores (see above). So far, two optimization routines are available. One may invoke least-squares optimization or use a genetic algorithm to explore the space of possible score combinations. The fitness of a combination of scoring scheme weights is evaluated with respect to a benchmark set as described in the previous subsection. Such an optimization may also uncover the main ingredients of an already well-performing score combination by ruling out unnecessary scores.

2.5 Implementation
QUASAR is completely implemented in Java (Version 1.4+). It is freely available for academic users as standalone and Java Web Start application. All scoring schemes, scoring functions, benchmark scores and optimization routines can be configured in an XML-like configuration file that can be generated using the GUI.


    3 USE CASES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 USE CASES
 4 CONCLUSION
 REFERENCES
 
3.1 Benchmarking and optimization
A first, interactive use case might be as follows: given a new scoring scheme, e.g. a new scoring matrix,

  1. One builds a benchmark set of alignments and loads the data into QUASAR.
  2. In QUASAR, one explores the performance of the new scoring matrix in comparison with and in combination with in-built scores. The evaluation is done with respect to the standard-of-truth benchmark scores available in QUASAR and with help of the visualization panel.
  3. One further improves the ranking performance by combining well-performing schemes and optimizing their weights using QUASAR's optimization routines.
  4. Now, one saves the configuration for future use of QUASAR from the command-line.

3.2 Automated alignment ranking
A second, non-interactive use case is the ranking of sequence–structure alignments. Here, one already has an optimized combination of scores together with the corresponding QUASAR configuration at hand. Given a set of different sequence–structure alignments for a target (e.g. to different template structures), one includes the call of QUASAR using the configuration file into the structure prediction process and is thus able to e.g. discard alignments on the basis of the previously optimized alignment score automatically.


    4 CONCLUSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 USE CASES
 4 CONCLUSION
 REFERENCES
 
Sequence–structure alignments play an important role in protein structure prediction and analysis. With QUASAR we provide a software that facilitates alignment scoring, comparison of known with user-defined scoring schemes and optimization of score combinations. It is platform-independent and can be used interactively or from the command-line. Future extensions will include new scoring schemes, improved analysis of results and more optimization options. We encourage users to send their own scoring schemes in order to have them included in future releases.


    Acknowledgments
 
This work was funded by the German Research Foundation (DFG) under project grant PROSEQO II (Zi616/2).

Conflict of Interest: none declared.


    FOOTNOTES
 
{dagger}The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors. Back

Received on August 19, 2005; revised on September 27, 2005; accepted on October 6, 2005

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 USE CASES
 4 CONCLUSION
 REFERENCES
 

    Berman, H., et al. (2000) The Protein Data Bank. Nucleic Acids Res, . 28, 235–242[Abstract/Free Full Text].

    Berrera, M., et al. (2003) Amino acid empirical contact energy definitions for fold recognition in the space of contact maps. BMC Bioinformatics, 4, 8[CrossRef][Medline].

    Boeckmann, B., et al. (2003) The Swiss-Prot protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res, . 31, 365–370[Abstract/Free Full Text].

    Dayhoff, M.O., et al. (1978) A model of evolutionary change in proteins. Atlas Protein Sequence Struct, . 5, 345–352.

    Kawashima, S., et al. (1999) AAindex: amino acid index database. Nucleic Acids Res, . 27, 368–369[Abstract/Free Full Text].

    Luthy, R., et al. (1991) Secondary structure-based profiles: use of structure-conserving scoring tables in searching protein sequence databases for structural similarities. Proteins, 10, 229–239[CrossRef][ISI][Medline].

    O'Sullivan, O., et al. (2003) APDB: a novel measure for benchmarking sequence alignment methods without reference alignments. Bioinformatics, 19, 215i–221i[Abstract].

    Siew, N., et al. (2000) MaxSub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics, 16, 776–785[Abstract/Free Full Text].

    Singer, M.S., et al. (2002) Prediction of protein residue contacts with a PDB-derived likelihood matrix. Protein Eng, . 15, 721–725[Abstract/Free Full Text].

    Zemla, A., et al. (1999) A modified definition of Sov, a segment-based measure for protein secondary structure prediction assessment. Proteins, 34, 220–223[CrossRef][ISI][Medline].

    Zhang, Y. and Skolnick, J. (2004) Scoring function for automated assessment of protein structure template quality. Proteins, 57, 702–710[CrossRef][ISI][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
F. Birzele, J. E. Gewehr, G. Csaba, and R. Zimmer
Vorolign--fast structural alignment using Voronoi contacts
Bioinformatics, January 15, 2007; 23(2): e205 - e211.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/24/4425    most recent
bti712v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Birzele, F.
Right arrow Articles by Zimmer, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Birzele, F.
Right arrow Articles by Zimmer, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?