Bioinformatics Advance Access published online on August 4, 2008
Bioinformatics, doi:10.1093/bioinformatics/btn414
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Model-based prediction of sequence alignment quality
1Biotechnology and Food Research, MTT Agrifood Research Finland, FI-31600 Jokioinen, Finland
2Department of Statistics, FI-20014 University of Turku, Finland
3Department of Mathematics, FI-20014 University of Turku, Finland
4Institute of Medical Technology, FI-33014 University of Tampere, Finland
5Tampere University Hospital, FI-33520 Tampere, Finland
*To whom correspondence should be addressed. Virpi Ahola, E-mail: virpi.ahola{at}mtt.fi
| Abstract |
|---|
Motivation: Multiple sequence alignment (MSA) is an essential prerequisite for many sequence analysis methods and valuable tool itself for describing relationships between protein sequences. Since the success of the sequence analysis is highly dependent on the reliability of alignments, measures for assessing the quality of alignments are highly requisite.
Results: We present a statistical model-based alignment quality score. Unlike other quality scores, it does not require several parallel alignments for the same set of sequences or additional structural information. Our quality score is based on measuring the conservation level of reference alignments in Homstrad database. Reference sequences were re-aligned with the Mafft, Muscle and Probcons alignment programs, and a sum-of-pairs (SP) score was used to measure the quality of the re-alignments. Statistical modelling of the SP score as a function of conservation level and other alignment characteristics makes it possible to predict the SP score for any global MSA. The predicted SP scores are highly correlated with the correct SP scores, when tested on the Homstrad and SABmark databases. The results are comparable to that of MOS and better than those of NorMD and NiRMSD alignment quality criteria. Furthermore, the predicted SP score is able to detect alignments with badly aligned or unrelated sequences.
Availability: The method is freely available at http://www.mtt.fi/AlignmentQuality/
Contact: virpi.ahola{at}mtt.fi
Supplementary information: Supplementary data is available at Bioinformatics online.
Associate Editor: Prof. Burkhard Rost
Received on May 20, 2008; revised on July 25, 2008; accepted on August 1, 2008