Bioinformatics Advance Access published online on July 20, 2009
Bioinformatics, doi:10.1093/bioinformatics/btp446
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Benchmarking homology detection procedures with low complexity filters
1Stockholm Bioinformatics Center, Stockholm University, SE-10691 Stockholm, Sweden
*To whom correspondence should be addressed. Mr. Kristoffer Forslund, E-mail: Kristoffer.Forslund{at}sbc.su.se
| Abstract |
|---|
Background: Low-complexity sequence regions present a common problem in finding true homologs to a protein query sequence. Several solutions to this have been suggested, but a detailed comparison between these on challenging data has so far been lacking. A common benchmark for homology detection procedures is to use SCOP/ASTRAL domain sequences belonging to the same or different superfamilies, but these contain almost no low complexity sequences.
Results: We here introduce an alternative benchmarking strategy based around Pfam domains and clans on whole-proteome data sets. This gives a realistic level of low complexity sequences. We used it to evaluate all six built-in BLAST low complexity filter settings as well as a range of settings in the MSPcrunch post-processing filter. The effect on alignment length was also assessed.
Conclusion: Score matrix adjustment methods provide a low false positive rate at a relatively small loss in sensitivity relative to no filtering, across the range of test conditions we apply. MSPcrunch achieved even less loss in sensitivity, but at a higher false positive rate. A drawback of the score matrix adjustment methods is however that the alignments often become truncated.
Availability: Perl scripts for MSPcrunch BLAST filtering and for generating the benchmark dataset are available at http://sonnhammer.sbc.su.se/download/software/MSPcrunch+Blixem/benchmark.tar.gz
Contact: Kristoffer.Forslund{at}sbc.su.se
Supplementary Information: See journal webpage.
Associate Editor: Prof. Dmitrij Frishman
Received on March 16, 2009; revised on July 14, 2009; accepted on July 15, 2009
This article has been cited by other articles:
![]() |
G. Ostlund, T. Schmitt, K. Forslund, T. Kostler, D. N. Messina, S. Roopra, O. Frings, and E. L. L. Sonnhammer InParanoid 7: new algorithms and tools for eukaryotic orthology analysis Nucleic Acids Res., January 1, 2010; 38(suppl_1): D196 - D203. [Abstract] [Full Text] [PDF] |
||||
