Skip Navigation



Bioinformatics Advance Access published online on July 20, 2009

Bioinformatics, doi:10.1093/bioinformatics/btp446
This Article
Right arrow Advance Access manuscript (PDF)
Right arrow All Versions of this Article:
25/19/2500    most recent
btp446v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Forslund, K.
Right arrow Articles by Sonnhammer, E. L.L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Forslund, K.
Right arrow Articles by Sonnhammer, E. L.L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author (2009). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Benchmarking homology detection procedures with low complexity filters

Kristoffer Forslund 1,* and Erik L.L. Sonnhammer 1

1Stockholm Bioinformatics Center, Stockholm University, SE-10691 Stockholm, Sweden

*To whom correspondence should be addressed. Mr. Kristoffer Forslund, E-mail: Kristoffer.Forslund{at}sbc.su.se


   Abstract

Background: Low-complexity sequence regions present a common problem in finding true homologs to a protein query sequence. Several solutions to this have been suggested, but a detailed comparison between these on challenging data has so far been lacking. A common benchmark for homology detection procedures is to use SCOP/ASTRAL domain sequences belonging to the same or different superfamilies, but these contain almost no low complexity sequences.

Results: We here introduce an alternative benchmarking strategy based around Pfam domains and clans on whole-proteome data sets. This gives a realistic level of low complexity sequences. We used it to evaluate all six built-in BLAST low complexity filter settings as well as a range of settings in the MSPcrunch post-processing filter. The effect on alignment length was also assessed.

Conclusion: Score matrix adjustment methods provide a low false positive rate at a relatively small loss in sensitivity relative to no filtering, across the range of test conditions we apply. MSPcrunch achieved even less loss in sensitivity, but at a higher false positive rate. A drawback of the score matrix adjustment methods is however that the alignments often become truncated.

Availability: Perl scripts for MSPcrunch BLAST filtering and for generating the benchmark dataset are available at http://sonnhammer.sbc.su.se/download/software/MSPcrunch+Blixem/benchmark.tar.gz

Contact: Kristoffer.Forslund{at}sbc.su.se

Supplementary Information: See journal webpage.

Associate Editor: Prof. Dmitrij Frishman


Received on March 16, 2009; revised on July 14, 2009; accepted on July 15, 2009

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
G. Ostlund, T. Schmitt, K. Forslund, T. Kostler, D. N. Messina, S. Roopra, O. Frings, and E. L. L. Sonnhammer
InParanoid 7: new algorithms and tools for eukaryotic orthology analysis
Nucleic Acids Res., January 1, 2010; 38(suppl_1): D196 - D203.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.