Bioinformatics Vol. 19 no. 13 2003
Pages 1672-1681
© 2003 Oxford University Press
Comparison of sequence masking algorithms and the detection of biased protein sequence regions
1 Department of Genetics/Inference Group (Cavendish Laboratory), University of Cambridge, Cambridge, UK and 2 Computational Genomics Group, The European Bioinformatics Institute, EMBL Outstation Cambridge CB10 1SD, UK
Received on October 25, 2002
; revised on February 7, 2003
; accepted on March 4, 2003
Motivation: Separation of protein sequence regions according to their local information complexity and subsequent masking of low complexity regions has greatly enhanced the reliability of function prediction by sequence similarity. Comparisons with alternative methods that focus on compositional sequence bias rather than information complexity measures have shown that removal of compositional bias yields at least as sensitive and much more specific results. Besides the application of sequence masking algorithms to sequence similarity searches, the study of the masked regions themselves is of great interest. Traditionally, however, these have been neglected despite evidence of their functional relevance.
Results: Here we demonstrate that compositional bias seems to be a more effective measure for the detection of biologically meaningful signals. Typical results on proteins are compared to results for sequences that have been randomized in various ways, conserving composition and local correlations for individual proteins or the entire set. It is remarkable that low-complexity regions have the same form of distribution in proteins as in randomized sequences, and that the signal from randomized sequences with conserved local correlations and amino acid composition almost matches the signal from proteins. This is not the case for sequence bias, which hence seems to be a genuinely biological phenomenon in contrast to patches of low complexity.
Availability: Software in executable form is available on request from the authors.
Supplementary information: There is an online supplement with additional supporting figures. (http://www.inference.phy.cam.ac.uk/dpk20/sup/)
Contact: kreil{at}ebi.ac.uk
* To whom correspondence should be addressed at: Computational Genomics Group, The European Bioinformatics Institute, EMBL Outstation Cambridge CB10 1SD, UK
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
I. B. Kuznetsov ProBias: a web-server for the identification of user-specified types of compositionally biased segments in protein sequences Bioinformatics, July 1, 2008; 24(13): 1534 - 1535. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. B. Kuznetsov and S. Hwang A novel sensitive method for the detection of user-defined compositional bias in biological sequences Bioinformatics, May 1, 2006; 22(9): 1055 - 1063. [Abstract] [Full Text] [PDF] |
||||
