Bioinformatics Vol. 18 no. 5 2002
Pages 672-678
© 2002 Oxford University Press
Detecting cryptically simple protein sequences using the SIMPLE algorithm
1 Grup de Recerca en Informatica Biomèdica,
Universitat Pompeu Fabra, Dr. Aiguader 80, 08003 Barcelona, Spain
2 Department of Crystallography, Birkbeck College,
Malet Street, London WC1E 7HX, UK
3 Department of Computer Science, Royal Holloway University of London,
Egham, Surrey TW20 0EX, UK
Received on August 22, 2001
; revised on November 15, 2001
; accepted on January 6, 2002
Motivation: Low-complexity or cryptically simple sequences are widespread in protein sequences but their evolution and function are poorly understood. To date methods for the detection of low complexity in proteins have been directed towards the filtering of such regions prior to sequence homology searches but not to the analysis of the regions per se. However, many of these regions are encoded by non-repetitive DNA sequences and may therefore result from selection acting on protein structure and/or function.
Results: We have developed a new tool, based on the SIMPLE algorithm, that facilitates the quantification of the amount of simple sequence in proteins and determines the type of short motifs that show clustering above a certain threshold. By modifying the sensitivity of the program simple sequence content can be studied at various levels, from highly organised tandem structures to complex combinations of repeats. We compare the relative amount of simplicity in different functional groups of yeast proteins and determine the level of clustering of the different amino acids in these proteins.
Availability: The program is available on request or online at http://www.biochem.ucl.ac.uk/bsm/SIMPLE
Contact: j.hancock{at}cs.rhul.ac.uk
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. Merkel and N. Gemmell Detecting short tandem repeats from genome data: opening the software black box Brief Bioinform, September 1, 2008; 9(5): 355 - 366. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Li and T. Kahveci A Novel algorithm for identifying low-complexity regions in a protein sequence Bioinformatics, December 15, 2006; 22(24): 2980 - 2987. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Coronado, O. Attie, S. L. Epstein, W.-G. Qiu, and P. N. Lipke Composition-Modified Matrices Improve Identification of Homologs of Saccharomyces cerevisiae Low-Complexity Glycoproteins. Eukaryot. Cell, April 1, 2006; 5(4): 628 - 637. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. J. McTaggart and T. J. Crease Selection on the Structural Stability of a Ribosomal RNA Expansion Segment in Daphnia obtusa Mol. Biol. Evol., May 1, 2005; 22(5): 1309 - 1319. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. G. Faux, S. P. Bottomley, A. M. Lesk, J. A. Irving, J. R. Morrison, M. G. de la Banda, and J. C. Whisstock Functional insights from the distribution and role of homopeptide repeat-containing proteins Genome Res., April 1, 2005; 15(4): 537 - 551. [Abstract] [Full Text] [PDF] |
||||
![]() |
A.-M. Mallon, L. Wilming, J. Weekes, J. G.R. Gilbert, J. Ashurst, S. Peyrefitte, L. Matthews, M. Cadman, R. McKeone, C. A. Sellick, et al. Organization and Evolution of a Gene-Rich Region of the Mouse Genome: A 12.7-Mb Region Deleted in the Del(13)Svea36H Mouse Genome Res., October 1, 2004; 14(10a): 1888 - 1901. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. L. Orlov and V. N. Potapov Complexity: an internet resource for analysis of DNA sequence complexity Nucleic Acids Res., July 1, 2004; 32(suppl_2): W628 - W633. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Huntley and G. B. Golding Neurological Proteins Are Not Enriched For Repetitive Sequences Genetics, March 1, 2004; 166(3): 1141 - 1154. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Ponte, R. Vila, and P. Suau Sequence Complexity of Histone H1 Subtypes Mol. Biol. Evol., March 1, 2003; 20(3): 371 - 380. [Abstract] [Full Text] [PDF] |
||||






