Bioinformatics Vol. 16 no. 10 2000
Pages 915-922
© 2000 Oxford University Press
Original Paper |
CAST: an iterative algorithm for the complexity analysis of sequence tracts
1 Department of Cell Biology and Biophysics,
Faculty of Biology, University of Athens, Athens GR-15701, Greece
2 Computational Genomics Group
3 SRS Team, Research Programme, The European
Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge CB10
1SD, UK
4 Millennium Pharmaceuticals Inc., 640
Memorial Drive, Cambridge, MA 02139, USA
Received on February 8, 2000
; revised on April 11, 2000
; accepted on April 18, 2000
Motivation: Sensitive detection and masking of low-complexity regions in protein sequences. Filtered sequences can be used in sequence comparison without the risk of matching compositionally biased regions. The main advantage of the method over similar approaches is the selective masking of single residue types without affecting other, possibly important, regions.
Results: A novel algorithm for low-complexity region detection and selective masking. The algorithm is based on multiple-pass SmithWaterman comparison of the query sequence against twenty homopolymers with infinite gap penalties. The output of the algorithm is both the masked query sequence for further analysis, e.g. database searches, as well as the regions of low complexity. The detection of low-complexity regions is highly specific for single residue types. It is shown that this approach is sufficient for masking database query sequences without generating false positives. The algorithm is benchmarked against widely available algorithms using the 210 genes of Plasmodium falciparum chromosome 2, a dataset known to contain a large number of low-complexity regions.
Availability: CAST (version 1.0) executable binaries are available to academic users free of charge under license. Web site entry point, server and additional material: http://www.ebi.ac.uk/research/cgg/services/cast/
Contact: ouzounis{at}ebi.ac.uk
To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
H. S. Ooi, C. Y. Kwo, M. Wildpaner, F. L. Sirota, B. Eisenhaber, S. Maurer-Stroh, W. C. Wong, A. Schleiffer, F. Eisenhaber, and G. Schneider ANNIE: integrated de novo protein sequence annotation Nucleic Acids Res., July 1, 2009; 37(suppl_2): W435 - W440. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. B. Kuznetsov ProBias: a web-server for the identification of user-specified types of compositionally biased segments in protein sequences Bioinformatics, July 1, 2008; 24(13): 1534 - 1535. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. B.-R. Ramirez, C. L. de Graffenried, I. Ebersberger, J. Yelinek, C. Y. He, A. Price, and G. Warren TbG63, a golgin involved in Golgi architecture in Trypanosoma brucei J. Cell Sci., May 1, 2008; 121(9): 1538 - 1546. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Prachumwat and W.-H. Li Gene number expansion and contraction in vertebrate genomes with respect to invertebrate genomes Genome Res., February 1, 2008; 18(2): 221 - 232. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Heger, E. Korpelainen, T. Hupponen, K. Mattila, V. Ollikainen, and L. Holm PairsDB atlas of protein sequence space Nucleic Acids Res., January 11, 2008; 36(suppl_1): D276 - D280. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Li and T. Kahveci A Novel algorithm for identifying low-complexity regions in a protein sequence Bioinformatics, December 15, 2006; 22(24): 2980 - 2987. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-M. Mouillon, P. Gustafsson, and P. Harryson Structural Investigation of Disordered Stress Proteins. Comparison of Full-Length Dehydrins with Isolated Peptides of Their Conserved Segments Plant Physiology, June 1, 2006; 141(2): 638 - 650. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. B. Kuznetsov and S. Hwang A novel sensitive method for the detection of user-defined compositional bias in biological sequences Bioinformatics, May 1, 2006; 22(9): 1055 - 1063. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Coronado, O. Attie, S. L. Epstein, W.-G. Qiu, and P. N. Lipke Composition-Modified Matrices Improve Identification of Homologs of Saccharomyces cerevisiae Low-Complexity Glycoproteins. Eukaryot. Cell, April 1, 2006; 5(4): 628 - 637. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Goldovsky, P. Janssen, D. Ahren, B. Audit, I. Cases, N. Darzentas, A. J. Enright, N. Lopez-Bigas, J. M. Peregrin-Alvarez, M. Smith, et al. CoGenT++: an extensive and extensible data environment for computational genomics Bioinformatics, October 1, 2005; 21(19): 3806 - 3810. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Coeytaux and A. Poupon Prediction of unfolded segments in a protein sequence based on amino acid composition Bioinformatics, May 1, 2005; 21(9): 1891 - 1900. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Horan, J. Lauricha, J. Bailey-Serres, N. Raikhel, and T. Girke Genome Cluster Database. A Sequence Family Analysis Platform for Arabidopsis and Rice Plant Physiology, May 1, 2005; 138(1): 47 - 54. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. W. Shin and S. M. Kim A new algorithm for detecting low-complexity regions in protein sequences Bioinformatics, January 15, 2005; 21(2): 160 - 170. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. M.R. Coulson, N. Hall, and C. A. Ouzounis Comparative Genomics of Transcriptional Control in the Human Malaria Parasite Plasmodium falciparum Genome Res., August 1, 2004; 14(8): 1548 - 1554. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Lorenz, J. L. Wells, D. W. Pryce, M. Novatchkova, F. Eisenhaber, R. J. McFarlane, and J. Loidl S. pombe meiotic linear elements contain proteins related to synaptonemal complex components J. Cell Sci., July 1, 2004; 117(15): 3343 - 3351. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. K. Papasaikas, P. G. Bagos, Z. I. Litou, V. J. Promponas, and S. J. Hamodrakas PRED-GPCR: GPCR recognition and family classification server Nucleic Acids Res., July 1, 2004; 32(suppl_2): W380 - W382. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Lopez-Bigas and C. A. Ouzounis Genome-wide identification of genes likely to be involved in human genetic disease Nucleic Acids Res., June 4, 2004; 32(10): 3108 - 3114. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Enright, V. Kunin, and C. A. Ouzounis Protein families and TRIBES in genome sequence space Nucleic Acids Res., August 1, 2003; 31(15): 4632 - 4638. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Peregrin-Alvarez, S. Tsoka, and C. A. Ouzounis The Phylogenetic Extent of Metabolic Enzymes and Pathways Genome Res., March 1, 2003; 13(3): 422 - 427. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. M. R. Coulson and C. A. Ouzounis The phylogenetic diversity of eukaryotic transcription Nucleic Acids Res., January 15, 2003; 31(2): 653 - 660. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Rigoutsos, T. Huynh, A. Floratos, L. Parida, and D. Platt Dictionary-driven protein annotation Nucleic Acids Res., September 1, 2002; 30(17): 3901 - 3916. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Enright, S. Van Dongen, and C. A. Ouzounis An efficient algorithm for large-scale detection of protein families Nucleic Acids Res., April 1, 2002; 30(7): 1575 - 1584. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. J. Janssen, B. Audit, and C. A. Ouzounis Strain-specific genes of Helicobacter pylori: distribution, function and dynamics Nucleic Acids Res., November 1, 2001; 29(21): 4395 - 4404. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Karmirantzou and S.J. Hamodrakas A Web-based classification system of DNA-binding protein families Protein Eng. Des. Sel., July 1, 2001; 14(7): 465 - 472. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Tsoka and C. A. Ouzounis Functional Versatility and Molecular Diversity of the Metabolic Map of Escherichia coli Genome Res., September 1, 2001; 11(9): 1503 - 1510. [Abstract] [Full Text] [PDF] |
||||






