Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (50)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Promponas, V. J.
Right arrow Articles by Ouzounis, C. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Promponas, V. J.
Right arrow Articles by Ouzounis, C. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics Vol. 16 no. 10 2000
Pages 915-922
© 2000 Oxford University Press


Original Paper

CAST: an iterative algorithm for the complexity analysis of sequence tracts

Vasilis J. Promponas 1, Anton J. Enright 2, Sophia Tsoka 2, David P. Kreil 3, Christophe Leroy 4, Stavros Hamodrakas 1, Chris Sander 4 and Christos A. Ouzounis 2,

1 Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Athens GR-15701, Greece
2 Computational Genomics Group
3 SRS Team, Research Programme, The European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge CB10 1SD, UK
4 Millennium Pharmaceuticals Inc., 640 Memorial Drive, Cambridge, MA 02139, USA

Received on February 8, 2000 ; revised on April 11, 2000 ; accepted on April 18, 2000

Motivation: Sensitive detection and masking of low-complexity regions in protein sequences. Filtered sequences can be used in sequence comparison without the risk of matching compositionally biased regions. The main advantage of the method over similar approaches is the selective masking of single residue types without affecting other, possibly important, regions.

Results: A novel algorithm for low-complexity region detection and selective masking. The algorithm is based on multiple-pass Smith–Waterman comparison of the query sequence against twenty homopolymers with infinite gap penalties. The output of the algorithm is both the masked query sequence for further analysis, e.g. database searches, as well as the regions of low complexity. The detection of low-complexity regions is highly specific for single residue types. It is shown that this approach is sufficient for masking database query sequences without generating false positives. The algorithm is benchmarked against widely available algorithms using the 210 genes of Plasmodium falciparum chromosome 2, a dataset known to contain a large number of low-complexity regions.

Availability: CAST (version 1.0) executable binaries are available to academic users free of charge under license. Web site entry point, server and additional material: http://www.ebi.ac.uk/research/cgg/services/cast/

Contact: ouzounis{at}ebi.ac.uk

To whom correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
H. S. Ooi, C. Y. Kwo, M. Wildpaner, F. L. Sirota, B. Eisenhaber, S. Maurer-Stroh, W. C. Wong, A. Schleiffer, F. Eisenhaber, and G. Schneider
ANNIE: integrated de novo protein sequence annotation
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W435 - W440.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
I. B. Kuznetsov
ProBias: a web-server for the identification of user-specified types of compositionally biased segments in protein sequences
Bioinformatics, July 1, 2008; 24(13): 1534 - 1535.
[Abstract] [Full Text] [PDF]


Home page
J. Cell Sci.Home page
I. B.-R. Ramirez, C. L. de Graffenried, I. Ebersberger, J. Yelinek, C. Y. He, A. Price, and G. Warren
TbG63, a golgin involved in Golgi architecture in Trypanosoma brucei
J. Cell Sci., May 1, 2008; 121(9): 1538 - 1546.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
A. Prachumwat and W.-H. Li
Gene number expansion and contraction in vertebrate genomes with respect to invertebrate genomes
Genome Res., February 1, 2008; 18(2): 221 - 232.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Heger, E. Korpelainen, T. Hupponen, K. Mattila, V. Ollikainen, and L. Holm
PairsDB atlas of protein sequence space
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D276 - D280.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
X. Li and T. Kahveci
A Novel algorithm for identifying low-complexity regions in a protein sequence
Bioinformatics, December 15, 2006; 22(24): 2980 - 2987.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
J.-M. Mouillon, P. Gustafsson, and P. Harryson
Structural Investigation of Disordered Stress Proteins. Comparison of Full-Length Dehydrins with Isolated Peptides of Their Conserved Segments
Plant Physiology, June 1, 2006; 141(2): 638 - 650.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
I. B. Kuznetsov and S. Hwang
A novel sensitive method for the detection of user-defined compositional bias in biological sequences
Bioinformatics, May 1, 2006; 22(9): 1055 - 1063.
[Abstract] [Full Text] [PDF]


Home page
Eukaryot CellHome page
J. E. Coronado, O. Attie, S. L. Epstein, W.-G. Qiu, and P. N. Lipke
Composition-Modified Matrices Improve Identification of Homologs of Saccharomyces cerevisiae Low-Complexity Glycoproteins.
Eukaryot. Cell, April 1, 2006; 5(4): 628 - 637.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
L. Goldovsky, P. Janssen, D. Ahren, B. Audit, I. Cases, N. Darzentas, A. J. Enright, N. Lopez-Bigas, J. M. Peregrin-Alvarez, M. Smith, et al.
CoGenT++: an extensive and extensible data environment for computational genomics
Bioinformatics, October 1, 2005; 21(19): 3806 - 3810.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. Coeytaux and A. Poupon
Prediction of unfolded segments in a protein sequence based on amino acid composition
Bioinformatics, May 1, 2005; 21(9): 1891 - 1900.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
K. Horan, J. Lauricha, J. Bailey-Serres, N. Raikhel, and T. Girke
Genome Cluster Database. A Sequence Family Analysis Platform for Arabidopsis and Rice
Plant Physiology, May 1, 2005; 138(1): 47 - 54.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. W. Shin and S. M. Kim
A new algorithm for detecting low-complexity regions in protein sequences
Bioinformatics, January 15, 2005; 21(2): 160 - 170.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
R. M.R. Coulson, N. Hall, and C. A. Ouzounis
Comparative Genomics of Transcriptional Control in the Human Malaria Parasite Plasmodium falciparum
Genome Res., August 1, 2004; 14(8): 1548 - 1554.
[Abstract] [Full Text] [PDF]


Home page
J. Cell Sci.Home page
A. Lorenz, J. L. Wells, D. W. Pryce, M. Novatchkova, F. Eisenhaber, R. J. McFarlane, and J. Loidl
S. pombe meiotic linear elements contain proteins related to synaptonemal complex components
J. Cell Sci., July 1, 2004; 117(15): 3343 - 3351.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. K. Papasaikas, P. G. Bagos, Z. I. Litou, V. J. Promponas, and S. J. Hamodrakas
PRED-GPCR: GPCR recognition and family classification server
Nucleic Acids Res., July 1, 2004; 32(suppl_2): W380 - W382.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
N. Lopez-Bigas and C. A. Ouzounis
Genome-wide identification of genes likely to be involved in human genetic disease
Nucleic Acids Res., June 4, 2004; 32(10): 3108 - 3114.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. J. Enright, V. Kunin, and C. A. Ouzounis
Protein families and TRIBES in genome sequence space
Nucleic Acids Res., August 1, 2003; 31(15): 4632 - 4638.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
J. M. Peregrin-Alvarez, S. Tsoka, and C. A. Ouzounis
The Phylogenetic Extent of Metabolic Enzymes and Pathways
Genome Res., March 1, 2003; 13(3): 422 - 427.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. M. R. Coulson and C. A. Ouzounis
The phylogenetic diversity of eukaryotic transcription
Nucleic Acids Res., January 15, 2003; 31(2): 653 - 660.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
I. Rigoutsos, T. Huynh, A. Floratos, L. Parida, and D. Platt
Dictionary-driven protein annotation
Nucleic Acids Res., September 1, 2002; 30(17): 3901 - 3916.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. J. Enright, S. Van Dongen, and C. A. Ouzounis
An efficient algorithm for large-scale detection of protein families
Nucleic Acids Res., April 1, 2002; 30(7): 1575 - 1584.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. J. Janssen, B. Audit, and C. A. Ouzounis
Strain-specific genes of Helicobacter pylori: distribution, function and dynamics
Nucleic Acids Res., November 1, 2001; 29(21): 4395 - 4404.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
M. Karmirantzou and S.J. Hamodrakas
A Web-based classification system of DNA-binding protein families
Protein Eng. Des. Sel., July 1, 2001; 14(7): 465 - 472.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
S. Tsoka and C. A. Ouzounis
Functional Versatility and Molecular Diversity of the Metabolic Map of Escherichia coli
Genome Res., September 1, 2001; 11(9): 1503 - 1510.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.