Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (7)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Li, L.
Right arrow Articles by Wan, H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Li, L.
Right arrow Articles by Wan, H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics 20(3) © Oxford University Press 2004; all rights reserved.

Pseudo-periodic partitions of biological sequences

Lugang Li 1,2,{dagger}, Renchao Jin 2,3,4,{dagger}, Poh-Lin Kok 2 and Honghui Wan 2,5,6,*

1 Department of Protective Medicine, Nanjing Army Medical College, The Second Army Medical University, Nanjing, Jiangsu 210099, China, 2 Laboratory of Bioinformatics, Maryland Institute of Dynamic Genomics, 3910 Jeffry Street, Silver Spring, MD 20906, USA, 3 Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19139, USA, 4 School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China, 5 Global Bioinformatics Laboratory, National Center for Genome Resources, 2935 Rodeo Park Drive East, Santa Fe, NM 87505, USA and 6 National Center for Toxicogenomics, National Institute of Environmental Health Sciences, National Institutes of Health, P.O. Box 12233, Mail Drop F1-05, 111 T. W. Alexander Drive, Research Triangle Park, NC 27709, USA

Received on February 2, 2003 ; revised on August 3, 2003 ; accepted on August 5, 2003

Motivation: Algorithm development for finding typical patterns in sequences, especially multiple pseudo-repeats (pseudo-periodic regions), is at the core of many problems arising in biological sequence and structure analysis. In fact, one of the most significant features of biological sequences is their high quasi-repetitiveness. Variation in the quasi-repetitiveness of genomic and proteomic texts demonstrates the presence and density of different biologically important information. It is very important to develop sensitive automatic computational methods for the identification of pseudo-periodic regions of sequences through which we can infer, describe and understand biological properties, and seek precise molecular details of biological structures, dynamics, interactions and evolution.

Results: We develop a novel, powerful computational tool for partitioning a sequence to pseudo-periodic regions. The pseudo-periodic partition is defined as a partition, which intuitively has the minimal bias to some perfect-periodic partition of the sequence based on the evolutionary distance. We devise a quadratic time and space algorithm for detecting a pseudo-periodic partition for a given sequence, which actually corresponds to the shortest path in the main diagonal of the directed (acyclic) weighted graph constructed by the Smith–Waterman self-alignment of the sequence. We use several typical examples to demonstrate the utilization of our algorithm and software system in detecting functional or structural domains and regions of proteins. A big advantage of our software program is that there is a parameter, the granularity factor, associated with it and we can freely choose a biological sequence family as a training set to determine the best parameter. In general, we choose all repeats (including many pseudo-repeats) in the SWISS-PROT amino acid sequence database as a typical training set. We show that the granularity factor is 0.52 and the average agreement accuracy of pseudo-periodic partitions, detected by our software for all pseudo-repeats in the SWISS-PROT database, is as high as 97.6%.

Availability: The program is available upon request from Honghui Wan and will be also available at http://www.mindgen.org

Contact: hwan{at}mindgen.org

* To whom correspondence should be addressed.

{dagger} The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
V. Boeva, M. Regnier, D. Papatsenko, and V. Makeev
Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression
Bioinformatics, March 15, 2006; 22(6): 676 - 684.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.