Skip Navigation


Bioinformatics Advance Access originally published online on December 18, 2007
Bioinformatics 2008 24(4):468-476; doi:10.1093/bioinformatics/btm613
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/4/468    most recent
btm613v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Li, X.
Right arrow Articles by Settles, A. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Li, X.
Right arrow Articles by Settles, A. M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

A novel genome-scale repeat finder geared towards transposons

Xuehui Li 1,*, Tamer Kahveci 1 and A. Mark Settles 2

1CISE Department and 2Plant Molecular and Cellular Biology Program and Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA

*To whom correspondence should be addressed.


   Abstract

Motivation: Repeats are ubiquitous in genomes and play important roles in evolution. Transposable elements are a common kind of repeat. Transposon insertions can be nested and make the task of identifying repeats difficult.

Results: We develop a novel iterative algorithm, called Greedier, to find repeats in a target genome given a repeat library. Greedier distinguishes itself from existing methods by taking into account the fragmentation of repeats. Each iteration consists of two passes. In the first pass, it identifies the local similarities between the repeat library and the target genome. Greedier then builds graphs from this comparison output. In each graph, a vertex denotes a similar subsequence pair. Edges denote pairs of subsequences that can be connected to form higher similarities. In the second pass, Greedier traverses these graphs greedily to find matches to individual repeat units in the repeat library. It computes a fitness value for each such match denoting the similarity of that match. Matches with fitness values greater than a cutoff are removed, and the rest of the genome is stitched together. The similarity cutoff is then gradually reduced, and the iteration is repeated until no hits are returned from the comparison. Our experiments on the Arabidopsis and rice genomes show that Greedier identifies approximately twice as many transposon bases as those found by cross_match and WindowMasker. Moreover, Greedier masks far fewer false positive bases than either cross_match or WindowMasker. In addition to masking repeats, Greedier also reports potential nested transposon structures.

Contact: xli{at}cise.ufl.edu

Associate Editor: Alfonso Valencia


Received on March 15, 2007; revised on December 10, 2007; accepted on December 10, 2007

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.