De novo identification of repeat families in large genomes
Department of Computer Science and Engineering, University of California San Diego La Jolla, CA 92093-0114, USA
*To whom correspondence should be addressed.
Every time we compare two species that are closer to each other than either is to humans, we get nearly killed by unmasked repeats.Webb Miller (Personal communication)
Motivation: De novo repeat family identification is a challenging algorithmic problem of great practical importance. As the number of genome sequencing projects increases, there is a pressing need to identify the repeat families present in large, newly sequenced genomes. We develop a new method for de novo identification of repeat families via extension of consensus seeds; our method enables a rigorous definition of repeat boundaries, a key issue in repeat analysis.
Results: Our RepeatScout algorithm is more sensitive and is orders of magnitude faster than RECON, the dominant tool for de novo repeat family identification in newly sequenced genomes. Using RepeatScout, we estimate that
2% of the human genome and 4% of mouse and rat genomes consist of previously unannotated repetitive sequence.
Availability: Source code is available for download at http://www-cse.ucsd.edu/groups/bioinformatics/software.html
Contact: ppevzner{at}cs.ucsd.edu
Received on January 15, 2005; accepted on March 27, 2005
This article has been cited by other articles:
![]() |
C. Feschotte, U. Keswani, N. Ranganathan, M. L. Guibotsy, and D. Levine Exploring Repetitive DNA Landscapes Using REPCLASS, a Tool That Automates the Classification of Transposable Elements in Eukaryotic Genomes Gen Biol Evol, August 12, 2009; 2009(0): 205 - 220. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Abrusan, N. Grundmann, L. DeMester, and W. Makalowski TEclass--a tool for automated classification of unknown eukaryotic transposable elements Bioinformatics, May 15, 2009; 25(10): 1329 - 1330. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. A. Mueller, R. K. Lankhorst, S. D. Tanksley, J. J. Giovannoni, R. White, J. Vrebalov, Z. Fei, J. van Eck, R. Buels, A. A. Mills, et al. A Snapshot of the Emerging Tomato Genome Sequence The Plant Genome, March 1, 2009; 2(1): 78 - 92. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Paten, J. Herrero, K. Beal, S. Fitzgerald, and E. Birney Enredo and Pecan: Genome-wide mammalian consistency-based multiple alignment with paralogs Genome Res., November 1, 2008; 18(11): 1814 - 1828. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Saha, S. Bridges, Z. V. Magbanua, and D. G. Peterson Empirical comparison of ab initio repeat finding programs Nucleic Acids Res., April 1, 2008; 36(7): 2284 - 2294. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. A. Kronmiller and R. P. Wise TEnest: Automated Chronological Annotation and Visualization of Nested Plant Transposable Elements Plant Physiology, January 1, 2008; 146(1): 45 - 59. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. M. Bergman and H. Quesneville Discovering and detecting transposable elements in genome sequences Brief Bioinform, November 1, 2007; 8(6): 382 - 392. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. K. Hane, R. G.T. Lowe, P. S. Solomon, K.-C. Tan, C. L. Schoch, J. W. Spatafora, P. W. Crous, C. Kodira, B. W. Birren, J. E. Galagan, et al. Dothideomycete Plant Interactions Illuminated by Genome Sequencing and EST Analysis of the Wheat Pathogen Stagonospora nodorum PLANT CELL, November 1, 2007; 19(11): 3347 - 3368. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Hou, P. Berman, C.-H. Hsu, and R. S. Harris HomologMiner: looking for homologous genomic groups in whole genomes Bioinformatics, April 15, 2007; 23(8): 917 - 925. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Achaz, F. Boyer, E. P. C. Rocha, A. Viari, and E. Coissac Repseek, a tool to retrieve approximate repeats from large DNA sequences Bioinformatics, January 1, 2007; 23(1): 119 - 121. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Chaisson, B. J. Raphael, and P. A. Pevzner Microinversions in mammalian evolution PNAS, December 26, 2006; 103(52): 19824 - 19829. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Tempel, M. Giraud, D. Lavenier, I.-C. Lerman, A.-S. Valin, I. Couee, A. E. Amrani, and J. Nicolas Domain organization within repeated DNA sequences: application to the study of a family of transposable elements Bioinformatics, August 15, 2006; 22(16): 1948 - 1954. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Toth, G. Deak, E. Barta, and G. B. Kiss PLOTREP: a web tool for defragmentation and visual analysis of dispersed genomic repeats. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W708 - W713. [Abstract] [Full Text] [PDF] |
||||








