Skip Navigation


Bioinformatics Advance Access originally published online on February 18, 2007
Bioinformatics 2007 23(8):917-925; doi:10.1093/bioinformatics/btm048
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/8/917    most recent
btm048v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Hou, M.
Right arrow Articles by Harris, R. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hou, M.
Right arrow Articles by Harris, R. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

HomologMiner: looking for homologous genomic groups in whole genomes

Minmei Hou *, Piotr Berman , Chih-Hao Hsu and Robert S. Harris

Department of Computer Science & Engineering, Penn State University, PA, USA

*To whom correspondence should be addressed.


   Abstract

Motivation: Complex genomes contain numerous repeated sequences, and genomic duplication is believed to be a main evolutionary mechanism to obtain new functions. Several tools are available for de novo repeat sequence identification, and many approaches exist for clustering homologous protein sequences. We present an efficient new approach to identify and cluster homologous DNA sequences with high accuracy at the level of whole genomes, excluding low-complexity repeats, tandem repeats and annotated interspersed repeats. We also determine the boundaries of each group member so that it closely represents a biological unit, e.g. a complete gene, or a partial gene coding a protein domain.

Results: We developed a program called HomologMiner to identify homologous groups applicable to genome sequences that have been properly marked for low-complexity repeats and annotated interspersed repeats. We applied it to the whole genomes of human (hg17), macaque (rheMac2) and mouse (mm8). Groups obtained include gene families (e.g. olfactory receptor gene family, zinc finger families), unannotated interspersed repeats and additional homologous groups that resulted from recent segmental duplications. Our program incorporates several new methods: a new abstract definition of consistent duplicate units, a new criterion to remove moderately frequent tandem repeats, and new algorithmic techniques. We also provide preliminary analysis of the output on the three genomes mentioned above, and show several applications including identifying boundaries of tandem gene clusters and novel interspersed repeat families.

Availability: All programs and datasets are downloadable from www.bx.psu.edu/miller_lab

Contact: mhou{at}cse.psu.edu

Associate Editor: Limsoon Wong


Received on November 21, 2006; revised on January 22, 2007; accepted on February 6, 2007

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.