Bioinformatics Advance Access originally published online on February 4, 2005
Bioinformatics 2005 21(9):1789-1796; doi:10.1093/bioinformatics/bti307
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A comparative analysis of relative occurrence of transcription factor binding sites in vertebrate genomes and gene promoter areas
1Center for Biomedical Genomics and BioInformatics, Molecular and Microbiology Department, College of Arts and Sciences, George Mason University Fairfax, VA 22031, USA
2Vavilov Institute of General Genetics Gubkina str, 3, GSP-1, 111991, Moscow, Russia
3Russian Center of Haematology Moscow 125167, Russia
*To whom correspondence should be addressed.
Motivation: The detection of transcription factor binding sites (TFBS) in genomic sequences is a basic task for elucidating the transcriptional aspects of gene regulation. Evaluation procedures applicable to the TFBS prediction outputs need improvement. Predicted TFBS located outside of the transcription associated areas are often neglected from the functional and the evolutionary points of view, therefore deserving a systematic overview.
Results: We calculated theoretical occurrences of 184 TFBS according to their position weight matrices and the dinucleotide statistics of the completed vertebrate genomes, then performed a TFBS prediction in the corresponding complete genomic sequences and their repeat-free, repetitive and regulatory fractions. Repeat-free fractions of the closely related mammalian genomes were characterized by strong similarities in TFBS occurrences. A significant over-representation of multiple TFBS was found in both repetitive and non-repetitive genome fractions.
Availability: F-values and real TFBS occurrences calculated for human, chimp, mouse, rat, zebrafish and fugu genomes are available for free download at http://www.gmu.edu/departments/mmb/baranova/pages/bioinformatics
Contact: abaranov{at}gmu.edu