Bioinformatics Advance Access originally published online on February 26, 2008
Bioinformatics 2008 24(7):1024-1025; doi:10.1093/bioinformatics/btn058
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
FTFD: an informatics pipeline supporting phylogenomic analysis of fungal transcription factors
1Fungal Bioinformatics Laboratory, 2Department of Agricultural Biotechnology, 3Center for Fungal Genetic Resource, 4Center for Agricultural Biomaterials, Seoul National University, Seoul 151-921, Korea and 5Pennylvenia State University, University Park, PA 16802, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: Genomes of more than 60 fungal species have been sequenced to date, yet there has been no systematic approach to analyze fungal transcription factors (TFs) kingdom widely. We developed a standardized pipeline for annotating TFs in fungal genomes. Resulting data have been archived in a new database termed the Fungal Transcription Factor Database (FTFD). In FTFD, 31 832 putative fungal TFs, identified from 62 fungal and 3 Oomycete species, were classified into 61 families and phylogenetically analyzed. The FTFD will serve as a community resource supporting comparative analyses of the distribution and domain structure of TFs within and across species.
Availability: All data described in this study can be browsed through the FTFD web site at http://ftfd.snu.ac.kr/.
Contact: yonglee{at}snu.ac.kr
| 1 INTRODUCTION |
|---|
|
|
|---|
Regulation of gene expression at the transcriptional level plays an essential role in modulating diverse biological processes. Rapid increase in the number of completely sequenced genomes offers a great opportunity for studying the function and evolution of transcription factors (TFs) at multiple phylogenetic levels (Park, et al., 2006). A large number of hypothetical TFs have been identified from various sequenced organisms and classified in to several databases (e.g., Kummerfeld and Teichmann, 2006; Wu, et al., 2007). Such databases, in combination with computational and high-throughput experimental approaches, provide a foundation for analyzing the distribution, function and possible evolutionary histories of TFs. Although genomes of more than 60 fungal species have been publicly released with more than 20 fungal species currently being sequenced (Park, et al., 2008), to our knowledge, no informatics platform that supports the identification, classification and comparison of fungal TFs exists. To address this deficiency, we developed the Fungal Transcription Factor Database (FTFD; http://ftfd.snu.ac.kr). All fungal genome databases that were used to construct the FTFD are available in the data warehouse of the Comparative Fungal Genomics Platform (CFGP; http://cfgp.snu.ac.kr; Park, et al., 2008). The FTFD currently archives TFs in 62 fungal and 3 Oomycete species and also provides kingdom-wide phylogenetic frameworks of different TF families, predicted subcellular localization information, and key references.
| 2 PIPELINE FOR IDENTIFYING FUNGAL TFS |
|---|
|
|
|---|
To systematically identify and catalog TFs in individual genomes, standardized fungal genome databases and the criteria for classifying TFs were established. The FTFD pipeline is composed of four steps. In the first step, proteins carrying DNA-binding domains were collected from the CFGP data warehouse (Park et al., 2008) and classified as Candidate TFs using 83 different InterPro terms used to describe fungal TFs in the literature. In the second step, to remove false positives, the candidate TFs were filtered using two criteria: the minimum length for each DNA-binding domain and the minimum number of motifs. The detailed informations on these criteria are provided on the FTFD web site. Those that successfully passed this step were classified as Putative TFs. In the third step, based on published results, known TFs that had not been properly annotated in the genome databases were manually added, tagged as Characterized TFs. In the fourth, all putative TFs and characterized TFs were analyzed phylogenetically. Results of this analysis were presented in the form of phylogenetic trees on the FTFD web site using a newly developed program called the Phyloviewer (http://www.phyloviewer.org/; Park et al., unpublished). Predicted subcellular localization information and key references for individual families and TFs are also presented on the FTFD web site.
| 3 ANALYSIS OF THE PHYLOGENETIC RELATIONSHIPS AND THE DISTRIBUTION OF DNA-BINDING DOMAINS AMONG FUNGAL TFS |
|---|
|
|
|---|
From 62 fungal and 3 Oomycete species, 31 832 putative TFs were identified and classified into 61 families. The proportion of TFs among all proteins encoded by each genome ranged from 2.29 (Pneumocystis carinii, the subphylum Taphrinomycotina) to 7.15% (Rhizopus oryzae, the subphylum Mucoromycotina). In comparison, the proportions in human and mouse are 9.8% and 9.4%, respectively (Kummerfeld and Teichmann, 2006). At the phylum level, the average proportion of TFs ranges from 3.12 to 6.31% with the subphylum Mucoromycotina having the highest proportion.
To investigate distribution patterns of 61 TF families across the analyzed species and potential associations between families, proportions of individual TF families relative to the total proteins were calculated for each species, and resulting data were presented as TF Matrix, colored-tiles that indicate relative proportion of each family across species in a manner analogous to microarray data (i.e. indicating whether the proportion of a particular family in each species is below or above the average proportion across all species). Hierarchal cluster analysis using the TF Matrix data revealed seven clusters. These clusters exhibited phylum-, subphylum- or fungal-specific patterns which are presented on the FTFD web site. One cluster includes five fungal-specific TF families.
Certain TFs carry more than one type of DNA-binding domain, suggesting their interactions with multiple discrete regulatory sequences (Darling et al., 1998). We performed a kingdom-wide analysis of the co-presence of different DNA-binding domains, and resulting data were presented in the form of diagram in a section named TF family map. On the map, DNA-binding domains that are present in same TF(s) are connected via a line with its thickness presenting a relative number of TFs having the particular combination of DNA-binding domains. This diagram would provide the basis for understanding the evolutionary history of fungal TFs based on a domain shuffling theory (Riechmann, et al., 2000).
On the FTFD web site, neighbor-joining phylogenetic trees (bootstrapped with 10 000 repeats) including all TFs were displayed via the Phyloviewer (Park et al., unpublished). These trees show phylogenetic groups defined based on their sequence similarity and domain architecture. Relationships within most of the phylogenetic groups agreed with the taxonomical relationships among the species carrying these TFs.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
This research was partially supported by grants from Crop Functional Genomics Center (CG1141) and Microbial Genomics and Applications Center (0462-20060021) of the 21st Century Frontier Research Program funded by the Ministry of Science and Technology and a grant from Biogreen21 Project (20050401034629) funded by Rural Development Administration to Y.H.L. J.P. thanks a graduate fellowship provided by the Ministry of Education through the Brain Korea 21 Project.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Jonathan Wren
Received on November 18, 2007; revised on January 17, 2008; accepted on February 10, 2008
| REFERENCES |
|---|
|
|
|---|
Darling DS, et al. A zinc finger homeodomain transcription factor binds specific thyroid hormone response elements. Mol. Cell Endocrinol, ( (1998) ) 139, : 25–35.[CrossRef][ISI][Medline].
Kummerfeld SK, Teichmann SA. DBD: a transcription factor prediction database. Nucl. Acids Res, ( (2006) ) 34, : D74–D81.
Park J, et al. A comparative genome-wide analysis of GATA transcription factors in fungi. Genomics & Informatics, ( (2006) ) 4, : 147–160..
Park J, et al. CFGP: A Web-based, Comparative Fungal Genomics Platform. Nucl. Acids Res, ( (2008) ) 36, : D562–D571.
Riechmann JL, et al. Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science, ( (2000) ) 290, : 2105–2110.
Wu J, et al. cTFbase: a database for comparative genomics of transcription factors in cyanobacteria. BMC genomics, ( (2007) ) 8, : 104.[CrossRef][Medline].
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||