Bioinformatics Advance Access originally published online on February 24, 2005
Bioinformatics 2005 21(10):2568-2569; doi:10.1093/bioinformatics/bti334
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
DATF: a database of Arabidopsis transcription factors


1Center for Bioinformatics Beijing 100871, Peoples Republic of China
2National Laboratory of Protein Engineering and Plant Genetic Engineering Beijing 100871, Peoples Republic of China
3College of Life Sciences, Peking University Beijing 100871, Peoples Republic of China
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Summary: We have probably developed the most comprehensive database of Arabidopsis transcription factors (DATF). The DATF contains known and predicted Arabidopsis transcription factors (1827 genes in 56 families) with the unique information of 1177 cloned sequences and many other features including 3D structure templates, EST expression information, transcription factor binding sites and nuclear location signals.
Availability: DATF is freely available at http://datf.cbi.pku.edu.cn
Contact: datf{at}mail.cbi.pku.edu.cn
| INTRODUCTION |
|---|
|
|
|---|
Transcription factors are the key regulators of gene expression and play critical roles in the life cycle of higher plants (Gong et al., 2004). Identification and classification of transcription factors in Arabidopsis thaliana are the first step towards understanding its mechanism of gene expression and regulation. A comprehensive and well-annotated database of Arabidopsis transcription factors may provide a useful resource for plant molecular biologists.
The Riechmann group (Riechmann et al., 2000; Riechmann, 2002), the Sheen lab (http://genetics.mgh.harvard.edu/sheenweb/AraTRs.html), OHIO-ATTFDB (Davuluri et al., 2003), RARTF (http://rarge.gsc.riken.jp/rartf/) and TrSDB (Hermoso et al., 2004) have compiled transcription factors in Arabidopsis and classified them into families. However, they do not classify clearly or provide enough annotation, or have limited browse or search functionality. TRANSFAC (Matys et al., 2003) database contains more information on transcription factors than the above three lists do, but the total number of Arabidopsis transcription factors it contains is only
400. Given the importance of Arabidopsis transcription factors, there is a strong need for a database that integrates multiple sources of information to give a comprehensive, genome-wide view of transcription factors in Arabidopsis. This was the goal for the database of Arabidopsis transcription factors (DATF). In particular, DATF provides the unique information of experimental cloned sequences as well as many other annotations of the Arabidopsis transcription factors.
| COLLECTION OF ARABIDOPSIS TRANSCRIPTION FACTORS |
|---|
|
|
|---|
We combined automated search and manual curation to generate a collection of Arabidopsis transcription factors as complete as possible. First, we used the gene lists and InterPro domains provided by Riechmann et al. (2000) to perform BLAST (Altschul et al., 1997), HMMER (Eddy, 1998) and Pfscan (Gattiker et al., 2002) searches against the protein sequences in the Arabidopsis genome. Second, we manually checked the above results and compared them with OHIO-ATTFDB (Davuluri et al., 2003) and the list from the Sheen lab. Conflicting cases were individually resolved according to literature and TAIR annotation (http://www.arabidopsis.org/). Third, some families such as bHLH and MADS were updated or added based on recent publications (Bailey et al., 2003; Parenicova et al., 2003; Riechmann, 2002). Finally, we identified 1789 transcription factors in Arabidopsis and classified them into 49 families.
| ANALYSIS AND ANNOTATION OF ARABIDOPSIS TRANSCRIPTION FACTORS |
|---|
|
|
|---|
We aim to provide comprehensive annotations for the transcription factors in DATF. First, DATF includes the unique information as to whether a transcription factor has been cloned by the proteomic investigation of the Arabidopsis transcription factors project (Gong et al., 2004). Among the 1789 genes, 1177 genes had been cloned, 31 of which were shown to be different from previously reported cDNA or predicted sequences (Gong et al., 2004). This information can be browsed and searched in the clone information field in DATF.
Among the 49 families, 4 (AP2/EREBP, GARP, NAC and SBP) have had 3D structures determined for at least one Arabidopsis transcription factor. Another 21 families have had 3D structures determined for at least one transcription factor in other species. Most of the known structures correspond not to the complete transcription factors but to the DNA-binding domains only. We performed BLAST (Altschul et al., 1997) search of all sequences in DATF against PDBselect (http://homepages.fh-giessen.de/~hg12640/pdbselect/) as well as sequences of new PDB entries from 2004. About half have BLAST hits with E-value <0.01, identity >30% and length of the hit segment >50 amino acids. Alignment between an Arabidopsis transcription factor and its hit segment is shown and ribbon pictures of the hit segment as well as the whole 3D structure of the PDB entry are displayed.
DATF integrates many other features of Arabidopsis transcription factors. These include: 62 binding sites for 20 transcription factor families, collected from the literature and TRANSFAC; nuclear location signals in 348 transcription factors, predicted with PredictNLS (Cokol et al., 2000 http://cubic.bioc.columbia.edu/predictNLS/); leucine zipper segments in 115 transcription factors, predicted with LZpred (Bornberg-Bauer et al., 1998 http://2zip.molgen.mpg.de/); functional domains, predicted with InterProScan (Zdobnov and Apweiler, 2001); gene duplication information, collected from TIGR (http://www.tigr.org/tdb/e2k1/ath1/ath1.shtml); EST expression information, collected from UniGene (http://www.ncbi.nlm.nih.gov/UniGene/UGOrg.cgi?TAXID=3702). Each entry in DATF is also hyperlinked with several other resources including TAIR, TIGR, MIPS (http://mips.gsf.de/proj/thal/db/), SIGnAL (http://signal.salk.edu/) and PubMed.
| IMPLEMENTATION AND USER INTERFACE |
|---|
|
|
|---|
All data and information were stored in a MySQL relational database on a Linux server. Queries to the database were implemented in PHP scripts running in an Apache/PHP environment. Graphics were drawn using the PHP module of the GD graphics library. Three-dimensional structure drawings were created using Molscript (Esnouf, 1997 http://www.avatar.se/molscript/).
DATF is accessible online and allows users to browse by family or chromosome. An introduction to each family is available by clicking on the family name. Users can search DATF by AGI locus ID or by using the advanced search page where they can specify a combination of several search criteria or run BLAST searches against the sequences in DATF. Users are encouraged to submit new data to DATF online. Users can download all the raw data through the DATF website.
| DISCUSSION |
|---|
|
|
|---|
The goal of DATF is to be comprehensive in both the collection of Arabidopsis transcription factors and information about each transcription factor. Three families in DATF (C2H2, LIM and PHD) may have members that contain DNA-binding domains but it is uncertain whether they play a direct role in transcription regulation. Users can apply their own judgment regarding these families.
A survey of the existing databases of Arabidopsis transcription factors shows that the total number of transcription factors included in these databases varies from 1400 to 2000. There are at least two reasons for this seemingly large difference. First, the databases may define transcription factors differentlysome include general transcription factors, such as TBP and chromatin-related proteins, such as SWI/SNF family proteins, whereas others do not. Second, the methods and cutoffs used in predicting transcription factors are often differentsome are stricter than others. Depending on how the data will be used, either high sensitivity or high specificity may be more desirable. In DATF, we define transcription factors as proteins that show sequence-specific DNA binding and are capable of activating and/or repressing transcription, excluding general transcription factors or chromatin-related proteins. We combined automated search and manual curation instead of relying on any one single method or cutoff, which we believe improve the quality of the database.
The DATF website has been accessed over 1.2 million times between May 2004 when the web site went online and March, 2005 when the manuscript went into print. We have also received user comments and gene submissions from several countries. We will update DATF regularly with new data, new analysis results and user submissions.
| Acknowledgments |
|---|
We thank colleagues of NSFC project (30221120261) headed by Drs Xing Wang Deng and Yuxian Zhu for providing us with the Clone Information, Shuqi Zhao and Dr John Gordon Olyarchuk for helpful assistance and comments, and the anonymous reviewers for the valuable suggestions. This work was supported by the China National Key Basic Research Program (973, 2003CB715900), the China National High-tech Program (863, 2004AA231020) and the Natural Science Foundation of China (NSFC, 30170232, 90408015).
| Footnotes |
|---|
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.
Received on September 2, 2004; revised on December 25, 2004; accepted on February 16, 2005
| REFERENCES |
|---|
|
|
|---|
Altschul, S.F., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 34193402.
Bailey, P.C., et al. (2003) Update on the basic helix-loop-helix transcription factor gene family in Arabidopsis thaliana. Plant Cell, 15, 24972502
Bornberg-Bauer, E., et al. (1998) Computational approaches to identify leucine zippers. In . Nucleic Acids Res, 26, 27402746
Cokol, M., et al. (2000) Finding nuclear localization signals. EMBO Rep, 1, 411415[CrossRef][Web of Science][Medline].
Davuluri, R.V., et al. (2003) AGRIS: Arabidopsis gene regulatory information server, an information resource of Arabidopsis cis-regulatory elements and transcription factors. BMC Bioinformatics, 4, 25[CrossRef][Medline].
Eddy, S.R. (1998) Profile hidden Markov models. Bioinformatics, 14, 755763
Esnouf, R.M. (1997) An extensively modified version of MolScript that includes greatly enhanced coloring capabilities. J. Mol. Graph. Model, 15, 132134[CrossRef][Web of Science][Medline].
Gattiker, A., et al. (2002) ScanProsite: a reference implementation of a PROSITE scanning tool. Appl. Bioinformatics, 1, 107108[Medline].
Gong, W., et al. (2004) Genome-wide ORFeome cloning and analysis of Arabidopsis transcription factor genes. Plant Physiol., 135, 773782
Hermoso, A., et al. (2004) TrSDB: a proteome database of transcription factors. Nucleic Acids Res., 32, D171D173
Matys, V., et al. (2003) TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res., 31, 374378
Parenicova, L., et al. (2003) Molecular and phylogenetic analyses of the complete MADS-box transcription factor family in Arabidopsis: new openings to the MADS world. Plant Cell, 15, 15381551
Riechmann, J.L., et al. (2000) Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science, 290, 21052110
Riechmann, J.L. (2002) Transcriptional regulation: a genomic overview. In Somerville, C.R. and Meyerowitz, E.M. (Eds.). The Arabidopsis Book, , Rockville, MD American Society of Plant Biologists, pp. 146.
Zdobnov, E.M. and Apweiler, R. (2001) InterProScanan integration platform for the signature-recognition methods in InterPro. Bioinformatics, 17, 847848
This article has been cited by other articles:
![]() |
K. Mochida, T. Yoshida, T. Sakurai, K. Yamaguchi-Shinozaki, K. Shinozaki, and L.-S. P. Tran In silico Analysis of Transcription Factor Repertoire and Prediction of Stress Responsive Transcription Factors in Soybean DNA Res, November 2, 2009; (2009) dsp023v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. D. Marks, L. Tian, J. P. Wenger, S. N. Omburo, W. Soto-Fuentes, J. He, D. R. Gang, G. D. Weiblen, and R. A. Dixon Identification of candidate genes affecting {Delta}9-tetrahydrocannabinol biosynthesis in Cannabis sativa J. Exp. Bot., September 1, 2009; 60(13): 3715 - 3726. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Spensley, J.-Y. Kim, E. Picot, J. Reid, S. Ott, C. Helliwell, and I. A. Carre Evolutionarily Conserved Regulatory Motifs in the Promoter of the Arabidopsis Clock Gene LATE ELONGATED HYPOCOTYL PLANT CELL, September 1, 2009; 21(9): 2606 - 2623. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Mitsuda and M. Ohme-Takagi Functional Analysis of Transcription Factors in Arabidopsis Plant Cell Physiol., July 1, 2009; 50(7): 1232 - 1248. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Mounet, A. Moing, V. Garcia, J. Petit, M. Maucourt, C. Deborde, S. Bernillon, G. Le Gall, I. Colquhoun, M. Defernez, et al. Gene and Metabolite Regulatory Network Analysis of Early Developing Fruit Tissues Highlights New Candidate Genes for the Control of Tomato Fruit Composition and Development Plant Physiology, March 1, 2009; 149(3): 1505 - 1528. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Gray, M. Bevan, T. Brutnell, C. R. Buell, K. Cone, S. Hake, D. Jackson, E. Kellogg, C. Lawrence, S. McCouch, et al. A Recommendation for Naming Transcription Factor Proteins in the Grasses Plant Physiology, January 1, 2009; 149(1): 4 - 6. [Full Text] [PDF] |
||||
![]() |
T. Obayashi, S. Hayashi, M. Saeki, H. Ohta, and K. Kinoshita ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic Acids Res., January 1, 2009; 37(suppl_1): D987 - D991. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. C. Day, R. P. Herridge, B. A. Ambrose, and R. C. Macknight Transcriptome Analysis of Proliferating Arabidopsis Endosperm Reveals Biological Implications for the Control of Syncytial Division, Cytokinin Signaling, and Gene Expression Regulation Plant Physiology, December 1, 2008; 148(4): 1964 - 1984. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Xiang, N. Tang, H. Du, H. Ye, and L. Xiong Characterization of OsbZIP23 as a Key Player of the Basic Leucine Zipper Transcription Factor Family for Conferring Abscisic Acid Sensitivity and Salinity and Drought Tolerance in Rice Plant Physiology, December 1, 2008; 148(4): 1938 - 1952. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Jiao, J. L. Riechmann, and E. M. Meyerowitz Transcriptome-Wide Analysis of Uncapped mRNAs in Arabidopsis Reveals Regulation of mRNA Degradation PLANT CELL, October 1, 2008; 20(10): 2571 - 2585. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Wingender The TRANSFAC project as an example of framework technology that supports the analysis of genomic regulation Brief Bioinform, July 1, 2008; 9(4): 326 - 332. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Ramamoorthy, S.-Y. Jiang, N. Kumar, P. N. Venkatesh, and S. Ramachandran A Comprehensive Transcriptional Profiling of the WRKY Gene Family in Rice Under Various Abiotic and Phytohormone Treatments Plant Cell Physiol., June 1, 2008; 49(6): 865 - 879. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Cai, C.-H. Li, A.-S. Xiong, R.-H. Peng, J. Zhou, F. Gao, Z. Zhang, and Q.-H. Yao DGTF: A Database of Grape Transcription Factors J. Amer. Soc. Hort. Sci., May 1, 2008; 133(3): 459 - 461. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. J. Rushton, M. T. Bokowiec, S. Han, H. Zhang, J. F. Brannock, X. Chen, T. W. Laudeman, and M. P. Timko Tobacco Transcription Factors: Novel Insights into Transcriptional Regulation in the Solanaceae Plant Physiology, May 1, 2008; 147(1): 280 - 295. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. A. Gutierrez, T. L. Stokes, K. Thum, X. Xu, M. Obertello, M. S. Katari, M. Tanurdzic, A. Dean, D. C. Nero, C. R. McClung, et al. Systems approach identifies an organic nitrogen-responsive gene network that is regulated by the master clock control gene CCA1 PNAS, March 25, 2008; 105(12): 4939 - 4944. [Abstract] [Full Text] [PDF] |
||||
![]() |
A.-Y. Guo, X. Chen, G. Gao, H. Zhang, Q.-H. Zhu, X.-C. Liu, Y.-F. Zhong, X. Gu, K. He, and J. Luo PlantTFDB: a comprehensive plant transcription factor database Nucleic Acids Res., January 11, 2008; 36(suppl_1): D966 - D969. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. G. Conte, S. Gaillard, N. Lanau, M. Rouard, and C. Perin GreenPhylDB: a database for plant comparative genomics Nucleic Acids Res., January 11, 2008; 36(suppl_1): D991 - D998. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Gong, K. He, M. Covington, S. P. Dinesh-Kumar, M. Snyder, S. L. Harmer, Y.-X. Zhu, and X. W. Deng The Development of Protein Microarrays and Their Applications in DNA-Protein and Protein-Protein Interaction Analyses of Arabidopsis Transcription Factors Mol Plant, January 1, 2008; 1(1): 27 - 41. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Manabe, N. Tinker, A. Colville, and B. Miki CSR1, the Sole Target of Imidazolinone Herbicide in Arabidopsis thaliana Plant Cell Physiol., September 1, 2007; 48(9): 1340 - 1358. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Dai, J. He, and X. Zhao A new systematic computational approach to predicting target genes of transcription factors Nucleic Acids Res., July 26, 2007; 35(13): 4433 - 4440. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. K. Udvardi, K. Kakar, M. Wandrey, O. Montanari, J. Murray, A. Andriankaja, J.-Y. Zhang, V. Benedito, J. M.I. Hofer, F. Chueng, et al. Legume Transcription Factors: Global Regulators of Plant Development and Response to the Environment Plant Physiology, June 1, 2007; 144(2): 538 - 549. [Full Text] [PDF] |
||||
![]() |
M. Tesfaye, J. Liu, D. L. Allan, and C. P. Vance Genomic and Genetic Control of Phosphate Stress in Legumes Plant Physiology, June 1, 2007; 144(2): 594 - 603. [Full Text] [PDF] |
||||
![]() |
Q.-H. Zhu, A.-Y. Guo, G. Gao, Y.-F. Zhong, M. Xu, M. Huang, and J. Luo DPTF: a database of poplar transcription factors Bioinformatics, May 15, 2007; 23(10): 1307 - 1308. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Freeling, L. Rapaka, E. Lyons, B. Pedersen, and B. C. Thomas G-Boxes, Bigfoot Genes, and Environmental Response: Characterization of Intragenomic Conserved Noncoding Sequences in Arabidopsis PLANT CELL, May 1, 2007; 19(5): 1441 - 1457. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Y. Hirai, K. Sugiyama, Y. Sawada, T. Tohge, T. Obayashi, A. Suzuki, R. Araki, N. Sakurai, H. Suzuki, K. Aoki, et al. Omics-based identification of Arabidopsis Myb transcription factors regulating aliphatic glucosinolate biosynthesis PNAS, April 10, 2007; 104(15): 6478 - 6483. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Richardt, D. Lang, R. Reski, W. Frank, and S. A. Rensing PlanTAPDB, a Phylogeny-Based Resource of Plant Transcription-Associated Proteins Plant Physiology, April 1, 2007; 143(4): 1452 - 1466. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Lee, K. He, V. Stolc, H. Lee, P. Figueroa, Y. Gao, W. Tongprasit, H. Zhao, I. Lee, and X. W. Deng Analysis of Transcription Factor HY5 Genomic Binding Sites Revealed Its Hierarchical Role in Light Regulation of Development PLANT CELL, March 1, 2007; 19(3): 731 - 749. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Gao, Y. Zhong, A. Guo, Q. Zhu, W. Tang, W. Zheng, X. Gu, L. Wei, and J. Luo DRTF: a database of rice transcription factors Bioinformatics, May 15, 2006; 22(10): 1286 - 1287. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-Y. Lee, J. Colinas, J. Y. Wang, D. Mace, U. Ohler, and P. N. Benfey Transcriptional and posttranscriptional regulation of transcription factor expression in Arabidopsis roots PNAS, April 11, 2006; 103(15): 6055 - 6060. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. K. Palaniswamy, S. James, H. Sun, R. S. Lamb, R. V. Davuluri, and E. Grotewold AGRIS and AtRegNet. A Platform to Link cis-Regulatory Elements and Transcription Factors into Regulatory Networks Plant Physiology, March 1, 2006; 140(3): 818 - 829. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Matys, O. V. Kel-Margoulis, E. Fricke, I. Liebich, S. Land, A. Barre-Dirrie, I. Reuter, D. Chekmenev, M. Krull, K. Hornischer, et al. TRANSFAC(R) and its module TRANSCompel(R): transcriptional gene regulation in eukaryotes Nucleic Acids Res., January 1, 2006; 34(suppl_1): D108 - D110. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||










