Skip Navigation


Bioinformatics Advance Access originally published online on May 27, 2004
Bioinformatics 2004 20(16):2799-2811; doi:10.1093/bioinformatics/bth333
This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow All Versions of this Article:
20/16/2799    most recent
bth333v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (14)
Right arrowRequest Permissions
Citing Articles
Right arrowScopus Links
Google Scholar
Right arrow Articles by Keles, S.
Right arrow Articles by Vulpe, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Keles, S.
Right arrow Articles by Vulpe, C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics vol. 20 issue 16 © Oxford University Press 2004; all rights reserved.

Regulatory motif finding by logic regression

Sündüz Keles 1,*, Mark J. van der Laan 1 and Chris Vulpe 2

1 Division of Biostatistics and 2 Nutritional Sciences & Toxicology, University of California, Berkeley, CA 94720, USA

Received on November 7, 2003; revised on March 29, 2004; accepted on May 22, 2004
Advance Access Publication May 27, 2004

Motivation: Multiple transcription factors coordinately control transcriptional regulation of genes in eukaryotes. Although many computational methods consider the identification of individual transcription factor binding sites (TFBSs), very few focus on the interactions between these sites. We consider finding TFBSs and their context specific interactions using microarray gene expression data. We devise a hybrid approach called LogicMotif composed of a TFBS identification method combined with the new regression methodology logic regression. LogicMotif has two steps: First, potential binding sites are identified from transcription control regions of genes of interest. Various available methods can be used in this step when the genes of interest can be divided into groups such as up-and downregulated. For this step, we also develop a simple univariate regression and extension method MFURE to extract candidate TFBSs from a large number of genes in the availability of microarray gene expression data. MFURE provides an alternative method for this step when partitioning of the genes into disjoint groups is not preferred. This first step aims to identify individual sites within gene groups of interest or sites that are correlated with the gene expression outcome. In the second step, logic regression is used to build a predictive model of outcome of interest (either gene expression or up- and down-regulation) using these potential sites. This 2-fold approach creates a rich diverse set of potential binding sites in the first step and builds regression or classification models in the second step using logic regression that is particularly good at identifying complex interactions.

Results: LogicMotif is applied to two publicly available datasets. A genome-wide gene expression data set of Saccharomyces cerevisiae is used for validation. The regression models obtained are interpretable and the biological implications are in agreement with the known resuts. This analysis suggests that LogicMotif provides biologically more reasonable regression models than previous analysis of this dataset with standard linear regression methods. Another dataset of S.cerevisiae illustrates the use of LogicMotif in classification questions by building a model that discriminates between up- and down-regulated genes in iron copper deficiency. LogicMotif identifies an inductive and two repressor motifs in this dataset. The inductive motif matches the binding site of the transcription factor Aft1p that has a key role in regulation of the uptake process. One of the novel repressor sites is highly present in transcription control regions of FeS genes. This site could represent a TFBS for an unknown transcription factor involved in repression of genes encoding FeS proteins in iron deficiency. We establish the robustness of the method to the type of outcome variable used by considering both continuous and binary outcome variables for this dataset. Our results indicate that logic regression used in combination with cluster/group operating binding site identification methods or with our proposed method MFURE is a powerful and flexible alternative to linear regression based motif finding methods.

Availability: Source code for logic regression is freely available as a package of the R programming language by Ruczinski et al. (2003) and can be downloaded at http://bear.fhcrc.org/~ingor/logic/download/download.html. An R package for MFURE is available at http://www.stat.berkeley.edu/~sunduz/software.html.

Contact: sunduz{at}stat.berkeley.edu

* To whom correspondence should be addressed.

Present address: Department of Statistics, of Biostatistics, Medical Informatics, University of Wisconsin, K6/440 CSC, 600 Highland Avenue, Madison, WI 53792-4675, USA.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
K. D. Yokoyama, U. Ohler, and G. A. Wray
Measuring spatial preferences at fine-scale resolution identifies known and novel cis-regulatory element candidates and functional motif-pair relationships
Nucleic Acids Res., July 1, 2009; 37(13): e92 - e92.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
E. Kristiansson, M. Thorsen, M. J. Tamas, and O. Nerman
Evolutionary Forces Act on Promoter Length: Identification of Enriched Cis-Regulatory Elements
Mol. Biol. Evol., June 1, 2009; 26(6): 1299 - 1307.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. Noto and M. Craven
Learning probabilistic models of cis-regulatory modules that represent logical and spatial aspects
Bioinformatics, January 15, 2007; 23(2): e156 - e162.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
L. Li, A. S. L. Cheng, V. X. Jin, H. H. Paik, M. Fan, X. Li, W. Zhang, J. Robarge, C. Balch, R. V. Davuluri, et al.
A mixture model-based discriminate analysis for identifying ordered transcription factor binding site pairs in gene promoters directly regulated by estrogen receptor-{alpha}
Bioinformatics, September 15, 2006; 22(18): 2210 - 2216.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
W. Zhong, P. Zeng, P. Ma, J. S. Liu, and Y. Zhu
RSIR: regularized sliced inverse regression for motif discovery
Bioinformatics, November 15, 2005; 21(22): 4169 - 4175.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
D. Papatsenko and M. Levine
Gene Regulatory Networks Special Feature: Quantitative analysis of binding motifs mediating diverse spatial readouts of the Dorsal gradient in the Drosophila embryo
PNAS, April 5, 2005; 102(14): 4966 - 4971.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.