Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Middendorf, M.
Right arrow Articles by Leslie, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Middendorf, M.
Right arrow Articles by Leslie, C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics 20(Suppl. 1) © Oxford University Press 2004; all rights reserved.

Predicting genetic regulatory response using classification

Manuel Middendorf 1, Anshul Kundaje 2, Chris Wiggins 3,4, Yoav Freund 2,4,5 and Christina Leslie 2,4,5,*

1 Department of Physics, 2 Department of Computer Science, 3 Department of Applied Mathematics, 4 Center for Computational Biology and Bioinformatics and 5 Center for Computational Learning Systems, Columbia University, NY 10027, USA

Received on January 15, 2004; accepted on March 1, 2004

Motivation: Studying gene regulatory mechanisms in simple model organisms through analysis of high-throughput genomic data has emerged as a central problem in computational biology. Most approaches in the literature have focused either on finding a few strong regulatory patterns or on learning descriptive models from training data. However, these approaches are not yet adequate for making accurate predictions about which genes will be up- or down-regulated in new or held-out experiments. By introducing a predictive methodology for this problem, we can use powerful tools from machine learning and assess the statistical significance of our predictions.

Results: We present a novel classification-based method for learning to predict gene regulatory response. Our approach is motivated by the hypothesis that in simple organisms such as Saccharomyces cerevisiae, we can learn a decision rule for predicting whether a gene is up- or down-regulated in a particular experiment based on (1) the presence of binding site subsequences (‘motifs’) in the gene's regulatory region and (2) the expression levels of regulators such as transcription factors in the experiment (‘parents’). Thus, our learning task integrates two qualitatively different data sources: genome-wide cDNA microarray data across multiple perturbation and mutant experiments along with motif profile data from regulatory sequences. We convert the regression task of predicting real-valued gene expression measurements to a classification task of predicting +1 and –1 labels, corresponding to up- and down-regulation beyond the levels of biological and measurement noise in microarray measurements. The learning algorithm employed is boosting with a margin-based generalization of decision trees, alternating decision trees. This large-margin classifier is sufficiently flexible to allow complex logical functions, yet sufficiently simple to give insight into the combinatorial mechanisms of gene regulation. We observe encouraging prediction accuracy on experiments based on the Gasch S.cerevisiae dataset, and we show that we can accurately predict up- and down-regulation on held-out experiments. We also show how to extract significant regulators, motifs and motif-regulator pairs from the learned models for various stress responses. Our method thus provides predictive hypotheses, suggests biological experiments, and provides interpretable insight into the structure of genetic regulatory networks.

Availability: The MLJava package is available upon request to the authors.

Supplementary: Additional results are available from http://www.cs.columbia.edu/compbio/geneclass

Contact: cleslie{at}cs.columbia.edu

* To whom correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Brief BioinformHome page
P. Larranaga, B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza, J. A. Lozano, R. Armananzas, G. Santafe, A. Perez, et al.
Machine learning in bioinformatics
Brief Bioinform, March 1, 2006; 7(1): 86 - 112.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
P. H. Lee and D. Lee
Modularized learning of genetic interaction networks from biological annotations and mRNA expression data
Bioinformatics, June 1, 2005; 21(11): 2739 - 2747.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.