Skip Navigation


Bioinformatics Advance Access originally published online on July 16, 2009
Bioinformatics 2009 25(19):2478-2485; doi:10.1093/bioinformatics/btp435
This Article
Right arrow Full Text
Right arrow Full Text (Print PDF)
Right arrow Supplementary Data
Right arrow All Versions of this Article:
25/19/2478    most recent
btp435v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Miller, D. J.
Right arrow Articles by Wang, Y.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Miller, D. J.
Right arrow Articles by Wang, Y.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2009. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions

David J. Miller 1,*, Yanxin Zhang 1, Guoqiang Yu 2, Yongmei Liu 3, Li Chen 2, Carl D. Langefeld 4, David Herrington 5 and Yue Wang 2

1Department of Electrical Engineering, The Pennsylvania State University, 2Department of Electrical and Computer Engineering, The Virginia Polytechnic Institute and State University, 3Department of Internal Medicine, 4Division of Public Health Sciences, Department of Biostatistical Sciences and 5Division of Public Health Sciences, Department of Epidemiology and Prevention, Wake Forest University

*To whom correspondence should be addressed.


   Abstract

Motivation: In both genome-wide association studies (GWAS) and pathway analysis, the modest sample size relative to the number of genetic markers presents formidable computational, statistical and methodological challenges for accurately identifying markers/interactions and for building phenotype-predictive models.

Results: We address these objectives via maximum entropy conditional probability modeling (MECPM), coupled with a novel model structure search. Unlike neural networks and support vector machines (SVMs), MECPM makes explicit and is determined by the interactions that confer phenotype-predictive power. Our method identifies both a marker subset and the multiple k-way interactions between these markers. Additional key aspects are: (i) evaluation of a select subset of up to five-way interactions while retaining relatively low complexity; (ii) flexible single nucleotide polymorphism (SNP) coding (dominant, recessive) within each interaction; (iii) no mathematical interaction form assumed; (iv) model structure and order selection based on the Bayesian Information Criterion, which fairly compares interactions at different orders and automatically sets the experiment-wide significance level; (v) MECPM directly yields a phenotype-predictive model. MECPM was compared with a panel of methods on datasets with up to 1000 SNPs and up to eight embedded penetrance function (i.e. ground-truth) interactions, including a five-way, involving less than 20 SNPs. MECPM achieved improved sensitivity and specificity for detecting both ground-truth markers and interactions, compared with previous methods.

Availability: http://www.cbil.ece.vt.edu/ResearchOngoingSNP.htm

Contact: djmiller{at}engr.psu.edu

Supplementary information:Supplementary data are available at Bioinformatics online.

Associate Editor: Alfonso Valencia


Received on April 3, 2009; revised on June 24, 2009; accepted on July 13, 2009

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.