Skip Navigation


Bioinformatics Advance Access originally published online on February 22, 2008
Bioinformatics 2008 24(5):629-636; doi:10.1093/bioinformatics/btn009
This Article
Right arrow Full Text
Right arrow Full Text (Print PDF)
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/5/629    most recent
btn009v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Li, L.
Right arrow Articles by Liang, Y.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Li, L.
Right arrow Articles by Liang, Y.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

fdrMotif: identifying cis-elements by an EM algorithm coupled with false discovery rate control

Leping Li 1,*, Robert L. Bass 2 and Yu Liang 1

1Biostatistics Branch and 2Computational Biology Facility, National Institute of Environmental Health Sciences, NIH, DHHS, Research Triangle Park, NC 27709, USA

*To whom correspondence should be addressed.


   Abstract

Motivation: Most de novo motif identification methods optimize the motif model first and then separately test the statistical significance of the motif score. In the first stage, a motif abundance parameter needs to be specified or modeled. In the second stage, a Z-score or P-value is used as the test statistic. Error rates under multiple comparisons are not fully considered.

Methodology: We propose a simple but novel approach, fdrMotif, that selects as many binding sites as possible while controlling a user-specified false discovery rate (FDR). Unlike existing iterative methods, fdrMotif combines model optimization [e.g. position weight matrix (PWM)] and significance testing at each step. By monitoring the proportion of binding sites selected in many sets of background sequences, fdrMotif controls the FDR in the original data. The model is then updated using an expectation (E)- and maximization (M)-like procedure. We propose a new normalization procedure in the E-step for updating the model. This process is repeated until either the model converges or the number of iterations exceeds a maximum.

Results: Simulation studies suggest that our normalization procedure assigns larger weights to the binding sites than do two other commonly used normalization procedures. Furthermore, fdrMotif requires only a user-specified FDR and an initial PWM. When tested on 542 high confidence experimental p53 binding loci, fdrMotif identified 569 p53 binding sites in 505 (93.2%) sequences. In comparison, MEME identified more binding sites but in fewer ChIP sequences than fdrMotif. When tested on 500 sets of simulated ‘ChIP’ sequences with embedded known p53 binding sites, fdrMotif, compared to MEME, has higher sensitivity with similar positive predictive value. Furthermore, fdrMotif is robust to noise: it selected nearly identical binding sites in data adulterated with 50% added background sequences and the unadulterated data. We suggest that fdrMotif represents an improvement over MEME.

Availability: C code can be found at: http://www.niehs.nih.gov/research/resources/software/fdrMotif/

Contact: li3{at}niehs.nih.gov

Supplementary information: Supplementary data are available at http://www.niehs.nih.gov/research/resources/software/fdrMotif/

Associate Editor: Dmitrij Frishman


Received on June 29, 2007; revised on January 3, 2008; accepted on January 6, 2008

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.