Modelling the transcription factor DNA-binding affinity using genome-wide ChIP-based data release_mhc6zivdpnd47aispq5csa3zg4

by Monther Alhamdoosh, Dianhui Wang

Released as a post by Cold Spring Harbor Laboratory.

2016  

Abstract

Understanding protein-DNA binding affinity is still a mystery for many transcription factors (TFs). Although several approaches have been proposed in the literature to model the DNA-binding specificity of TFs, they still have some limitations. Most of the methods require a cut-off threshold in order to classify a K-mer as a binding site (BS) and finding such a threshold is usually done by handcraft rather than a science. Some other approaches use a prior knowledge on the biological context of regulatory elements in the genome along with machine learning algorithms to build classifier models for TFBSs. Noticeably, these methods deliberately select the training and testing datasets so that they are very separable. Hence, the current methods do not actually capture the TF-DNA binding relationship. In this paper, we present a threshold-free framework based on a novel ensemble learning algorithm in order to locate TFBSs in DNA sequences. Our proposed approach creates TF-specific classifier models using genome-wide DNA-binding experiments and a prior biological knowledge on DNA sequences and TF binding preferences. Systematic background filtering algorithms are utilized to remove non-functional K-mers from training and testing datasets. To reduce the complexity of classifier models, a fast feature selection algorithm is employed. Finally, the created classifier models are used to scan new DNA sequences and identify potential binding sites. The analysis results show that our proposed approach is able to identify novel binding sites in the Saccharomyces cerevisiae genome.
In application/xml+jats format

Archived Files and Locations

application/pdf   1.1 MB
file_y3ptido2vvggpc65sl22rnle7a
www.biorxiv.org (web)
web.archive.org (webarchive)
application/pdf   1.1 MB
file_wp334y7qivbhlmgtg3uk66b6xu
web.archive.org (webarchive)
www.biorxiv.org (web)
Read Archived PDF
Preserved and Accessible
Type  post
Stage   unknown
Date   2016-07-04
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 02e9f68c-1ccb-4b34-9a3e-94f2982e0cbd
API URL: JSON