Abstract

We present novel deep learning frameworks capable of learning jointly from raw DNA sequence and diverse functional genomic profiling experiments to learn fundamental predictive relationships between regulatory sequence, chromatin architecture, chromatin state and transcription factor binding. Recently, the ATAC-seq assay was developed to simultaneously profile chromatin accessibility and architecture of regulatory elements from low input samples based on direct in vitro transposition of sequencing adaptors into native chromatin. We train multi-task, multi-modal deep convolutional neural networks (CNNs) on a novel 2D representation of ATAC-seq data that leverages subtle patterns in insert-size distributions to simultaneously predict multiple histone modifications, combinatorial chromatin state and binding sites of a key insulator protein (CTCF) with high accuracy. Models trained on related assays such as DNase-seq and MNase-seq data also achieve high performance genome-wide and across cell-types supporting a fundamental predictive mapping between local chromatin architecture and chromatin state. We develop novel feature importance scores and visualization methods to extract biologically meaningful predictive patterns from deep neural networks. We further present new deep hybrid architectures consisting of convolutional and recurrent layers to predict in-vivo transcription factor binding events and learn regulatory sequence grammars from raw DNA sequence and chromatin accessibility profiles across cell types and tissues. Our methods potentially enable detailed characterization of context-specific regulatory landscapes from low input samples of rare cell types using a single assay.

Video Recording