Abstract

We hypothesized that transcription factors (TFs) can recognize DNA shape without nucleotide sequence recognition. Motivating an independent role for shape, many TF binding sites lack a sequence-motif, DNA shape adds specificity to sequence-motifs, and different sequences can encode similar shapes. We therefore asked if binding sites of any TFs are enriched for a specific pattern of DNA shape features, such as minor groove width, propeller twist, helical twist, or roll. To discover these shape-motifs de novo, we developed ShapeMF, which performs Gibbs sampling directly on shape features rather than nucleotide sequences. Using ChIP-Seq data for 110 human ENCODE TFs, we find that most TFs have shape-motifs and strongly bind regulatory regions with shape-motifs in the absence of sequence-motifs. When shape- and sequence-recognition co-occur in a region, the two types of motifs can be overlapping, flanking, or separated by consistent spacing, with shape-motifs explaining low information content positions in and nearby sequence-motifs. Shape-motifs are prevalent in regions co-bound by multiple TFs. They also explain binding of the co-factors MYC-MAX and TBX5-NKX2-5, which cannot be accounted for with sequence-motifs. Finally, shape-motifs are strikingly different across TFs with nearly identical sequence motifs, providing an explanation for their distinct binding locations. These results establish shape-motifs as drivers of TF-DNA recognition complementary to sequence-motifs.