Abstract

The position weight matrix (PWM) model of binding site motifs of transcription factors specifies a multinomial distribution of sequences that has only one dominating seed sequence. To make the model more accurate, one can use several seeds and also utilize the fact that the transcription factors not only bind to DNA but also to each other, forming dimeric and higher order regulatory complexes. Moreover, the internal dependencies possibly present within the motif should be represented by the model.  The talk will describe developments in modeling and predicting binding motifs, using multiseed models that include mixtures of monomeric and dimeric PWMs, and are learned from large sequence sets.

Video Recording