Abstract

Recent technological advancements allow the measurement of protein binding to thousands of DNA or RNA probes on a single microarray. Since the space on the array is limited, the challenge is how to efficiently generate a minimum-size set of sequences that together cover all k-mers. In this talk, we will first introduce de Bruijn graphs and their applications in efficient coverage of DNA k-mers. Then, we will describe a generalization of the problem, in which the sequences are required to obtain certain properties (e.g., unstructured RNAs). We will prove that in this formalization the problem is NP-hard and give a (infeasible) approximation algorithm.​ We will present a heuristic based on random walks in de Bruijn graphs which works well in practice.​ If time allows, we shall discuss questions arising in analysis of novel high throughput in vitro methods ​for motif discovery.

Video Recording