Abstract

Given an alphabet  and integers k and L, a universal hitting set is a set of k-mers so that every possible L-long string over the alphabet must contain as a substring a k-mer from the set. We provide complexity analysis and efficient algorithms to construct small universal hitting sets. We also explore the relationship of such sets to minimizers, which have been adopted in many deep-sequencing algorithms, and show the advantage of universal hitting sets.
Joint work with Yaron Orenstein, David Pellow, Guillaume Marcais and Carl Kingsford