Abstract

I will discuss a unifying statistical formulation for many fundamental problems in genome science and develop a reference-free, highly efficient algorithm that solves it.  This formulation allows us to construct an algorithm that performs inference on raw reads, avoiding references completely. We illustrate the power of our approach for new data-driven biological discovery with examples of novel single-cell resolved, cell-type-specific isoform expression, including splicing, expression in the major histocompatibility complex, and de novo prediction of viral protein adaptation including in SARS-CoV-2.

Video Recording