Abstract

Deep learning models are typically very powerful predictors but also tend to learn latent biases present in training data. Using genomics as a case study, I will show how one can use interpretation frameworks to identify latent confounders, technical biases and artifacts present in experimental data that is learned by models trained naively on these data. I will also show how we can design models that can learn extremely accurate models of these technical biases and automatically correct them resulting in models that have high predictive performance, provide near optimal latent bias correction and the ability to reveal novel insights into fundamental mechanisms of gene regulation. And just for kicks ... our models are also very efficient and robust and we outperform state-of-the-art foundation models.

Video Recording