Abstract
Machine learning has demonstrated great promise in helping to accelerate scientific discovery and decision-making. However, machine-learning predictions contain errors. How can we make trustworthy scientific discoveries and decisions in spite of prediction error? This talk will share a line of work motivated by this question. The first part of the talk will focus on prediction-powered inference, a novel framework for performing valid statistical inference when a gold-standard data set is supplemented with predictions from a machine-learning system, without making any assumptions about the system. Prediction-powered inference may enable scientists to draw valid conclusions in a more data-efficient way, as we demonstrate with applications in proteomics, genomics, and astronomy. In the second part of the talk, I will touch on our recent efforts building upon these ideas to make reliable machine learning-guided decisions in biological sequence design, and beyond.