Abstract

The steady growth in specialized accelerators has led to the emergence of a wide number of domain-specific languages (DSLs). Their restricted, high-level nature enables compilers to generate efficient code for rapidly evolving heterogeneous hardware. However, rewriting existing code to exploit DSL compiler performance, is an onerous programmer task. This has led to recent interest in automatically lifting code to DSLs. While language models have proved remarkably successful at related translation tasks, they are prone to hallucinations. Alternative program synthesis approaches are accurate, but are unable to scale to complex DSLs.

In this talk, I will present two approaches which combine language models and program synthesis for lifting. The first uses a large language model to generate a probabilistic context-free grammar representing the space of likely possible solutions, and uses enumerative synthesis to explore this space. The second uses a small language model to guess an initial solution, and then uses a measurement oracle to estimate how far away the guess is from a valid answer, and guide a search through a space of edit rules. We apply both to lifting legacy code to tensor DSLs and demonstrate speed-ups of up to 38x over the unlifted code.

Video Recording