Abstract
We describe a large-scale RCT in collaboration with the Taipei Department of Education for ~1000 high school students obtaining Python certification. All students had access to a LLM-powered AI tutor, and had to solve a required number of weekly practice problems. Half the students were randomized to a fixed practice sequence, while half were randomized to a personalized practice sequence using a POMDP framework to infer the student's mastery before moving on to more difficult problems. Existing POMDP formulations have limited visibility into student progress, hindering their ability to effectively provide personalized support. We show that student interactions on the practice platform provide a powerful view into their mastery, so we can significantly improve performance by using LLM-extracted features from platform interactions (e.g., meaningful code edits) as the POMDP observations. At the end of the semester, all students took a written exam with no AI assistance to receive certification. Students in the personalized arm performed 0.15 SD better, equivalent to several months of additional schooling.