Abstract

Building on the predictability, computability and stability (PCS) framework and for randomised experiments, we introduce a novel methodology for Stable Discovery of Interpretable Subgroups via Calibration (StaDISC), with large heterogeneous treatment effects. StaDISC was developed during our re-analysis of the 1999–2000 VIGOR study. Vioxx was found to, on average and in comparison with naproxen, reduce the risk of gastrointestinal events but increase the risk of thrombotic cardiovascular events. Applying StaDISC, we fit 18 popular conditional average treatment effect (CATE) estimators for both outcomes and use calibration to demonstrate their poor global performance. However, they are locally well-calibrated and stable, enabling the identification of patient groups with larger than (estimated) average treatment effects. In fact, StaDISC discovers three clinically interpretable subgroups each for the gastrointestinal outcome (totalling 29.4% of the study size) and the thrombotic cardiovascular outcome (totalling 11.0%). Complementary analyses of the found subgroups using the 2001–2004 APPROVe study, a separate independently conducted randomised controlled trial with 2587 patients, provide further supporting evidence for the promise of StaDISC.