Abstract
We consider the problem of counterfactual inference in sequentially designed experiments wherein a collection of units undergo a sequence of interventions based on policies adaptive over time, and outcomes are observed based on the assigned interventions. Our goal is counterfactual inference, i.e., estimate what would have happened if alternate policies were used, a problem that is inherently challenging due to the heterogeneity in the outcomes across users and time. In this work, we identify structural assumptions that allow us to impute the missing potential outcomes in sequential experiments, where the policy is allowed to adapt simultaneously to all users' past data. We prove that under suitable assumptions on the latent factors and temporal dynamics, a variant of the nearest neighbor strategy allows us to impute the missing information using the observed outcome across time and users. Under mild assumptions on the adaptive policy and the underlying latent factor model, we prove that using data till time t for N users in the study, our estimate for the missing potential outcome at time t+1 admits a mean squared-error that scales as t^{-1/2+\delta} + N^{-1+\delta} for any \delta>0, for any fixed user. We also provide an asymptotic confidence interval for each outcome under suitable growth conditions on N and t, which can then be used to build confidence intervals for individual treatment effects. Our work extends the recent literature on inference with adaptively collected data by allowing for policies that pool across users, the matrix completion literature for missing at random settings by allowing for adaptive sampling mechanisms, and missing data problems in multivariate time series by allowing for a generic non-parametric model.