![Data-Driven Decision Processes_hi-res logo](/sites/default/files/styles/workshop_banner_sm_1x/public/2023-03/Data-Driven%20Decision%20Processes_hi-res.png.jpg?itok=WR1qG1Qx)
Abstract
We devise an online learning algorithm that adapts to the data and achieves regret that is simultaneously competitive against low-regret algorithms under stochastic and adversarial inputs. Our algorithm – titled Switching via Monotone Adapted Regret Traces (SMART) – combines the follow-the-leader (FTL) policy (which enjoys low regret for stochastic instances), with any ‘worst-case’ policy that guarantees an upper bound on regret over all instances. We show that the regret of the SMART policy on *any* input sequence is within a multiplicative factor e/(e − 1) = 1.58 of the smaller of: 1) the regret obtained by FTL on the sequence, and 2) the upper bound on regret guaranteed by the given worst-case policy. We discuss extensions, including combining FTL with a “small-loss” algorithm and combining multiple algorithms. Our work thus provides a fundamentally new, and much simpler, approach to achieving ‘best-of-both-worlds’ bounds. We complement this by establishing a fundamental lower bound that shows that over all input sequences, no algorithm can get better than a 1.43-fraction of the lower of the regrets achieved by FTL and the minimax-optimal policy.