Abstract

We devise an online learning algorithm that adapts to the data and achieves regret that is simultaneously competitive against low-regret algorithms under stochastic and adversarial inputs. Our algorithm – titled Switching via Monotone Adapted Regret Traces (SMART) – combines the follow-the-leader (FTL) policy (which enjoys low regret for stochastic instances), with any ‘worst-case’ policy that guarantees an upper bound on regret over all instances. We show that the regret of the SMART policy on *any* input sequence is within a multiplicative factor e/(e − 1) = 1.58 of the smaller of: 1) the regret obtained by FTL on the sequence, and 2) the upper bound on regret guaranteed by the given worst-case policy. We discuss extensions, including combining FTL with a “small-loss” algorithm and combining multiple algorithms. Our work thus provides a fundamentally new, and much simpler, approach to achieving ‘best-of-both-worlds’ bounds. We complement this by establishing a fundamental lower bound that shows that over all input sequences, no algorithm can get better than a 1.43-fraction of the lower of the regrets achieved by FTL and the minimax-optimal policy.