Clinical trials are the ultimate and most expensive tools in the development of medical products. Their designs have changed little since the advent of randomization in the 1940s. Confirmatory trials are large and slow and take years to answer a single question. They are completely out of sync with modern biology that is changing our understanding of disease and its heterogeneity not yearly but weekly. Building a design is making a decision. Modern innovations are going back to the drawing board, borrowing from the theory of bandit problems and asking how we can design clinical trials with the explicit goal of delivering better therapy to patients. These new designs incorporate randomization but in a much more efficient and dynamic fashion. They address many questions of clinical and biological importance, including those related to disease heterogeneity. They lead to different approaches for handling and modeling control therapy. They focus on the future, predicting each patient's disease course and predicting each therapy's success in the present trial and in future trials. Evaluating the performance of such complicated designs requires simulation. I'll describe actual clinical trials that are incorporating decision-analytic concepts.

### Monday, September 19th, 2016

We consider the problem of A-B testing when the impact of a treatment is marred by a large number of covariates. This is the situation in a number of modern applications of A-B testing (such as in adTech and e-commerce) as well as in more traditional applications (such as clinical trials). Randomization can be highly inefficient in such settings, and thus we consider the problem of optimally allocating test subjects to either treatment with a view to maximizing the efficiency of our estimate of the treatment effect. Our main contribution is a tractable algorithm that is near optimal under a broad set of assumptions. Specifically, we provide an algorithm for this problem in the online setting where subjects arrive, and must be assigned, sequentially. We also characterize the value of optimization and show that it can be expected to grow large with the number of covariates. Finally, using a real-world impression dataset, we show that our algorithms can be expected to yield substantial improvements to efficiency in practice. Joint work with Nikhil Bhat (AirBnB) and Ciamac Moallemi (Columbia)

Suppose each of N men and N women picks a point in a metric space. A woman ranks the men in increasing order of their distance to her, closest first. Men rank similarly. Based on these ranking lists, the goal is to find a stable matching in the sense of Gale and Shapley. This problem naturally models several applications, we mention two: (i) Dating sites: each man/woman answers a set of questions with, say, binary (e.g., like/dislike) answers. Ranking lists are based on the Hamming distance between the binary vectors of answers, ties are broken at random. (ii) Ride hailing: Passengers and cabs are matched based on Euclidean distance (or, the time to pick up).

We analyze this model in discrete and continuous metric spaces. In the discrete case, our results utilize and contrast with those of Ashlagi et al on the uniqueness of stable matchings. In the continuous case, we consider “unbalanced Poisson matchings in R” and make contact with the work of Holroyd et al.

Joint work with Hossein Karkeh Abadi.

We prove that the classic policy-iteration method (Howard 1960), including the Simplex method (Dantzig 1947) with the most-negative-reduced-cost pivoting rule, is a strongly polynomial-time algorithm for solving the Markov decision problem (MDP) with any fixed discount factor. Furthermore, the computational complexity of the policy-iteration method (including the Simplex method) is superior to that of the only known strongly polynomial-time interior-point algorithm for solving this problem. The result is surprising since the Simplex method with the same pivoting rule was shown to be exponential for solving a general linear programming (LP) problem, the Simplex (or simple policy-iteration) method with the smallest-index pivoting rule was shown to be exponential for solving an MDP regardless of discount rates, and the policy-iteration method was recently shown to be exponential for solving a undiscounted MDP. We also extend the result to solving MDPs with sub-stochastic and transient state transition probability matrices.

In this talk, I will suggest the study of new models that interpolate between the stringent requirements on distributions for stochastic models and the excessive pessimism of worst-case models for discrete optimization problems. I will make the case for the study, present a few examples and parallel developments and end with a concrete model.

New York launched the largest bike-sharing system in North America in May 2013. We have worked with Citibike, using analytics and optimization to change how they manage the system. Huge rush-hour usage imbalances the system; we answer both of the questions: where should bikes be at the start of a rush hour and how can we mitigate the imbalances that develop? We will survey the analytics we have employed for the former question, where we developed an approach based on continuous-time Markov chains combined with integer programming models to compute daily stocking levels for the bikes, as well as methods employed for optimizing the capacity of the stations. For the latter, we will describe both heuristic methods and approximation algorithms that we have obtained for both rebalancing mid-rush hour, and overnight.

### Tuesday, September 20th, 2016

We consider the problem of predicting a label for an individual given the information about the person and her position within a network. Such a fusion of the two sources of information naturally arises in a variety of applications, including recommendation systems, ad placement, and personalized medical care. When formalizing this problem, one faces a computationally intractable combinatorial objective. We present an unexpected phenomenon: it is possible to develop poly-time (and statistically near-optimal) online prediction methods even when the offline problem is provably hard. These prediction methods arise in a systematic way from a new relaxation framework that has roots in the work of Cover and Blackwell. The new applications to learning on graphs are based in part on multiplicative approximation algorithms developed by the theoretical computer science community. Our approach naturally extends to the contextual multi-armed bandit setting with large sets of policies -- a notoriously difficult problem, which is often encountered in real-world applications. Joint work with K. Sridharan.

We will consider the following question: can we optimize decisions on models learned from data and achieve desirable outcomes? We will discuss this question in a framework we call optimization from samples: we are given samples of function values (model) and our goal is to (approximately) optimize the function (i.e. make a good decision). On the negative front we show that there are classes of functions that are heavily used in applications which are both (approximately) optimizable and statistically learnable, but cannot be optimized within any reasonable approximation guarantee when sampled from data. On a positive front, we show natural conditions under which optimization from samples is achievable. This is joint work with Eric Balkanski and Aviad Rubinstein.

We will consider distributed partial clustering algorithms including those for uncertain input.

Online $d$-dimensional vector packing models many settings such as minimizing resources in data centers where jobs have multiple resource requirements (CPU, Memory, etc.). In this problem vectors arrives online and the goal is to assign them to minimum number of bins such that the sum of all vectors in each bin is at most the all $1$'s vector. We survey recent results on online vector packing such as lower and upper bounds for general vectors, lower and upper bounds for restricted size vectors and close to $e \approx 2.718$ upper and lower bounds for small vectors. Based on joint work with I. Cohen, S. Kamara, B. Shepherd (STOC'13), I. Cohen, A. Fiat, A. Roytman (SODA'16) and I. Cohen, and A. Roytman (2016)

We consider a pairwise comparisons model with n users and m items. Each user is shown a few pairs of items, and when a pair of items is shown to a user, he or she expresses a preference for one of the items based on a probabilistic model. The goal is to group users into clusters so that users within each cluster have similar preferences. We will present an algorithm which clusters all users correctly with high probability using a number of pairwise comparisons which is within a polylog factor of an information-theoretic lower bound. Joint work with Barbara Dembin and Siddhartha Satpathi.

We present online algorithms for covering and packing problems with (non-linear) convex objectives. The convex covering problem is to minimize a monotone convex function subject to covering constraints. In the online version, a new covering constraint is revealed in each step and the algorithm is required to maintain a feasible and monotonically non-decreasing assignment over time. This framework captures several natural problems that have been individually studied previously. We present an online algorithm for the problem, and show that its competitive ratio is nearly the best possible. This allows us to obtain algorithms for several application problems whose performance either improves on, or nearly matches, their respective best known previous results. We also consider a convex packing problem which is the Fenchel dual of the convex covering program. We use a primal-dual approach that naturally gives algorithms for this dual problem as well. As an application, this gives us online algorithms for maximizing social welfare with (non-linear convex) production costs.

### Wednesday, September 21st, 2016

I will present the first polynomial-time method with poly(n)sqrt(T)-regret for bandit convex optimization.

Web applications typically optimize their product offerings using randomized controlled trials (RCTs), commonly called A/B testing. These tests are usually analyzed via p-values and confidence intervals presented though an online platform. Used properly, these measures both control Type I error (false positives) and deliver nearly optimal Type II error (false negatives).

Unfortunately, inferences based on these measures are wholly unreliable if users make decisions while continuously monitoring their tests. On the other hand, users have good reason to continuously monitor: there are often significant opportunity costs to letting experiments continue, and thus optimal inference depends on the relative preference of the user for faster run-time vs. greater detection. Furthermore, the platform does not know these preferences in advance; indeed, they can evolve depending on the data observed during the test itself.

This sets the challenge we address in our work: can we deliver valid and essentially optimal inference, but in an environment where users continuously monitor experiments, despite not knowing the user's risk preferences in advance? We provide a solution leveraging methods from sequential hypothesis testing, and refer to our measures as *always valid* p-values and confidence intervals. Our solution led to a practical implementation in a commercial A/B testing platform, serving thousands of customers since 2015.

Joint work with Leo Pekelis and David Walsh. This work was carried out with Optimizely, a leading commercial A/B testing platform.

In multi-armed bandit models, there is a fundamental difference between optimization (a.k.a. best arm identification or pure-exploration) and regret minimization. As we shall see, in both cases a notion of (asymptotically) optimal algorithm can be defined, and the behavior of optimal algorithms for the two objectives are indeed quite different. Moreover, the complexity of the two problems feature different information-theoretic quantities. Building on these observations, we will investigate the best possible performance of an Explore-Then-Commit strategy for regret minimization, that starts with a pure-exploration phase to identify the best arm and then plays this estimated best arm till the end of the budget. In two-armed bandit models with Gaussian arms, we will see that the regret of such strategies is at least twice larger as that of the best strategies interleaving exploration and exploitation. This talk is based on joint works with Aur_lien Garivier and Tor Lattimore.

The mathematical study of social and information networks has centered around generative models for such networks (preferential attachment, the Chung-Lu random graph model, Kronecker graphs, etc.). This work proposes distribution-free models of social and information networks --- novel classes of graphs that capture all plausible such networks. Our models are motivated by triadic closure, the property that vertices with one or more mutual neighbors tend to also be neighbors; this is one of the most universal signatures of social networks. We prove structural results on the clustering properties of such graphs, and give algorithmic applications to clustering and clique-finding problems.

Includes joint work with Jacob Fox, Rishi Gupta, C. Seshadhri, Fan Wei, and Nicole Wein.

What is the computational power of best-response computations in repeated game playing? I will give a precise answer to this question in the context of no-regret learning in zero-sum games, and discuss the implications within the theory of online learning and regret minimization.

Based on joint work with Elad Hazan (STOC'16).

Sequential decision making situations in real world applications often involve multiple long term constraints and nonlinear objectives. Some examples from online advertising include advertisers' budget constraints, nonlinear under-delivery penalties, and need for diversity in ad allocation. However, most standard models like multi-armed bandits, contextual bandits, online learning, and Markov decision processes simply optimize the sum of revenue/costs/profits received in every step, without any long-term considerations. Meanwhile online packing and covering problems allowed a limited, linear knapsack type constraints on consumption. In this talk, I will discuss a series of recent work in which we have introduced novel techniques to handle a very general form of convex constraints and objectives in sequential decision making, while providing provably optimal guarantees. Our primal-dual techniques are efficient to implement and employ fundamental concepts of Fenchel duality in convex programming, while using online learning methods as blackbox. Our techniques are remarkably general, as we demonstrate by their successful application to a variety of sequential decision making models: multi-armed bandits, contextual bandits, online packing, and online stochastic convex programming. This talk is based on the following series of work. Shipra Agrawal and Nikhil R. Devanur, Bandits with concave rewards and convex knapsacks, in EC 2014, ACM conference on Economics and Computation, June 2014. Shipra Agrawal and Nikhil R. Devanur, Fast algorithms for online stochastic convex programming, in SODA 2015 Shipra Agrawal, Nikhil R. Devanur, Lihong Li, An Efficient Algorithm for Contextual Bandits with Knapsacks, and an Extension to Concave Objectives, COLT 2016. Shipra Agrawal and Nikhil R. Devanur, Linear Contextual Bandits with Knapsacks, Unpublished manuscript 2016.

### Thursday, September 22nd, 2016

Consider a bounded degree network of resources, e.g., each node in the network is a collection of processors of limited capacity in a cloud center. Jobs arrive stochastically and move through the network until they receive the service they need or give up. We call the latter case, i.e., a job giving up, a failure. We show that even if the jobs move through the network in an adversarial fashion (and give up any time after the first node they visit), the failure probability of the network behaves almost as if the graph was a single node. That is, failures do not cascade. We apply this result to the analysis of time-of-use pricing for selling resources. The challenge here is that jobs have both timing constraints and preferences for cheaper resources over more expensive resources. Joint work with Shuchi Chawla, Nikhil Devanur, Alexander Holroyd, James Martin and Balu Sivan

A typical mechanism design problem involves uncertainty in the designer's knowledge of the preferences of the participants: in revenue maximization settings, the seller typically pays an "informational rent" owing to this uncertainty. We study settings where the buyer also faces uncertainty in his own value: the buyer repeatedly uses the product and does not know ahead of time what his value for a future use would be. In fact, the buyer may be unsure of his own attitude towards risk --- how much is he willing to pay upfront for future use of the product? We design approximately optimal posted pricing based mechanisms that are robust with regards to information, or lack thereof, about the buyer's value profile and risk attitude.

Individuals working towards a goal often exhibit time inconsistent behavior, making plans and then failing to follow through. One well-known model of such behavioral anomalies is a special form of hyperbolic discounting called present-bias discounting: individuals over-weight present costs by a bias factor. This model explains many time-inconsistent behaviors, but can make stark predictions in many settings: individuals either follow the most efficient plan for reaching their goal or procrastinate indefinitely. We propose a modification in which the present-bias parameter can vary over time, drawn independently each step from a fixed distribution. Following Kleinberg and Oren (2014), we use a weighted task graph to model task planning, and measure the cost of procrastination as the relative expected cost of the chosen path versus the optimal path. We use a novel connection to optimal pricing theory to describe the structure of the worst-case task graph for any present-bias distribution. We then leverage this structure to derive conditions on the bias distribution under which the worst-case ratio is exponential (in time) or constant. We also examine conditions on the task graph that lead to improved procrastination ratios: graphs with a uniformly bounded distance to the goal, and graphs in which the distance to the goal monotonically decreases on any path. Joint work with Nick Gravin, Brendan Lucier, and Manolis Pountourakis.

Online ads are delivered in a real-time fashion under uncertainty in an environment with strategic agents. Making such real-time (or online) decisions without knowing the future results in challenging stochastic optimization problems for ad selection and dynamic mechanism design problems for repeated auctions. In this talk, I will present a number of recent theoretical models and results in this area inspired by applications in reservation and exchange markets in display advertising.

In particular, after a short introduction, I will first highlight the practical importance of considering “hybrid” models that can take advantage of forecasting for stochastic models and at the same time, are robust against adversarial changes in the input such as traffic spikes and discuss our recent results combining stochastic and adversarial input models from recent SODA and EC papers. Then I will present more recent results concerning online bundling schemes that can be applied to repeated auction environments. In particular, we discuss ideas from our recent papers about contract design, *online bundling, **stateful** pricing, bank account mechanisms, and Martingale auctions. *We will conclude by stating a number of open problems in this area.

### Friday, September 23rd, 2016

Scheduling jobs is a fundamental problem that arises in numerous forms and in various situations. In this talk, we introduce and study a general scheduling problem that we term the Packing Scheduling problem (PSP). In this problem, jobs can have different arrival times and sizes, and the rates at which a scheduler can process jobs are subject to arbitrary packing constraints. We show that the PSP framework captures a variety of scheduling problems, including the classical problems of heterogeneous machines scheduling, broadcast scheduling, and scheduling jobs of different parallelizability. It also captures scheduling constraints arising in diverse modern environments ranging from multi-core computer architectures to data centers. More concretely, PSP models multidimensional resource and network bandwidth requirements arising in scheduling jobs on shared clusters. We design non-clairvoyant online algorithms for PSP and its special cases ? in this setting, the scheduler is unaware of the sizes of jobs and these jobs arrive in an adversarial fashion. For several special cases of PSP, we show constant competitive algorithms for minimizing the standard QoS metrics of total weighted completion time and total delay. Surprisingly, we achieve these results by applying the well-known Proportional Fairness algorithm from economics literature to allocate resources each time instant. Proportional Fairness has been extensively studied in the context of maximizing fairness in resource allocation; however, we present the first unifying analyses in the context of optimizing job latency in scheduling contexts. Our results yield the first constant competitive algorithms for the completion times and delay metrics for several classical non-clairvoyant scheduling problems. They showcase the advantage of viewing complex scheduling problems as sequential resource allocation problems, and applying ideas from economics to both perform these allocations and analyze them. Joint work with Sungjin Im (UC Merced) and Janardhan Kulkarni (Microsoft Research). Appeared in STOC 2014 and FOCS 2015.

In this talk, I will describe some recent progress made on the problem of minimizing weighted flow-time in the online non-clairvoyant scheduling model, arguably the most general model of machine scheduling. In this problem, a set of jobs $J$ arrive over time to be scheduled on a set of $M$ machines. Each job $j$ has processing length $p_j$, weight $w_j$, and is processed at a rate of $\ell_{ij}$ when scheduled on machine $i$. The online scheduler knows the values of $w_j$ and $\ell_{ij}$ upon arrival of the job, but is not aware of the quantity $p_j$. We present the first online algorithm that achieves a constant competitive ration for the total weighted flow-time objective. The key algorithmic idea is to let jobs migrate selfishly until they converge to an equilibrium. Towards this end, we define a game where each job's utility which is closely tied to the instantaneous increase in the objective the job is responsible for, and each machine declares a policy that assigns priorities to jobs based on when they migrate to it, and the execution speeds. The selfish migrate framework has also found applications in designing online algorithms for various other scheduling problems, including energy efficient scheduling and multidimensional scheduling.

Online algorithms model situations where the input data arrives over time, and the algorithm needs to take decisions without knowing the future input. They are typically analyzed in the framework of competitive analysis -- the performance of such an algorithm is compared against an adversary which knows the entire future beforehand. Although this has been a very fruitful approach, it often leads to pessimistic results because the adversary is much more powerful than the online algorithm. Recently, there have been attempts to evolve alternate ways of analyzing online algorithms which give more power to the online algorithm (as compared to the offline adversary). I will discuss some recent work on models which allow the algorithm to change a small number of decisions taken in the past. Drawing from examples in scheduling and graph algorithms, I will show that one can get interesting results in this ''online algorithms with recourse'' model.

A famous conjecture by Babaioff, Immorlica, and Kleinberg states that there is a constant competitive algorithm for the Matroid Secretary Problem. Our current best algorithms are however quite far from proving this conjecture: they have a O(loglog rank)-competitive ratio, where rank equals the largest cardinality of a feasible set. In this talk, I give an overview of recent progress on this problem. Apart from the Matroid Secretary Problem, we will also see how this progress led to the development of better algorithms for other online selection problems, such as Bayesian online selection, oblivious posted pricing mechanisms, and stochastic probing models. Finally, I hope to stimulate further research by pointing out several related open problems. In particular, we have a candidate approach for the Matroid Secretary Problem that we so far are unable to analyze.

We present an algorithmic technique for online algorithms on instances that are defined by an adversary but come in random order. In particular, this includes linear packing problems that are defined as follows. The variables of a packing linear program arrive one after the other. We get to know a variable's contributions to objective function and constraints only when it arrives. At this point in time, we have to set the respective variable?s value immediately and irrevocably. One example of such a problem is online knapsack: Objects of different profits and weights arrive one after the other. We have to choose a subset of these objects so as to maximize the profit while keeping the overall weight below a given bound. With random arrival order, it is possible to get close to the optimal offline solution. Using our approach, we obtain a simple algorithm achieving the optimal competitive ratio.