No abstract available.

### Monday, June 3rd, 2019

Driven by major trends such as decarbonization and electrification of vehicles, the electric power sector across the globe is moving to a future where massive digitization and information flow is used to compensate for the resulting reduced operating reserves. Such digitization presents both major opportunities and major dangers. Exploiting the former while combatting the latter will require an appropriate combination of market mechanisms and regulatory frameworks to realize technological solutions that are becoming available. Xie will illustrate the opportunities presented by digitization through a risk-quantifiable, market-clearing mechanism that incorporates massive renewables and aggregates demand response from individual end users. He will also examine the dangers of digitization with a look at cyberattacks on automatic generation control (AGC) in power systems through malicious interception and manipulation of information flow over feedback loops. For both, Xie will present technological solutions. The challenge that arises is the development of an appropriate framework of market mechanisms and regulations so that the needed investments are made by the appropriate stakeholders to implement the technological solutions, and thus realize the potential of the massively digitized power grid.

Motivated by reconfiguration of electricity distribution grids to reduce power losses to due electrical flow, we consider the minimization of a cubic objective over the spanning tree polytope (i.e. radial networks). This corresponds to a supermodular function minimization over a matroid basis constraint, for which there are no known multiplicative approximation algorithms. For the special case of the power loss objective, we provide simple approximation algorithms with provable guarantees for various settings: O(n) approximation using shortest-path trees for general graphs (with n nodes) with general demands, O(\sqrt{n}) approximation for grid graphs (with n nodes) with general demands using the cut structure, and a constant factor approximation for grid graphs with uniform demand. This is on-going joint work with Ali Khodabakhsh, Hassan Mortagy and Evdokia Nikolova.

For many A/B tests, the treatment is a new algorithm or policy (e.g., a matching policy in a ridesharing system, delivery system, or ad auction). These policies change the state of the system, and so classical experimental designs that randomize platform participants to treatment or control are infeasible. Instead, the typical design is a "switchback" experiment: each policy is alternately applied in successive time intervals, and then the performance of each policy is estimated by its sample average across intervals where it was used. Despite their appeal, switchback designs suffer from *temporal interference*: the application of policy A in one interval affects the initial state of the system as seen by policy B in the next interval. Motivated by the need to mitigate temporal interference, in this talk we study optimal experimental design and estimation for comparison of two policies. Our key innovation is to model the problem as one of efficiently estimating the difference of the steady state reward between two different Markov chains on the same state space (the two policies). We completely characterize the optimal experimental design and estimator in this setting, and discuss implications for practice. Joint work with Peter Glynn and Mohammad Rasouli.

I will present two projects at the intersection of algorithms and mechanism design with power systems that were conceived or developed in part during the Simons Real-Time Decision-Making semester in spring 2018. One is "Electric Vehicle Valet" on how to utilize electric vehicles (EVs) as mobile energy storage to help with grid support, that prompted us to develop several scheduling approximation algorithms. The second one is "Prosumer Pricing" which presents a new electricity pricing scheme that allows utility companies to more fairly divide their internal overhead costs among traditional and solar (or other) power - generating consumers, and thus better control the level of penetration of solar power. Joint work with Jimmy Horn, Ali Khodabakhsh, Jannik Matuschke and Manolis Pountourakis

This talk investigates Lagrangian (mobile) control of traffic flow at large scale (city-wide, with fluid flow models) and local scale (vehicular level).

For large scale inference and control, fluid flow models over networks are considered. Algorithms relying on convex optimization are presented for fusion of static and mobile (Lagrangian) traffic information data. Repeated game theory is used to characterize the stability such flows under selfish information patterns (each flow attempting to optimize their latency). Convergence to Nash equilibria of the solutions is presented, leading to control strategies to optimize the network efficiency.

At local scale, the question of how will self-driving vehicles will change traffic flow patterns is investigated. We describe approaches based on deep-RL presented in the context of enabling mixed-autonomy mobility. The talk explores the gradual and complex integration of automated vehicles into the existing traffic system. We present the potential impact of a small fraction of automated vehicles on low-level traffic flow dynamics, using novel techniques in model-free deep reinforcement learning, in which the automated vehicles act as mobile controllers to traffic flow.

Illustrative examples will be presented in the context of a new open-source computational platform called FLOW, which integrates state of the art microsimulation tools with deep-RL libraries on AWS EC2. Interesting behavior of mixed autonomy traffic will be revealed in the context of emergent behavior of traffic: https://flow-project.

The Weighted Tree Augmentation problem (WTAP) is a fundamental problem in network design. In this talk, we consider this problem in the online setting. We are given an n-vertex spanning tree T and an additional set L of edges (called links) with costs. Then, terminal pairs arrive one-by-one and our task is to maintain a low-cost subset of links F such that every terminal pair that has arrived so far is 2-edge-connected in T \cup F. This online problem was first studied by Gupta, Krishnaswamy and Ravi (SICOMP 2012) who used it as a subroutine for the online survivable network design problem. They gave a deterministic O(\log^2 n)-competitive algorithm and showed an \Omega(\log n) lower bound on the competitive ratio of randomized algorithms. We give a deterministic O(\log n)-competitive algorithm for online WTAP, which is tight up to constant factors. This is joint work with Seffi Naor and William Umboh.

The data rate at the Large Hadron Collider is so high that most events are discarded in real time. I will discuss algorithms for this decision making at the earliest stages of this trigger system. These algorithms have time severe constraints and also must be able to handle partial or missing data. The more effective these real time algorithms are, the better our chances will be to uncover fundamental properties of nature.

### Tuesday, June 4th, 2019

Soon before arriving to Berkeley in 2018, we discovered that the Q-learning algorithm of Watkins has infinite variance (as defined in the Central Limit Theorem), and obtained methods to speed up convergence and even optimize the variance by modifying the gain in the recursion. An optimal algorithm that we call Zap Q-learning involves a matrix inverse that is similar to what appears in the classical Newton Raphson algorithm. Stability was established only for the so-called tabular case.

Big surprises came in 2019: the analysis in 2018 admits an extremely broad generalization. Zap stochastic approximation (SA) algorithms are stable and have approximately optimal variance under minimal assumptions. This has significant implications to reinforcement learning.

The lecture will lay out design principles for reinforcement learning based on these discoveries, and building on the "Hidden Theory" tutorial from 2018. Two implications to RL will be discussed in the lecture:

(i) It is known that the natural generalization of Watkins' Q-learning algorithm is not always stable in function approximation settings. Parameter estimates may diverge to infinity even in the linear function approximation setting with a simple finite state-action model. The new Zap Q-learning algorithm is stable and convergent under minimal assumptions, even in the case of nonlinear function approximation.

(ii) The GQ algorithm of Maei et. al. 2010 was designed to address the stability challenge. However, just as in Watkins' algorithm, convergence can be absurdly slow. Theory for Zap GQ is not entirely complete, but initial numerical experiments are promising.

Based on joint research with Adithya Devraj and Shuhang Chen, UF and Ana Busic, Inria Paris

As you learned in the previous talk, stochastic approximation (SA) algorithms are prone to slow convergence, which was explained by the fact that the asymptotic (CLT) variance is typically infinite. This was one of the main motivations for the introduction of the Zap Q-learning algorithm: a two time-scale version of Stochastic Newton-Raphson that has optimal variance.

While the Zap algorithm enjoys fast convergence, each iteration involves a matrix inversion, which is obviously undesirable. This motivated us to look at alternatives, resulting in two new classes of algorithms:

(i) The PolSA algorithm that is based on Polyak’s momentum technique, but with a specialized matrix momentum, and

(ii) The NeSA algorithm based on Nesterov’s acceleration technique.

Analysis requires entirely new analytic techniques, since the algorithms fall far outside the classical SA framework of Robbins and Monro. An approach taken in our new work is via coupling: conditions are established under which the parameter estimates obtained using the PolSA algorithm couple with those obtained using a Newton-Raphson SA algorithm (a cousin of Zap).

Since the 1980s, mean-field models have been used as a tool for analysis of distributed control of flexible loads. This talk surveys recent work on the optimization of mean-field dynamics. The focus is on a Kullback-Leibler-Quadradic (KLQ) optimal control formulation for the Demand Dispatch problem, i.e. creating virtual energy storage from flexible electric loads. The grid balancing authority simply broadcasts the desired aggregate power consumption target signal, and the heterogeneous population of loads ramps power consumption up and down to accurately track the signal. Analysis of the Lagrangian dual of the KLQ optimization problem leads to a menu of solution options, and expressions of the gradient and Hessian suitable for Monte-Carlo-based optimization.

Motivated by applications in blockchains and sensor networks, we consider a model of $n$ nodes trying to reach consensus on their majority bit. Each node $i$ is assigned a bit at time zero, and is a finite automaton with m bits of memory (i.e., $2^m$ states) and a Poisson clock. When the clock of $i$ rings, $i$ can choose to communicate, and is then matched to a uniformly chosen node $j$. The nodes $j$ and $i$ may update their states based on the state of the other node. Previous work has focused on minimizing the time to consensus and the probability of error, while our goal is minimizing the number of communications. We show that when $m > 3\log\ log \log(n)$, consensus can be reached at linear communication cost, but this is impossible if $m < \log\log\log(n)$. We also study a synchronous variant of the model, where our upper and lower bounds on m for achieving linear communication cost are $2\log\log\log(n)$ and $log\log\log(n)$, respectively. A key step is to distinguish when nodes can become aware of knowing the majority bit and stop communicating. We show that this is impossible if their memory is too low. This is joint work with Giulia Fanti, Nina Holden and Yuval Peres.

We present efficient randomized algorithms for combining observations or scores generated privately and independently by many agents. The agents may be telescopes generating astronomical images, census-takers measuring population density, sensors observing earthquake tremors, or voters in an election or competition, and the goal may be to identify the most significant events, observations or local concentrations or hot spots In a body of data.

Reinforcement Learning (RL) problems for continuous state and action space systems is quite challenging. Recently, deep reinforcement learning methods have been shown to be quite effective for certain RL problems in settings of very large/continuous state and action spaces. But such methods require extensive hyper-parameter tuning, huge amount of data, and come with no performance guarantees. We note that such methods are mostly trained `offline’ on experience replay buffers. In this talk, I will describe a series of simple reinforcement learning schemes for various settings. Our premise is that we have access to a generative model that can give us simulated samples of the next state. We will start with finite state and action space MDPs. An `empirical value learning’ (EVL) algorithm can be derived quite simply by replacing the expectation in the Bellman operator with an empirical estimate. We note that the EVL algorithm has remarkably good numerical performance for practical purposes. We next extend this to continuous state spaces by considering randomized function approximation on a reproducible kernel Hilbert space (RKHS). This allows for arbitrarily good approximation with high probability for any problem due to its universal function approximation property. Next, we consider continuous action spaces. In each iteration of EVL, we sample actions from the continuous action space, and take a supremum over the sampled actions. Under mild assumptions on the MDP, we show that this performs quite well numerically, with provable performance guarantees. Finally, we consider the `Online-EVL’ algorithm that learns from a trajectory of state-action-reward sequence. Under mild mixing conditions on the trajectory, we can provide performance bounds and also show that it has competitive (and in fact slightly superior) performance as compared to the Deep Q-Network algorithm on a benchmark RL problem. I will conclude by a brief overview of the framework of probabilistic contraction analysis of iterated random operators that underpins the theoretical analysis.

This talk is based on work with a number of people that includes Hiteshi Sharma, William Haskell, and Dileep Kalathil.

Many real-time sequential decision making problems can be modeled as online optimal control problems with a look-ahead window of predictions. In this talk, we focus on online optimal control problems with time-varying convex cost functions and linear time-invariant dynamical system. We consider a finite prediction window of the future cost functions and focus on dynamic regret (aka, competitive difference) analysis. We show how the prediction improves the regret for any given online algorithm without using prediction by conducting additional W step of gradient calculations. We also characterize a fundamental lower bound of dynamic regrets for any online algorithm. The lower bound and the upper bound of the designed online algorithm almost match each other, implying that our algorithm achieves a near-optimal performance with respect to dynamic regrets.

### Wednesday, June 5th, 2019

Coflows are an emerging network abstraction introduced to model traffic patterns within a datacenter. This abstraction naturally leads to several scheduling optimization problems, and we consider, in particular, optimizing for the sum of weighted completion times. There has been recent work from both the theory and systems communities on this scheduling problem, leading to either practical solutions that perform well in practice and are easy to implement, or theoretical solutions with bounded performance guarantees, albeit not both. In this work, we bridge this gap, and present Sincronia, a network design that matches the best known approximation guarantee, while empirically outperforming the existing state-of-the-art network designs. Sincronia achieves this using a key technical result ? we show that given a ?right? ordering of coflows, any per-flow rate allocation mechanism achieves average coflow completion time within a factor of 4 of the optimal as long as (co)flows are prioritized with respect to the ordering. We obtain this right ordering of coflows using a primal-dual algorithm. This is joint work with Saksham Agarwal, Shijin Rajakrishnan, Akshay Narayan, Rachit Agarwal, and Amin Vahdat.

We describe Self-Programming Networks (SPNs), an ongoing research effort at Stanford for making cloud computing networks autonomous; that is, to enable the networks to sense and monitor themselves, and program and control themselves. We present two key algorithms: (i) Simon, for fine-grained network measurement using packet and probe timestamps taken at the network?s edge, and (ii) Huygens, for nanosecond-level clock synchronization in real-time and at scale. Simon solves an inverse problem (first formulated in the context of Network Tomography) while Huygens uses a combination of inference techniques to achieve network clock synchronization. We describe the relevance of this work to existing financial trading infrastructure, and how, in future, it enables "financial exchanges in the Cloud."

As ridesharing firms become bigger and more popular, the focus for researchers shifts from optimizing a single platform to understanding the effects of multiple platforms competing with other, and with other transportation modes, in particular, public transit. I will present some of our work in tackling these questions - in particular, in trying to understand the operational losses due to multiple ridesharing platforms, and the challenges of designing a mobility marketplace: a centralized platform where private mobility providers and public transit can jointly offer hybrid multi-modal trips to commuters. Based on joint work with Chamsi Hssaine, Thibault Sejourne, Raga Gopalakrishnan and Samitha Samaranayake.

Increasing penetration of highly variable components such as solar generation and electric vehicle charging loads pose significant challenges to keeping three-phase loads balanced in modern distribution systems. Failure to maintain balance across three phases would lead to asset deterioration and increasing delivery losses. Motivated by the real-world needs to automate and optimize the three-phase balancing decision making, this paper introduces a robust look-ahead optimization framework that pursues balanced phases in the presence of demand-side uncertainties. We show that look-ahead moving window optimization can reduce imbalances among phases at the cost of a limited number of phase swapping operations. Case studies quantify the improvements of the proposed methods compared with conventional deterministic phase balancing. Discussions on possible benefits of the proposed methods and extensions are presented.

Mobility systems featuring shared vehicles are often unable to serve all potential customers, as the distribution of demand does not coincide with the positions of vehicles at any given time. System operators often choose to reposition these shared vehicles (such as bikes, cars, or scooters) actively during the course of the day to improve service rate. They face a complex dynamic optimization problem in which many integer-valued decisions must be made, using real-time state and forecast information, and within the tight computation time constraints inherent to real-time decision-making. We first present a novel nested-flow formulation of the problem, and demonstrate that its linear relaxation is significantly tighter than one from existing literature. We then adapt a two-stage stochastic approximation scheme from the generic SPAR method of Powell et al., in which rebalancing plans are optimized against a value function representing the expected cost (in terms of fulfilled and unfulfilled customer demand) of the future evolution of the system. The true value function is approximated by a separable function of contributions due to the rebalancing actions carried out at each station and each time step of the planning horizon. The new algorithm is suited to real-time use, as it typically generates high-quality solutions in tens of iterations and can be truncated to fit within available computation time. We provide insight into this good performance by examining the mathematical properties of our new flow formulation, and demonstrate the algorithm on synthetic and real networks.

Real-time decision-making is often very complicated since decisions of today may negatively affect the future, by leading the underlying system in an undesired state. There are two prominent ways to try to plan against such undesired outcomes. On the one hand, the main machine learning and computer systems approach is to extrapolate patterns in the data and obtain useful predictions about the future. On the other hand, theoretically competitive online algorithms hedge their decisions against any possible future event to achieve robust worst-case performance guarantees. However, both of these approaches have important shortcomings: the first often lacks principled guarantees and can be prone to errors or adversarial examples, while the second is often tailored towards the worst case and, as a result, cannot meaningfully use existing patterns in data.

In this talk, I will tackle this mismatch by aiming for the best of both worlds: using the predictive power of machine learning while guaranteeing the worst-case robustness of online algorithms. I will focus on the paging or caching problem, which exemplifies the above mismatch as we have a) many useful predictions from heuristics or deep learning with weak theoretical guarantees, while also having b) pessimistic algorithms with robust worst-case guarantees. I will show a way to slightly adapt a classical competitive online algorithm (Marker) to incorporate predictions in a way to get leverage when they are relatively accurate while also retaining the worst-case competitiveness. This is a surprisingly powerful technique that, on the side, can enable to robustify classical heuristics such as LRU, that tend to perform well in practice without having nice worst-case guarantees.