Today’s computing systems consist of servers that offer highly heterogeneous service rates due to the diverse machine types and the dynamic computing environment. In such systems, to guarantee low latency, load-balancing algorithms need to consider not only the amount of work on servers but also their service rates. However, because of the dynamic nature of the computing environment, the service rates cannot be obtained beforehand and thus have to be learned on the fly. In this talk, we consider a load-balancing system that consists of multiple servers with unknown service rates. Each incoming job needs to be assigned to one of the servers upon arrival. Our goal is to develop learning-integrated job-assigning algorithms that perform comparably to their counterparts under known service rates. Although some facets of this problem resemble the well-studied multi-armed bandit problem, our findings reveal that the exploration component is fundamentally different. In particular, we develop a meta-algorithm that integrates learning into a large class of load-balancing algorithms. The proposed algorithm uses almost no exploration, yet it achieves a constant regret in the accumulated latency of jobs. A similar free exploration phenomenon has been observed in a prior paper, but in a different setting with a single server. This is joint work with Yifei Huang and Zhiyuan Tang, both from Tsinghua University.