Abstract
Generalized matching problems arise in different research areas of computer science, including computational advertising, recommender systems, and trade markets. Consider, for example, the problem of recommending multimedia items (e.g., DVDs) to users such that (1) users are recommended items that they are likely to be interested in, (2) every user gets neither too few nor too many recommendations, and (3) only items available in stock are recommended to users. In practice, instances of this and related problems may involve millions of users and items; traditional matching algorithms fail at such scales. In this paper, we propose a distributed algorithm for approximately solving large-scale generalized matching problems like the one above. Our algorithm is based on linear programming and is designed to run on a small cluster of commodity nodes (it is also amenable to MapReduce). In more detail, we develop a distributed approximation algorithm for a broad class of problems, i.e., mixed packing-covering linear programs. Our algorithm has strong approximation guarantees and requires only a poly-logarithmic number of passes over the data. We also propose a distributed rounding algorithm that transforms the fractional solution of the linear program into a near-optimal integral solution. Experiments on real-world and synthetic data suggest that our algorithm scales to very large problem sizes and can outperform alternative approaches by multiple orders of magnitude.