# SGT Theory-to-Practice Seminar

Silvio Lattanzi (Google Research)

Calvin Lab 116

## Large-Scale Graph Mining

The amount of data available and requiring analysis has grown at astonishing rate in recent years. To cope with this deluge of information it is fundamental to design new algorithms to analyze data efficiently. In this talk, we describe our effort to build a large scale graph-mining library. We first describe the general framework and few relevant problems that we are trying to solve. Then we describe few results and open problems on local algorithms for clustering.

We start by describing a new analysis of the Personalized PageRank algorithm for local clustering that show better theoretical guarantee, both in terms of the clustering accuracy and the conductance of the output set. We also prove that our analysis is tight, and perform empirical evaluation to support our theory on both synthetic and real data.

Then we analyze the problem of computing similarity rankings in large-scale multi-categorical bipartite graphs, where the two sides of the graph represent actors and items, and the items are partitioned into an arbitrary set of categories. The problem has several real-world applications, including identifying competing advertisers and suggesting related queries in an online advertising system or finding users with similar interests and suggesting content to them. In these settings, we are interested in computing on-the-fly rankings of similar actors, given an actor and an arbitrary subset of categories of interest. We show a solution to this problem based on the Personalized PageRank algorithm and discuss some related open problems.

Finally we discuss social network clustering, we first observe the discrepancies between current models for community detection and few interesting social network properties. Then we present few practical solutions for the problem and raise some open questions.

Joint work with Alessandro Epasto, Jon Feldman, Stefano Leonardi, Vahab Mirrokni and Zeyuan Allen Zhu.