Fall 2016

Monotone Estimation Framework and its Applications for Scalable Analytics of Large Data Sets

Friday, Nov. 18, 2016 11:20 am12:00 pm PST

Add to Calendar

Monotone estimation problems (MEP) have a simple formulation: We have a data point $x$ from a universe $D$ and we are interested in estimating $f(x)$ for a nonnegative function $f$. However, we do not observe $x$ but instead see the output $S(x,u)$ of a blurry scope applied to $x$ with resolution $u \sim U[0,1]$: The lower $u$ is, the more information we have on $x$ and hence on $f(x)$. We are interested in estimators that are unbiased, nonnegative, and optimal (admissible).

In this talk, we will characterize feasibility, present constructions of estimators, and motivate the study of MEPs. One important application of MEPs is coordinated weighted sampling. Sample coordination was introduced decades ago (Brewer et al 1972) for efficient surveys of populations that change over time and later applied for scalable summarization and analytics of large data sets (it is an instance of LSH and MinHash sketches are a special case). Coordinated samples are effective summaries of matrix data (logs, transactions, feature vectors collected across time or servers), generalized coverage functions, and graph-structure kernels. We will see that estimators for some fundamental statistics can be expressed as a sum of MEPs: Change in activity across time or servers, similarity and influence of nodes in a network, and coverage of set of items. The quality of our MEP estimators is critical to the overall performance tradeoffs of these applications.