Ronitt Rubinfeld (MIT)
This talk will be held virtually and will be live streamed on our website. Full participation (including the capacity to ask questions) will be available via Zoom webinar. A link to the Zoom webinar will be shared on this page closer to the event date.
Algorithms for understanding data generated from distributions over large discrete domains are of fundamental importance. In this talk, we consider the sample complexity of property testing algorithms that seek to distinguish whether or not an underlying distribution satisfies basic shape properties. Examples of such properties include convexity, log-concavity, heavy tails, and approximability by k-histogram functions. In this talk, we will focus on the property of monotonicity, as tools developed for distinguishing the monotonicity property have proven to be useful for the all of the above properties as well as several others. We say a distribution p is monotone if for any two comparable elements x<y in the domain, we have that p(x)<p(y). For example, for the classic n-dimensional hypercube domain, in which domain elements are described via n different features, monotonicity implies that for every element, an increase in the value of one of the features can only increase its probability.
We recount the development over the past nearly two decades of monotonicity testing algorithms for distributions over various discrete domains, which make no a priori assumptions on the underlying distribution. We study the sample complexity for testing whether a distribution is monotone as a function of the size of the domain, which can vary dramatically depending on the structure of the underlying domain. Not surprisingly, the sample complexity over high dimensional domains can be much greater than over low dimensional domains of the same size. Nevertheless, for many important domain structures, including high dimensional domains, the sample complexity is sublinear in the size of the domain. In contrast, when no a priori assumptions are made about the distribution, learning the distribution requires sample complexity that is linear in the size of the domain. The techniques used draw tools from a wide spectrum of areas, including statistics, optimization, combinatorics, and computational complexity theory.
If you require accommodation for communication, please contact our Access Coordinator at email@example.com with as much advance notice as possible.