Results 1831 - 1840 of 23856
Prior knowledge—whether from historical data, domain expertise, or predictive models—can enhance statistical inference by reducing sample complexity. We introduce a general framework for leveraging such predictions and apply it to hypothesis testing for discrete distributions. In the standard setting, optimal sample complexity bounds are known for identity, closeness, and uniformity testing. We show that access to a predicted distribution can reduce the required samples, with gains determined by its total variation distance from the true distribution. Our algorithms are adaptive, adjusting to prediction accuracy without prior calibration, and robust, never exceeding the standard sample complexity when predictions are uninformative. We establish information-theoretic lower bounds confirming optimality and present experimental results demonstrating significant practical improvements.
Shyamal Patel
Title: A Mysterious Connection Between Tolerant Junta Testing and Agnostically Learning Conjunctions
Abstract: In this talk we discuss a curious connection between agnostically learning conjunction and tolerant junta testing. Inspired by this connection, we show improved bounds for both problems.
Guy Blanc
Title: Instance-Optimal Uniformity Testing and Tracking
Abstract: In the uniformity testing task, an algorithm is provided with samples from an unknown probability distribution over a (known) finite domain, and must decide whether it is the uniform distribution, or, alternatively, if its total variation distance from uniform exceeds some input distance parameter. This question has received a significant amount of interest and its complexity is, by now, fully settled. Yet, we argue that it fails to capture many scenarios of interest, and that its very definition as a gap problem in terms of a prespecified distance may lead to suboptimal performance.
To address these shortcomings, we introduce the problem of uniformity tracking, whereby an algorithm is required to detect deviations from uniformity (however they may manifest themselves) using as few samples as possible, and be competitive against an optimal algorithm knowing the distribution profile in hindsight. Our main contribution is a polylog(opt}-competitive uniformity tracking algorithm. We obtain this result by leveraging new structural results on Poisson mixtures, which we believe to be of independent interest.
We study monotonicity testing of high-dimensional distributions on {−1,1}^n in the model of subcube conditioning, suggested and studied by Canonne, Ron, and Servedio [CRS15] and Bhattacharyya and Chakraborty [BC18]. Previous work shows that the sample complexity of monotonicity testing must be exponential in n (Rubinfeld, Vasilian~\cite{RV20}, and Aliakbarpour, Gouleakis, Peebles, Rubinfeld, Yodpinyanee~\cite{AGPRY19}). We show that the subcube \emph{query complexity} is Θ̃ (n/ε2), by proving nearly matching upper and lower bounds. Our work is the first to use directed isoperimetric inequalities (developed for function monotonicity testing) for analyzing a distribution testing algorithm. Along the way, we generalize an inequality of Khot, Minzer, and Safra~\cite{KMS18} to real-valued functions on {−1,1}n. We also study uniformity testing of distributions that are promised to be monotone, a problem introduced by Rubinfeld, Servedio~\cite{RS09} , using subcube conditioning. We show that the query complexity is Θ̃ (n‾√/ε2). Our work proves the lower bound, which matches (up to poly-logarithmic factors) the uniformity testing upper bound for general distributions (Canonne, Chen, Kamath, Levi, Waingarten~\cite{CCKLW21}). Hence, we show that monotonicity does not help, beyond logarithmic factors, in testing uniformity of distributions with subcube conditional queries.
In his classic result, Vizing (1964) proved that any graph of maximum degree ∆ can be edge colored using at most ∆+1 different colors. Vizing’s original proof is algorithmic and shows that such an edge coloring can be found in O(mn) time where m and n denote the number of edges and vertices respectively. This was subsequently improved to O~(m \sqrt{n}) independently by [Arjomandi ’82; Gabow et al. ’85]. This bound remained state-of-the-art for four decades, and only recently got improved to O~(min{n^2, mn^{1/4}}) [Assadi ’24; Bhattacharya et al. ’24] using randomization. In this talk, I will present a completely new approach to edge coloring that leads to the first near-linear—in fact O(m log ∆)—time algorithm for Vizing’s theorem.
Nipun Amarasinghe is currently a second year mathematics graduate student at Texas A&M University advised by Dr. J. M. Landsberg. Nipun's research interests are in algebraic geometry and its applications to the complexity of matrix multiplication and more...
Hanna Komlós is a PhD student at New York University advised by Martín Farach-Colton. Her research is primarily in data structures, particularly randomized and external-memory data structures and fundamental combinatorial primitives with data-structural...
Clifford Stein is the Wei T. Chang Professor of Industrial Engineering and Operations Research and of Computer Science at Columbia University. He is a fellow of the ACM and the former Director of the Data Science Institute at Columbia University. He...