Abstract

For a fixed Boolean query Q, the *probabilistic query evaluation problem for Q*
(written PQE(Q)) is to compute, given as input a tuple-independent
probabilistic database (TID) D, the probability that D satisfies Q. In a
celebrated JACM'2013 paper, Dalvi and Suciu established the following dichotomy
theorem: for any (Boolean) union of conjunctive queries (UCQ) Q, PQE(Q) is
either solvable in polynomial time or is #P-hard. The UCQs for which the
problem is in PTIME are called *safe*. Dalvi and Suciu's algorithm on such a
safe query relies essentially on the following three rules: 
 1. Independence:  Pr(A and B) = Pr(A) x Pr(B), when A and B are independent
    events
 2. Negation: Pr(A) = 1 - Pr(not A)
 3. Inclusion-Exclusion: Pr(A or B) = Pr(A) + Pr(B) - Pr(A and B), generalized
    to more than two events.

In parallel, another method to obtain polynomial-time algorithms for the PQE
problem is through *knowledge compilation*: here one first compiles the
provenance of a query Q on a TID D as a Boolean circuit or diagram from the
field of knowledge compilation (e.g., OBDDs, FBDDs, d-DNNFs, etc), and then
uses this circuit to compute the probability. At a high-level, this type of
algorithm makes use of the following three rules, restricted in diverse ways
depending on the type of circuit under consideration:
 1. Independence: just as above
 2. Negation: just as above
 3. Disjoint union: Pr(A or B) = Pr(A) + Pr(B) when A and B are disjoint event

This naturally leads to the following question: letting Q be a safe UCQ, can
the tractability of PQE(Q) be captured with the knowledge compilation approach?
This question, called the intensional-extensional problem by Dalvi and Suciu in
that same JACM paper, can be studied independently for every circuit class from
knowledge compilation, and, for instance, the safe UCQs that can be handled
with OBDDs are entirely characterized. However, the question is still open for
more powerful circuit classes such as d-DNNFs or d-Ds.

In this talk I will talk about this problem, present a technique that allowed
handle a specific class of UCQs, and discuss our ongoing work on the
problem. In particular, I will present a neat combinatorial conjecture, that we
named the "non-cancelling intersections" conjecture, that talks only about sets
and the so-called Möbius function (i.e., no databases, no queries, no complexity).

This talk is based on ongoing work with Antoine Amarilli, Louis Jachiet, and
Dan Suciu. Slides: TBA. The paper solving the conjecture for the specific class of queries:
https://arxiv.org/abs/1912.11864. A draft on the non-cancelling intersections
conjecture: TBA.

Note: If you watch the recorded talk, it might be useful to first watch Dan
Suciu's talk here
https://simons.berkeley.edu/talks/dan-suciu-university-washington-2023-10-16,
depending on your familiarity with the topic.

Attachment

Video Recording