Abstract

For a fixed Boolean query Q, the *probabilistic query evaluation problem for Q*

(written PQE(Q)) is to compute, given as input a tuple-independent

probabilistic database (TID) D, the probability that D satisfies Q. In a

celebrated JACM'2013 paper, Dalvi and Suciu established the following dichotomy

theorem: for any (Boolean) union of conjunctive queries (UCQ) Q, PQE(Q) is

either solvable in polynomial time or is #P-hard. The UCQs for which the

problem is in PTIME are called *safe*. Dalvi and Suciu's algorithm on such a

safe query relies essentially on the following three rules: 

 1. Independence:  Pr(A and B) = Pr(A) x Pr(B), when A and B are independent

    events

 2. Negation: Pr(A) = 1 - Pr(not A)

 3. Inclusion-Exclusion: Pr(A or B) = Pr(A) + Pr(B) - Pr(A and B), generalized

    to more than two events.



In parallel, another method to obtain polynomial-time algorithms for the PQE

problem is through *knowledge compilation*: here one first compiles the

provenance of a query Q on a TID D as a Boolean circuit or diagram from the

field of knowledge compilation (e.g., OBDDs, FBDDs, d-DNNFs, etc), and then

uses this circuit to compute the probability. At a high-level, this type of

algorithm makes use of the following three rules, restricted in diverse ways

depending on the type of circuit under consideration:

 1. Independence: just as above

 2. Negation: just as above

 3. Disjoint union: Pr(A or B) = Pr(A) + Pr(B) when A and B are disjoint event



This naturally leads to the following question: letting Q be a safe UCQ, can

the tractability of PQE(Q) be captured with the knowledge compilation approach?

This question, called the intensional-extensional problem by Dalvi and Suciu in

that same JACM paper, can be studied independently for every circuit class from

knowledge compilation, and, for instance, the safe UCQs that can be handled

with OBDDs are entirely characterized. However, the question is still open for

more powerful circuit classes such as d-DNNFs or d-Ds.



In this talk I will talk about this problem, present a technique that allowed

handle a specific class of UCQs, and discuss our ongoing work on the

problem. In particular, I will present a neat combinatorial conjecture, that we

named the "non-cancelling intersections" conjecture, that talks only about sets

and the so-called Möbius function (i.e., no databases, no queries, no complexity).



This talk is based on ongoing work with Antoine Amarilli, Louis Jachiet, and

Dan Suciu. Slides: TBA. The paper solving the conjecture for the specific class of queries:

https://arxiv.org/abs/1912.11864. A draft on the non-cancelling intersections

conjecture: TBA.



Note: If you watch the recorded talk, it might be useful to first watch Dan

Suciu's talk here

https://simons.berkeley.edu/talks/dan-suciu-university-washington-2023-10-16,

depending on your familiarity with the topic.

Attachment

Video Recording