Abstract

Many state-of-the-art systems that reason over large knowledge bases perform approximate inference by sampling. This talk shows an alternative approach that allows approximate ranking of answers to hard probabilistic queries in guaranteed polynomial time by using only basic operators of existing database management systems (i.e. no sampling required).
(1) The first part of this talk develops upper and lower bounds for the probability of Boolean functions by treating multiple occurrences of variables as independent and assigning them new individual probabilities. We call this approach dissociation and give an exact characterization of optimal oblivious bounds (i.e. when the new probabilities are chosen independent of the probabilities of all other variables). These new bounds shed light on the connection between previous relaxation-based and model-based approximations and unify them as concrete choices in a larger design space.
(2) The second part then uses these bounds and draws the connection to lifted inference. We will see how the problem of approximate probabilistic inference for certain hard probabilistic queries can be entirely reduced to a standard query evaluation problem with aggregates. The resulting approximations are upper bounds to the true probabilities, which our experiments show to be sufficient to rank the top query answers fast and with high precision. 
 
(Talk based on joint work with Dan Suciu from TODS 2014, VLDB 2015, and VLDBJ 2016: http://arxiv.org/pdf/1409.6052http://arxiv.org/pdf/1412.1069http://arxiv.org/pdf/1310.6257)

Video Recording