This is the weekly seminar of the Logic and Algorithms in Database Theory and AI program.

All talks can be followed with the zoom link https://berkeley.zoom.us/my/simonszoom2.

If you want to give a seminar please contact Benjie Wang or Mikaël Monet.

## All scheduled dates:

### Upcoming

No Upcoming activities yet

### Past

TITLE: Challenges in Building a Provenance-Aware Database Management System

We describe ProvSQL, software currently being developed in our team that adds support for provenance management and probabilistic query evaluation to a regular database management system, PostgreSQL. After recalling how different forms of provenance are computed over relational databases, and giving some basics of probabilistic query evaluation, we discuss research and engineering challenges in implementing techniques from the scientific literature on provenance management and probabilistic databases within a major database management system: use of circuits and knowledge compilation tools to compactly represent provenance and compute probabilities; exploitation of the structural properties of the queries or the database; interaction between provenance computation and query optimization; scaling up; development of application areas; etc.

**TITLE**: On Computing Probabilistic Explanations for Decision Trees** **

**LOCATION: **Calvin Lab Auditorium

Formal XAI (explainable AI) is a growing area that focuses on computing explanations with mathematical guarantees for the decisions made by ML models. Inside formal XAI, one of the most studied cases is that of explaining the choices taken by decision trees, as they are traditionally deemed as one of the most interpretable classes of models. Recent work has focused on studying the computation of sufficient reasons, a kind of explanation in which given a decision tree T and an instance x, one explains the decision T(x) by providing a subset y of the features of x such that for any other instance z compatible with y, it holds that T(z) = T(x), intuitively meaning that the features in y are already enough to fully justify the classification of x by T. It has been argued, however, that sufficient reasons constitute a restrictive notion of explanation. For such a reason, the community has started to study their probabilistic counterpart, in which one requires that the probability of T(z) = T(x) must be at least some value δ, where z is a random instance that is compatible with y. We show that finding δ-sufficient-reasons that are minimal inclusion-wise does not admit polynomial-time algorithms (unless P = NP). This is in stark contrast with the deterministic case (δ=1) where inclusion-wise minimal sufficient-reasons are easy to compute.

This is joint work with: Marcelo Arenas, Miguel Romero and Bernardo Subsercaseaux

This talk can also be followed with the zoom link https://berkeley.zoom.us/my/simonszoom2.

**TITLE:** Acyclity and Notions of "Width" of Hypergraphs

Answering Conjunctive Queries (CQs) and, likewise solving Constraint Satisfaction Problems (CSPs) are classical NP-complete problems. The underlying structure of CQs and CSPs is captured by hypergraphs and it is well known that both problems become tractable if the underlying hypergraph is acyclic. However, acyclicity is a strong restriction. Consequently, several generalizations of acyclicity have been studied - each coming with some notion of width that measures the degree of cyclicity.Starting from acyclicity, some basic definitions and computational properties of various notions of "width" of hypergraphs will be presented in this talk: treewidth, hypertree width, generalized hypertree width, and fractional hypertree width. Time permitting, also adaptive width and submodular will be briefly mentioned.

This talk can also be followed with the zoom link https://berkeley.zoom.us/my/simonszoom2.

**Title:** Finding Labeled Paths in Graphs: Easy, Hard, and Open Cases

**Abstract: **Modern graph query languages enable the user to use regular expressions to match different kinds of paths in graphs. These paths could be unconstrained, shortest, without node repetitions, or without edge repetitions. Whereas finding unconstrained and shortest paths is tractable, this is not always the case if we do not want node or edge repetitions. In this talk I will look at the question for which regular languages L the problem of finding such paths is in polynomial time and when it is NP-complete. In a nutshell, the situation is fairly well understood in directed graphs, but much less so in the undirected case.

TITLE: Neuro-Probabilistic Answer Set Programming

Probabilistic Answer Set Programming provides a declarative language for the specification of rich probabilistic models that include cyclic dependencies, multiple incomparable solutions and probabilistic uncertainty specification. The framework was recently extended to allow for neurosymbolic reasoning via neural probabilistic classifiers and gradient-based learning. In this talk, I will present an overview of the framework, including the main semantics and inference tasks, and their complexity. I will also comment on interesting applications and current challenges. No background knowledge will be assumed.

**Title:** Getting to the CORE of Complex Event Recognition

**Abstract:** Complex Event Recognition (CER for short) refers to the activity of processing high-velocity streams of primitive events by evaluating queries that detect complex events: collections of primitive events that satisfy some pattern. In particular, CER queries match incoming events on the basis of their content; where they occur in the input stream; and how this order relates to other events in the stream. CER has been successfully applied in diverse domains such as maritime monitoring, network intrusion detection, industrial control systems and real-time analytics.

In this talk, I will survey our work on developing a formal framework for specifying and evaluating CER queries and will explain in particular a novel CER evaluation algorithm whose complexity is provably asymptotically optimal. The implementation of this algorithm in the CORE COmplex event Recognition Engine shows that in practice the algorithm exhibits stable query performance, even for long sequence queries and large time windows, and outperforms existing systems by up to five orders of magnitude on different workloads.

I will discuss the essential ideas behind both the supported query language and evaluation algorithm, as well as their limitations, from these limitations discuss open questions also drawing parallels to the dynamic (IVM) evaluation of conjunctive queries.

**Title**: Recursive Rules with Aggregation: A Simple Unified Semantics and Efficient Incremental Computation

**Abstract:** Complex reasoning problems are most clearly and easily specified using logical rules, but require recursive rules with aggregation such as count and sum for practical applications. Unfortunately, the meaning of such rules has been a significant challenge, leading to many disagreeing semantics.

This talk presents a unified semantics for recursive rules with aggregation, extending the unified founded semantics and constraint semantics for recursive rules with negation. The key idea is to support simple expression of the different assumptions underlying different semantics, and orthogonally interpret aggregation operations using their simple usual meaning. We present a formal definition of the semantics, prove important properties of the semantics, and compare with prior semantics. In particular, we present an efficient inference over aggregation that gives precise answers to all examples we have studied from the literature. We also apply our semantics to a wide range of challenging examples, and show that our semantics is simple and matches the desired results in all cases. Finally, we describe experiments on the most challenging examples, implemented using efficient incremental computation, exhibiting unexpectedly superior performance over well-known systems when they can compute correct answers.

(Joint work with Scott Stoller)

https://arxiv.org/abs/2007.13053 (Journal of Logic and Computation Dec 2022)

**Title**: Enumerating Regular Languages in Bounded Delay

**Abstract**: We study the task, for a given language L, of enumerating the (generally infinite) sequence of

its words, without repetitions, while bounding the delay between two consecutive words. To

allow for delay bounds that do not depend on the current word length, we assume a model where

we produce each word by editing the preceding word with a small edit script, rather than

writing out the word from scratch. In particular, this witnesses that the language is

orderable, i.e., we can write its words as an infinite sequence such that the Levenshtein edit

distance between any two consecutive words is bounded by a value that depends only on the

language. For instance, (a+b)∗ is orderable (with a variant of the Gray code), but a∗+b∗ is

not.

We characterize which regular languages are enumerable in this sense, and show that this can be

decided in PTIME in an input deterministic finite automaton (DFA) for the language. In fact, we

show that, given a DFA A, we can compute in PTIME automata A1,…,At such that L(A) is

partitioned as L(A1)⊔…⊔L(At) and every L(Ai) is orderable in this sense. Further, we show that

the value of t obtained is optimal, i.e., we cannot partition L(A) into less than t orderable

languages.

In the case where L(A) is orderable (i.e., t=1), we show that the ordering can be produced by a

bounded-delay algorithm: specifically, the algorithm runs in a suitable pointer machine model,

and produces a sequence of bounded-length edit scripts to visit the words of L(A) without

repetitions, with bounded delay -- exponential in |A| -- between each script. In fact, we show

that we can achieve this while only allowing the edit operations push and pop at the beginning

and end of the word, which implies that the word can in fact be maintained in a double-ended

queue.

This is joint work with Mikaël Monet, presented at STACS'23 (https://arxiv.org/abs/2209.14878).