Abstract

Uncertainty arises naturally in many application domains due to data entry errors, sensor errors and noise, uncertainty in information extraction and parsing, ambiguity from data integration, and heuristic data preparation & cleaning techniques. Incomplete and probabilistic databases are principled techniques developed by the database community for modeling uncertainty in data as a set of possible worlds. Two common semantics for querying incomplete (and probabilistic) databases are the certain answers semantics which returns all facts that are true no matter which possible world corresponds to the actual state of the real world and possible answers semantics which returns all facts that could be true, i.e., are true in at least one possible world. Except for specific classes of data models, database instances, and queries, these semantics are intractable. While there is a growing body of work on PTIME under-approximations of certain answers, past work is limited to restricted classes of queries and mostly focuses on set semantics where relations in the database are sets of tuples. In this talk, I present a principled approach for generalizing incomplete (and probabilistic) databases as well as certain and possible answers beyond sets using semiring-annotated relations (K-relations) to support, e.g., incomplete bag semantics, provenance, and temporal data. Importantly, the approach is a strict generalization of past work as it coincides with definitions of incomplete set and bag semantics proposed in related work. I then demonstrate that algebraic properties of semirings naturally provide a means for PTIME under-approximations of certain answers and over-approximations possible answers. Furthermore, extending the model to use intervals to bound possible attribute values, the resulting data model called attribute-annotated uncertain databases (AU-DBs) is closed under full relational algebra with aggregation and sorting-based operations, enabling for the first time an efficient approximation of certain answers for this expressive class of queries.

• Efficient Uncertainty Tracking for Complex Queries with Attribute-
Level Bounds. Su Feng, Aaron Huber, Boris Glavic, Oliver Kennedy.

Proceedings of the 46th International Conference on Management of
Data, pp. 528 – 540 (2021).
https://dl.acm.org/doi/pdf/10.1145/3448016.3452791

• Uncertainty Annotated Databases - a Lightweight Approach for Ap-
proximating Certain Answers. Su Feng, Aaron Huber, Boris Glavic,

Oliver Kennedy. Proceedings of the 44th International Conference on
Management of Data, pp. 1313-1330 (2019).
http://cs.iit.edu/%7edbgroup/assets/pdfpubls/FH19.pdf

Attachment

Video Recording