Measuring the Importance of Database Elements

Abstract

I will give an overview of several research efforts, past and ongoing, that involve importance measurement in database activities. These include the responsibility of tuples to the result of a query and the inconsistency of dirty databases, the importance of integrity constraints to the operations of cleaning, and the importance of vertices and edges to the result of a graph query. The common intuition in all cases is that we measure the impact of changing or eliminating the entity in question. Hence, a common necessity is a measure of impact on the database state. Another commonality is that we should account for inherent dependencies among the impacts of entities; hence, the importance of an entity cannot ignore its interaction with the rest. Therefore, the settings of these measurements are commonly viewed as instances of a cooperative game, where the conventional rule for score attribution is the Shapley value. A similar approach is taken in Machine Learning where the SHAP score quantifies the importance of features to the outcome of a learned model (e.g., a classifier). I will conclude the talk by describing our recent investigation of the application of the SHAP score to measure the importance of parameter values in database queries. In all mentioned applications, I will discuss the complexity of computing the importance value, where the algebraic properties of the impact measure might play a crucial role.

Attachment

Slides

Measuring the Importance of Database Elements

Abstract

Attachment

Video Recording