Seokki Lee and Xing Niu, Illinois Institute of Technology; Bertram Ludäscher, University of Illinois at Urbana-Champaign; Boris Glavic, Illinois Institute of Technology
How to use provenance to explain why a query returns a result or why a result is missing has been studied extensively. Recently, we have demonstrated how to uniformly answer these types of provenance questions for first-order queries with negation and have presented an implementation of this approach in our PUG (Provenance Unification through Graphs) system. However, for realistically-sized databases, the provenance of answers and missing answers can be very large, overwhelming the user with too much information and wasting computational resources. In this paper, we introduce an (approximate) summarization technique that generates compact representations of why and why-not provenance. Our technique uses patterns as a summarized representation of sets of elements from the provenance, i.e., successful or failed derivations. We rank these patterns based on their descriptiveness (we use precision and recall as quality measures for patterns) and return only the top-k summaries. We demonstrate how this summarization technique can be integrated with provenance capture to compute summaries on demand and how sampling techniques can be employed to speed up both the summarization and capture steps. Our preliminary experiments demonstrate that this summarization technique scales to large instances of a real-world dataset.
author = {Seokki Lee and Xing Niu and Bertram Lud{\"a}scher and Boris Glavic},
title = {Integrating Approximate Summarization with Provenance Capture},
booktitle = {9th USENIX Workshop on the Theory and Practice of Provenance (TaPP 2017)},
year = {2017},
address = {Seattle, WA},
url = {https://www.usenix.org/conference/tapp17/workshop-program/presentation/lee},
publisher = {USENIX Association},
month = jun
}