usenix conference policies
How Much Domain Data Should Be in Provenance Databases?
Daniel de Oliveira, Universidade Federal Fluminense; Vítor Silva and Marta Mattoso, Federal University of Rio de Janeiro
Provenance databases are an important asset in data analytics of large-scale scientific data. The data derivation path allows for identifying parameters, files and domain data values of interest. In scientific workflows, provenance data is automatically captured by workflow systems. However, the power of provenance data analyses depends on the expressiveness of domain-specific data along the provenance traces. While much has been done through the W3C PROV initiative and its PROV-DM to represent generic provenance data, representing domain-specific data in provenance traces has received little attention, yet it accounts for a large number of provenance analytical queries. Such queries are based on selections on data values from input/output artifacts along workflow activities. There are several problems in modeling and capturing values from domain-specific attributes, some of them are related to managing provenance granularity, others to addressing data values hidden inside files and representing the semantics of domain data. In this work, we discuss these open issues and propose some alternatives to domain-specific provenance data capture, representation, storage and queries. Addressing these issues may be decisive in using provenance to drive scientific data analyses at large-scale.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Daniel de Oliveira and V{\'\i}tor Silva and Marta Mattoso},
title = {How Much Domain Data Should Be in Provenance Databases?},
booktitle = {7th USENIX Workshop on the Theory and Practice of Provenance (TaPP 15)},
year = {2015},
address = {Edinburgh, Scotland},
url = {https://www.usenix.org/conference/tapp15/workshop-program/presentation/de-oliveira},
publisher = {USENIX Association},
month = jul
}
connect with us