usenix conference policies
Linking Prospective and Retrospective Provenance in Scripts
Saumen Dey, University of California, Davis; Khalid Belhajjame, Université Paris-Dauphine; David Koop, University of Massachusetts Dartmouth; Meghan Raul, University of California, Davis; Bertram Ludäscher, University of Illinois at Urbana-Champaign
Scripting languages like Python, R, andMATLAB have seen significant use across a variety of scientific domains. To assist scientists in the analysis of script executions, a number of mechanisms, e.g., noWorkflow, have been recently proposed to capture the provenance of script executions. The provenance information recorded can be used, e.g., to trace the lineage of a particular result by identifying the data inputs and the processing steps that were used to produce it. By and large, the provenance information captured for scripts is fine-grained in the sense that it captures data dependencies at the level of script statement, and do so for every variable within the script. While useful, the amount of recorded provenance information can be overwhelming for users and cumbersome to use. This suggests the need for abstraction mechanisms that focus attention on specific parts of provenance relevant for analyses. Toward this goal, we propose that fine-grained provenance information recorded as the result of script execution can be abstracted using user-specified, workflow-like views. Specifically, we show how the provenance traces recorded by noWorkflow can be mapped to the workflow specifications generated by YesWorkflow from scripts based on user annotations. We examine the issues in constructing a successful mapping, provide an initial implementation of our solution, and present competency queries illustrating how a workflow view generated from the script can be used to explore the provenance recorded during script execution.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Saumen Dey and Khalid Belhajjame and David Koop and Meghan Raul and Bertram Lud{\"a}scher},
title = {Linking Prospective and Retrospective Provenance in Scripts},
booktitle = {7th USENIX Workshop on the Theory and Practice of Provenance (TaPP 15)},
year = {2015},
address = {Edinburgh, Scotland},
url = {https://www.usenix.org/conference/tapp15/workshop-program/presentation/dey},
publisher = {USENIX Association},
month = jul
}
connect with us