usenix conference policies
Retrospective Provenance Without a Runtime Provenance Recorder
Timothy McPhillips, University of Illinois at Urbana-Champaign; Shawn Bowers, Gonzaga University; Khalid Belhajjame, Paris Dauphine University; Bertram Ludäscher, University of Illinois at Urbana-Champaign
The YesWork ow (YW) toolkit aims to provide users of scripting languages such as Python, Perl, and R with many of the benefits of scientific workflow automation. YW requires neither the use of a workflow engine nor the overhead of adapting or instrumenting code to run in such a system. Instead, YW enables scientists to annotate their scripts with special comments that reveal the main computational blocks and dataflow dependencies otherwise implicit in scripts. YW tools extract and analyze these comments, represent scripts in terms of entities based on a typical scientific workflow model, and provide graphical workflow views (i.e., prospective provenance) of scripts. In this paper, we present a new extension of YW for inferring retrospective provenance from script executions without relying on a runtime provenance recorder. Instead we exploit the common practice of scientists to embed important pieces of provenance in directory structures and file names. For such “provenance-friendly” data organizations, we offer a new annotation mechanism based on URI templates. YW uses these to link conceptual-level prospective provenance with data files created at runtime, resulting in a powerful, integrated model of prospective and retrospective provenance.We present scientifically meaningful retrospective provenance queries for investigating an execution of a data acquisition workflow implemented as a Python script, and show how these queries can be evaluated using the YW toolkit.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Timothy McPhillips and Shawn Bowers and Khalid Belhajjame and Bertram Lud{\"a}scher},
title = {Retrospective Provenance Without a Runtime Provenance Recorder},
booktitle = {7th USENIX Workshop on the Theory and Practice of Provenance (TaPP 15)},
year = {2015},
address = {Edinburgh, Scotland},
url = {https://www.usenix.org/conference/tapp15/workshop-program/presentation/mcphillips},
publisher = {USENIX Association},
month = jul
}
connect with us