Dataflow Notebooks: Encoding and Tracking Dependencies of Cells

Authors: 

David Koop and Jay Patel, University of Massachusetts Dartmouth

Abstract: 

Computational notebooks have seen widespread adoption among scientists in many fields, and allow users to view interactive graphical results inline, to embed text and code together, to organize code into cells, and to selectively edit and re-execute cells. Because they allow quick and recordable analyses, they play an important role in documenting experiments. However, the reproducibility of notebooks can vary significantly due to the ordering of cells or changes in global state that affect the re-execution of those cells. In addition, in many popular notebook environments, cells are tagged with transient identifiers that change when a cell is re-executed so it is impossible to robustly reference other cells. We introduce dataflow notebooks as a method to allow users to explicitly encode dependencies between cells by adding a unique, persistent identifier to each cell and expanding incode references to results in other cells. In these notebooks, we can both pose and answer provenance queries about dependencies etween cells. This permits new notebook operations like downstream updates which, given a change to one cell, allow users to update all cells that may be impacted by the change while leaving all other cells alone. At the same time, dataflow notebooks increase reproducibility and enable greater reuse by making dependencies clear.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {204237,
author = {David Koop and Jay Patel},
title = {Dataflow Notebooks: Encoding and Tracking Dependencies of Cells},
booktitle = {9th USENIX Workshop on the Theory and Practice of Provenance (TaPP 2017)},
year = {2017},
address = {Seattle, WA},
url = {https://www.usenix.org/conference/tapp17/workshop-program/presentation/koop},
publisher = {USENIX Association},
month = jun
}