Containers, Workflows, and Reproducibility

The DASPOS project hosted a workshop on Container Strategies for Data and Software Preservation that Promote Open Science at Notre Dame on May 19-20, 2016.  We had a very interesting collection of researchers and practitioners, all working on problems related to reproducibility, but presenting different approaches and technologies. Prof. Thain presented recent work by CCL grad students Haiyan Meng and Peter Ivie on Combining Containers and Workflow Systems for Reproducible Execution .

The Umbrella tool created by Haiyan Meng allows for a simple, compact, archival representation of a computation, taking into account hardware, operating system, software, and data dependencies.  This allows one to accurately perform computational experiments and give each one a DOI that can be shared, downloaded, and executed.

The PRUNE tool created by Peter Ivie allows one to construct dynamic workflows of connected tasks, each one precisely specified by execution environment.  Provenance and data are tracked precisely, so that the specification of a workflow (and its results) can be exported, imported, and shared with other people.  Think of it like git for workflows.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Scaling SADE (Safety Aware Drone Ecosystem): A Hybrid UAV Simulation System for High-Fidelity Research
  • Wrangling Massive Tasks Graphs with Dynamic Hierarchical Composition
  • TaskVine Insights - Storage Management: Disk Load Shifting
  • Simulating Digital Agriculture in Near Real-Time with xGFabric
  • Undergraduate Researcher Showcases PLEDGE Project at APANAC 2025 in Panama