Wrangling Massive Tasks Graphs with Dynamic Hierarchical Composition
On Thursday, Octobor 30, research engineer Ben Tovar presented our recent work on accelerating the execution of High Energy Physics (HEP) workflows at the PyHEP 2025 Workshop , a hybrid workshop held at CERN . The presentation centered on an execution schema called Dynamic Data Reduction (DDR) that runs on top of TaskVine .
With a DDR , we take advantage of the structure inherent in many HEP applications where when processing multiple collision events, the accumulation (reduction) step is typically both associative and commutative. This means that it is unnecessary to pre-determine which processed events are reduced together and can leverage factors such availability of data location. Further, the number of events processed together can respond dynamically to the resources available, and datasets can be processed independently.
In the DDR application stack, TaskVine acts as the execution platform that distributes the computation to the cluster.
As an example, we ran Cortado, a HEP application that processes 419 datasets, 19,631 files, and 14TB of data (totaling 12,000 million events) in about 5.5 hours using over 1600 cores at any one time. During the run some of these cores had to be replaced because of resources eviction.
For more information, please visit the DDR pipy page at https://pypi.org/project/dynamic-data-reduction/Enjoy Reading This Article?
Here are some more articles you might like to read next: