Automating Complex Workflows
This tutorial demonstrates how to use HydraFlow's workflow automation capabilities to define, manage, and execute complex experiment workflows.
Prerequisites
Before you begin this tutorial, you should:
- Understand basic HydraFlow applications (from the Creating Your First Application tutorial)
- Have a basic understanding of YAML configuration files
Project Structure
First, let's examine our project structure:
.
├── example.py
├── hydraflow.yaml
└── submit.py
In this tutorial, we'll use:
example.py
: Our basic HydraFlow applicationhydraflow.yaml
: A configuration file to define our experiment workflowssubmit.py
: A helper script for job submission
Understanding Job Definitions
The hydraflow.yaml
file allows you to define reusable experiment workflows:
hydraflow.yaml | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
This configuration file defines three different types of jobs:
job_sequential
: A job that runs sequentiallyjob_parallel
: A job that runs with parallelizationjob_submit
: A job that uses a submit command for custom execution
Each job demonstrates different execution patterns and parameter combinations.
Using the HydraFlow CLI
HydraFlow provides a command-line interface (CLI) for executing and
managing jobs defined in your hydraflow.yaml
file.
The primary command is hydraflow run
, which allows you to execute
any job defined in your configuration.
Basic usage:
hydraflow run <job_name> [overrides]
Where:
<job_name>
is the name of a job defined inhydraflow.yaml
[overrides]
are optional Hydra-style parameter overrides
For more details on the CLI, see the Job Configuration documentation.
Previewing Execution with Dry Run
Before executing our workflows, we can preview what will happen using the --dry-run
flag:
$ hydraflow run job_sequential --dry-run
python example.py --multirun width=100 height=100,200,300 hydra.job.name=job_sequential hydra.sweep.dir=multirun/01JS5YNKR2BV141BNRTP7N8B22
python example.py --multirun width=300 height=100,200,300 hydra.job.name=job_sequential hydra.sweep.dir=multirun/01JS5YNKR2SVG8GVX7YCFYS55H
From the dry run output, we can observe:
- 2 jobs will be executed (from the
each
parameter combinations) - Each job contains 3 sweeps (from the
all
range values) - Each job includes additional options:
hydra.job.name
: The name of the job defined in hydraflow.yamlhydra.sweep.dir
: A unique but time-ordered directory for each job created by HydraFlow
Standard Hydra creates directories based on the current date and time, which may cause duplication during parallel execution. HydraFlow solves this problem by creating unique, time-ordered directories for each job.
Running Sequential Jobs
Let's examine the sequential job configuration:
job_sequential:
run: python example.py
sets:
- each: width=100,300
all: height=100:300:100
This job uses the each
and all
parameters to run
multiple configuration combinations in sequence:
$ hydraflow run job_sequential
2025/04/19 02:40:03 INFO mlflow.tracking.fluent: Experiment with name 'job_sequential' does not exist. Creating a new experiment.
[2025-04-19 02:40:06,235][HYDRA] Launching 3 jobs locally
[2025-04-19 02:40:06,235][HYDRA] #0 : width=100 height=100
[2025-04-19 02:40:06,467][__main__][INFO] - 6ffa1ce10f1145ec9b977e903e3fa767
[2025-04-19 02:40:06,467][__main__][INFO] - {'width': 100, 'height': 100}
[2025-04-19 02:40:06,469][HYDRA] #1 : width=100 height=200
[2025-04-19 02:40:06,549][__main__][INFO] - 509c934c0065451d81b1d8ef651b31a5
[2025-04-19 02:40:06,549][__main__][INFO] - {'width': 100, 'height': 200}
[2025-04-19 02:40:06,551][HYDRA] #2 : width=100 height=300
[2025-04-19 02:40:06,630][__main__][INFO] - fc64619444224f0c80d7752a0a492402
[2025-04-19 02:40:06,630][__main__][INFO] - {'width': 100, 'height': 300}
[2025-04-19 02:40:09,195][HYDRA] Launching 3 jobs locally
[2025-04-19 02:40:09,195][HYDRA] #0 : width=300 height=100
[2025-04-19 02:40:09,339][__main__][INFO] - 25d77ad428f9484089260f0b181355a4
[2025-04-19 02:40:09,339][__main__][INFO] - {'width': 300, 'height': 100}
[2025-04-19 02:40:09,440][HYDRA] #1 : width=300 height=200
[2025-04-19 02:40:09,531][__main__][INFO] - 0b80ece99f8444b7b1d766150ad63979
[2025-04-19 02:40:09,532][__main__][INFO] - {'width': 300, 'height': 200}
[2025-04-19 02:40:09,586][HYDRA] #2 : width=300 height=300
[2025-04-19 02:40:09,670][__main__][INFO] - 8c31a5da88d84aa984e978d9b9bbb831
[2025-04-19 02:40:09,670][__main__][INFO] - {'width': 300, 'height': 300}
0:00:05 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00:00 2/2 100%
Results of execution:
- An experiment named
job_sequential
is created - 2×3=6 jobs are executed sequentially
- A progress bar is displayed to track completion
Running Parallel Jobs
Now let's look at our parallel job configuration:
job_parallel:
run: python example.py
add: >-
hydra/launcher=joblib
hydra.launcher.n_jobs=3
sets:
- each: width=200,400
all: height=100:300:100
This job leverages Hydra's parallel execution features using a joblib launcher via add
parameter:
$ hydraflow run job_parallel --dry-run
python example.py --multirun width=200 height=100,200,300 hydra.job.name=job_parallel hydra.sweep.dir=multirun/01JS5YNWBSSP2KDRAYY6PCR8BR hydra/launcher=joblib hydra.launcher.n_jobs=3
python example.py --multirun width=400 height=100,200,300 hydra.job.name=job_parallel hydra.sweep.dir=multirun/01JS5YNWBSY7FNMQAKCHP9GEFF hydra/launcher=joblib hydra.launcher.n_jobs=3
$ hydraflow run job_parallel
2025/04/19 02:40:12 INFO mlflow.tracking.fluent: Experiment with name 'job_parallel' does not exist. Creating a new experiment.
[2025-04-19 02:40:15,081][HYDRA]
Joblib.Parallel(n_jobs=3,backend=loky,prefer=processes,require=None,verbose=0,ti
meout=None,pre_dispatch=2*n_jobs,batch_size=auto,temp_folder=None,max_nbytes=Non
e,mmap_mode=r) is launching 3 jobs
[2025-04-19 02:40:15,081][HYDRA] Launching jobs, sweep output dir :
multirun/01JS5YNYEPD66A2BZPPMF1CZ75
[2025-04-19 02:40:15,081][HYDRA] #0 : width=200 height=100
[2025-04-19 02:40:15,081][HYDRA] #1 : width=200 height=200
[2025-04-19 02:40:15,081][HYDRA] #2 : width=200 height=300
[2025-04-19 02:40:17,608][__main__][INFO] - 680cafaeffd34f11afdade450c6d3a95
[2025-04-19 02:40:17,608][__main__][INFO] - {'width': 200, 'height': 300}
[2025-04-19 02:40:17,793][__main__][INFO] - 28b616b2f7284d76acc268572251fd06
[2025-04-19 02:40:17,793][__main__][INFO] - {'width': 200, 'height': 200}
[2025-04-19 02:40:18,110][__main__][INFO] - 06f54a25c5884544acd64fb85acb81b4
[2025-04-19 02:40:18,110][__main__][INFO] - {'width': 200, 'height': 100}
Exception ignored in: <function ResourceTracker.__del__ at 0x7f15c750c540>
Traceback (most recent call last):
File
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 82, in __del__
File
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 91, in _stop
File
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 116, in _stop_locked
ChildProcessError: [Errno 10] No child processes
Exception ignored in: <function ResourceTracker.__del__ at 0x7fbd30810540>
Traceback (most recent call last):
File
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 82, in __del__
File
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 91, in _stop
File
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 116, in _stop_locked
ChildProcessError: [Errno 10] No child processes
Exception ignored in: <function ResourceTracker.__del__ at 0x7ff181f00540>
Traceback (most recent call last):
File
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 82, in __del__
File
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 91, in _stop
File
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 116, in _stop_locked
ChildProcessError: [Errno 10] No child processes
[2025-04-19 02:40:21,027][HYDRA]
Joblib.Parallel(n_jobs=3,backend=loky,prefer=processes,require=None,verbose=0,ti
meout=None,pre_dispatch=2*n_jobs,batch_size=auto,temp_folder=None,max_nbytes=Non
e,mmap_mode=r) is launching 3 jobs
[2025-04-19 02:40:21,027][HYDRA] Launching jobs, sweep output dir :
multirun/01JS5YNYEP8RR3S67ADJCP8Q6N
[2025-04-19 02:40:21,027][HYDRA] #0 : width=400 height=100
[2025-04-19 02:40:21,027][HYDRA] #1 : width=400 height=200
[2025-04-19 02:40:21,027][HYDRA] #2 : width=400 height=300
[2025-04-19 02:40:23,313][__main__][INFO] - 7203c65cd495445ba95cfc1a95e64b40
[2025-04-19 02:40:23,313][__main__][INFO] - {'width': 400, 'height': 300}
[2025-04-19 02:40:23,867][__main__][INFO] - 5dd6a5bf673046abb4f6be12e6b06eac
[2025-04-19 02:40:23,867][__main__][INFO] - {'width': 400, 'height': 200}
[2025-04-19 02:40:23,957][__main__][INFO] - 2f71b1da68494fb89e1e9afce5b4ebae
[2025-04-19 02:40:23,957][__main__][INFO] - {'width': 400, 'height': 100}
Exception ignored in: <function ResourceTracker.__del__ at 0x7fb219808540>
Traceback (most recent call last):
File
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 82, in __del__
File
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 91, in _stop
File
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 116, in _stop_locked
ChildProcessError: [Errno 10] No child processes
Exception ignored in: <function ResourceTracker.__del__ at 0x7f6b2c200540>
Traceback (most recent call last):
File
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 82, in __del__
File
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 91, in _stop
File
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 116, in _stop_locked
ChildProcessError: [Errno 10] No child processes
Exception ignored in: <function ResourceTracker.__del__ at 0x7efff670c540>
Traceback (most recent call last):
File
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 82, in __del__
File
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 91, in _stop
File
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 116, in _stop_locked
ChildProcessError: [Errno 10] No child processes
0:00:11 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00:00 2/2 100%
Results of execution:
- An experiment named
job_parallel
is created - The same Python script is used but with a different experiment name
- 2 Python commands are executed sequentially
- Each Python command runs 3 jobs in parallel (using the
hydra/launcher=joblib
configuration)
This demonstrates how HydraFlow makes Hydra's powerful parallel execution features easily accessible.
Using the Submit Command
For more complex execution patterns, HydraFlow provides the submit
command. Here's our submit job configuration:
job_submit:
submit: python submit.py example.py
sets:
- each: width=250:350:100
all: height=150,250
The submit
command requires two key components:
- Your HydraFlow application (
example.py
in this case) - A command or script that will receive and process a parameter file
Here's our implementation of the submit handler:
submit.py | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
How the submit
command works:
- HydraFlow generates all parameter combinations based on your job configuration
- It writes these combinations to a temporary text file (one combination per line)
- It runs the command specified in the
submit
field of yourhydraflow.yaml
- It appends the temporary file path as the last argument to your command
For example, with submit: python submit.py example.py
in your configuration,
the actual executed command will be something like:
python submit.py example.py /tmp/hydraflow_parameters_12345.txt
Let's see it in action with a dry run:
$ hydraflow run job_submit --dry-run
python submit.py example.py /home/runner/work/hydraflow/hydraflow/examples/tmpv63ou1sw
--multirun width=250 height=150,250 hydra.job.name=job_submit hydra.sweep.dir=multirun/01JS5YPARZBHJM152AXAED92QS
--multirun width=350 height=150,250 hydra.job.name=job_submit hydra.sweep.dir=multirun/01JS5YPARZQPV31V1YJJRCBYP9
And now let's run it:
$ hydraflow run job_submit
2025/04/19 02:40:27 INFO mlflow.tracking.fluent: Experiment with name 'job_submit' does not exist. Creating a new experiment.
[2025-04-19 02:40:29,707][HYDRA] Launching 2 jobs locally
[2025-04-19 02:40:29,707][HYDRA] #0 : width=250 height=150
[2025-04-19 02:40:29,828][__main__][INFO] - 3dc6d001a7994bd79a1c61263d3e1f80
[2025-04-19 02:40:29,828][__main__][INFO] - {'width': 250, 'height': 150}
[2025-04-19 02:40:29,831][HYDRA] #1 : width=250 height=250
[2025-04-19 02:40:29,913][__main__][INFO] - f3176ff528524307bc197544eb4eacc1
[2025-04-19 02:40:29,913][__main__][INFO] - {'width': 250, 'height': 250}
[2025-04-19 02:40:32,307][HYDRA] Launching 2 jobs locally
[2025-04-19 02:40:32,307][HYDRA] #0 : width=350 height=150
[2025-04-19 02:40:32,428][__main__][INFO] - eef340a91d1a410988d7a749ccd66369
[2025-04-19 02:40:32,429][__main__][INFO] - {'width': 350, 'height': 150}
[2025-04-19 02:40:32,431][HYDRA] #1 : width=350 height=250
[2025-04-19 02:40:32,519][__main__][INFO] - 37da9ed5643d4764b36c1c1c433bd356
[2025-04-19 02:40:32,519][__main__][INFO] - {'width': 350, 'height': 250}
['/home/runner/work/hydraflow/hydraflow/.venv/bin/python', 'example.py', '--multirun', 'width=250', 'height=150,250', 'hydra.job.name=job_submit', 'hydra.sweep.dir=multirun/01JS5YPCVHEV05606GRFRFKGV2']
['/home/runner/work/hydraflow/hydraflow/.venv/bin/python', 'example.py', '--multirun', 'width=350', 'height=150,250', 'hydra.job.name=job_submit', 'hydra.sweep.dir=multirun/01JS5YPCVHD4NKH3S9MZTEVRBW']
Our submit.py
script implements a simple processor that:
- Accepts two arguments: the application file (
example.py
) and the parameter file - Reads each line from the parameter file
- Runs the application with each set of parameters sequentially
In real-world scenarios, you could customize this handler to:
- Submit jobs to compute clusters (SLURM, PBS, etc.)
- Implement custom scheduling logic
- Distribute workloads based on resource requirements
Reviewing Results
With HydraFlow, all important data is stored in MLflow, so we can safely delete the Hydra output directories:
$ rm -rf multirun
Let's check the directory structure:
.
├── mlruns
│ ├── 0
│ │ └── meta.yaml
│ ├── 703462943040863130
│ │ ├── 37da9ed5643d4764b36c1c1c433bd356
│ │ ├── 3dc6d001a7994bd79a1c61263d3e1f80
│ │ ├── eef340a91d1a410988d7a749ccd66369
│ │ ├── f3176ff528524307bc197544eb4eacc1
│ │ └── meta.yaml
│ ├── 764359945704951764
│ │ ├── 06f54a25c5884544acd64fb85acb81b4
│ │ ├── 28b616b2f7284d76acc268572251fd06
│ │ ├── 2f71b1da68494fb89e1e9afce5b4ebae
│ │ ├── 5dd6a5bf673046abb4f6be12e6b06eac
│ │ ├── 680cafaeffd34f11afdade450c6d3a95
│ │ ├── 7203c65cd495445ba95cfc1a95e64b40
│ │ └── meta.yaml
│ └── 999934558401375698
│ ├── 0b80ece99f8444b7b1d766150ad63979
│ ├── 25d77ad428f9484089260f0b181355a4
│ ├── 509c934c0065451d81b1d8ef651b31a5
│ ├── 6ffa1ce10f1145ec9b977e903e3fa767
│ ├── 8c31a5da88d84aa984e978d9b9bbb831
│ ├── fc64619444224f0c80d7752a0a492402
│ └── meta.yaml
├── example.py
├── hydraflow.yaml
└── submit.py
After cleanup, we can observe:
- There are three experiments (one for each job type)
- Each experiment contains multiple runs
- A total of 16 runs were executed across all jobs
Summary
In this tutorial, you've learned how to:
- Define different types of experiment workflows in a
hydraflow.yaml
file - Execute sequential and parallel job runs
- Use the
submit
command for custom execution patterns - Preview jobs with dry runs
- Manage and organize experiment outputs
These workflow automation capabilities allow you to efficiently manage complex experiment configurations, making your machine learning research more organized and reproducible.
Next Steps
Now that you've learned about workflow automation, try:
- Defining your own custom workflows
- Exploring more complex parameter sweep combinations
- Learning how to Analyze Results from your experiments
For more detailed information, refer to: