Skip to content

Automating Complex Workflows

This tutorial demonstrates how to use HydraFlow's workflow automation capabilities to define, manage, and execute complex experiment workflows.

Prerequisites

Before you begin this tutorial, you should:

  1. Understand basic HydraFlow applications (from the Creating Your First Application tutorial)
  2. Have a basic understanding of YAML configuration files

Project Structure

First, let's examine our project structure:

.
├── example.py
├── hydraflow.yaml
└── submit.py

In this tutorial, we'll use:

  • example.py: Our basic HydraFlow application
  • hydraflow.yaml: A configuration file to define our experiment workflows
  • submit.py: A helper script for job submission

Understanding Job Definitions

The hydraflow.yaml file allows you to define reusable experiment workflows:

hydraflow.yaml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
jobs:
  job_sequential:
    run: python example.py
    sets:
      - each: width=100,300
        all: height=100:300:100
  job_parallel:
    run: python example.py
    add: >-
      hydra/launcher=joblib
      hydra.launcher.n_jobs=3
    sets:
      - each: width=200,400
        all: height=100:300:100
  job_submit:
    submit: python submit.py example.py
    sets:
      - each: width=250:350:100
        all: height=150,250

This configuration file defines three different types of jobs:

  1. job_sequential: A job that runs sequentially
  2. job_parallel: A job that runs with parallelization
  3. job_submit: A job that uses a submit command for custom execution

Each job demonstrates different execution patterns and parameter combinations.

Using the HydraFlow CLI

HydraFlow provides a command-line interface (CLI) for executing and managing jobs defined in your hydraflow.yaml file. The primary command is hydraflow run, which allows you to execute any job defined in your configuration.

Basic usage:

hydraflow run <job_name> [overrides]

Where:

  • <job_name> is the name of a job defined in hydraflow.yaml
  • [overrides] are optional Hydra-style parameter overrides

For more details on the CLI, see the Job Configuration documentation.

Previewing Execution with Dry Run

Before executing our workflows, we can preview what will happen using the --dry-run flag:

$ hydraflow run job_sequential --dry-run
python example.py --multirun width=100 height=100,200,300 hydra.job.name=job_sequential hydra.sweep.dir=multirun/01JS5YNKR2BV141BNRTP7N8B22
python example.py --multirun width=300 height=100,200,300 hydra.job.name=job_sequential hydra.sweep.dir=multirun/01JS5YNKR2SVG8GVX7YCFYS55H

From the dry run output, we can observe:

  • 2 jobs will be executed (from the each parameter combinations)
  • Each job contains 3 sweeps (from the all range values)
  • Each job includes additional options:
    • hydra.job.name: The name of the job defined in hydraflow.yaml
    • hydra.sweep.dir: A unique but time-ordered directory for each job created by HydraFlow

Standard Hydra creates directories based on the current date and time, which may cause duplication during parallel execution. HydraFlow solves this problem by creating unique, time-ordered directories for each job.

Running Sequential Jobs

Let's examine the sequential job configuration:

job_sequential:
  run: python example.py
  sets:
    - each: width=100,300
      all: height=100:300:100

This job uses the each and all parameters to run multiple configuration combinations in sequence:

$ hydraflow run job_sequential
2025/04/19 02:40:03 INFO mlflow.tracking.fluent: Experiment with name 'job_sequential' does not exist. Creating a new experiment.
[2025-04-19 02:40:06,235][HYDRA] Launching 3 jobs locally                       
[2025-04-19 02:40:06,235][HYDRA]        #0 : width=100 height=100               
[2025-04-19 02:40:06,467][__main__][INFO] - 6ffa1ce10f1145ec9b977e903e3fa767    
[2025-04-19 02:40:06,467][__main__][INFO] - {'width': 100, 'height': 100}       
[2025-04-19 02:40:06,469][HYDRA]        #1 : width=100 height=200               
[2025-04-19 02:40:06,549][__main__][INFO] - 509c934c0065451d81b1d8ef651b31a5    
[2025-04-19 02:40:06,549][__main__][INFO] - {'width': 100, 'height': 200}       
[2025-04-19 02:40:06,551][HYDRA]        #2 : width=100 height=300               
[2025-04-19 02:40:06,630][__main__][INFO] - fc64619444224f0c80d7752a0a492402    
[2025-04-19 02:40:06,630][__main__][INFO] - {'width': 100, 'height': 300}       
[2025-04-19 02:40:09,195][HYDRA] Launching 3 jobs locally                       
[2025-04-19 02:40:09,195][HYDRA]        #0 : width=300 height=100               
[2025-04-19 02:40:09,339][__main__][INFO] - 25d77ad428f9484089260f0b181355a4    
[2025-04-19 02:40:09,339][__main__][INFO] - {'width': 300, 'height': 100}       
[2025-04-19 02:40:09,440][HYDRA]        #1 : width=300 height=200               
[2025-04-19 02:40:09,531][__main__][INFO] - 0b80ece99f8444b7b1d766150ad63979    
[2025-04-19 02:40:09,532][__main__][INFO] - {'width': 300, 'height': 200}       
[2025-04-19 02:40:09,586][HYDRA]        #2 : width=300 height=300               
[2025-04-19 02:40:09,670][__main__][INFO] - 8c31a5da88d84aa984e978d9b9bbb831    
[2025-04-19 02:40:09,670][__main__][INFO] - {'width': 300, 'height': 300}       
  0:00:05 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00:00 2/2 100%

Results of execution:

  • An experiment named job_sequential is created
  • 2×3=6 jobs are executed sequentially
  • A progress bar is displayed to track completion

Running Parallel Jobs

Now let's look at our parallel job configuration:

job_parallel:
  run: python example.py
  add: >-
    hydra/launcher=joblib
    hydra.launcher.n_jobs=3
  sets:
    - each: width=200,400
      all: height=100:300:100

This job leverages Hydra's parallel execution features using a joblib launcher via add parameter:

$ hydraflow run job_parallel --dry-run
python example.py --multirun width=200 height=100,200,300 hydra.job.name=job_parallel hydra.sweep.dir=multirun/01JS5YNWBSSP2KDRAYY6PCR8BR hydra/launcher=joblib hydra.launcher.n_jobs=3
python example.py --multirun width=400 height=100,200,300 hydra.job.name=job_parallel hydra.sweep.dir=multirun/01JS5YNWBSY7FNMQAKCHP9GEFF hydra/launcher=joblib hydra.launcher.n_jobs=3
$ hydraflow run job_parallel
2025/04/19 02:40:12 INFO mlflow.tracking.fluent: Experiment with name 'job_parallel' does not exist. Creating a new experiment.
[2025-04-19 02:40:15,081][HYDRA]                                                
Joblib.Parallel(n_jobs=3,backend=loky,prefer=processes,require=None,verbose=0,ti
meout=None,pre_dispatch=2*n_jobs,batch_size=auto,temp_folder=None,max_nbytes=Non
e,mmap_mode=r) is launching 3 jobs                                              
[2025-04-19 02:40:15,081][HYDRA] Launching jobs, sweep output dir :             
multirun/01JS5YNYEPD66A2BZPPMF1CZ75                                             
[2025-04-19 02:40:15,081][HYDRA]        #0 : width=200 height=100               
[2025-04-19 02:40:15,081][HYDRA]        #1 : width=200 height=200               
[2025-04-19 02:40:15,081][HYDRA]        #2 : width=200 height=300               
[2025-04-19 02:40:17,608][__main__][INFO] - 680cafaeffd34f11afdade450c6d3a95    
[2025-04-19 02:40:17,608][__main__][INFO] - {'width': 200, 'height': 300}       
[2025-04-19 02:40:17,793][__main__][INFO] - 28b616b2f7284d76acc268572251fd06    
[2025-04-19 02:40:17,793][__main__][INFO] - {'width': 200, 'height': 200}       
[2025-04-19 02:40:18,110][__main__][INFO] - 06f54a25c5884544acd64fb85acb81b4    
[2025-04-19 02:40:18,110][__main__][INFO] - {'width': 200, 'height': 100}       
Exception ignored in: <function ResourceTracker.__del__ at 0x7f15c750c540>      
Traceback (most recent call last):                                              
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 82, in __del__                                                
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 91, in _stop                                                  
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 116, in _stop_locked                                          
ChildProcessError: [Errno 10] No child processes                                
Exception ignored in: <function ResourceTracker.__del__ at 0x7fbd30810540>      
Traceback (most recent call last):                                              
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 82, in __del__                                                
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 91, in _stop                                                  
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 116, in _stop_locked                                          
ChildProcessError: [Errno 10] No child processes                                
Exception ignored in: <function ResourceTracker.__del__ at 0x7ff181f00540>      
Traceback (most recent call last):                                              
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 82, in __del__                                                
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 91, in _stop                                                  
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 116, in _stop_locked                                          
ChildProcessError: [Errno 10] No child processes                                
[2025-04-19 02:40:21,027][HYDRA]                                                
Joblib.Parallel(n_jobs=3,backend=loky,prefer=processes,require=None,verbose=0,ti
meout=None,pre_dispatch=2*n_jobs,batch_size=auto,temp_folder=None,max_nbytes=Non
e,mmap_mode=r) is launching 3 jobs                                              
[2025-04-19 02:40:21,027][HYDRA] Launching jobs, sweep output dir :             
multirun/01JS5YNYEP8RR3S67ADJCP8Q6N                                             
[2025-04-19 02:40:21,027][HYDRA]        #0 : width=400 height=100               
[2025-04-19 02:40:21,027][HYDRA]        #1 : width=400 height=200               
[2025-04-19 02:40:21,027][HYDRA]        #2 : width=400 height=300               
[2025-04-19 02:40:23,313][__main__][INFO] - 7203c65cd495445ba95cfc1a95e64b40    
[2025-04-19 02:40:23,313][__main__][INFO] - {'width': 400, 'height': 300}       
[2025-04-19 02:40:23,867][__main__][INFO] - 5dd6a5bf673046abb4f6be12e6b06eac    
[2025-04-19 02:40:23,867][__main__][INFO] - {'width': 400, 'height': 200}       
[2025-04-19 02:40:23,957][__main__][INFO] - 2f71b1da68494fb89e1e9afce5b4ebae    
[2025-04-19 02:40:23,957][__main__][INFO] - {'width': 400, 'height': 100}       
Exception ignored in: <function ResourceTracker.__del__ at 0x7fb219808540>      
Traceback (most recent call last):                                              
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 82, in __del__                                                
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 91, in _stop                                                  
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 116, in _stop_locked                                          
ChildProcessError: [Errno 10] No child processes                                
Exception ignored in: <function ResourceTracker.__del__ at 0x7f6b2c200540>      
Traceback (most recent call last):                                              
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 82, in __del__                                                
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 91, in _stop                                                  
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 116, in _stop_locked                                          
ChildProcessError: [Errno 10] No child processes                                
Exception ignored in: <function ResourceTracker.__del__ at 0x7efff670c540>      
Traceback (most recent call last):                                              
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 82, in __del__                                                
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 91, in _stop                                                  
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 116, in _stop_locked                                          
ChildProcessError: [Errno 10] No child processes                                
  0:00:11 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00:00 2/2 100%

Results of execution:

  • An experiment named job_parallel is created
  • The same Python script is used but with a different experiment name
  • 2 Python commands are executed sequentially
  • Each Python command runs 3 jobs in parallel (using the hydra/launcher=joblib configuration)

This demonstrates how HydraFlow makes Hydra's powerful parallel execution features easily accessible.

Using the Submit Command

For more complex execution patterns, HydraFlow provides the submit command. Here's our submit job configuration:

job_submit:
  submit: python submit.py example.py
  sets:
    - each: width=250:350:100
      all: height=150,250

The submit command requires two key components:

  1. Your HydraFlow application (example.py in this case)
  2. A command or script that will receive and process a parameter file

Here's our implementation of the submit handler:

submit.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
import shlex
import subprocess
import sys
from pathlib import Path


def main() -> None:
    app_file, opt_file = sys.argv[1:]
    text = Path(opt_file).read_text()

    for line in text.splitlines():
        opts = shlex.split(line)
        args = [sys.executable, app_file, *opts]
        print(args)
        subprocess.run(args, check=True)


if __name__ == "__main__":
    main()

How the submit command works:

  1. HydraFlow generates all parameter combinations based on your job configuration
  2. It writes these combinations to a temporary text file (one combination per line)
  3. It runs the command specified in the submit field of your hydraflow.yaml
  4. It appends the temporary file path as the last argument to your command

For example, with submit: python submit.py example.py in your configuration, the actual executed command will be something like:

python submit.py example.py /tmp/hydraflow_parameters_12345.txt

Let's see it in action with a dry run:

$ hydraflow run job_submit --dry-run
python submit.py example.py /home/runner/work/hydraflow/hydraflow/examples/tmpv63ou1sw
--multirun width=250 height=150,250 hydra.job.name=job_submit hydra.sweep.dir=multirun/01JS5YPARZBHJM152AXAED92QS
--multirun width=350 height=150,250 hydra.job.name=job_submit hydra.sweep.dir=multirun/01JS5YPARZQPV31V1YJJRCBYP9

And now let's run it:

$ hydraflow run job_submit
2025/04/19 02:40:27 INFO mlflow.tracking.fluent: Experiment with name 'job_submit' does not exist. Creating a new experiment.
[2025-04-19 02:40:29,707][HYDRA] Launching 2 jobs locally
[2025-04-19 02:40:29,707][HYDRA]    #0 : width=250 height=150
[2025-04-19 02:40:29,828][__main__][INFO] - 3dc6d001a7994bd79a1c61263d3e1f80
[2025-04-19 02:40:29,828][__main__][INFO] - {'width': 250, 'height': 150}
[2025-04-19 02:40:29,831][HYDRA]    #1 : width=250 height=250
[2025-04-19 02:40:29,913][__main__][INFO] - f3176ff528524307bc197544eb4eacc1
[2025-04-19 02:40:29,913][__main__][INFO] - {'width': 250, 'height': 250}
[2025-04-19 02:40:32,307][HYDRA] Launching 2 jobs locally
[2025-04-19 02:40:32,307][HYDRA]    #0 : width=350 height=150
[2025-04-19 02:40:32,428][__main__][INFO] - eef340a91d1a410988d7a749ccd66369
[2025-04-19 02:40:32,429][__main__][INFO] - {'width': 350, 'height': 150}
[2025-04-19 02:40:32,431][HYDRA]    #1 : width=350 height=250
[2025-04-19 02:40:32,519][__main__][INFO] - 37da9ed5643d4764b36c1c1c433bd356
[2025-04-19 02:40:32,519][__main__][INFO] - {'width': 350, 'height': 250}
['/home/runner/work/hydraflow/hydraflow/.venv/bin/python', 'example.py', '--multirun', 'width=250', 'height=150,250', 'hydra.job.name=job_submit', 'hydra.sweep.dir=multirun/01JS5YPCVHEV05606GRFRFKGV2']
['/home/runner/work/hydraflow/hydraflow/.venv/bin/python', 'example.py', '--multirun', 'width=350', 'height=150,250', 'hydra.job.name=job_submit', 'hydra.sweep.dir=multirun/01JS5YPCVHD4NKH3S9MZTEVRBW']

Our submit.py script implements a simple processor that:

  1. Accepts two arguments: the application file (example.py) and the parameter file
  2. Reads each line from the parameter file
  3. Runs the application with each set of parameters sequentially

In real-world scenarios, you could customize this handler to:

  • Submit jobs to compute clusters (SLURM, PBS, etc.)
  • Implement custom scheduling logic
  • Distribute workloads based on resource requirements

Reviewing Results

With HydraFlow, all important data is stored in MLflow, so we can safely delete the Hydra output directories:

$ rm -rf multirun

Let's check the directory structure:

.
├── mlruns
│   ├── 0
│   │   └── meta.yaml
│   ├── 703462943040863130
│   │   ├── 37da9ed5643d4764b36c1c1c433bd356
│   │   ├── 3dc6d001a7994bd79a1c61263d3e1f80
│   │   ├── eef340a91d1a410988d7a749ccd66369
│   │   ├── f3176ff528524307bc197544eb4eacc1
│   │   └── meta.yaml
│   ├── 764359945704951764
│   │   ├── 06f54a25c5884544acd64fb85acb81b4
│   │   ├── 28b616b2f7284d76acc268572251fd06
│   │   ├── 2f71b1da68494fb89e1e9afce5b4ebae
│   │   ├── 5dd6a5bf673046abb4f6be12e6b06eac
│   │   ├── 680cafaeffd34f11afdade450c6d3a95
│   │   ├── 7203c65cd495445ba95cfc1a95e64b40
│   │   └── meta.yaml
│   └── 999934558401375698
│       ├── 0b80ece99f8444b7b1d766150ad63979
│       ├── 25d77ad428f9484089260f0b181355a4
│       ├── 509c934c0065451d81b1d8ef651b31a5
│       ├── 6ffa1ce10f1145ec9b977e903e3fa767
│       ├── 8c31a5da88d84aa984e978d9b9bbb831
│       ├── fc64619444224f0c80d7752a0a492402
│       └── meta.yaml
├── example.py
├── hydraflow.yaml
└── submit.py

After cleanup, we can observe:

  • There are three experiments (one for each job type)
  • Each experiment contains multiple runs
  • A total of 16 runs were executed across all jobs

Summary

In this tutorial, you've learned how to:

  1. Define different types of experiment workflows in a hydraflow.yaml file
  2. Execute sequential and parallel job runs
  3. Use the submit command for custom execution patterns
  4. Preview jobs with dry runs
  5. Manage and organize experiment outputs

These workflow automation capabilities allow you to efficiently manage complex experiment configurations, making your machine learning research more organized and reproducible.

Next Steps

Now that you've learned about workflow automation, try:

  • Defining your own custom workflows
  • Exploring more complex parameter sweep combinations
  • Learning how to Analyze Results from your experiments

For more detailed information, refer to: