Automating Complex Workflows

This tutorial demonstrates how to use HydraFlow's workflow automation capabilities to define, manage, and execute complex experiment workflows.

Prerequisites

Before you begin this tutorial, you should:

Understand basic HydraFlow applications (from the Creating Your First Application tutorial)
Have a basic understanding of YAML configuration files

Project Structure

First, let's examine our project structure:

.
├── example.py
├── hydraflow.yaml
└── submit.py

In this tutorial, we'll use:

example.py: Our basic HydraFlow application
hydraflow.yaml: A configuration file to define our experiment workflows
submit.py: A helper script for job submission

Understanding Job Definitions

The hydraflow.yaml file allows you to define reusable experiment workflows:

hydraflow.yaml
jobs:
  job_sequential:
    run: python example.py
    sets:
      - each: width=100,300
        all: height=100:300:100
  job_parallel:
    run: python example.py
    add: >-
      hydra/launcher=joblib
      hydra.launcher.n_jobs=3
    sets:
      - each: width=200,400
        all: height=100:300:100
  job_submit:
    submit: python submit.py example.py
    sets:
      - each: width=250:350:100
        all: height=150,250

This configuration file defines three different types of jobs:

job_sequential: A job that runs sequentially
job_parallel: A job that runs with parallelization
job_submit: A job that uses a submit command for custom execution

Each job demonstrates different execution patterns and parameter combinations.

Using the HydraFlow CLI

HydraFlow provides a command-line interface (CLI) for executing and managing jobs defined in your hydraflow.yaml file. The primary command is hydraflow run, which allows you to execute any job defined in your configuration.

Basic usage:

hydraflow run <job_name> [overrides]

Where:

<job_name> is the name of a job defined in hydraflow.yaml
[overrides] are optional Hydra-style parameter overrides

For more details on the CLI, see the Job Configuration documentation.

Previewing Execution with Dry Run

Before executing our workflows, we can preview what will happen using the --dry-run flag:

$ hydraflow run job_sequential --dry-run
python example.py --multirun width=100 height=100,200,300 hydra.job.name=job_sequential hydra.sweep.dir=multirun/01K14KP3DG5B06SMQANXVQ91VN
python example.py --multirun width=300 height=100,200,300 hydra.job.name=job_sequential hydra.sweep.dir=multirun/01K14KP3DGAVY3EEA5VN9S81ZH

From the dry run output, we can observe:

2 jobs will be executed (from the each parameter combinations)
Each job contains 3 sweeps (from the all range values)
Each job includes additional options:
- hydra.job.name: The name of the job defined in hydraflow.yaml
- hydra.sweep.dir: A unique but time-ordered directory for each job created by HydraFlow

Standard Hydra creates directories based on the current date and time, which may cause duplication during parallel execution. HydraFlow solves this problem by creating unique, time-ordered directories for each job.

Running Sequential Jobs

Let's examine the sequential job configuration:

job_sequential:
  run: python example.py
  sets:
    - each: width=100,300
      all: height=100:300:100

This job uses the each and all parameters to run multiple configuration combinations in sequence:

$ hydraflow run job_sequential
2025/07/27 00:14:24 INFO mlflow.tracking.fluent: Experiment with name 'job_sequential' does not exist. Creating a new experiment.
[2025-07-27 00:14:26,306][HYDRA] Launching 3 jobs locally                       
[2025-07-27 00:14:26,306][HYDRA]        #0 : width=100 height=100               
[2025-07-27 00:14:26,540][__main__][INFO] - b6e7de84df134c1aaaa2af7bb97a6677    
[2025-07-27 00:14:26,540][__main__][INFO] - {'width': 100, 'height': 100}       
[2025-07-27 00:14:26,543][HYDRA]        #1 : width=100 height=200               
[2025-07-27 00:14:26,623][__main__][INFO] - dd0858bce4c940c7a1a3d2ee50d05799    
[2025-07-27 00:14:26,623][__main__][INFO] - {'width': 100, 'height': 200}       
[2025-07-27 00:14:26,625][HYDRA]        #2 : width=100 height=300               
[2025-07-27 00:14:26,707][__main__][INFO] - c2fff6d1e9534395ac32306b0ceed7dc    
[2025-07-27 00:14:26,707][__main__][INFO] - {'width': 100, 'height': 300}       
[2025-07-27 00:14:28,208][HYDRA] Launching 3 jobs locally                       
[2025-07-27 00:14:28,208][HYDRA]        #0 : width=300 height=100               
[2025-07-27 00:14:28,331][__main__][INFO] - e772604c0b5940edb7f099374ecb9968    
[2025-07-27 00:14:28,331][__main__][INFO] - {'width': 300, 'height': 100}       
[2025-07-27 00:14:28,334][HYDRA]        #1 : width=300 height=200               
[2025-07-27 00:14:28,416][__main__][INFO] - cf6469f5424040af90ac536d5b21fe77    
[2025-07-27 00:14:28,416][__main__][INFO] - {'width': 300, 'height': 200}       
[2025-07-27 00:14:28,418][HYDRA]        #2 : width=300 height=300               
[2025-07-27 00:14:28,503][__main__][INFO] - 4bd4fb008c9848c2862f4d0061fc2c10    
[2025-07-27 00:14:28,503][__main__][INFO] - {'width': 300, 'height': 300}       
  0:00:03 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00:00 2/2 100%

Results of execution:

An experiment named job_sequential is created
2×3=6 jobs are executed sequentially
A progress bar is displayed to track completion

Running Parallel Jobs

Now let's look at our parallel job configuration:

job_parallel:
  run: python example.py
  add: >-
    hydra/launcher=joblib
    hydra.launcher.n_jobs=3
  sets:
    - each: width=200,400
      all: height=100:300:100

This job leverages Hydra's parallel execution features using a joblib launcher via add parameter:

$ hydraflow run job_parallel --dry-run
python example.py --multirun width=200 height=100,200,300 hydra.job.name=job_parallel hydra.sweep.dir=multirun/01K14KP8XGY7Z6ZFYYM4AAGJK2 hydra/launcher=joblib hydra.launcher.n_jobs=3
python example.py --multirun width=400 height=100,200,300 hydra.job.name=job_parallel hydra.sweep.dir=multirun/01K14KP8XGEGANB16ER084R076 hydra/launcher=joblib hydra.launcher.n_jobs=3

$ hydraflow run job_parallel
2025/07/27 00:14:30 INFO mlflow.tracking.fluent: Experiment with name 'job_parallel' does not exist. Creating a new experiment.
[2025-07-27 00:14:31,914][HYDRA]                                                
Joblib.Parallel(n_jobs=3,backend=loky,prefer=processes,require=None,verbose=0,ti
meout=None,pre_dispatch=2*n_jobs,batch_size=auto,temp_folder=None,max_nbytes=Non
e,mmap_mode=r) is launching 3 jobs                                              
[2025-07-27 00:14:31,914][HYDRA] Launching jobs, sweep output dir :             
multirun/01K14KPA4J9MTG6M63T3XQQB7P                                             
[2025-07-27 00:14:31,914][HYDRA]        #0 : width=200 height=100               
[2025-07-27 00:14:31,914][HYDRA]        #1 : width=200 height=200               
[2025-07-27 00:14:31,915][HYDRA]        #2 : width=200 height=300               
[2025-07-27 00:14:33,553][__main__][INFO] - 3f39829279db48b4938f494f355b8963    
[2025-07-27 00:14:33,553][__main__][INFO] - {'width': 200, 'height': 200}       
[2025-07-27 00:14:33,662][__main__][INFO] - 441bd56c103e4e5da3cd98c928ea589c    
[2025-07-27 00:14:33,662][__main__][INFO] - {'width': 200, 'height': 100}       
[2025-07-27 00:14:33,852][__main__][INFO] - 019fd21e3e314a7f8b549520154aa02a    
[2025-07-27 00:14:33,853][__main__][INFO] - {'width': 200, 'height': 300}       
[2025-07-27 00:14:35,767][HYDRA]                                                
Joblib.Parallel(n_jobs=3,backend=loky,prefer=processes,require=None,verbose=0,ti
meout=None,pre_dispatch=2*n_jobs,batch_size=auto,temp_folder=None,max_nbytes=Non
e,mmap_mode=r) is launching 3 jobs                                              
[2025-07-27 00:14:35,767][HYDRA] Launching jobs, sweep output dir :             
multirun/01K14KPA4JAN0DF3C102ZNPKM4                                             
[2025-07-27 00:14:35,767][HYDRA]        #0 : width=400 height=100               
[2025-07-27 00:14:35,767][HYDRA]        #1 : width=400 height=200               
[2025-07-27 00:14:35,767][HYDRA]        #2 : width=400 height=300               
[2025-07-27 00:14:37,195][__main__][INFO] - e7262e89028f4b159ab80e7edc6fc0cd    
[2025-07-27 00:14:37,195][__main__][INFO] - {'width': 400, 'height': 200}       
[2025-07-27 00:14:37,593][__main__][INFO] - 4c43935b75f04fbeb85da50e090eacbb    
[2025-07-27 00:14:37,593][__main__][INFO] - {'width': 400, 'height': 100}       
[2025-07-27 00:14:37,622][__main__][INFO] - 7db0d101f0594391817d7f7bde883b54    
[2025-07-27 00:14:37,622][__main__][INFO] - {'width': 400, 'height': 300}       
  0:00:07 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00:00 2/2 100%

Results of execution:

An experiment named job_parallel is created
The same Python script is used but with a different experiment name
2 Python commands are executed sequentially
Each Python command runs 3 jobs in parallel (using the hydra/launcher=joblib configuration)

This demonstrates how HydraFlow makes Hydra's powerful parallel execution features easily accessible.

Using the Submit Command

For more complex execution patterns, HydraFlow provides the submit command. Here's our submit job configuration:

job_submit:
  submit: python submit.py example.py
  sets:
    - each: width=250:350:100
      all: height=150,250

The submit command requires two key components:

Your HydraFlow application (example.py in this case)
A command or script that will receive and process a parameter file

Here's our implementation of the submit handler:

submit.py
import shlex
import subprocess
import sys
from pathlib import Path


def main() -> None:
    app_file, opt_file = sys.argv[1:]
    text = Path(opt_file).read_text()

    for line in text.splitlines():
        opts = shlex.split(line)
        args = [sys.executable, app_file, *opts]
        print(args)
        subprocess.run(args, check=True)


if __name__ == "__main__":
    main()

How the submit command works:

HydraFlow generates all parameter combinations based on your job configuration
It writes these combinations to a temporary text file (one combination per line)
It runs the command specified in the submit field of your hydraflow.yaml
It appends the temporary file path as the last argument to your command

For example, with submit: python submit.py example.py in your configuration, the actual executed command will be something like:

python submit.py example.py /tmp/hydraflow_parameters_12345.txt

Let's see it in action with a dry run:

$ hydraflow run job_submit --dry-run
python submit.py example.py /home/runner/work/hydraflow/hydraflow/examples/tmpu19b39my
--multirun width=250 height=150,250 hydra.job.name=job_submit hydra.sweep.dir=multirun/01K14KPJ6QK0Y1ERGEY6V5GKBV
--multirun width=350 height=150,250 hydra.job.name=job_submit hydra.sweep.dir=multirun/01K14KPJ6QQ22WWE0MTZE5Y41Z

And now let's run it:

$ hydraflow run job_submit
2025/07/27 00:14:40 INFO mlflow.tracking.fluent: Experiment with name 'job_submit' does not exist. Creating a new experiment.
[2025-07-27 00:14:41,421][HYDRA] Launching 2 jobs locally
[2025-07-27 00:14:41,421][HYDRA]    #0 : width=250 height=150
[2025-07-27 00:14:41,544][__main__][INFO] - 3c268e2901944a868a39edd36fcd7e3f
[2025-07-27 00:14:41,544][__main__][INFO] - {'width': 250, 'height': 150}
[2025-07-27 00:14:41,547][HYDRA]    #1 : width=250 height=250
[2025-07-27 00:14:41,628][__main__][INFO] - f65c81843361487fbd5e4e265f620112
[2025-07-27 00:14:41,628][__main__][INFO] - {'width': 250, 'height': 250}
[2025-07-27 00:14:43,093][HYDRA] Launching 2 jobs locally
[2025-07-27 00:14:43,093][HYDRA]    #0 : width=350 height=150
[2025-07-27 00:14:43,219][__main__][INFO] - a8d37af7ee344760859009c7d64a52ff
[2025-07-27 00:14:43,219][__main__][INFO] - {'width': 350, 'height': 150}
[2025-07-27 00:14:43,222][HYDRA]    #1 : width=350 height=250
[2025-07-27 00:14:43,304][__main__][INFO] - d6e433f5a80441948134a2d2e25788f7
[2025-07-27 00:14:43,304][__main__][INFO] - {'width': 350, 'height': 250}
['/home/runner/work/hydraflow/hydraflow/.venv/bin/python', 'example.py', '--multirun', 'width=250', 'height=150,250', 'hydra.job.name=job_submit', 'hydra.sweep.dir=multirun/01K14KPKFKS85ZBJRWZW09Z1XB']
['/home/runner/work/hydraflow/hydraflow/.venv/bin/python', 'example.py', '--multirun', 'width=350', 'height=150,250', 'hydra.job.name=job_submit', 'hydra.sweep.dir=multirun/01K14KPKFK40XVZ5J4NAEA215X']

Our submit.py script implements a simple processor that:

Accepts two arguments: the application file (example.py) and the parameter file
Reads each line from the parameter file
Runs the application with each set of parameters sequentially

In real-world scenarios, you could customize this handler to:

Submit jobs to compute clusters (SLURM, PBS, etc.)
Implement custom scheduling logic
Distribute workloads based on resource requirements

Reviewing Results

With HydraFlow, all important data is stored in MLflow, so we can safely delete the Hydra output directories:

$ rm -rf multirun

Let's check the directory structure:

.
├── mlruns
│   ├── 0
│   │   └── meta.yaml
│   ├── 194239857113511074
│   │   ├── 019fd21e3e314a7f8b549520154aa02a
│   │   ├── 3f39829279db48b4938f494f355b8963
│   │   ├── 441bd56c103e4e5da3cd98c928ea589c
│   │   ├── 4c43935b75f04fbeb85da50e090eacbb
│   │   ├── 7db0d101f0594391817d7f7bde883b54
│   │   ├── e7262e89028f4b159ab80e7edc6fc0cd
│   │   └── meta.yaml
│   ├── 882029129974687121
│   │   ├── 4bd4fb008c9848c2862f4d0061fc2c10
│   │   ├── b6e7de84df134c1aaaa2af7bb97a6677
│   │   ├── c2fff6d1e9534395ac32306b0ceed7dc
│   │   ├── cf6469f5424040af90ac536d5b21fe77
│   │   ├── dd0858bce4c940c7a1a3d2ee50d05799
│   │   ├── e772604c0b5940edb7f099374ecb9968
│   │   └── meta.yaml
│   └── 987270575134579241
│       ├── 3c268e2901944a868a39edd36fcd7e3f
│       ├── a8d37af7ee344760859009c7d64a52ff
│       ├── d6e433f5a80441948134a2d2e25788f7
│       ├── f65c81843361487fbd5e4e265f620112
│       └── meta.yaml
├── example.py
├── hydraflow.yaml
└── submit.py

After cleanup, we can observe:

There are three experiments (one for each job type)
Each experiment contains multiple runs
A total of 16 runs were executed across all jobs

Summary

In this tutorial, you've learned how to:

Define different types of experiment workflows in a hydraflow.yaml file
Execute sequential and parallel job runs
Use the submit command for custom execution patterns
Preview jobs with dry runs
Manage and organize experiment outputs

These workflow automation capabilities allow you to efficiently manage complex experiment configurations, making your machine learning research more organized and reproducible.

Next Steps

Now that you've learned about workflow automation, try:

Defining your own custom workflows
Exploring more complex parameter sweep combinations
Learning how to Analyze Results from your experiments

For more detailed information, refer to: