Skip to content

Creating Your First HydraFlow Application

This tutorial demonstrates how to create and run a basic HydraFlow application that integrates Hydra's configuration management with MLflow's experiment tracking.

Prerequisites

Before you begin this tutorial, you should:

  1. Have HydraFlow installed (Installation Guide)
  2. Have a basic understanding of Python and machine learning experiments

Project Structure

First, let's examine our project structure:

.
├── example.py
├── hydraflow.yaml
└── submit.py

In this tutorial, we will only use the example.py file.

Creating a Basic Application

Let's create a simple HydraFlow application that defines a configuration class and tracks experiment parameters:

example.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import logging
from dataclasses import dataclass

from mlflow.entities import Run

import hydraflow

log = logging.getLogger(__name__)


@dataclass
class Config:
    width: int = 1024
    height: int = 768


@hydraflow.main(Config)
def app(run: Run, cfg: Config) -> None:
    log.info(run.info.run_id)
    log.info(cfg)


if __name__ == "__main__":
    app()

Understanding the Key Components

Let's break down the essential parts of this application:

  1. Configuration Class: A dataclass that defines the parameters for our experiment

    @dataclass
    class Config:
        width: int = 1024
        height: int = 768
    

    This class defines the structure and default values for our configuration parameters. Using Python's dataclass gives us type safety and clear structure.

  2. Main Function: The core of our application, decorated with @hydraflow.main

    @hydraflow.main(Config)
    def app(run: Run, cfg: Config) -> None:
        log.info(run.info.run_id)
        log.info(cfg)
    

    This function will be executed with the provided configuration. It takes two key parameters:

    • run: An MLflow run object that provides access to the current experiment
    • cfg: The configuration object with our parameters

The Power of the Decorator

The hydraflow.main decorator is where the magic happens:

  • It registers your configuration class with Hydra's ConfigStore
  • It sets up an MLflow experiment
  • It starts an MLflow run and passes it to your function
  • It stores all Hydra configuration and logs as MLflow artifacts

This single decorator seamlessly connects Hydra's configuration capabilities with MLflow's experiment tracking.

Running the Application

Now that we understand the code, let's run our application.

Single-run Mode

First, let's run it in single-run mode:

$ python example.py
2025/04/19 02:41:04 INFO mlflow.tracking.fluent: Experiment with name 'example' does not exist. Creating a new experiment.
[2025-04-19 02:41:05,037][__main__][INFO] - 2179c72c88624825b322bf2522e7674a
[2025-04-19 02:41:05,037][__main__][INFO] - {'width': 1024, 'height': 768}

When you run the application, HydraFlow automatically:

  1. Creates an MLflow experiment named after your application (in this case, "example")
  2. Starts a run with the provided configuration
  3. Captures logs and artifacts

Let's use the MLflow CLI to verify that our experiment was created:

$ mlflow experiments search
Experiment Id       Name     Artifact Location                                                              
------------------  -------  -------------------------------------------------------------------------------
0                   Default  file:///home/runner/work/hydraflow/hydraflow/examples/mlruns/0                 
445485773524966029  example  file:///home/runner/work/hydraflow/hydraflow/examples/mlruns/445485773524966029

Now, let's examine the directory structure created by Hydra and MlFlow:

.
├── mlruns
│   ├── 0
│   │   └── meta.yaml
│   └── 445485773524966029
│       ├── 2179c72c88624825b322bf2522e7674a
│       │   ├── artifacts
│       │   │   ├── .hydra
│       │   │   └── example.log
│       │   ├── metrics
│       │   ├── params
│       │   └── meta.yaml
│       └── meta.yaml
├── outputs
│   └── 2025-04-19
│       └── 02-41-04
│           ├── .hydra
│           │   ├── config.yaml
│           │   ├── hydra.yaml
│           │   └── overrides.yaml
│           └── example.log
├── example.py
├── hydraflow.yaml
└── submit.py

The directory structure shows:

  • outputs directory: Created by Hydra to store the run's outputs
  • mlruns directory: Created by MLflow to store experiment data
  • artifacts directory: Contains configuration files and logs managed by HydraFlow

Multi-run Mode (Parameter Sweeps)

One of Hydra's most powerful features is the ability to run parameter sweeps. Let's try this by overriding our configuration parameters:

$ python example.py -m width=400,600 height=100,200
[2025-04-19 02:41:09,873][HYDRA] Launching 4 jobs locally
[2025-04-19 02:41:09,873][HYDRA]    #0 : width=400 height=100
[2025-04-19 02:41:10,001][__main__][INFO] - 31b314073635407daedbd844bdc29f41
[2025-04-19 02:41:10,001][__main__][INFO] - {'width': 400, 'height': 100}
[2025-04-19 02:41:10,003][HYDRA]    #1 : width=400 height=200
[2025-04-19 02:41:10,084][__main__][INFO] - 851ef5be98bb42a195a265eb5cfdd5aa
[2025-04-19 02:41:10,084][__main__][INFO] - {'width': 400, 'height': 200}
[2025-04-19 02:41:10,087][HYDRA]    #2 : width=600 height=100
[2025-04-19 02:41:10,166][__main__][INFO] - df439ec6d4ab456f9e56c89d1e944b05
[2025-04-19 02:41:10,167][__main__][INFO] - {'width': 600, 'height': 100}
[2025-04-19 02:41:10,169][HYDRA]    #3 : width=600 height=200
[2025-04-19 02:41:10,251][__main__][INFO] - 15099e7e9d8149cab749ca538f00c33c
[2025-04-19 02:41:10,251][__main__][INFO] - {'width': 600, 'height': 200}

The -m flag (or --multirun) tells Hydra to run all combinations of the specified parameters. In this case, we'll run 4 combinations:

  • width=400, height=100
  • width=400, height=200
  • width=600, height=100
  • width=600, height=200

Let's see the updated directory structure:

.
├── mlruns
│   ├── 0
│   └── 445485773524966029
│       ├── 15099e7e9d8149cab749ca538f00c33c
│       │   └── artifacts
│       │       ├── .hydra
│       │       └── example.log
│       ├── 2179c72c88624825b322bf2522e7674a
│       │   └── artifacts
│       │       ├── .hydra
│       │       └── example.log
│       ├── 31b314073635407daedbd844bdc29f41
│       │   └── artifacts
│       │       ├── .hydra
│       │       └── example.log
│       ├── 851ef5be98bb42a195a265eb5cfdd5aa
│       │   └── artifacts
│       │       ├── .hydra
│       │       └── example.log
│       └── df439ec6d4ab456f9e56c89d1e944b05
│           └── artifacts
│               ├── .hydra
│               └── example.log
├── multirun
│   └── 2025-04-19
│       └── 02-41-09
│           ├── 0
│           │   ├── .hydra
│           │   └── example.log
│           ├── 1
│           │   ├── .hydra
│           │   └── example.log
│           ├── 2
│           │   ├── .hydra
│           │   └── example.log
│           └── 3
│               ├── .hydra
│               └── example.log
├── outputs
│   └── 2025-04-19
│       └── 02-41-04
│           ├── .hydra
│           └── example.log
├── example.py
└── submit.py

Notice that all runs are added to the same MLflow experiment, making it easy to compare results across parameter combinations.

Cleanup

With HydraFlow, all important data is stored in MLflow, so you can safely delete the Hydra output directories:

$ rm -rf outputs multirun

After cleanup, the directory structure is much simpler:

.
├── mlruns
│   ├── 0
│   │   └── meta.yaml
│   └── 445485773524966029
│       ├── 15099e7e9d8149cab749ca538f00c33c
│       ├── 2179c72c88624825b322bf2522e7674a
│       ├── 31b314073635407daedbd844bdc29f41
│       ├── 851ef5be98bb42a195a265eb5cfdd5aa
│       ├── df439ec6d4ab456f9e56c89d1e944b05
│       └── meta.yaml
├── example.py
├── hydraflow.yaml
└── submit.py

All experiment data remains safely stored in the MLflow directory.

Summary

In this tutorial, you've learned how to:

  1. Create a simple HydraFlow application using the @hydraflow.main decorator
  2. Define configuration using Python dataclasses
  3. Run experiments with default and overridden parameters
  4. Perform parameter sweeps using Hydra's multi-run capabilities

This basic pattern forms the foundation for all HydraFlow applications. As your machine learning workflows grow in complexity, you can build upon this foundation to create more sophisticated experiments.

Next Steps

Now that you've learned how to create and run a basic application, try:

  • Creating more complex configurations with nested parameters
  • Adding actual machine learning code to your application
  • Exploring Automated Workflows with HydraFlow
  • Learning how to Analyze Results from your experiments

For detailed documentation, refer to: