Skip to content

Creating Your First HydraFlow Application

This tutorial demonstrates how to create and run a basic HydraFlow application that integrates Hydra's configuration management with MLflow's experiment tracking.

Prerequisites

Before you begin this tutorial, you should:

  1. Have HydraFlow installed (Installation Guide)
  2. Have a basic understanding of Python

Project Structure

First, let's examine our project structure:

./
├── example.py
├── hydraflow.yaml
└── submit.py

In this tutorial, we will only use the example.py file.

Creating a Basic Application

Let's create a simple HydraFlow application that defines a configuration class and tracks experiment parameters:

example.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from __future__ import annotations

import logging
from dataclasses import dataclass
from typing import TYPE_CHECKING

import hydraflow

if TYPE_CHECKING:
    from mlflow.entities import Run

logger = logging.getLogger(__name__)


@dataclass
class Config:
    width: int = 1024
    height: int = 768


@hydraflow.main(Config, tracking_uri="sqlite:///mlflow.db")
def app(run: Run, cfg: Config) -> None:
    logger.info(run.info.run_id)
    logger.info(cfg)


if __name__ == "__main__":
    app()

Understanding the Key Components

Let's break down the essential parts of this application:

  1. Configuration Class: A dataclass that defines the parameters for our experiment:

    @dataclass
    class Config:
        width: int = 1024
        height: int = 768
    

  2. Main Function: The core of our application, decorated with @hydraflow.main:

    @hydraflow.main(Config, tracking_uri="sqlite:///mlflow.db")
    def app(run: Run, cfg: Config) -> None:
        logger.info(run.info.run_id)
        logger.info(cfg)
    

    This function is the entry point and receives two key parameters: run (an MLflow Run object) and cfg (the configuration object).

  3. Entry Point: The standard Python entry point that calls our application function:

    if __name__ == "__main__":
        app()
    

The Power of the Decorator

The hydraflow.main decorator is where the magic happens:

  • It registers your configuration class with Hydra's ConfigStore.
  • It sets the MLflow tracking URI via the tracking_uri if provided.
  • It sets up an MLflow experiment.
  • It starts an MLflow run and passes it to your function.
  • It stores all Hydra configuration and logs as MLflow artifacts.

Running the Application

Now that we understand the code, let's run our application.

Single-run Mode

First, let's run it in single-run mode:

$ python example.py
2025/11/30 07:24:03 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2025/11/30 07:24:03 INFO mlflow.store.db.utils: Updating database tables
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade  -> 451aebb31d03, add metric step
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade 451aebb31d03 -> 90e64c465722, migrate user column to tags
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade 90e64c465722 -> 181f10493468, allow nulls for metric values
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade 181f10493468 -> df50e92ffc5e, Add Experiment Tags Table
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade df50e92ffc5e -> 7ac759974ad8, Update run tags with larger limit
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade 7ac759974ad8 -> 89d4b8295536, create latest metrics table
2025-11-30 07:24:03 INFO  [89d4b8295536_create_latest_metrics_table_py] Migration complete!
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade 89d4b8295536 -> 2b4d017a5e9b, add model registry tables to db
2025-11-30 07:24:03 INFO  [2b4d017a5e9b_add_model_registry_tables_to_db_py] Adding registered_models and model_versions tables to database.
2025-11-30 07:24:03 INFO  [2b4d017a5e9b_add_model_registry_tables_to_db_py] Migration complete!
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade 2b4d017a5e9b -> cfd24bdc0731, Update run status constraint with killed
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade cfd24bdc0731 -> 0a8213491aaa, drop_duplicate_killed_constraint
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade 0a8213491aaa -> 728d730b5ebd, add registered model tags table
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade 728d730b5ebd -> 27a6a02d2cf1, add model version tags table
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade 27a6a02d2cf1 -> 84291f40a231, add run_link to model_version
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade 84291f40a231 -> a8c4a736bde6, allow nulls for run_id
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade a8c4a736bde6 -> 39d1c3be5f05, add_is_nan_constraint_for_metrics_tables_if_necessary
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade 39d1c3be5f05 -> c48cb773bb87, reset_default_value_for_is_nan_in_metrics_table_for_mysql
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade c48cb773bb87 -> bd07f7e963c5, create index on run_uuid
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade bd07f7e963c5 -> 0c779009ac13, add deleted_time field to runs table
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade 0c779009ac13 -> cc1f77228345, change param value length to 500
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade cc1f77228345 -> 97727af70f4d, Add creation_time and last_update_time to experiments table
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade 97727af70f4d -> 3500859a5d39, Add Model Aliases table
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade 3500859a5d39 -> 7f2a7d5fae7d, add datasets inputs input_tags tables
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade 7f2a7d5fae7d -> 2d6e25af4d3e, increase max param val length from 500 to 8000
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade 2d6e25af4d3e -> acf3f17fdcc7, add storage location field to model versions
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade acf3f17fdcc7 -> 867495a8f9d4, add trace tables
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade 867495a8f9d4 -> 5b0e9adcef9c, add cascade deletion to trace tables foreign keys
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade 5b0e9adcef9c -> 4465047574b1, increase max dataset schema size
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade 4465047574b1 -> f5a4f2784254, increase run tag value limit to 8000
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade f5a4f2784254 -> 0584bdc529eb, add cascading deletion to datasets from experiments
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade 0584bdc529eb -> 400f98739977, add logged model tables
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade 400f98739977 -> 6953534de441, add step to inputs table
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade 6953534de441 -> bda7b8c39065, increase_model_version_tag_value_limit
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade bda7b8c39065 -> cbc13b556ace, add V3 trace schema columns
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade cbc13b556ace -> 770bee3ae1dd, add assessments table
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade 770bee3ae1dd -> a1b2c3d4e5f6, add spans table
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade a1b2c3d4e5f6 -> de4033877273, create entity_associations table
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade de4033877273 -> 1a0cddfcaa16, Add webhooks and webhook_events tables
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade 1a0cddfcaa16 -> 534353b11cbc, add scorer tables
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade 534353b11cbc -> 71994744cf8e, add evaluation datasets
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade 71994744cf8e -> 3da73c924c2f, add outputs to dataset record
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Running upgrade 3da73c924c2f -> bf29a5ff90ea, add jobs table
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
2025-11-30 07:24:03 INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
2025/11/30 07:24:03 INFO mlflow.tracking.fluent: Experiment with name 'example' does not exist. Creating a new experiment.

When you run the application, HydraFlow automatically:

  1. Sets the MLflow tracking URI to the mlflow.db SQLite database in the project root.
  2. Creates an MLflow experiment named after your application (in this case, "example").
  3. Starts a run with the provided configuration.
  4. Captures logs and artifacts.

Let's use the MLflow CLI to verify that our experiment was created:

$ MLFLOW_TRACKING_URI=sqlite:///mlflow.db mlflow experiments search
2025/11/30 07:24:05 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2025/11/30 07:24:05 INFO mlflow.store.db.utils: Updating database tables
2025-11-30 07:24:05 INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
2025-11-30 07:24:05 INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
2025-11-30 07:24:05 INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
2025-11-30 07:24:05 INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
Experiment Id    Name     Artifact Location                                      
---------------  -------  -------------------------------------------------------
0                Default  /home/runner/work/hydraflow/hydraflow/examples/mlruns/0
1                example  /home/runner/work/hydraflow/hydraflow/examples/mlruns/1

Now, let's examine the directory structure created by Hydra and MlFlow:

./
├── mlruns/
│   └── 1/
│       └── 54ffa3032464477f87d75f7eef5fb23c/
│           └── artifacts/
│               ├── .hydra/
│               └── example.log
├── outputs/
│   └── 2025-11-30/
│       └── 07-24-03/
│           ├── .hydra/
│           │   ├── config.yaml
│           │   ├── hydra.yaml
│           │   └── overrides.yaml
│           └── example.log
├── example.py
├── hydraflow.yaml
├── mlflow.db
└── submit.py

The directory structure shows:

  • outputs directory: Created by Hydra to store the run's outputs
  • mlflow.db file: Created by MLflow to store the experiment tracking database
  • mlruns directory: Created by MLflow to store experiment artifacts
  • artifacts directory: Contains configuration files and logs managed by HydraFlow

Multi-run Mode (Parameter Sweeps)

One of Hydra's most powerful features is the ability to run parameter sweeps. Let's try this by overriding our configuration parameters:

$ python example.py -m width=400,600 height=100,200
[2025-11-30 07:24:07,258][HYDRA] Launching 4 jobs locally
[2025-11-30 07:24:07,258][HYDRA]    #0 : width=400 height=100
2025/11/30 07:24:07 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2025/11/30 07:24:07 INFO mlflow.store.db.utils: Updating database tables
2025-11-30 07:24:07 INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
2025-11-30 07:24:07 INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
2025-11-30 07:24:07 INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
2025-11-30 07:24:07 INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
[2025-11-30 07:24:07,716][HYDRA]    #1 : width=400 height=200
[2025-11-30 07:24:07,777][__main__][INFO] - 82e9f894b75146af960c080e61bf9cb7
[2025-11-30 07:24:07,777][__main__][INFO] - {'width': 400, 'height': 200}
[2025-11-30 07:24:07,781][HYDRA]    #2 : width=600 height=100
[2025-11-30 07:24:07,844][__main__][INFO] - c6e2412157ae4759abeb26fc8116a88d
[2025-11-30 07:24:07,844][__main__][INFO] - {'width': 600, 'height': 100}
[2025-11-30 07:24:07,848][HYDRA]    #3 : width=600 height=200
[2025-11-30 07:24:07,910][__main__][INFO] - 078d96ccb5c249d3bdca4ddbdef3552e
[2025-11-30 07:24:07,910][__main__][INFO] - {'width': 600, 'height': 200}

The -m flag (or --multirun) tells Hydra to run all combinations of the specified parameters. In this case, we'll run 4 combinations:

  • width=400, height=100
  • width=400, height=200
  • width=600, height=100
  • width=600, height=200

Let's see the updated directory structure:

./
├── mlruns/
│   └── 1/
│       ├── 078d96ccb5c249d3bdca4ddbdef3552e/
│       │   └── artifacts/
│       │       ├── .hydra/
│       │       └── example.log
│       ├── 188edcb9403c4a449039bf7c0b1f1b8d/
│       │   └── artifacts/
│       │       ├── .hydra/
│       │       └── example.log
│       ├── 54ffa3032464477f87d75f7eef5fb23c/
│       │   └── artifacts/
│       │       ├── .hydra/
│       │       └── example.log
│       ├── 82e9f894b75146af960c080e61bf9cb7/
│       │   └── artifacts/
│       │       ├── .hydra/
│       │       └── example.log
│       └── c6e2412157ae4759abeb26fc8116a88d/
│           └── artifacts/
│               ├── .hydra/
│               └── example.log
├── multirun/
│   └── 2025-11-30/
│       ├── 07-24-07/
│       │   ├── 0/
│       │   │   ├── .hydra/
│       │   │   └── example.log
│       │   ├── 1/
│       │   │   ├── .hydra/
│       │   │   └── example.log
│       │   ├── 2/
│       │   │   ├── .hydra/
│       │   │   └── example.log
│       │   └── 3/
│       │       ├── .hydra/
│       │       └── example.log
│       └── .hydraflow.lock
├── outputs/
│   └── 2025-11-30/
│       └── 07-24-03/
│           ├── .hydra/
│           └── example.log
├── example.py
├── mlflow.db
└── submit.py

Notice that all runs are added to the same MLflow experiment, making it easy to compare results across parameter combinations.

Cleanup

With HydraFlow, all important data is stored in MLflow, so you can safely delete the Hydra output directories:

$ rm -rf outputs multirun

After cleanup, the directory structure is much simpler:

./
├── mlruns/
│   └── 1/
│       ├── 078d96ccb5c249d3bdca4ddbdef3552e/
│       ├── 188edcb9403c4a449039bf7c0b1f1b8d/
│       ├── 54ffa3032464477f87d75f7eef5fb23c/
│       ├── 82e9f894b75146af960c080e61bf9cb7/
│       └── c6e2412157ae4759abeb26fc8116a88d/
├── example.py
├── hydraflow.yaml
├── mlflow.db
└── submit.py

All experiment data remains safely stored in the MLflow directory.

Summary

In this tutorial, you've learned how to:

  1. Create a simple HydraFlow application using the @hydraflow.main decorator
  2. Define configuration using Python dataclasses
  3. Run experiments with default and overridden parameters
  4. Perform parameter sweeps using Hydra's multi-run capabilities

This basic pattern forms the foundation for all HydraFlow applications. As your machine learning workflows grow in complexity, you can build upon this foundation to create more sophisticated experiments.

Next Steps

Now that you've learned how to create and run a basic application, try:

  • Creating more complex configurations with nested parameters
  • Adding actual machine learning code to your application
  • Exploring Automated Workflows with HydraFlow
  • Learning how to Analyze Results from your experiments

For detailed documentation, refer to: