Creating Your First HydraFlow Application
This tutorial demonstrates how to create and run a basic HydraFlow application that integrates Hydra's configuration management with MLflow's experiment tracking.
Prerequisites
Before you begin this tutorial, you should:
- Have HydraFlow installed (Installation Guide)
- Have a basic understanding of Python
Project Structure
First, let's examine our project structure:
./
├── example.py
├── hydraflow.yaml
└── submit.py
In this tutorial, we will only use the example.py file.
Creating a Basic Application
Let's create a simple HydraFlow application that defines a configuration class and tracks experiment parameters:
| example.py | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | |
Understanding the Key Components
Let's break down the essential parts of this application:
-
Configuration Class: A
dataclassthat defines the parameters for our experiment:@dataclass class Config: width: int = 1024 height: int = 768 -
Main Function: The core of our application, decorated with
@hydraflow.main:@hydraflow.main(Config, tracking_uri="sqlite:///mlflow.db") def app(run: Run, cfg: Config) -> None: logger.info(run.info.run_id) logger.info(cfg)This function is the entry point and receives two key parameters:
run(an MLflow Run object) andcfg(the configuration object). -
Entry Point: The standard Python entry point that calls our application function:
if __name__ == "__main__": app()
The Power of the Decorator
The hydraflow.main decorator is where the magic happens:
- It registers your configuration class with Hydra's
ConfigStore. - It sets the MLflow tracking URI via the
tracking_uriif provided. - It sets up an MLflow experiment.
- It starts an MLflow run and passes it to your function.
- It stores all Hydra configuration and logs as MLflow artifacts.
Running the Application
Now that we understand the code, let's run our application.
Single-run Mode
First, let's run it in single-run mode:
$ python example.py
2025/11/30 07:24:03 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2025/11/30 07:24:03 INFO mlflow.store.db.utils: Updating database tables
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Context impl SQLiteImpl.
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Will assume non-transactional DDL.
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade -> 451aebb31d03, add metric step
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade 451aebb31d03 -> 90e64c465722, migrate user column to tags
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade 90e64c465722 -> 181f10493468, allow nulls for metric values
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade 181f10493468 -> df50e92ffc5e, Add Experiment Tags Table
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade df50e92ffc5e -> 7ac759974ad8, Update run tags with larger limit
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade 7ac759974ad8 -> 89d4b8295536, create latest metrics table
2025-11-30 07:24:03 INFO [89d4b8295536_create_latest_metrics_table_py] Migration complete!
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade 89d4b8295536 -> 2b4d017a5e9b, add model registry tables to db
2025-11-30 07:24:03 INFO [2b4d017a5e9b_add_model_registry_tables_to_db_py] Adding registered_models and model_versions tables to database.
2025-11-30 07:24:03 INFO [2b4d017a5e9b_add_model_registry_tables_to_db_py] Migration complete!
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade 2b4d017a5e9b -> cfd24bdc0731, Update run status constraint with killed
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade cfd24bdc0731 -> 0a8213491aaa, drop_duplicate_killed_constraint
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade 0a8213491aaa -> 728d730b5ebd, add registered model tags table
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade 728d730b5ebd -> 27a6a02d2cf1, add model version tags table
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade 27a6a02d2cf1 -> 84291f40a231, add run_link to model_version
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade 84291f40a231 -> a8c4a736bde6, allow nulls for run_id
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade a8c4a736bde6 -> 39d1c3be5f05, add_is_nan_constraint_for_metrics_tables_if_necessary
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade 39d1c3be5f05 -> c48cb773bb87, reset_default_value_for_is_nan_in_metrics_table_for_mysql
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade c48cb773bb87 -> bd07f7e963c5, create index on run_uuid
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade bd07f7e963c5 -> 0c779009ac13, add deleted_time field to runs table
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade 0c779009ac13 -> cc1f77228345, change param value length to 500
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade cc1f77228345 -> 97727af70f4d, Add creation_time and last_update_time to experiments table
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade 97727af70f4d -> 3500859a5d39, Add Model Aliases table
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade 3500859a5d39 -> 7f2a7d5fae7d, add datasets inputs input_tags tables
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade 7f2a7d5fae7d -> 2d6e25af4d3e, increase max param val length from 500 to 8000
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade 2d6e25af4d3e -> acf3f17fdcc7, add storage location field to model versions
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade acf3f17fdcc7 -> 867495a8f9d4, add trace tables
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade 867495a8f9d4 -> 5b0e9adcef9c, add cascade deletion to trace tables foreign keys
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade 5b0e9adcef9c -> 4465047574b1, increase max dataset schema size
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade 4465047574b1 -> f5a4f2784254, increase run tag value limit to 8000
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade f5a4f2784254 -> 0584bdc529eb, add cascading deletion to datasets from experiments
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade 0584bdc529eb -> 400f98739977, add logged model tables
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade 400f98739977 -> 6953534de441, add step to inputs table
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade 6953534de441 -> bda7b8c39065, increase_model_version_tag_value_limit
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade bda7b8c39065 -> cbc13b556ace, add V3 trace schema columns
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade cbc13b556ace -> 770bee3ae1dd, add assessments table
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade 770bee3ae1dd -> a1b2c3d4e5f6, add spans table
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade a1b2c3d4e5f6 -> de4033877273, create entity_associations table
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade de4033877273 -> 1a0cddfcaa16, Add webhooks and webhook_events tables
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade 1a0cddfcaa16 -> 534353b11cbc, add scorer tables
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade 534353b11cbc -> 71994744cf8e, add evaluation datasets
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade 71994744cf8e -> 3da73c924c2f, add outputs to dataset record
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Running upgrade 3da73c924c2f -> bf29a5ff90ea, add jobs table
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Context impl SQLiteImpl.
2025-11-30 07:24:03 INFO [alembic.runtime.migration] Will assume non-transactional DDL.
2025/11/30 07:24:03 INFO mlflow.tracking.fluent: Experiment with name 'example' does not exist. Creating a new experiment.
When you run the application, HydraFlow automatically:
- Sets the MLflow tracking URI to the
mlflow.dbSQLite database in the project root. - Creates an MLflow experiment named after your application (in this case, "example").
- Starts a run with the provided configuration.
- Captures logs and artifacts.
Let's use the MLflow CLI to verify that our experiment was created:
$ MLFLOW_TRACKING_URI=sqlite:///mlflow.db mlflow experiments search
2025/11/30 07:24:05 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2025/11/30 07:24:05 INFO mlflow.store.db.utils: Updating database tables
2025-11-30 07:24:05 INFO [alembic.runtime.migration] Context impl SQLiteImpl.
2025-11-30 07:24:05 INFO [alembic.runtime.migration] Will assume non-transactional DDL.
2025-11-30 07:24:05 INFO [alembic.runtime.migration] Context impl SQLiteImpl.
2025-11-30 07:24:05 INFO [alembic.runtime.migration] Will assume non-transactional DDL.
Experiment Id Name Artifact Location
--------------- ------- -------------------------------------------------------
0 Default /home/runner/work/hydraflow/hydraflow/examples/mlruns/0
1 example /home/runner/work/hydraflow/hydraflow/examples/mlruns/1
Now, let's examine the directory structure created by Hydra and MlFlow:
./
├── mlruns/
│ └── 1/
│ └── 54ffa3032464477f87d75f7eef5fb23c/
│ └── artifacts/
│ ├── .hydra/
│ └── example.log
├── outputs/
│ └── 2025-11-30/
│ └── 07-24-03/
│ ├── .hydra/
│ │ ├── config.yaml
│ │ ├── hydra.yaml
│ │ └── overrides.yaml
│ └── example.log
├── example.py
├── hydraflow.yaml
├── mlflow.db
└── submit.py
The directory structure shows:
outputsdirectory: Created by Hydra to store the run's outputsmlflow.dbfile: Created by MLflow to store the experiment tracking databasemlrunsdirectory: Created by MLflow to store experiment artifactsartifactsdirectory: Contains configuration files and logs managed by HydraFlow
Multi-run Mode (Parameter Sweeps)
One of Hydra's most powerful features is the ability to run parameter sweeps. Let's try this by overriding our configuration parameters:
$ python example.py -m width=400,600 height=100,200
[2025-11-30 07:24:07,258][HYDRA] Launching 4 jobs locally
[2025-11-30 07:24:07,258][HYDRA] #0 : width=400 height=100
2025/11/30 07:24:07 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2025/11/30 07:24:07 INFO mlflow.store.db.utils: Updating database tables
2025-11-30 07:24:07 INFO [alembic.runtime.migration] Context impl SQLiteImpl.
2025-11-30 07:24:07 INFO [alembic.runtime.migration] Will assume non-transactional DDL.
2025-11-30 07:24:07 INFO [alembic.runtime.migration] Context impl SQLiteImpl.
2025-11-30 07:24:07 INFO [alembic.runtime.migration] Will assume non-transactional DDL.
[2025-11-30 07:24:07,716][HYDRA] #1 : width=400 height=200
[2025-11-30 07:24:07,777][__main__][INFO] - 82e9f894b75146af960c080e61bf9cb7
[2025-11-30 07:24:07,777][__main__][INFO] - {'width': 400, 'height': 200}
[2025-11-30 07:24:07,781][HYDRA] #2 : width=600 height=100
[2025-11-30 07:24:07,844][__main__][INFO] - c6e2412157ae4759abeb26fc8116a88d
[2025-11-30 07:24:07,844][__main__][INFO] - {'width': 600, 'height': 100}
[2025-11-30 07:24:07,848][HYDRA] #3 : width=600 height=200
[2025-11-30 07:24:07,910][__main__][INFO] - 078d96ccb5c249d3bdca4ddbdef3552e
[2025-11-30 07:24:07,910][__main__][INFO] - {'width': 600, 'height': 200}
The -m flag (or --multirun) tells Hydra to run all combinations of
the specified parameters. In this case, we'll run 4 combinations:
- width=400, height=100
- width=400, height=200
- width=600, height=100
- width=600, height=200
Let's see the updated directory structure:
./
├── mlruns/
│ └── 1/
│ ├── 078d96ccb5c249d3bdca4ddbdef3552e/
│ │ └── artifacts/
│ │ ├── .hydra/
│ │ └── example.log
│ ├── 188edcb9403c4a449039bf7c0b1f1b8d/
│ │ └── artifacts/
│ │ ├── .hydra/
│ │ └── example.log
│ ├── 54ffa3032464477f87d75f7eef5fb23c/
│ │ └── artifacts/
│ │ ├── .hydra/
│ │ └── example.log
│ ├── 82e9f894b75146af960c080e61bf9cb7/
│ │ └── artifacts/
│ │ ├── .hydra/
│ │ └── example.log
│ └── c6e2412157ae4759abeb26fc8116a88d/
│ └── artifacts/
│ ├── .hydra/
│ └── example.log
├── multirun/
│ └── 2025-11-30/
│ ├── 07-24-07/
│ │ ├── 0/
│ │ │ ├── .hydra/
│ │ │ └── example.log
│ │ ├── 1/
│ │ │ ├── .hydra/
│ │ │ └── example.log
│ │ ├── 2/
│ │ │ ├── .hydra/
│ │ │ └── example.log
│ │ └── 3/
│ │ ├── .hydra/
│ │ └── example.log
│ └── .hydraflow.lock
├── outputs/
│ └── 2025-11-30/
│ └── 07-24-03/
│ ├── .hydra/
│ └── example.log
├── example.py
├── mlflow.db
└── submit.py
Notice that all runs are added to the same MLflow experiment, making it easy to compare results across parameter combinations.
Cleanup
With HydraFlow, all important data is stored in MLflow, so you can safely delete the Hydra output directories:
$ rm -rf outputs multirun
After cleanup, the directory structure is much simpler:
./
├── mlruns/
│ └── 1/
│ ├── 078d96ccb5c249d3bdca4ddbdef3552e/
│ ├── 188edcb9403c4a449039bf7c0b1f1b8d/
│ ├── 54ffa3032464477f87d75f7eef5fb23c/
│ ├── 82e9f894b75146af960c080e61bf9cb7/
│ └── c6e2412157ae4759abeb26fc8116a88d/
├── example.py
├── hydraflow.yaml
├── mlflow.db
└── submit.py
All experiment data remains safely stored in the MLflow directory.
Summary
In this tutorial, you've learned how to:
- Create a simple HydraFlow application using the
@hydraflow.maindecorator - Define configuration using Python dataclasses
- Run experiments with default and overridden parameters
- Perform parameter sweeps using Hydra's multi-run capabilities
This basic pattern forms the foundation for all HydraFlow applications. As your machine learning workflows grow in complexity, you can build upon this foundation to create more sophisticated experiments.
Next Steps
Now that you've learned how to create and run a basic application, try:
- Creating more complex configurations with nested parameters
- Adding actual machine learning code to your application
- Exploring Automated Workflows with HydraFlow
- Learning how to Analyze Results from your experiments
For detailed documentation, refer to: