Creating Your First HydraFlow Application
This tutorial demonstrates how to create and run a basic HydraFlow application that integrates Hydra's configuration management with MLflow's experiment tracking.
Prerequisites
Before you begin this tutorial, you should:
- Have HydraFlow installed (Installation Guide)
- Have a basic understanding of Python
Project Structure
First, let's examine our project structure:
./
├── example.py
├── hydraflow.yaml
└── submit.py
In this tutorial, we will only use the example.py file.
Creating a Basic Application
Let's create a simple HydraFlow application that defines a configuration class and tracks experiment parameters:
| example.py | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | |
Understanding the Key Components
Let's break down the essential parts of this application:
-
Configuration Class: A
dataclassthat defines the parameters for our experiment:@dataclass class Config: width: int = 1024 height: int = 768 -
Main Function: The core of our application, decorated with
@hydraflow.main:@hydraflow.main(Config) def app(run: Run, cfg: Config) -> None: logger.info(run.info.run_id) logger.info(cfg)This function is the entry point and receives two key parameters:
run(an MLflow Run object) andcfg(the configuration object). -
Entry Point: The standard Python entry point that calls our application function:
if __name__ == "__main__": app()
The Power of the Decorator
The hydraflow.main decorator is where the magic happens:
- It registers your configuration class with Hydra's
ConfigStore. - It sets the MLflow tracking URI via the
tracking_uriif provided. - It sets up an MLflow experiment.
- It starts an MLflow run and passes it to your function.
- It stores all Hydra configuration and logs as MLflow artifacts.
Running the Application
Now that we understand the code, let's run our application.
Single-run Mode
First, let's run it in single-run mode:
$ python example.py
2026/03/14 08:43:34 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2026/03/14 08:43:34 INFO mlflow.store.db.utils: Updating database tables
2026/03/14 08:43:34 INFO mlflow.tracking.fluent: Experiment with name 'example' does not exist. Creating a new experiment.
[2026-03-14 08:43:34,825][__main__][INFO] - aecad860807c4ca58b7a6ec5370bd783
[2026-03-14 08:43:34,825][__main__][INFO] - {'width': 1024, 'height': 768}
When you run the application, HydraFlow automatically:
- Sets the MLflow tracking URI to the
mlflow.dbSQLite database in the project root. - Creates an MLflow experiment named after your application (in this case, "example").
- Starts a run with the provided configuration.
- Captures logs and artifacts.
Let's use the MLflow CLI to verify that our experiment was created:
$ mlflow experiments search
Experiment Id Name Artifact Location
--------------- ------- -------------------------------------------------------
0 Default /home/runner/work/hydraflow/hydraflow/examples/mlruns/0
1 example /home/runner/work/hydraflow/hydraflow/examples/mlruns/1
Now, let's examine the directory structure created by Hydra and MlFlow:
./
├── mlruns/
│ └── 1/
│ └── aecad860807c4ca58b7a6ec5370bd783/
│ └── artifacts/
│ ├── .hydra/
│ └── example.log
├── outputs/
│ └── 2026-03-14/
│ └── 08-43-33/
│ ├── .hydra/
│ │ ├── config.yaml
│ │ ├── hydra.yaml
│ │ └── overrides.yaml
│ └── example.log
├── example.py
├── hydraflow.yaml
├── mlflow.db
└── submit.py
The directory structure shows:
outputsdirectory: Created by Hydra to store the run's outputsmlflow.dbfile: Created by MLflow to store the experiment tracking databasemlrunsdirectory: Created by MLflow to store experiment artifactsartifactsdirectory: Contains configuration files and logs managed by HydraFlow
Multi-run Mode (Parameter Sweeps)
One of Hydra's most powerful features is the ability to run parameter sweeps. Let's try this by overriding our configuration parameters:
$ python example.py -m width=400,600 height=100,200
[2026-03-14 08:43:39,195][HYDRA] Launching 4 jobs locally
[2026-03-14 08:43:39,195][HYDRA] #0 : width=400 height=100
[2026-03-14 08:43:39,738][__main__][INFO] - 1407f330da0e4aeda73bc7b50de108d3
[2026-03-14 08:43:39,738][__main__][INFO] - {'width': 400, 'height': 100}
[2026-03-14 08:43:39,746][HYDRA] #1 : width=400 height=200
[2026-03-14 08:43:39,808][__main__][INFO] - d543517f32044d6b90e56c3e0244fbb4
[2026-03-14 08:43:39,808][__main__][INFO] - {'width': 400, 'height': 200}
[2026-03-14 08:43:39,813][HYDRA] #2 : width=600 height=100
[2026-03-14 08:43:39,874][__main__][INFO] - a8b80af4102147d0b14f4bb8a7597d12
[2026-03-14 08:43:39,874][__main__][INFO] - {'width': 600, 'height': 100}
[2026-03-14 08:43:39,888][HYDRA] #3 : width=600 height=200
[2026-03-14 08:43:39,950][__main__][INFO] - 641592e875e840c2a92844064bac3868
[2026-03-14 08:43:39,951][__main__][INFO] - {'width': 600, 'height': 200}
The -m flag (or --multirun) tells Hydra to run all combinations of
the specified parameters. In this case, we'll run 4 combinations:
- width=400, height=100
- width=400, height=200
- width=600, height=100
- width=600, height=200
Let's see the updated directory structure:
./
├── mlruns/
│ └── 1/
│ ├── 1407f330da0e4aeda73bc7b50de108d3/
│ │ └── artifacts/
│ │ ├── .hydra/
│ │ └── example.log
│ ├── 641592e875e840c2a92844064bac3868/
│ │ └── artifacts/
│ │ ├── .hydra/
│ │ └── example.log
│ ├── a8b80af4102147d0b14f4bb8a7597d12/
│ │ └── artifacts/
│ │ ├── .hydra/
│ │ └── example.log
│ ├── aecad860807c4ca58b7a6ec5370bd783/
│ │ └── artifacts/
│ │ ├── .hydra/
│ │ └── example.log
│ └── d543517f32044d6b90e56c3e0244fbb4/
│ └── artifacts/
│ ├── .hydra/
│ └── example.log
├── multirun/
│ └── 2026-03-14/
│ └── 08-43-39/
│ ├── 0/
│ │ ├── .hydra/
│ │ └── example.log
│ ├── 1/
│ │ ├── .hydra/
│ │ └── example.log
│ ├── 2/
│ │ ├── .hydra/
│ │ └── example.log
│ └── 3/
│ ├── .hydra/
│ └── example.log
├── outputs/
│ └── 2026-03-14/
│ └── 08-43-33/
│ ├── .hydra/
│ └── example.log
├── example.py
├── mlflow.db
└── submit.py
Notice that all runs are added to the same MLflow experiment, making it easy to compare results across parameter combinations.
Cleanup
With HydraFlow, all important data is stored in MLflow, so you can safely delete the Hydra output directories:
$ rm -rf outputs multirun
After cleanup, the directory structure is much simpler:
./
├── mlruns/
│ └── 1/
│ ├── 1407f330da0e4aeda73bc7b50de108d3/
│ ├── 641592e875e840c2a92844064bac3868/
│ ├── a8b80af4102147d0b14f4bb8a7597d12/
│ ├── aecad860807c4ca58b7a6ec5370bd783/
│ └── d543517f32044d6b90e56c3e0244fbb4/
├── example.py
├── hydraflow.yaml
├── mlflow.db
└── submit.py
All experiment data remains safely stored in the MLflow directory.
Summary
In this tutorial, you've learned how to:
- Create a simple HydraFlow application using the
@hydraflow.maindecorator - Define configuration using Python dataclasses
- Run experiments with default and overridden parameters
- Perform parameter sweeps using Hydra's multi-run capabilities
This basic pattern forms the foundation for all HydraFlow applications. As your machine learning workflows grow in complexity, you can build upon this foundation to create more sophisticated experiments.
Next Steps
Now that you've learned how to create and run a basic application, try:
- Creating more complex configurations with nested parameters
- Adding actual machine learning code to your application
- Exploring Automated Workflows with HydraFlow
- Learning how to Analyze Results from your experiments
For detailed documentation, refer to: