Creating Your First HydraFlow Application
This tutorial demonstrates how to create and run a basic HydraFlow application that integrates Hydra's configuration management with MLflow's experiment tracking.
Prerequisites
Before you begin this tutorial, you should:
- Have HydraFlow installed (Installation Guide)
- Have a basic understanding of Python and machine learning experiments
Project Structure
First, let's examine our project structure:
.
├── example.py
├── hydraflow.yaml
└── submit.py
In this tutorial, we will only use the example.py
file.
Creating a Basic Application
Let's create a simple HydraFlow application that defines a configuration class and tracks experiment parameters:
example.py | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
Understanding the Key Components
Let's break down the essential parts of this application:
-
Configuration Class: A dataclass that defines the parameters for our experiment
@dataclass class Config: width: int = 1024 height: int = 768
This class defines the structure and default values for our configuration parameters. Using Python's dataclass gives us type safety and clear structure.
-
Main Function: The core of our application, decorated with
@hydraflow.main
@hydraflow.main(Config) def app(run: Run, cfg: Config) -> None: log.info(run.info.run_id) log.info(cfg)
This function will be executed with the provided configuration. It takes two key parameters:
run
: An MLflow run object that provides access to the current experimentcfg
: The configuration object with our parameters
The Power of the Decorator
The hydraflow.main
decorator is where the magic happens:
- It registers your configuration class with Hydra's
ConfigStore
- It sets up an MLflow experiment
- It starts an MLflow run and passes it to your function
- It stores all Hydra configuration and logs as MLflow artifacts
This single decorator seamlessly connects Hydra's configuration capabilities with MLflow's experiment tracking.
Running the Application
Now that we understand the code, let's run our application.
Single-run Mode
First, let's run it in single-run mode:
$ python example.py
2025/04/19 02:41:04 INFO mlflow.tracking.fluent: Experiment with name 'example' does not exist. Creating a new experiment.
[2025-04-19 02:41:05,037][__main__][INFO] - 2179c72c88624825b322bf2522e7674a
[2025-04-19 02:41:05,037][__main__][INFO] - {'width': 1024, 'height': 768}
When you run the application, HydraFlow automatically:
- Creates an MLflow experiment named after your application (in this case, "example")
- Starts a run with the provided configuration
- Captures logs and artifacts
Let's use the MLflow CLI to verify that our experiment was created:
$ mlflow experiments search
Experiment Id Name Artifact Location
------------------ ------- -------------------------------------------------------------------------------
0 Default file:///home/runner/work/hydraflow/hydraflow/examples/mlruns/0
445485773524966029 example file:///home/runner/work/hydraflow/hydraflow/examples/mlruns/445485773524966029
Now, let's examine the directory structure created by Hydra and MlFlow:
.
├── mlruns
│ ├── 0
│ │ └── meta.yaml
│ └── 445485773524966029
│ ├── 2179c72c88624825b322bf2522e7674a
│ │ ├── artifacts
│ │ │ ├── .hydra
│ │ │ └── example.log
│ │ ├── metrics
│ │ ├── params
│ │ └── meta.yaml
│ └── meta.yaml
├── outputs
│ └── 2025-04-19
│ └── 02-41-04
│ ├── .hydra
│ │ ├── config.yaml
│ │ ├── hydra.yaml
│ │ └── overrides.yaml
│ └── example.log
├── example.py
├── hydraflow.yaml
└── submit.py
The directory structure shows:
outputs
directory: Created by Hydra to store the run's outputsmlruns
directory: Created by MLflow to store experiment dataartifacts
directory: Contains configuration files and logs managed by HydraFlow
Multi-run Mode (Parameter Sweeps)
One of Hydra's most powerful features is the ability to run parameter sweeps. Let's try this by overriding our configuration parameters:
$ python example.py -m width=400,600 height=100,200
[2025-04-19 02:41:09,873][HYDRA] Launching 4 jobs locally
[2025-04-19 02:41:09,873][HYDRA] #0 : width=400 height=100
[2025-04-19 02:41:10,001][__main__][INFO] - 31b314073635407daedbd844bdc29f41
[2025-04-19 02:41:10,001][__main__][INFO] - {'width': 400, 'height': 100}
[2025-04-19 02:41:10,003][HYDRA] #1 : width=400 height=200
[2025-04-19 02:41:10,084][__main__][INFO] - 851ef5be98bb42a195a265eb5cfdd5aa
[2025-04-19 02:41:10,084][__main__][INFO] - {'width': 400, 'height': 200}
[2025-04-19 02:41:10,087][HYDRA] #2 : width=600 height=100
[2025-04-19 02:41:10,166][__main__][INFO] - df439ec6d4ab456f9e56c89d1e944b05
[2025-04-19 02:41:10,167][__main__][INFO] - {'width': 600, 'height': 100}
[2025-04-19 02:41:10,169][HYDRA] #3 : width=600 height=200
[2025-04-19 02:41:10,251][__main__][INFO] - 15099e7e9d8149cab749ca538f00c33c
[2025-04-19 02:41:10,251][__main__][INFO] - {'width': 600, 'height': 200}
The -m
flag (or --multirun
) tells Hydra to run all combinations of the specified parameters. In this case, we'll run 4 combinations:
- width=400, height=100
- width=400, height=200
- width=600, height=100
- width=600, height=200
Let's see the updated directory structure:
.
├── mlruns
│ ├── 0
│ └── 445485773524966029
│ ├── 15099e7e9d8149cab749ca538f00c33c
│ │ └── artifacts
│ │ ├── .hydra
│ │ └── example.log
│ ├── 2179c72c88624825b322bf2522e7674a
│ │ └── artifacts
│ │ ├── .hydra
│ │ └── example.log
│ ├── 31b314073635407daedbd844bdc29f41
│ │ └── artifacts
│ │ ├── .hydra
│ │ └── example.log
│ ├── 851ef5be98bb42a195a265eb5cfdd5aa
│ │ └── artifacts
│ │ ├── .hydra
│ │ └── example.log
│ └── df439ec6d4ab456f9e56c89d1e944b05
│ └── artifacts
│ ├── .hydra
│ └── example.log
├── multirun
│ └── 2025-04-19
│ └── 02-41-09
│ ├── 0
│ │ ├── .hydra
│ │ └── example.log
│ ├── 1
│ │ ├── .hydra
│ │ └── example.log
│ ├── 2
│ │ ├── .hydra
│ │ └── example.log
│ └── 3
│ ├── .hydra
│ └── example.log
├── outputs
│ └── 2025-04-19
│ └── 02-41-04
│ ├── .hydra
│ └── example.log
├── example.py
└── submit.py
Notice that all runs are added to the same MLflow experiment, making it easy to compare results across parameter combinations.
Cleanup
With HydraFlow, all important data is stored in MLflow, so you can safely delete the Hydra output directories:
$ rm -rf outputs multirun
After cleanup, the directory structure is much simpler:
.
├── mlruns
│ ├── 0
│ │ └── meta.yaml
│ └── 445485773524966029
│ ├── 15099e7e9d8149cab749ca538f00c33c
│ ├── 2179c72c88624825b322bf2522e7674a
│ ├── 31b314073635407daedbd844bdc29f41
│ ├── 851ef5be98bb42a195a265eb5cfdd5aa
│ ├── df439ec6d4ab456f9e56c89d1e944b05
│ └── meta.yaml
├── example.py
├── hydraflow.yaml
└── submit.py
All experiment data remains safely stored in the MLflow directory.
Summary
In this tutorial, you've learned how to:
- Create a simple HydraFlow application using the
@hydraflow.main
decorator - Define configuration using Python dataclasses
- Run experiments with default and overridden parameters
- Perform parameter sweeps using Hydra's multi-run capabilities
This basic pattern forms the foundation for all HydraFlow applications. As your machine learning workflows grow in complexity, you can build upon this foundation to create more sophisticated experiments.
Next Steps
Now that you've learned how to create and run a basic application, try:
- Creating more complex configurations with nested parameters
- Adding actual machine learning code to your application
- Exploring Automated Workflows with HydraFlow
- Learning how to Analyze Results from your experiments
For detailed documentation, refer to: