Skip to content

Quickstart

Hydra application

The following example demonstrates how to use Hydraflow with a Hydra application. There are two main steps to using Hydraflow:

  1. Set the MLflow experiment using the Hydra job name.
  2. Start a new MLflow run that logs the Hydra configuration.
apps/quickstart.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import logging
from dataclasses import dataclass

import hydra
from hydra.core.config_store import ConfigStore

import hydraflow

log = logging.getLogger(__name__)


@dataclass
class Config:
    width: int = 1024
    height: int = 768


cs = ConfigStore.instance()
cs.store(name="config", node=Config)


@hydra.main(version_base=None, config_name="config")
def app(cfg: Config) -> None:
    hydraflow.set_experiment()

    with hydraflow.start_run(cfg):
        log.info(f"{cfg.width=}, {cfg.height=}")


if __name__ == "__main__":
    app()

Set the MLflow experiment

hydraflow.set_experiment sets the MLflow experiment using the Hydra job name. Optionally, it can also set the tracking URI with uri argument. For example,

hydraflow.set_experiment(uri="sqlite:///mlruns.db")

Start a new MLflow run

hydraflow.start_run starts a new MLflow run that logs the Hydra configuration. It returns the started run so that it can be used to log metrics, parameters, and artifacts within the context of the run.

with hydraflow.start_run(cfg) as run:
    pass

Run the application

Single-run

Run the Hydra application as a normal Python script.

$ python apps/quickstart.py
2025/01/28 14:46:20 INFO mlflow.tracking.fluent: Experiment with name 'quickstart' does not exist. Creating a new experiment.
[2025-01-28 14:46:20,928][__main__][INFO] - cfg.width=1024, cfg.height=768

Check the MLflow CLI to view the experiment.

$ mlflow experiments search
Experiment Id       Name        Artifact Location                                                     
------------------  ----------  ----------------------------------------------------------------------
0                   Default     file:///home/runner/work/hydraflow/hydraflow/mlruns/0                 
245602949584624815  quickstart  file:///home/runner/work/hydraflow/hydraflow/mlruns/245602949584624815

Multi-run

$ python apps/quickstart.py -m width=400,600 height=100,200,300
[2025-01-28 14:46:25,162][HYDRA] Launching 6 jobs locally
[2025-01-28 14:46:25,162][HYDRA]    #0 : width=400 height=100
[2025-01-28 14:46:25,279][__main__][INFO] - cfg.width=400, cfg.height=100
[2025-01-28 14:46:25,281][HYDRA]    #1 : width=400 height=200
[2025-01-28 14:46:25,425][__main__][INFO] - cfg.width=400, cfg.height=200
[2025-01-28 14:46:25,427][HYDRA]    #2 : width=400 height=300
[2025-01-28 14:46:25,505][__main__][INFO] - cfg.width=400, cfg.height=300
[2025-01-28 14:46:25,507][HYDRA]    #3 : width=600 height=100
[2025-01-28 14:46:25,584][__main__][INFO] - cfg.width=600, cfg.height=100
[2025-01-28 14:46:25,586][HYDRA]    #4 : width=600 height=200
[2025-01-28 14:46:25,663][__main__][INFO] - cfg.width=600, cfg.height=200
[2025-01-28 14:46:25,665][HYDRA]    #5 : width=600 height=300
[2025-01-28 14:46:25,742][__main__][INFO] - cfg.width=600, cfg.height=300

Use Hydraflow API

Run collection

>>> import mlflow
>>> mlflow.set_experiment("quickstart")
>>> import hydraflow
>>> rc = hydraflow.list_runs()
>>> print(rc)
RunCollection(7)

Retrieve a run

>>> run = rc.first()
>>> print(type(run))
<class 'mlflow.entities.run.Run'>
>>> cfg = hydraflow.load_config(run)
>>> print(type(cfg))
>>> print(cfg)
<class 'omegaconf.dictconfig.DictConfig'>
{'width': 1024, 'height': 768}
>>> run = rc.last()
>>> cfg = hydraflow.load_config(run)
>>> print(cfg)
{'width': 600, 'height': 300}

Filter runs

>>> filtered = rc.filter(width=400)
>>> print(filtered)
RunCollection(3)
>>> filtered = rc.filter(height=[100, 300])
>>> print(filtered)
RunCollection(4)
>>> filtered = rc.filter(height=(100, 300))
>>> print(filtered)
RunCollection(6)
>>> run = rc.find(height=100)
>>> print(run.data.params)
{'height': '100', 'width': '400'}
>>> run = rc.find_last(height=100)
>>> print(run.data.params)
{'height': '100', 'width': '600'}

Map runs

>>> params = rc.map(lambda x: x.data.params)
>>> for p in params:
...     print(p)
{'height': '768', 'width': '1024'}
{'height': '100', 'width': '400'}
{'height': '200', 'width': '400'}
{'height': '300', 'width': '400'}
{'height': '100', 'width': '600'}
{'height': '200', 'width': '600'}
{'height': '300', 'width': '600'}
>>> list(rc.map_id(print))
903a22469ebf46b49322d1b1e9e1bcc9
3102b7a7957c44f5874f073ca905ed80
1819fbefc8ad4b2391d9c51c26122e64
36e7967804ee4a8ab8646d1d290a63eb
9f8bed2dd6414126a91fd4cdc7b99240
e2620faeab254fe7957154437145843c
533ee2ca14ff48a696e6a9b1e7469648

Group runs

>>> grouped = rc.groupby("width")
>>> for key, group in grouped.items():
...     print(key, group)
1024 RunCollection(1)
400 RunCollection(3)
600 RunCollection(3)
>>> grouped = rc.groupby(["height"])
>>> for key, group in grouped.items():
...     print(key, group)
('768',) RunCollection(1)
('100',) RunCollection(2)
('200',) RunCollection(2)
('300',) RunCollection(2)

Config dataframe

>>> print(rc.data.config)
   width  height
0   1024     768
1    400     100
2    400     200
3    400     300
4    600     100
5    600     200
6    600     300