Core Concepts

This page introduces the fundamental concepts of HydraFlow that form the foundation of the framework.

Design Principles

HydraFlow is built on the following design principles:

Type Safety - Utilizing Python dataclasses for configuration type checking and IDE support
Reproducibility - Automatically tracking all experiment configurations for fully reproducible experiments
Workflow Integration - Creating a cohesive workflow by integrating Hydra's configuration management with MLflow's experiment tracking
Analysis Capabilities - Providing powerful APIs for easily analyzing experiment results

Key Components

HydraFlow consists of the following key components:

Configuration Management

HydraFlow uses a hierarchical configuration system based on OmegaConf and Hydra. This provides:

Type-safe configuration using Python dataclasses
Schema validation to ensure configuration correctness
Configuration composition from multiple sources
Command-line overrides

Example configuration:

from dataclasses import dataclass

@dataclass
class Config:
    learning_rate: float = 0.001
    batch_size: int = 32
    epochs: int = 10

This configuration class defines the structure and default values for your experiment, enabling type checking and auto-completion.

Main Decorator

The @hydraflow.main decorator defines the entry point for a HydraFlow application:

import hydraflow
from mlflow.entities import Run

@hydraflow.main(Config)
def train(run: Run, cfg: Config) -> None:
    # Your experiment code
    print(f"Training with lr={cfg.learning_rate}, batch_size={cfg.batch_size}")

    # Log metrics
    hydraflow.log_metric("accuracy", 0.95)

This decorator provides:

Automatic registration of your config class with Hydra's ConfigStore
Automatic setup of an MLflow experiment
Storage of Hydra configurations and logs as MLflow artifacts
Support for type-safe APIs and IDE integration

Workflow Automation

HydraFlow allows you to automate experiment workflows using a YAML-based job definition system:

jobs:
  train_models:
    run: python train.py
    sets:
      - each: model=small,medium,large
        all: learning_rate=0.001,0.01,0.1

This enables:

Defining reusable experiment workflows
Efficient configuration of parameter sweeps
Organization of complex experiment campaigns

You can also define more complex parameter spaces using extended sweep syntax:

# Ranges (start:end:step)
python train.py -m "learning_rate=0.01:0.03:0.01"

# SI prefixes
python train.py -m "batch_size=1k,2k,4k"
# 1000, 2000, 4000

# Grid within a single parameter
python train.py -m "model=(small,large)_(v1,v2)"
# small_v1, small_v2, large_v1, large_v2

Analysis Tools

After running experiments, HydraFlow provides powerful tools for accessing and analyzing results. These tools help you track, compare, and derive insights from your experiments.

Working with Individual Runs

For individual experiment analysis, HydraFlow provides the Run class, which represents a single experiment run:

from hydraflow import Run

# Load an existing run
run = Run.load("path/to/run")

# Access configuration values
learning_rate = run.get("learning_rate")

The Run class provides:

Access to experiment configurations used during the run
Methods for loading and analyzing experiment results
Support for custom implementations through the factory pattern
Type-safe access to configuration values

You can use type parameters for more powerful IDE support:

from dataclasses import dataclass
from hydraflow import Run

@dataclass
class MyConfig:
    learning_rate: float
    batch_size: int

# Load a Run with type information
run = Run[MyConfig].load("path/to/run")
print(run.cfg.learning_rate)  # IDE auto-completion works

Comparing Multiple Runs

For comparing multiple runs, HydraFlow offers the RunCollection class, which enables efficient analysis across runs:

# Load multiple runs
runs = Run.load(["path/to/run1", "path/to/run2", "path/to/run3"])

# Filter runs by parameter value
filtered_runs = runs.filter(model_type="lstm")

# Group runs by a parameter
grouped_runs = runs.group_by("dataset_name")

# Convert to DataFrame for analysis
df = runs.to_frame("learning_rate", "batch_size", "accuracy")

Key features of experiment comparison:

Filtering runs based on configuration parameters
Grouping runs by common attributes
Aggregating data across runs
Converting to Polars DataFrames for advanced analysis

Summary

These core concepts work together to provide a comprehensive framework for managing machine learning experiments:

Configuration Management - Type-safe configuration with Python dataclasses
Main Decorator - The entry point that integrates Hydra and MLflow
Workflow Automation - Reusable experiment definitions and advanced parameter sweeps
Analysis Tools - Access, filter, and analyze experiment results

Understanding these fundamental concepts will help you leverage the full power of HydraFlow for your machine learning projects.