Core Concepts
This page introduces the fundamental concepts of HydraFlow that form the foundation of the framework.
Design Principles
HydraFlow is built on the following design principles:
- Type Safety - Utilizing Python dataclasses for configuration type checking and IDE support
- Reproducibility - Automatically tracking all experiment configurations for fully reproducible experiments
- Workflow Integration - Creating a cohesive workflow by integrating Hydra's configuration management with MLflow's experiment tracking
- Analysis Capabilities - Providing powerful APIs for easily analyzing experiment results
Key Components
HydraFlow consists of the following key components:
Configuration Management
HydraFlow uses a hierarchical configuration system based on OmegaConf and Hydra. This provides:
- Type-safe configuration using Python dataclasses
- Schema validation to ensure configuration correctness
- Configuration composition from multiple sources
- Command-line overrides
Example configuration:
from dataclasses import dataclass
@dataclass
class Config:
learning_rate: float = 0.001
batch_size: int = 32
epochs: int = 10
This configuration class defines the structure and default values for your experiment, enabling type checking and auto-completion.
Main Decorator
The @hydraflow.main
decorator defines the entry point for a HydraFlow application:
import hydraflow
from mlflow.entities import Run
@hydraflow.main(Config)
def train(run: Run, cfg: Config) -> None:
# Your experiment code
print(f"Training with lr={cfg.learning_rate}, batch_size={cfg.batch_size}")
# Log metrics
hydraflow.log_metric("accuracy", 0.95)
This decorator provides:
- Automatic registration of your config class with Hydra's
ConfigStore
- Automatic setup of an MLflow experiment
- Storage of Hydra configurations and logs as MLflow artifacts
- Support for type-safe APIs and IDE integration
Workflow Automation
HydraFlow allows you to automate experiment workflows using a YAML-based job definition system:
jobs:
train_models:
run: python train.py
sets:
- each: model=small,medium,large
all: learning_rate=0.001,0.01,0.1
This enables:
- Defining reusable experiment workflows
- Efficient configuration of parameter sweeps
- Organization of complex experiment campaigns
You can also define more complex parameter spaces using extended sweep syntax:
# Ranges (start:end:step)
python train.py -m "learning_rate=0.01:0.03:0.01"
# SI prefixes
python train.py -m "batch_size=1k,2k,4k"
# 1000, 2000, 4000
# Grid within a single parameter
python train.py -m "model=(small,large)_(v1,v2)"
# small_v1, small_v2, large_v1, large_v2
Analysis Tools
After running experiments, HydraFlow provides powerful tools for accessing and analyzing results. These tools help you track, compare, and derive insights from your experiments.
Working with Individual Runs
For individual experiment analysis, HydraFlow provides the Run
class, which represents a single experiment run:
from hydraflow import Run
# Load an existing run
run = Run.load("path/to/run")
# Access configuration values
learning_rate = run.get("learning_rate")
The Run
class provides:
- Access to experiment configurations used during the run
- Methods for loading and analyzing experiment results
- Support for custom implementations through the factory pattern
- Type-safe access to configuration values
You can use type parameters for more powerful IDE support:
from dataclasses import dataclass
from hydraflow import Run
@dataclass
class MyConfig:
learning_rate: float
batch_size: int
# Load a Run with type information
run = Run[MyConfig].load("path/to/run")
print(run.cfg.learning_rate) # IDE auto-completion works
Comparing Multiple Runs
For comparing multiple runs, HydraFlow offers the RunCollection
class, which enables efficient analysis across runs:
# Load multiple runs
runs = Run.load(["path/to/run1", "path/to/run2", "path/to/run3"])
# Filter runs by parameter value
filtered_runs = runs.filter(model_type="lstm")
# Group runs by a parameter
grouped_runs = runs.group_by("dataset_name")
# Convert to DataFrame for analysis
df = runs.to_frame("learning_rate", "batch_size", "accuracy")
Key features of experiment comparison:
- Filtering runs based on configuration parameters
- Grouping runs by common attributes
- Aggregating data across runs
- Converting to Polars DataFrames for advanced analysis
Summary
These core concepts work together to provide a comprehensive framework for managing machine learning experiments:
- Configuration Management - Type-safe configuration with Python dataclasses
- Main Decorator - The entry point that integrates Hydra and MLflow
- Workflow Automation - Reusable experiment definitions and advanced parameter sweeps
- Analysis Tools - Access, filter, and analyze experiment results
Understanding these fundamental concepts will help you leverage the full power of HydraFlow for your machine learning projects.