Run Class
The Run
class is a fundamental component of
HydraFlow's analysis toolkit, representing a single execution of an
experiment. It provides structured access to all data associated with
a run, including configuration and artifacts.
Basic Usage
To work with a run, first load it using either
the constructor or the
Run.load
class method:
from hydraflow import Run
from pathlib import Path
# Using constructor with Path object
run_dir = Path("mlruns/exp_id/run_id")
run = Run(run_dir)
# Using load method with string path
run = Run.load("mlruns/exp_id/run_id")
Access Run Data
The Run
class provides access to run information and configuration.
Run Information
The info
attribute provides the following information:
print(f"Run ID: {run.info.run_id}")
print(f"Run Directory: {run.info.run_dir}")
print(f"Job name: {run.info.job_name}")
Run Configuration
The cfg
attribute provides the entire configuration:
# Access entire configuration
print(f"Configuration: {run.cfg}")
You can also access configuration values by key using
the get
method:
# Access configuration by key
learning_rate = run.get("learning_rate")
# Nested access with dot notation
model_type = run.get("model.type")
# Access implementation attributes or run info
metric_value = run.get("accuracy") # From impl or cfg
run_id = run.get("run_id") # From RunInfo
The get
method searches for values in the following order:
- First in the configuration (
cfg
) - Then in the implementation instance (
impl
) - Finally in the run information (
info
)
This provides a unified access interface regardless of where the data is stored.
Type-Safe Configuration Access
For better IDE integration and type checking, you can specify the configuration type as a type parameter:
from dataclasses import dataclass
from hydraflow import Run
@dataclass
class ModelConfig:
type: str
hidden_size: int
@dataclass
class TrainingConfig:
learning_rate: float
batch_size: int
epochs: int
@dataclass
class Config:
model: ModelConfig
training: TrainingConfig
seed: int = 42
# Create a typed Run instance
run = Run[Config](run_dir)
# Type-safe access with IDE auto-completion
model_type = run.cfg.model.type
lr = run.cfg.training.learning_rate
seed = run.cfg.seed
Custom Implementation Classes
The Run
class can be extended with custom
implementation classes to add
domain-specific functionality:
from pathlib import Path
from hydraflow import Run
class ModelLoader:
def __init__(self, artifacts_dir: Path):
self.artifacts_dir = artifacts_dir
def load_weights(self):
"""Load the model weights from the artifacts directory."""
return torch.load(self.artifacts_dir / "weights.pt")
def evaluate(self, test_data):
"""Evaluate the model on test data."""
model = self.load_weights()
return model.evaluate(test_data)
# Create a Run with implementation
run = Run[Config, ModelLoader](run_dir, ModelLoader)
The impl
attribute provides access to the
implementation class instance:
# Access implementation methods
weights = run.impl.load_weights()
results = run.impl.evaluate(test_data)
Configuration-Aware Implementations
Implementation classes can optionally accept the run's configuration:
class AdvancedModelLoader:
def __init__(self, artifacts_dir: Path, cfg: Config | None = None):
self.artifacts_dir = artifacts_dir
self.cfg = cfg
def load_model(self):
"""Load model using configuration parameters."""
model_type = self.cfg.model.type
model_path = self.artifacts_dir / f"{model_type}_model.pt"
return torch.load(model_path)
# The implementation will receive both artifacts_dir and cfg
run = Run[Config, AdvancedModelLoader](run_dir, AdvancedModelLoader)
model = run.impl.load_model() # Uses configuration information
Loading Multiple Runs
The load
class method can load both individual runs and collections of runs:
# Load a single run
run = Run.load("mlruns/exp_id/run_id")
# Load multiple runs to create a RunCollection
run_dirs = ["mlruns/exp_id/run_id1", "mlruns/exp_id/run_id2"]
runs = Run.load(run_dirs)
# Load runs with parallel processing
runs = Run.load(run_dirs, n_jobs=4) # Use 4 parallel jobs for loading
runs = Run.load(run_dirs, n_jobs=-1) # Use all available CPU cores
Finding Runs with iter_run_dirs
HydraFlow provides the iter_run_dirs
function to easily discover runs in your MLflow tracking directory:
from hydraflow.core.io import iter_run_dirs
from hydraflow import Run
# Find all runs in the tracking directory
tracking_dir = "mlruns"
run_dirs = list(iter_run_dirs(tracking_dir))
runs = Run.load(run_dirs)
# Filter runs by experiment name
# - Use a single experiment name
runs = Run.load(iter_run_dirs(tracking_dir, "my_experiment"))
# - Use multiple experiment names (with pattern matching)
runs = Run.load(iter_run_dirs(tracking_dir, ["train_*", "eval_*"]))
# - Use a custom filtering function
def filter_experiments(name: str) -> bool:
return name.startswith("train_") and "v2" in name
runs = Run.load(iter_run_dirs(tracking_dir, filter_experiments))
The iter_run_dirs
function yields paths to run directories that can be directly passed to Run.load
. This makes it easy to find and load runs based on experiment names or custom filtering criteria.
Best Practices
-
Use Type Parameters: Specify configuration types with
Run[Config]
for better IDE support and type checking. -
Leverage Custom Implementations: Create domain-specific implementation classes to encapsulate analysis logic.
-
Use Parallel Loading: For large numbers of runs, use the
n_jobs
parameter withload
to speed up loading. -
Unified Data Access: Use the
get
method as a unified interface to access data from all components (configuration, implementation, and run info). It provides a consistent way to retrieve values regardless of where they are stored, with a clear precedence order (cfg → impl → info). -
Default Values: When accessing potentially missing keys, use the
get
method's default parameter:run.get("key", default_value)
.
Summary
The Run
class provides a powerful interface for
working with experiment runs in HydraFlow. Its type-safe configuration access,
custom implementation support, and convenient loading mechanisms make it easy
to analyze and compare experiment results effectively.