Updating Run Configurations

As machine learning projects evolve, configuration structures often change. The update method in HydraFlow provides a powerful way to handle these changes and work with runs from different periods in your project's lifecycle.

The Configuration Evolution Problem

A common challenge in ML experimentation is dealing with changing configuration schemas:

You start with a specific configuration structure
As your project evolves, you add new parameters
Now you have a mix of old and new runs with different configuration schemas
You want to analyze all runs together, but filtering becomes problematic

For example, imagine you've been training image models with a fixed aspect ratio, but later decided to parameterize this aspect ratio:

# Old configuration (fixed aspect ratio)
@dataclass
class ModelConfig:
    width: int = 256
    height: int = 256
    # aspect_ratio is implicitly 1:1

# New configuration (parameterized aspect ratio)
@dataclass
class ModelConfig:
    width: int = 256
    height: int = 256
    aspect_ratio: float = 1.0  # New parameter!

When you try to filter runs by aspect_ratio, older runs will lack this parameter, making consistent analysis difficult.

Using the Update Method

The update method solves this problem by allowing you to add missing configuration parameters to runs without altering existing values:

from hydraflow import Run

# Load a mix of old and new runs
runs = Run.load(["old_run_dir", "new_run_dir"])

# Add aspect_ratio to runs that don't have it
for run in runs:
    run.update("aspect_ratio", 1.0)  # Add default value if missing

# You can also use nested parameters with dot notation
run.update("model.aspect_ratio", 1.0)

# Or use double underscore notation for nested parameters
run.update("model__aspect_ratio", 1.0)  # Equivalent to "model.aspect_ratio"

# Now you can filter by aspect_ratio
square_runs = runs.filter(aspect_ratio=1.0)

The update method only adds values if the key doesn't already exist. For runs that already have an aspect_ratio parameter, the original value is preserved.

The double underscore notation (__) is automatically converted to dot notation (.) internally, making it particularly useful for working with nested configurations.

Batch Updates with RunCollection

To simplify updating multiple runs, you can use the RunCollection.update method:

# Update all runs at once
runs.update("aspect_ratio", 1.0)

# Now all runs have the aspect_ratio parameter

Dynamic Updates with Callables

You can provide a callable function instead of a fixed value to compute parameters dynamically:

# Calculate aspect_ratio from width and height
def calculate_aspect_ratio(run: Run) -> float:
    width = run.get("width", 0)  # Use default value if key doesn't exist
    height = run.get("height", 0)
    if height == 0:
        return 1.0
    return width / height

# Update with calculated values
runs.update("aspect_ratio", calculate_aspect_ratio)

# Combine with callable defaults for more sophisticated logic
runs.update("normalized_lr", lambda run: run.get(
    "learning_rate",
    default=lambda r: r.get("base_lr", 0.01) * r.get("lr_multiplier", 1.0)
) / 10)

The combination of dynamic updates with callable defaults in get provides a powerful mechanism for handling complex configuration scenarios and parameter dependencies. This approach allows you to:

First look for existing values with custom fallback logic using get with callable defaults
Then compute new parameters based on those values with update
Finally ensure all runs have consistent parameters for analysis

Updating Multiple Parameters

You can update multiple related parameters at once by using a tuple of keys:

# Update both width and height parameters
runs.update(
    ("width", "height"),
    (640, 480)
)

# Update with calculated values
def calculate_dimensions(run: Run) -> tuple[int, int]:
    base_size = run.get("base_size", 256)  # Default value if key doesn't exist
    return (base_size, base_size)

runs.update(("width", "height"), calculate_dimensions)

Forcing Updates

By default, update won't modify existing values. To override this behavior, use the force parameter:

# Force update even if the parameter already exists
runs.update("aspect_ratio", 1.0, force=True)

Best Practices

Add Documentation: Comment your code to explain why updates are needed, especially for future reference.
Use Consistent Defaults: When adding missing parameters, use sensible defaults that reflect the implicit values of older runs.
Consider Dynamic Updates: When possible, compute missing values from existing parameters to maintain consistency.
Update Early: Apply updates early in your analysis pipeline, before filtering or grouping.

Summary

The update method enables you to work with runs that have evolving configuration schemas. By adding missing parameters, you can treat old and new runs uniformly, enabling consistent analysis across your project's lifetime. This approach provides a form of "duck typing" for run configurations, allowing you to analyze runs based on their functional properties rather than their exact structure.