hydraflow

source package hydraflow

Integrate Hydra and MLflow to manage and track machine learning experiments.

Classes

Collection — A collection of items that implements the Sequence protocol.
Run — Represent an MLflow Run in HydraFlow.
RunCollection — A collection of Run instances that implements the Sequence protocol.

Functions

chdir_artifact — Change the current working directory to the artifact directory of the given run.
get_artifact_dir — Retrieve the artifact directory for the given run.
get_experiment_names — Get the experiment names from the tracking directory.
iter_artifact_paths — Iterate over the artifact paths in the tracking directory.
iter_artifacts_dirs — Iterate over the artifacts directories in the tracking directory.
iter_experiment_dirs — Iterate over the experiment directories in the tracking directory.
iter_run_dirs — Iterate over the run directories in the tracking directory.
log_run — Log the parameters from the given configuration instance.
main — Decorator for configuring and running MLflow experiments with Hydra.
start_run — Start an MLflow run and log parameters using the provided configuration instance.

source class Collection[I](items: Iterable[I], get: Callable[[I, str, Any | Callable[[I], Any]], Any] | None = None)

Bases : Sequence[I]

A collection of items that implements the Sequence protocol.

Methods

filter — Filter items based on criteria.
try_get — Try to get a single item matching the specified criteria.
get — Get a single item matching the specified criteria.
first — Get the first item matching the specified criteria.
last — Get the last item matching the specified criteria.
to_list — Extract a list of values for a specific key from all items.
to_numpy — Extract values for a specific key from all items as a NumPy array.
to_series — Extract values for a specific key from all items as a Polars series.
unique — Get the unique values for a specific key across all items.
n_unique — Count the number of unique values for a specific key across all items.
sort — Sort items based on one or more keys.
map — Apply a function to each item and return an iterator of results.
pmap — Apply a function to each item in parallel and return a list of results.
to_frame — Convert the collection to a Polars DataFrame.
group_by — Group items by one or more keys and return a GroupBy instance.
sample — Sample a random subset of items from the collection.
shuffle — Shuffle the items in the collection.
eq — Create a predicate function that checks if two attributes are equal.
ne — Create a predicate function that checks if two attributes are not equal.
gt — Create a predicate function that checks if the left > the right.
lt — Create a predicate function that checks if the left < the right.
ge — Create a predicate function that checks if the left >= the right.
le — Create a predicate function that checks if the left <= the right.
startswith — Create a predicate function that checks if an attribute starts with a prefix.
endswith — Create a predicate function that checks if an attribute ends with a suffix.
match — Create a predicate function that checks if an attribute matches a pattern.

source method Collection.filter(*criteria: Callable[[I], bool] | tuple[str, Any], **kwargs: Any) → Self

Filter items based on criteria.

This method allows filtering items using various criteria:

Callable criteria that take an item and return a boolean
Key-value tuples where the key is a string and the value is compared using the matches function
Keyword arguments, where the key is a string and the value is compared using the matches function

The matches function supports the following comparison types:

Callable: The predicate function is called with the value
List/Set: Checks if the value is in the list/set
Tuple of length 2: Checks if the value is in the range [min, max]
Other: Checks for direct equality

Parameters

*criteria : Callable[[I], bool] | tuple[str, Any] — Callable criteria or (key, value) tuples for filtering.
**kwargs : Any — Additional key-value pairs for filtering.

Returns

Self — A new Collection containing only the items that match all criteria.

Examples

# Filter using a callable
filtered = collection.filter(lambda x: x > 5)

# Filter using a key-value tuple
filtered = collection.filter(("age", 25))

# Filter using keyword arguments
filtered = collection.filter(age=25, name="John")

# Filter using range
filtered = collection.filter(("age", (20, 30)))

# Filter using list membership
filtered = collection.filter(("name", ["John", "Jane"]))

source method Collection.try_get(*criteria: Callable[[I], bool] | tuple[str, Any], **kwargs: Any) → I | None

Try to get a single item matching the specified criteria.

This method applies filters and returns a single matching item if exactly one is found, None if no items are found, or raises ValueError if multiple items match.

Parameters

*criteria : Callable[[I], bool] | tuple[str, Any] — Callable criteria or (key, value) tuples for filtering.
**kwargs : Any — Additional key-value pairs for filtering.

Returns

I | None — A single item that matches the criteria, or None if no matches are found.

Raises

ValueError — If multiple items match the criteria.

source method Collection.get(*criteria: Callable[[I], bool] | tuple[str, Any], **kwargs: Any) → I

Get a single item matching the specified criteria.

This method applies filters and returns a single matching item, or raises ValueError if no items or multiple items match.

Parameters

*criteria : Callable[[I], bool] | tuple[str, Any] — Callable criteria or (key, value) tuples for filtering.
**kwargs : Any — Additional key-value pairs for filtering.

Returns

I — A single item that matches the criteria.

Raises

ValueError — If no items match or if multiple items match
the criteria.
_value_error

source method Collection.first(*criteria: Callable[[I], bool] | tuple[str, Any], **kwargs: Any) → I

Get the first item matching the specified criteria.

This method applies filters and returns the first matching item, or raises ValueError if no items match.

Parameters

*criteria : Callable[[I], bool] | tuple[str, Any] — Callable criteria or (key, value) tuples for filtering.
**kwargs : Any — Additional key-value pairs for filtering.

Returns

I — The first item that matches the criteria.

Raises

ValueError — If no items match the criteria.
_value_error

source method Collection.last(*criteria: Callable[[I], bool] | tuple[str, Any], **kwargs: Any) → I

Get the last item matching the specified criteria.

This method applies filters and returns the last matching item, or raises ValueError if no items match.

Parameters

*criteria : Callable[[I], bool] | tuple[str, Any] — Callable criteria or (key, value) tuples for filtering.
**kwargs : Any — Additional key-value pairs for filtering.

Returns

I — The last item that matches the criteria.

Raises

ValueError — If no items match the criteria.
_value_error

source method Collection.to_list(key: str, default: Any | Callable[[I], Any] = MISSING) → list[Any]

Extract a list of values for a specific key from all items.

Parameters

key : str — The key to extract from each item.
default : Any | Callable[[I], Any] — The default value to return if the key is not found. If a callable, it will be called with the item and the value returned will be used as the default.

Returns

list[Any] — A list containing the values for the specified key from each item.

source method Collection.to_numpy(key: str, default: Any | Callable[[I], Any] = MISSING) → NDArray

Extract values for a specific key from all items as a NumPy array.

Parameters

key : str — The key to extract from each item.
default : Any | Callable[[I], Any] — The default value to return if the key is not found. If a callable, it will be called with the item and the value returned will be used as the default.

Returns

NDArray — A NumPy array containing the values for the specified key from each item.

source method Collection.to_series(key: str, default: Any = MISSING, *, name: str | None = None) → Series

Extract values for a specific key from all items as a Polars series.

Parameters

key : str — The key to extract from each item.
default : Any — The default value to return if the key is not found. If a callable, it will be called with the item and the value returned will be used as the default.
name : str | None — The name of the series. If not provided, the key will be used.

Returns

Series — A Polars series containing the values for the specified key from each item.

source method Collection.unique(key: str, default: Any | Callable[[I], Any] = MISSING) → NDArray

Get the unique values for a specific key across all items.

Parameters

key : str — The key to extract unique values for.
default : Any | Callable[[I], Any] — The default value to return if the key is not found. If a callable, it will be called with the item and the value returned will be used as the default.

Returns

NDArray — A NumPy array containing the unique values for the specified key.

source method Collection.n_unique(key: str, default: Any | Callable[[I], Any] = MISSING) → int

Count the number of unique values for a specific key across all items.

Parameters

key : str — The key to count unique values for.
default : Any | Callable[[I], Any] — The default value to return if the key is not found. If a callable, it will be called with the item and the value returned will be used as the default.

Returns

int — The number of unique values for the specified key.

source method Collection.sort(*keys: str, reverse: bool = False) → Self

Sort items based on one or more keys.

Parameters

*keys : str — The keys to sort by, in order of priority.
reverse : bool — Whether to sort in descending order (default is ascending).

Returns

Self — A new Collection with the items sorted according to the specified keys.

source method Collection.map[**P, R](function: Callable[Concatenate[I, P], R], *args: P.args, **kwargs: P.kwargs) → Iterator[R]

Apply a function to each item and return an iterator of results.

This is a memory-efficient mapping operation that lazily evaluates results. Ideal for large collections where memory usage is a concern.

Parameters

function : Callable[Concatenate[I, P], R] — Function to apply to each item. The item is passed as the first argument.
*args : P.args — Additional positional arguments to pass to the function.
**kwargs : P.kwargs — Additional keyword arguments to pass to the function.

Returns

Iterator[R] — An iterator of the function's results.

Examples

# Process results one at a time
for result in collection.map(process_item, additional_arg):
    handle_result(result)

# Convert to list if needed
results = list(collection.map(transform_item))

source method Collection.pmap[**P, R](function: Callable[Concatenate[I, P], R], n_jobs: int = -1, backend: str = 'multiprocessing', progress: bool = False, *args: P.args, **kwargs: P.kwargs) → list[R]

Apply a function to each item in parallel and return a list of results.

This method processes items concurrently for improved performance on CPU-bound or I/O-bound operations, depending on the backend.

Parameters

function : Callable[Concatenate[I, P], R] — Function to apply to each item. The item is passed as the first argument.
n_jobs : int — Number of jobs to run in parallel. -1 means using all processors.
backend : str — Parallelization backend.
progress : bool — Whether to display a progress bar.
*args : P.args — Additional positional arguments to pass to the function.
**kwargs : P.kwargs — Additional keyword arguments to pass to the function.

Returns

list[R] — A list containing all results of the function applications.

Examples

# Process all items in parallel using all cores
results = collection.pmap(heavy_computation)

# Specify number of parallel jobs and backend
results = collection.pmap(process_files, n_jobs=4, backend="threading")

source method Collection.to_frame(*keys: str | tuple[str, Any | Callable[[I], Any]], defaults: dict[str, Any | Callable[[I], Any]] | None = None, n_jobs: int = 0, backend: str = 'multiprocessing', progress: bool = False, **kwargs: Callable[[I], Any]) → DataFrame

Convert the collection to a Polars DataFrame.

This method converts the items in the collection into a Polars DataFrame. It allows specifying multiple keys, where each key can be a string or a tuple. If a tuple is provided, the first element is treated as the key and the second element as the default value for that key.

Parameters

*keys : str | tuple[str, Any | Callable[[I], Any]] — The keys to include as columns in the DataFrame. If a tuple is provided, the first element is the key and the second element is the default value.
defaults : dict[str, Any | Callable[[I], Any]] | None — Default values for the keys. If a callable, it will be called with the item and the value returned will be used as the default.
n_jobs : int — Number of jobs to run in parallel. 0 means no parallelization. Default to 0.
backend : str — Parallelization backend.
progress : bool — Whether to display a progress bar.
**kwargs : Callable[[I], Any] — Additional columns to compute using callables that take an item and return a value.

Returns

DataFrame — A Polars DataFrame containing the specified data from the items.

Examples

# Convert to DataFrame with single keys
df = collection.to_frame("name", "age")

# Convert to DataFrame with keys and default values
df = collection.to_frame(("name", "Unknown"), ("age", 0))

source method Collection.group_by(*by: str) → GroupBy[Self, I]

Group items by one or more keys and return a GroupBy instance.

This method organizes items into groups based on the specified keys and returns a GroupBy instance that contains the grouped collections. The GroupBy instance behaves like a dictionary, allowing access to collections for each group key.

Parameters

*by : str — The keys to group by. If a single key is provided, its value will be used as the group key. If multiple keys are provided, a tuple of their values will be used as the group key. Keys can use dot notation (e.g., "model.type") to access nested configuration values.

Returns

GroupBy[Self, I] — A GroupBy instance containing the grouped items. Each group is a collection of the same type as the original.

source method Collection.sample(k: int, seed: int | None = None) → Self

Sample a random subset of items from the collection.

This method returns a new collection containing a random sample of items from the original collection. The sample is drawn without replacement, meaning each item can only appear once in the sample.

Parameters

k : int — The number of items to sample.
seed : int | None — The seed for the random number generator. If provided, the sample will be reproducible.

Returns

Self — A new collection containing a random sample of items.

Raises

ValueError — If the sample size is greater than the collection size.

source method Collection.shuffle(seed: int | None = None) → Self

Shuffle the items in the collection.

This method returns a new collection with the items in random order.

Parameters

seed : int | None — The seed for the random number generator. If provided, the sample will be reproducible.

Returns

Self — A new collection containing the items in random order.

source method Collection.eq(left: str, right: str, *, default: Any | Callable[[I], Any] = MISSING) → Callable[[I], bool]

Create a predicate function that checks if two attributes are equal.

Parameters

left : str — The name of the left attribute to compare.
right : str — The name of the right attribute to compare.
default : Any | Callable[[I], Any], optional — The default value to use if either attribute is not found. If callable, it will be called with the item.

Returns

Callable[[I], bool] — A function that takes an item and returns True if the values of the specified attributes are equal.

Examples

# Find items where attribute 'a' equals attribute 'b'
equal_items = collection.filter(collection.eq('a', 'b'))

source method Collection.ne(left: str, right: str, *, default: Any | Callable[[I], Any] = MISSING) → Callable[[I], bool]

Create a predicate function that checks if two attributes are not equal.

Parameters

left : str — The name of the left attribute to compare.
right : str — The name of the right attribute to compare.
default : Any | Callable[[I], Any], optional — The default value to use if either attribute is not found. If callable, it will be called with the item.

Returns

Callable[[I], bool] — A function that takes an item and returns True if the values of the specified attributes are not equal.

Examples

# Find items where attribute 'a' is not equal to attribute 'b'
unequal_items = collection.filter(collection.ne('a', 'b'))

source method Collection.gt(left: str, right: str, *, default: Any | Callable[[I], Any] = MISSING) → Callable[[I], bool]

Create a predicate function that checks if the left > the right.

Parameters

left : str — The name of the left attribute to compare.
right : str — The name of the right attribute to compare.
default : Any | Callable[[I], Any], optional — The default value to use if either attribute is not found. If callable, it will be called with the item.

Returns

Callable[[I], bool] — A function that takes an item and returns True if the left attribute value is greater than the right attribute value.

Examples

# Find items where attribute 'a' is greater than attribute 'b'
items = collection.filter(collection.gt('a', 'b'))

source method Collection.lt(left: str, right: str, *, default: Any | Callable[[I], Any] = MISSING) → Callable[[I], bool]

Create a predicate function that checks if the left < the right.

Parameters

left : str — The name of the left attribute to compare.
right : str — The name of the right attribute to compare.
default : Any | Callable[[I], Any], optional — The default value to use if either attribute is not found. If callable, it will be called with the item.

Returns

Callable[[I], bool] — A function that takes an item and returns True if the left attribute value is less than the right attribute value.

Examples

# Find items where attribute 'a' is less than attribute 'b'
items = collection.filter(collection.lt('a', 'b'))

source method Collection.ge(left: str, right: str, *, default: Any | Callable[[I], Any] = MISSING) → Callable[[I], bool]

Create a predicate function that checks if the left >= the right.

Parameters

left : str — The name of the left attribute to compare.
right : str — The name of the right attribute to compare.
default : Any | Callable[[I], Any], optional — The default value.

Returns

Callable[[I], bool] — A predicate function for filtering.

source method Collection.le(left: str, right: str, *, default: Any | Callable[[I], Any] = MISSING) → Callable[[I], bool]

Create a predicate function that checks if the left <= the right.

Parameters

left : str — The name of the left attribute to compare.
right : str — The name of the right attribute to compare.
default : Any | Callable[[I], Any], optional — The default value.

Returns

Callable[[I], bool] — A predicate function for filtering.

source method Collection.startswith(key: str, prefix: str, *, default: Any | Callable[[I], Any] = MISSING) → Callable[[I], bool]

Create a predicate function that checks if an attribute starts with a prefix.

Parameters

key : str — The name of the attribute to check.
prefix : str — The prefix to check for.
default : Any | Callable[[I], Any], optional — The default value.

Returns

Callable[[I], bool] — A predicate function for filtering.

source method Collection.endswith(key: str, suffix: str, *, default: Any | Callable[[I], Any] = MISSING) → Callable[[I], bool]

Create a predicate function that checks if an attribute ends with a suffix.

Parameters

key : str — The name of the attribute to check.
suffix : str — The suffix to check for.
default : Any | Callable[[I], Any], optional — The default value.

Returns

Callable[[I], bool] — A predicate function for filtering.

source method Collection.match(key: str, pattern: str | Pattern[str], *, default: Any | Callable[[I], Any] = MISSING, flags: _FlagsType = 0) → Callable[[I], bool]

Create a predicate function that checks if an attribute matches a pattern.

Parameters

key : str — The name of the attribute to check.
pattern : str | re.Pattern — The pattern to check for.
default : Any | Callable[[I], Any], optional — The default value.
flags : re.RegexFlag, optional — Flags for the regex pattern.

Returns

Callable[[I], bool] — A predicate function for filtering.

source class Run[C, I = None](run_dir: Path, impl_factory: Callable[[Path], I] | Callable[[Path, C], I] | None = None)

Represent an MLflow Run in HydraFlow.

A Run contains information about the run, configuration, and implementation. The configuration type C and implementation type I are specified as type parameters.

Attributes

info : RunInfo — Information about the run, such as run directory, run ID, and job name.
impl_factory : Callable[[Path], I] | Callable[[Path, C], I] — Factory function to create the implementation instance.
cfg : C — The configuration instance loaded from the Hydra configuration file.
impl : I — The implementation instance created by the factory function.

Methods

load — Load a Run from a run directory.
update — Set default value(s) in the configuration if they don't already exist.
get — Get a value from the information or configuration.
lit — Create a Polars literal expression from a run key.
to_frame — Convert the Run to a DataFrame.
to_dict — Convert the Run to a dictionary.
chdir — Change the current working directory to the artifact directory.
path — Return the path relative to the artifact directory.
iterdir — Iterate over the artifact directories for the run.
glob — Glob the artifact directories for the run.

source property Run.cfg: C

The configuration instance loaded from the Hydra configuration file.

source property Run.impl: I

The implementation instance created by the factory function.

This property dynamically examines the signature of the impl_factory using the inspect module and calls it with the appropriate arguments:

If the factory accepts one parameter: called with just the artifacts directory
If the factory accepts two parameters: called with the artifacts directory and the configuration instance

This allows implementation classes to be configuration-aware and utilize both the file system and configuration information.

Load a Run from a run directory.

Parameters

run_dir : str | Path | Iterable[str | Path] — The directory where the MLflow runs are stored, either as a string, a Path instance, or an iterable of them.
impl_factory : Callable[[Path], I] | Callable[[Path, C], I] | None — A factory function that creates the implementation instance. It can accept either just the artifacts directory path, or both the path and the configuration instance. Defaults to None, in which case a function that returns None is used.
n_jobs : int — The number of parallel jobs. If 0 (default), runs sequentially. If -1, uses all available CPU cores.

Returns

Self | RunCollection[Self] — A single Run instance or a RunCollection of Run instances.

source method Run.update(key: str | tuple[str, ...], value: Any | Callable[[Self], Any], *, force: bool = False) → None

Set default value(s) in the configuration if they don't already exist.

This method adds a value or multiple values to the configuration, but only if the corresponding keys don't already have values. Existing values will not be modified.

Parameters

key : str | tuple[str, ...] — Either a string representing a single configuration path (can use dot notation like "section.subsection.param"), or a tuple of strings to set multiple related configuration values at once.
value : Any | Callable[[Self], Any] — The value to set. This can be:
- For string keys: Any value, or a callable that returns a value
- For tuple keys: An iterable with the same length as the key tuple, or a callable that returns such an iterable
- For callable values: The callable must accept a single argument of type Run (self) and return the appropriate value type
force : bool — Whether to force the update even if the key already exists.

Raises

TypeError — If a tuple key is provided but the value is not an iterable, or if the callable doesn't return an iterable.

source method Run.get(key: str, default: Any | Callable[[Self], Any] = MISSING) → Any

Get a value from the information or configuration.

Parameters

key : str — The key to look for. Can use dot notation for nested keys in configuration. Special keys:
- "cfg": Returns the configuration object
- "impl": Returns the implementation object
- "info": Returns the run information object
default : Any | Callable[[Self], Any] — Value to return if the key is not found. If a callable, it will be called with the Run instance and the value returned will be used as the default. If not provided, AttributeError will be raised.

Returns

Any — The value associated with the key, or the default value if the key is not found and a default is provided.

Raises

AttributeError — If the key is not found and no default is provided.

Note

The search order for keys is:

Configuration (cfg)
Implementation (impl)
Run information (info)
Run object itself (self)

source method Run.lit(key: str, default: Any | Callable[[Self], Any] = MISSING, *, dtype: PolarsDataType | None = None) → Expr

Create a Polars literal expression from a run key.

Parameters

key : str — The key to look up in the run's configuration or info.
default : Any | Callable[[Run], Any], optional — Default value to use if the key is missing. If a callable is provided, it will be called with the Run instance.
dtype : PolarsDataType | None — Explicit data type for the literal expression.

Returns

Expr — A Polars literal expression aliased to the provided key.

Raises

AttributeError — If the key is not found and no default is provided.

source method Run.to_frame(function: Callable[[Self], DataFrame], *keys: str | tuple[str, Any | Callable[[Self], Any]]) → DataFrame

Convert the Run to a DataFrame.

Parameters

function : Callable[[Run], DataFrame] — A function that takes a Run instance and returns a DataFrame.
keys : str | tuple[str, Any | Callable[[Run], Any]] — The keys to add to the DataFrame.

Returns

DataFrame — A DataFrame representation of the Run.

source method Run.to_dict(flatten: bool = True) → dict[str, Any]

Convert the Run to a dictionary.

Parameters

flatten : bool, optional — If True, flattens nested dictionaries. Defaults to True.

Returns

dict[str, Any] — A dictionary representation of the Run's configuration.

Raises

TypeError

source method Run.chdir(relative_dir: str = '') → Iterator[Path]

Change the current working directory to the artifact directory.

This context manager changes the current working directory to the artifact directory of the run. It ensures that the directory is changed back to the original directory after the context is exited.

Parameters

relative_dir : str — The relative directory to the artifact directory. Defaults to an empty string.

Yields

Path — The artifact directory of the run.

source method Run.path(relative_path: str = '') → Path

Return the path relative to the artifact directory.

Parameters

relative_path : str — The relative path to the artifact directory.

Returns

Path — The path relative to the artifact directory.

source method Run.iterdir(relative_dir: str = '') → Iterator[Path]

Iterate over the artifact directories for the run.

Parameters

relative_dir : str — The relative directory to iterate over.

Yields

Path — The artifact directory for the run.

source method Run.glob(pattern: str, relative_dir: str = '') → Iterator[Path]

Glob the artifact directories for the run.

Parameters

pattern : str — The pattern to glob.
relative_dir : str — The relative directory to glob.

Yields

Path — The existing artifact paths that match the pattern.

source class RunCollection[R: Run[Any, Any]](items: Iterable[I], get: Callable[[I, str, Any | Callable[[I], Any]], Any] | None = None)

Bases : Collection[R]

A collection of Run instances that implements the Sequence protocol.

RunCollection provides methods for filtering, sorting, grouping, and analyzing runs, as well as converting run data to various formats such as DataFrames.

Parameters

runs : Iterable[Run] — An iterable of Run instances to include in the collection.

Methods

preload — Pre-load configuration and implementation objects for all runs in parallel.
update — Update configuration values for all runs in the collection.
concat — Concatenate the results of a function applied to all runs in the collection.
iterdir — Iterate over the artifact directories for all runs in the collection.
glob — Glob the artifact directories for all runs in the collection.

source method RunCollection.preload(*, n_jobs: int = 0, cfg: bool = True, impl: bool = True) → Self

Pre-load configuration and implementation objects for all runs in parallel.

This method eagerly evaluates the cfg and impl properties of all runs in the collection, potentially in parallel using joblib. This can significantly improve performance for subsequent operations that access these properties, as they will be already loaded in memory.

Parameters

n_jobs : int — Number of parallel jobs to run.
- 0: Run sequentially (default)
- -1: Use all available CPU cores
- 0: Use the specified number of cores
cfg : bool — Whether to preload the configuration objects. Defaults to True.
impl : bool — Whether to preload the implementation objects. Defaults to True.

Returns

Self — The same RunCollection instance with preloaded configuration and implementation objects.

Note

The preloading is done using joblib's threading backend, which is suitable for I/O-bound tasks like loading configuration files and implementation objects.

Examples

# Preload all runs sequentially
runs.preload()

# Preload using all available cores
runs.preload(n_jobs=-1)

# Preload only configurations
runs.preload(impl=False)

# Preload only implementations
runs.preload(cfg=False)

source method RunCollection.update(key: str | tuple[str, ...], value: Any | Callable[[R], Any], *, force: bool = False) → None

Update configuration values for all runs in the collection.

This method calls the update method on each run in the collection.

Parameters

key : str | tuple[str, ...] — Either a string representing a single configuration path or a tuple of strings to set multiple configuration values.
value : Any | Callable[[R], Any] — The value(s) to set or a callable that returns such values.
force : bool — Whether to force updates even if the keys already exist.

source method RunCollection.concat(function: Callable[[R], DataFrame], *keys: str | tuple[str, Any | Callable[[R], Any]]) → DataFrame

Concatenate the results of a function applied to all runs in the collection.

This method applies the provided function to each run in the collection and concatenates the resulting DataFrames along the specified keys.

Parameters

function : Callable[[R], DataFrame] — A function that takes a Run instance and returns a DataFrame.
keys : str | tuple[str, Any | Callable[[R], Any]] — The keys to add to the DataFrame.

Returns

DataFrame — A DataFrame representation of the Run collection.

source method RunCollection.iterdir(relative_dir: str = '') → Iterator[Path]

Iterate over the artifact directories for all runs in the collection.

This method yields all files and directories in the specified relative directory for each run in the collection.

Parameters

relative_dir : str — The relative directory within the artifacts directory to iterate over.

Yields

Path — Each path in the specified directory for each run in the collection.

source method RunCollection.glob(pattern: str, relative_dir: str = '') → Iterator[Path]

Glob the artifact directories for all runs in the collection.

This method yields all paths matching the specified pattern in the relative directory for each run in the collection.

Parameters

pattern : str — The glob pattern to match files or directories.
relative_dir : str — The relative directory within the artifacts directory to search in.

Yields

Path — Each path matching the pattern for each run in the collection.

source chdir_artifact(run: Run) → Iterator[Path]

Change the current working directory to the artifact directory of the given run.

This context manager changes the current working directory to the artifact directory of the given run. It ensures that the directory is changed back to the original directory after the context is exited.

Parameters

run : Run | None — The run to get the artifact directory from.

source get_artifact_dir(run: Run) → Path

Retrieve the artifact directory for the given run.

This function uses MLflow to get the artifact directory for the given run.

Parameters

run : Run | None — The run instance. Defaults to None.

Returns

Path — The local path to the directory where the artifacts are downloaded.

Raises

NotImplementedError

source get_experiment_names(tracking_dir: str | Path) → list[str]

Get the experiment names from the tracking directory.

Returns

list[str] — A list of experiment names sorted by the name.

Iterate over the artifact paths in the tracking directory.

source iter_artifacts_dirs(tracking_dir: str | Path, experiment_names: str | list[str] | Callable[[str], bool] | None = None) → Iterator[Path]

Iterate over the artifacts directories in the tracking directory.

source iter_experiment_dirs(tracking_dir: str | Path, experiment_names: str | list[str] | Callable[[str], bool] | None = None) → Iterator[Path]

Iterate over the experiment directories in the tracking directory.

source iter_run_dirs(tracking_dir: str | Path, experiment_names: str | list[str] | Callable[[str], bool] | None = None) → Iterator[Path]

Iterate over the run directories in the tracking directory.

source log_run(run: Run) → Iterator[None]

Log the parameters from the given configuration instance.

This context manager logs the parameters from the provided configuration instance using MLflow. It also manages the MLflow run context, ensuring that artifacts are logged and the run is properly closed.

Parameters

run : Run — The run instance.

Yields

None — None

source main[C](node: C | type[C], config_name: str = 'config', *, chdir: bool = False, force_new_run: bool = False, match_overrides: bool = False, rerun_finished: bool = False, dry_run: bool = False, update: Callable[[C], C | None] | None = None)

Decorator for configuring and running MLflow experiments with Hydra.

This decorator combines Hydra configuration management with MLflow experiment tracking. It automatically handles run deduplication and configuration storage.

Parameters

node : C | type[C] — Configuration node class or instance defining the structure of the configuration.
config_name : str — Name of the configuration. Defaults to "config".
chdir : bool — If True, changes working directory to the artifact directory of the run. Defaults to False.
force_new_run : bool — If True, always creates a new MLflow run instead of reusing existing ones. Defaults to False.
match_overrides : bool — If True, matches runs based on Hydra CLI overrides instead of full config. Defaults to False.
rerun_finished : bool — If True, allows rerunning completed runs. Defaults to False.
dry_run : bool — If True, starts the hydra job but does not run the application itself. This allows users to preview the configuration and settings without executing the actual run. Defaults to False.
update : Callable[[C], C | None] | None — A function that takes a configuration and returns a new configuration or None. The function can modify the configuration in-place and/or return it. If the function returns None, the original (potentially modified) configuration is used. Changes made by this function are saved to the configuration file. This is useful for adding derived parameters, ensuring consistency between related values, or adding runtime information to the configuration. Defaults to None.

Start an MLflow run and log parameters using the provided configuration instance.

This context manager starts an MLflow run and logs parameters using the specified configuration instance. It ensures that the run is properly closed after completion.

Parameters

config : object — The configuration instance to log parameters from.
chdir : bool — Whether to change the current working directory to the artifact directory of the current run. Defaults to False.
run_id : str | None — The existing run ID. Defaults to None.
experiment_id : str | None — The experiment ID. Defaults to None.
run_name : str | None — The name of the run. Defaults to None.
nested : bool — Whether to allow nested runs. Defaults to False.
parent_run_id : str | None — The parent run ID. Defaults to None.
tags : dict[str, str] | None — Tags to associate with the run. Defaults to None.
description : str | None — A description of the run. Defaults to None.
log_system_metrics : bool | None — Whether to log system metrics. Defaults to None.
synchronous : bool | None — Whether to log parameters synchronously. Defaults to None.

Yields

Run — An MLflow Run instance representing the started run.