Skip to content

hydraflow

source package hydraflow

Integrate Hydra and MLflow to manage and track machine learning experiments.

Classes

  • Collection A collection of items that implements the Sequence protocol.

  • Run Represent an MLflow Run in HydraFlow.

  • RunCollection A collection of Run instances that implements the Sequence protocol.

Functions

  • chdir_artifact Change the current working directory to the artifact directory of the given run.

  • get_artifact_dir Retrieve the artifact directory for the given run.

  • get_experiment_names Get the experiment names from the tracking directory.

  • iter_artifact_paths Iterate over the artifact paths in the tracking directory.

  • iter_artifacts_dirs Iterate over the artifacts directories in the tracking directory.

  • iter_experiment_dirs Iterate over the experiment directories in the tracking directory.

  • iter_run_dirs Iterate over the run directories in the tracking directory.

  • log_run Log the parameters from the given configuration instance.

  • main Decorator for configuring and running MLflow experiments with Hydra.

  • start_run Start an MLflow run and log parameters using the provided configuration instance.

source class Collection[I](items: Iterable[I], get: Callable[[I, str, Any | Callable[[I], Any]], Any] | None = None)

Bases : Sequence[I]

A collection of items that implements the Sequence protocol.

Methods

  • filter Filter items based on criteria.

  • try_get Try to get a single item matching the specified criteria.

  • get Get a single item matching the specified criteria.

  • first Get the first item matching the specified criteria.

  • last Get the last item matching the specified criteria.

  • to_list Extract a list of values for a specific key from all items.

  • to_numpy Extract values for a specific key from all items as a NumPy array.

  • to_series Extract values for a specific key from all items as a Polars series.

  • unique Get the unique values for a specific key across all items.

  • n_unique Count the number of unique values for a specific key across all items.

  • sort Sort items based on one or more keys.

  • map Apply a function to each item and return an iterator of results.

  • pmap Apply a function to each item in parallel and return a list of results.

  • to_frame Convert the collection to a Polars DataFrame.

  • group_by Group items by one or more keys and return a GroupBy instance.

  • sample Sample a random subset of items from the collection.

  • shuffle Shuffle the items in the collection.

  • eq Create a predicate function that checks if two attributes are equal.

  • ne Create a predicate function that checks if two attributes are not equal.

  • gt Create a predicate function that checks if the left > the right.

  • lt Create a predicate function that checks if the left < the right.

  • ge Create a predicate function that checks if the left >= the right.

  • le Create a predicate function that checks if the left <= the right.

  • startswith Create a predicate function that checks if an attribute starts with a prefix.

  • endswith Create a predicate function that checks if an attribute ends with a suffix.

  • match Create a predicate function that checks if an attribute matches a pattern.

source method Collection.filter(*criteria: Callable[[I], bool] | tuple[str, Any], **kwargs: Any)Self

Filter items based on criteria.

This method allows filtering items using various criteria:

  • Callable criteria that take an item and return a boolean
  • Key-value tuples where the key is a string and the value is compared using the matches function
  • Keyword arguments, where the key is a string and the value is compared using the matches function

The matches function supports the following comparison types:

  • Callable: The predicate function is called with the value
  • List/Set: Checks if the value is in the list/set
  • Tuple of length 2: Checks if the value is in the range [min, max]
  • Other: Checks for direct equality

Parameters

  • *criteria : Callable[[I], bool] | tuple[str, Any] Callable criteria or (key, value) tuples for filtering.

  • **kwargs : Any Additional key-value pairs for filtering.

Returns

  • Self A new Collection containing only the items that match all criteria.

Examples

# Filter using a callable
filtered = collection.filter(lambda x: x > 5)

# Filter using a key-value tuple
filtered = collection.filter(("age", 25))

# Filter using keyword arguments
filtered = collection.filter(age=25, name="John")

# Filter using range
filtered = collection.filter(("age", (20, 30)))

# Filter using list membership
filtered = collection.filter(("name", ["John", "Jane"]))

source method Collection.try_get(*criteria: Callable[[I], bool] | tuple[str, Any], **kwargs: Any)I | None

Try to get a single item matching the specified criteria.

This method applies filters and returns a single matching item if exactly one is found, None if no items are found, or raises ValueError if multiple items match.

Parameters

  • *criteria : Callable[[I], bool] | tuple[str, Any] Callable criteria or (key, value) tuples for filtering.

  • **kwargs : Any Additional key-value pairs for filtering.

Returns

  • I | None A single item that matches the criteria, or None if no matches are found.

Raises

  • ValueError If multiple items match the criteria.

source method Collection.get(*criteria: Callable[[I], bool] | tuple[str, Any], **kwargs: Any)I

Get a single item matching the specified criteria.

This method applies filters and returns a single matching item, or raises ValueError if no items or multiple items match.

Parameters

  • *criteria : Callable[[I], bool] | tuple[str, Any] Callable criteria or (key, value) tuples for filtering.

  • **kwargs : Any Additional key-value pairs for filtering.

Returns

  • I A single item that matches the criteria.

Raises

  • ValueError If no items match or if multiple items match

  • the criteria.

  • _value_error

source method Collection.first(*criteria: Callable[[I], bool] | tuple[str, Any], **kwargs: Any)I

Get the first item matching the specified criteria.

This method applies filters and returns the first matching item, or raises ValueError if no items match.

Parameters

  • *criteria : Callable[[I], bool] | tuple[str, Any] Callable criteria or (key, value) tuples for filtering.

  • **kwargs : Any Additional key-value pairs for filtering.

Returns

  • I The first item that matches the criteria.

Raises

  • ValueError If no items match the criteria.

  • _value_error

source method Collection.last(*criteria: Callable[[I], bool] | tuple[str, Any], **kwargs: Any)I

Get the last item matching the specified criteria.

This method applies filters and returns the last matching item, or raises ValueError if no items match.

Parameters

  • *criteria : Callable[[I], bool] | tuple[str, Any] Callable criteria or (key, value) tuples for filtering.

  • **kwargs : Any Additional key-value pairs for filtering.

Returns

  • I The last item that matches the criteria.

Raises

  • ValueError If no items match the criteria.

  • _value_error

source method Collection.to_list(key: str, default: Any | Callable[[I], Any] = MISSING)list[Any]

Extract a list of values for a specific key from all items.

Parameters

  • key : str The key to extract from each item.

  • default : Any | Callable[[I], Any] The default value to return if the key is not found. If a callable, it will be called with the item and the value returned will be used as the default.

Returns

  • list[Any] A list containing the values for the specified key from each item.

source method Collection.to_numpy(key: str, default: Any | Callable[[I], Any] = MISSING)NDArray

Extract values for a specific key from all items as a NumPy array.

Parameters

  • key : str The key to extract from each item.

  • default : Any | Callable[[I], Any] The default value to return if the key is not found. If a callable, it will be called with the item and the value returned will be used as the default.

Returns

  • NDArray A NumPy array containing the values for the specified key from each item.

source method Collection.to_series(key: str, default: Any = MISSING, *, name: str | None = None)Series

Extract values for a specific key from all items as a Polars series.

Parameters

  • key : str The key to extract from each item.

  • default : Any The default value to return if the key is not found. If a callable, it will be called with the item and the value returned will be used as the default.

  • name : str | None The name of the series. If not provided, the key will be used.

Returns

  • Series A Polars series containing the values for the specified key from each item.

source method Collection.unique(key: str, default: Any | Callable[[I], Any] = MISSING)NDArray

Get the unique values for a specific key across all items.

Parameters

  • key : str The key to extract unique values for.

  • default : Any | Callable[[I], Any] The default value to return if the key is not found. If a callable, it will be called with the item and the value returned will be used as the default.

Returns

  • NDArray A NumPy array containing the unique values for the specified key.

source method Collection.n_unique(key: str, default: Any | Callable[[I], Any] = MISSING)int

Count the number of unique values for a specific key across all items.

Parameters

  • key : str The key to count unique values for.

  • default : Any | Callable[[I], Any] The default value to return if the key is not found. If a callable, it will be called with the item and the value returned will be used as the default.

Returns

  • int The number of unique values for the specified key.

source method Collection.sort(*keys: str, reverse: bool = False)Self

Sort items based on one or more keys.

Parameters

  • *keys : str The keys to sort by, in order of priority.

  • reverse : bool Whether to sort in descending order (default is ascending).

Returns

  • Self A new Collection with the items sorted according to the specified keys.

source method Collection.map[**P, R](function: Callable[Concatenate[I, P], R], *args: P.args, **kwargs: P.kwargs)Iterator[R]

Apply a function to each item and return an iterator of results.

This is a memory-efficient mapping operation that lazily evaluates results. Ideal for large collections where memory usage is a concern.

Parameters

  • function : Callable[Concatenate[I, P], R] Function to apply to each item. The item is passed as the first argument.

  • *args : P.args Additional positional arguments to pass to the function.

  • **kwargs : P.kwargs Additional keyword arguments to pass to the function.

Returns

  • Iterator[R] An iterator of the function's results.

Examples

# Process results one at a time
for result in collection.map(process_item, additional_arg):
    handle_result(result)

# Convert to list if needed
results = list(collection.map(transform_item))

source method Collection.pmap[**P, R](function: Callable[Concatenate[I, P], R], n_jobs: int = -1, backend: str = 'multiprocessing', progress: bool = False, *args: P.args, **kwargs: P.kwargs)list[R]

Apply a function to each item in parallel and return a list of results.

This method processes items concurrently for improved performance on CPU-bound or I/O-bound operations, depending on the backend.

Parameters

  • function : Callable[Concatenate[I, P], R] Function to apply to each item. The item is passed as the first argument.

  • n_jobs : int Number of jobs to run in parallel. -1 means using all processors.

  • backend : str Parallelization backend.

  • progress : bool Whether to display a progress bar.

  • *args : P.args Additional positional arguments to pass to the function.

  • **kwargs : P.kwargs Additional keyword arguments to pass to the function.

Returns

  • list[R] A list containing all results of the function applications.

Examples

# Process all items in parallel using all cores
results = collection.pmap(heavy_computation)

# Specify number of parallel jobs and backend
results = collection.pmap(process_files, n_jobs=4, backend="threading")

source method Collection.to_frame(*keys: str | tuple[str, Any | Callable[[I], Any]], defaults: dict[str, Any | Callable[[I], Any]] | None = None, n_jobs: int = 0, backend: str = 'multiprocessing', progress: bool = False, **kwargs: Callable[[I], Any])DataFrame

Convert the collection to a Polars DataFrame.

This method converts the items in the collection into a Polars DataFrame. It allows specifying multiple keys, where each key can be a string or a tuple. If a tuple is provided, the first element is treated as the key and the second element as the default value for that key.

Parameters

  • *keys : str | tuple[str, Any | Callable[[I], Any]] The keys to include as columns in the DataFrame. If a tuple is provided, the first element is the key and the second element is the default value.

  • defaults : dict[str, Any | Callable[[I], Any]] | None Default values for the keys. If a callable, it will be called with the item and the value returned will be used as the default.

  • n_jobs : int Number of jobs to run in parallel. 0 means no parallelization. Default to 0.

  • backend : str Parallelization backend.

  • progress : bool Whether to display a progress bar.

  • **kwargs : Callable[[I], Any] Additional columns to compute using callables that take an item and return a value.

Returns

  • DataFrame A Polars DataFrame containing the specified data from the items.

Examples

# Convert to DataFrame with single keys
df = collection.to_frame("name", "age")

# Convert to DataFrame with keys and default values
df = collection.to_frame(("name", "Unknown"), ("age", 0))

source method Collection.group_by(*by: str)GroupBy[Self, I]

Group items by one or more keys and return a GroupBy instance.

This method organizes items into groups based on the specified keys and returns a GroupBy instance that contains the grouped collections. The GroupBy instance behaves like a dictionary, allowing access to collections for each group key.

Parameters

  • *by : str The keys to group by. If a single key is provided, its value will be used as the group key. If multiple keys are provided, a tuple of their values will be used as the group key. Keys can use dot notation (e.g., "model.type") to access nested configuration values.

Returns

  • GroupBy[Self, I] A GroupBy instance containing the grouped items. Each group is a collection of the same type as the original.

source method Collection.sample(k: int, seed: int | None = None)Self

Sample a random subset of items from the collection.

This method returns a new collection containing a random sample of items from the original collection. The sample is drawn without replacement, meaning each item can only appear once in the sample.

Parameters

  • k : int The number of items to sample.

  • seed : int | None The seed for the random number generator. If provided, the sample will be reproducible.

Returns

  • Self A new collection containing a random sample of items.

Raises

  • ValueError If the sample size is greater than the collection size.

source method Collection.shuffle(seed: int | None = None)Self

Shuffle the items in the collection.

This method returns a new collection with the items in random order.

Parameters

  • seed : int | None The seed for the random number generator. If provided, the sample will be reproducible.

Returns

  • Self A new collection containing the items in random order.

source method Collection.eq(left: str, right: str, *, default: Any | Callable[[I], Any] = MISSING)Callable[[I], bool]

Create a predicate function that checks if two attributes are equal.

Parameters

  • left : str The name of the left attribute to compare.

  • right : str The name of the right attribute to compare.

  • default : Any | Callable[[I], Any], optional The default value to use if either attribute is not found. If callable, it will be called with the item.

Returns

  • Callable[[I], bool] A function that takes an item and returns True if the values of the specified attributes are equal.

Examples

# Find items where attribute 'a' equals attribute 'b'
equal_items = collection.filter(collection.eq('a', 'b'))

source method Collection.ne(left: str, right: str, *, default: Any | Callable[[I], Any] = MISSING)Callable[[I], bool]

Create a predicate function that checks if two attributes are not equal.

Parameters

  • left : str The name of the left attribute to compare.

  • right : str The name of the right attribute to compare.

  • default : Any | Callable[[I], Any], optional The default value to use if either attribute is not found. If callable, it will be called with the item.

Returns

  • Callable[[I], bool] A function that takes an item and returns True if the values of the specified attributes are not equal.

Examples

# Find items where attribute 'a' is not equal to attribute 'b'
unequal_items = collection.filter(collection.ne('a', 'b'))

source method Collection.gt(left: str, right: str, *, default: Any | Callable[[I], Any] = MISSING)Callable[[I], bool]

Create a predicate function that checks if the left > the right.

Parameters

  • left : str The name of the left attribute to compare.

  • right : str The name of the right attribute to compare.

  • default : Any | Callable[[I], Any], optional The default value to use if either attribute is not found. If callable, it will be called with the item.

Returns

  • Callable[[I], bool] A function that takes an item and returns True if the left attribute value is greater than the right attribute value.

Examples

# Find items where attribute 'a' is greater than attribute 'b'
items = collection.filter(collection.gt('a', 'b'))

source method Collection.lt(left: str, right: str, *, default: Any | Callable[[I], Any] = MISSING)Callable[[I], bool]

Create a predicate function that checks if the left < the right.

Parameters

  • left : str The name of the left attribute to compare.

  • right : str The name of the right attribute to compare.

  • default : Any | Callable[[I], Any], optional The default value to use if either attribute is not found. If callable, it will be called with the item.

Returns

  • Callable[[I], bool] A function that takes an item and returns True if the left attribute value is less than the right attribute value.

Examples

# Find items where attribute 'a' is less than attribute 'b'
items = collection.filter(collection.lt('a', 'b'))

source method Collection.ge(left: str, right: str, *, default: Any | Callable[[I], Any] = MISSING)Callable[[I], bool]

Create a predicate function that checks if the left >= the right.

Parameters

  • left : str The name of the left attribute to compare.

  • right : str The name of the right attribute to compare.

  • default : Any | Callable[[I], Any], optional The default value.

Returns

  • Callable[[I], bool] A predicate function for filtering.

source method Collection.le(left: str, right: str, *, default: Any | Callable[[I], Any] = MISSING)Callable[[I], bool]

Create a predicate function that checks if the left <= the right.

Parameters

  • left : str The name of the left attribute to compare.

  • right : str The name of the right attribute to compare.

  • default : Any | Callable[[I], Any], optional The default value.

Returns

  • Callable[[I], bool] A predicate function for filtering.

source method Collection.startswith(key: str, prefix: str, *, default: Any | Callable[[I], Any] = MISSING)Callable[[I], bool]

Create a predicate function that checks if an attribute starts with a prefix.

Parameters

  • key : str The name of the attribute to check.

  • prefix : str The prefix to check for.

  • default : Any | Callable[[I], Any], optional The default value.

Returns

  • Callable[[I], bool] A predicate function for filtering.

source method Collection.endswith(key: str, suffix: str, *, default: Any | Callable[[I], Any] = MISSING)Callable[[I], bool]

Create a predicate function that checks if an attribute ends with a suffix.

Parameters

  • key : str The name of the attribute to check.

  • suffix : str The suffix to check for.

  • default : Any | Callable[[I], Any], optional The default value.

Returns

  • Callable[[I], bool] A predicate function for filtering.

source method Collection.match(key: str, pattern: str | Pattern[str], *, default: Any | Callable[[I], Any] = MISSING, flags: _FlagsType = 0)Callable[[I], bool]

Create a predicate function that checks if an attribute matches a pattern.

Parameters

  • key : str The name of the attribute to check.

  • pattern : str | re.Pattern The pattern to check for.

  • default : Any | Callable[[I], Any], optional The default value.

  • flags : re.RegexFlag, optional Flags for the regex pattern.

Returns

  • Callable[[I], bool] A predicate function for filtering.

source class Run[C, I = None](run_dir: Path, impl_factory: Callable[[Path], I] | Callable[[Path, C], I] | None = None)

Represent an MLflow Run in HydraFlow.

A Run contains information about the run, configuration, and implementation. The configuration type C and implementation type I are specified as type parameters.

Attributes

  • info : RunInfo Information about the run, such as run directory, run ID, and job name.

  • impl_factory : Callable[[Path], I] | Callable[[Path, C], I] Factory function to create the implementation instance.

  • cfg : C The configuration instance loaded from the Hydra configuration file.

  • impl : I The implementation instance created by the factory function.

Methods

  • load Load a Run from a run directory.

  • update Set default value(s) in the configuration if they don't already exist.

  • get Get a value from the information or configuration.

  • lit Create a Polars literal expression from a run key.

  • to_frame Convert the Run to a DataFrame.

  • to_dict Convert the Run to a dictionary.

  • chdir Change the current working directory to the artifact directory.

  • path Return the path relative to the artifact directory.

  • iterdir Iterate over the artifact directories for the run.

  • glob Glob the artifact directories for the run.

source property Run.cfg: C

The configuration instance loaded from the Hydra configuration file.

source property Run.impl: I

The implementation instance created by the factory function.

This property dynamically examines the signature of the impl_factory using the inspect module and calls it with the appropriate arguments:

  • If the factory accepts one parameter: called with just the artifacts directory
  • If the factory accepts two parameters: called with the artifacts directory and the configuration instance

This allows implementation classes to be configuration-aware and utilize both the file system and configuration information.

source classmethod Run.load(run_dir: str | Path | Iterable[str | Path], impl_factory: Callable[[Path], I] | Callable[[Path, C], I] | None = None, *, n_jobs: int = 0)Self | RunCollection[Self]

Load a Run from a run directory.

Parameters

  • run_dir : str | Path | Iterable[str | Path] The directory where the MLflow runs are stored, either as a string, a Path instance, or an iterable of them.

  • impl_factory : Callable[[Path], I] | Callable[[Path, C], I] | None A factory function that creates the implementation instance. It can accept either just the artifacts directory path, or both the path and the configuration instance. Defaults to None, in which case a function that returns None is used.

  • n_jobs : int The number of parallel jobs. If 0 (default), runs sequentially. If -1, uses all available CPU cores.

Returns

  • Self | RunCollection[Self] A single Run instance or a RunCollection of Run instances.

source method Run.update(key: str | tuple[str, ...], value: Any | Callable[[Self], Any], *, force: bool = False)None

Set default value(s) in the configuration if they don't already exist.

This method adds a value or multiple values to the configuration, but only if the corresponding keys don't already have values. Existing values will not be modified.

Parameters

  • key : str | tuple[str, ...] Either a string representing a single configuration path (can use dot notation like "section.subsection.param"), or a tuple of strings to set multiple related configuration values at once.

  • value : Any | Callable[[Self], Any] The value to set. This can be:

    • For string keys: Any value, or a callable that returns a value
    • For tuple keys: An iterable with the same length as the key tuple, or a callable that returns such an iterable
    • For callable values: The callable must accept a single argument of type Run (self) and return the appropriate value type
  • force : bool Whether to force the update even if the key already exists.

Raises

  • TypeError If a tuple key is provided but the value is not an iterable, or if the callable doesn't return an iterable.

source method Run.get(key: str, default: Any | Callable[[Self], Any] = MISSING)Any

Get a value from the information or configuration.

Parameters

  • key : str The key to look for. Can use dot notation for nested keys in configuration. Special keys:

    • "cfg": Returns the configuration object
    • "impl": Returns the implementation object
    • "info": Returns the run information object
  • default : Any | Callable[[Self], Any] Value to return if the key is not found. If a callable, it will be called with the Run instance and the value returned will be used as the default. If not provided, AttributeError will be raised.

Returns

  • Any The value associated with the key, or the default value if the key is not found and a default is provided.

Raises

  • AttributeError If the key is not found and no default is provided.

Note

The search order for keys is:

  1. Configuration (cfg)
  2. Implementation (impl)
  3. Run information (info)
  4. Run object itself (self)

source method Run.lit(key: str, default: Any | Callable[[Self], Any] = MISSING, *, dtype: PolarsDataType | None = None)Expr

Create a Polars literal expression from a run key.

Parameters

  • key : str The key to look up in the run's configuration or info.

  • default : Any | Callable[[Run], Any], optional Default value to use if the key is missing. If a callable is provided, it will be called with the Run instance.

  • dtype : PolarsDataType | None Explicit data type for the literal expression.

Returns

  • Expr A Polars literal expression aliased to the provided key.

Raises

  • AttributeError If the key is not found and no default is provided.

source method Run.to_frame(function: Callable[[Self], DataFrame], *keys: str | tuple[str, Any | Callable[[Self], Any]])DataFrame

Convert the Run to a DataFrame.

Parameters

  • function : Callable[[Run], DataFrame] A function that takes a Run instance and returns a DataFrame.

  • keys : str | tuple[str, Any | Callable[[Run], Any]] The keys to add to the DataFrame.

Returns

  • DataFrame A DataFrame representation of the Run.

source method Run.to_dict(flatten: bool = True)dict[str, Any]

Convert the Run to a dictionary.

Parameters

  • flatten : bool, optional If True, flattens nested dictionaries. Defaults to True.

Returns

  • dict[str, Any] A dictionary representation of the Run's configuration.

Raises

  • TypeError

source method Run.chdir(relative_dir: str = '')Iterator[Path]

Change the current working directory to the artifact directory.

This context manager changes the current working directory to the artifact directory of the run. It ensures that the directory is changed back to the original directory after the context is exited.

Parameters

  • relative_dir : str The relative directory to the artifact directory. Defaults to an empty string.

Yields

  • Path The artifact directory of the run.

source method Run.path(relative_path: str = '')Path

Return the path relative to the artifact directory.

Parameters

  • relative_path : str The relative path to the artifact directory.

Returns

  • Path The path relative to the artifact directory.

source method Run.iterdir(relative_dir: str = '')Iterator[Path]

Iterate over the artifact directories for the run.

Parameters

  • relative_dir : str The relative directory to iterate over.

Yields

  • Path The artifact directory for the run.

source method Run.glob(pattern: str, relative_dir: str = '')Iterator[Path]

Glob the artifact directories for the run.

Parameters

  • pattern : str The pattern to glob.

  • relative_dir : str The relative directory to glob.

Yields

  • Path The existing artifact paths that match the pattern.

source class RunCollection[R: Run[Any, Any]](items: Iterable[I], get: Callable[[I, str, Any | Callable[[I], Any]], Any] | None = None)

Bases : Collection[R]

A collection of Run instances that implements the Sequence protocol.

RunCollection provides methods for filtering, sorting, grouping, and analyzing runs, as well as converting run data to various formats such as DataFrames.

Parameters

  • runs : Iterable[Run] An iterable of Run instances to include in the collection.

Methods

  • preload Pre-load configuration and implementation objects for all runs in parallel.

  • update Update configuration values for all runs in the collection.

  • concat Concatenate the results of a function applied to all runs in the collection.

  • iterdir Iterate over the artifact directories for all runs in the collection.

  • glob Glob the artifact directories for all runs in the collection.

source method RunCollection.preload(*, n_jobs: int = 0, cfg: bool = True, impl: bool = True)Self

Pre-load configuration and implementation objects for all runs in parallel.

This method eagerly evaluates the cfg and impl properties of all runs in the collection, potentially in parallel using joblib. This can significantly improve performance for subsequent operations that access these properties, as they will be already loaded in memory.

Parameters

  • n_jobs : int Number of parallel jobs to run.

    • 0: Run sequentially (default)
    • -1: Use all available CPU cores
    • 0: Use the specified number of cores

  • cfg : bool Whether to preload the configuration objects. Defaults to True.

  • impl : bool Whether to preload the implementation objects. Defaults to True.

Returns

  • Self The same RunCollection instance with preloaded configuration and implementation objects.

Note

The preloading is done using joblib's threading backend, which is suitable for I/O-bound tasks like loading configuration files and implementation objects.

Examples

# Preload all runs sequentially
runs.preload()

# Preload using all available cores
runs.preload(n_jobs=-1)

# Preload only configurations
runs.preload(impl=False)

# Preload only implementations
runs.preload(cfg=False)

source method RunCollection.update(key: str | tuple[str, ...], value: Any | Callable[[R], Any], *, force: bool = False)None

Update configuration values for all runs in the collection.

This method calls the update method on each run in the collection.

Parameters

  • key : str | tuple[str, ...] Either a string representing a single configuration path or a tuple of strings to set multiple configuration values.

  • value : Any | Callable[[R], Any] The value(s) to set or a callable that returns such values.

  • force : bool Whether to force updates even if the keys already exist.

source method RunCollection.concat(function: Callable[[R], DataFrame], *keys: str | tuple[str, Any | Callable[[R], Any]])DataFrame

Concatenate the results of a function applied to all runs in the collection.

This method applies the provided function to each run in the collection and concatenates the resulting DataFrames along the specified keys.

Parameters

  • function : Callable[[R], DataFrame] A function that takes a Run instance and returns a DataFrame.

  • keys : str | tuple[str, Any | Callable[[R], Any]] The keys to add to the DataFrame.

Returns

  • DataFrame A DataFrame representation of the Run collection.

source method RunCollection.iterdir(relative_dir: str = '')Iterator[Path]

Iterate over the artifact directories for all runs in the collection.

This method yields all files and directories in the specified relative directory for each run in the collection.

Parameters

  • relative_dir : str The relative directory within the artifacts directory to iterate over.

Yields

  • Path Each path in the specified directory for each run in the collection.

source method RunCollection.glob(pattern: str, relative_dir: str = '')Iterator[Path]

Glob the artifact directories for all runs in the collection.

This method yields all paths matching the specified pattern in the relative directory for each run in the collection.

Parameters

  • pattern : str The glob pattern to match files or directories.

  • relative_dir : str The relative directory within the artifacts directory to search in.

Yields

  • Path Each path matching the pattern for each run in the collection.

source chdir_artifact(run: Run)Iterator[Path]

Change the current working directory to the artifact directory of the given run.

This context manager changes the current working directory to the artifact directory of the given run. It ensures that the directory is changed back to the original directory after the context is exited.

Parameters

  • run : Run | None The run to get the artifact directory from.

source get_artifact_dir(run: Run)Path

Retrieve the artifact directory for the given run.

This function uses MLflow to get the artifact directory for the given run.

Parameters

  • run : Run | None The run instance. Defaults to None.

Returns

  • Path The local path to the directory where the artifacts are downloaded.

Raises

  • NotImplementedError

source get_experiment_names(tracking_dir: str | Path)list[str]

Get the experiment names from the tracking directory.

Returns

  • list[str] A list of experiment names sorted by the name.

source iter_artifact_paths(tracking_dir: str | Path, artifact_path: str | Path, experiment_names: str | list[str] | Callable[[str], bool] | None = None)Iterator[Path]

Iterate over the artifact paths in the tracking directory.

source iter_artifacts_dirs(tracking_dir: str | Path, experiment_names: str | list[str] | Callable[[str], bool] | None = None)Iterator[Path]

Iterate over the artifacts directories in the tracking directory.

source iter_experiment_dirs(tracking_dir: str | Path, experiment_names: str | list[str] | Callable[[str], bool] | None = None)Iterator[Path]

Iterate over the experiment directories in the tracking directory.

source iter_run_dirs(tracking_dir: str | Path, experiment_names: str | list[str] | Callable[[str], bool] | None = None)Iterator[Path]

Iterate over the run directories in the tracking directory.

source log_run(run: Run)Iterator[None]

Log the parameters from the given configuration instance.

This context manager logs the parameters from the provided configuration instance using MLflow. It also manages the MLflow run context, ensuring that artifacts are logged and the run is properly closed.

Parameters

  • run : Run The run instance.

Yields

  • None None

source main[C](node: C | type[C], config_name: str = 'config', *, chdir: bool = False, force_new_run: bool = False, match_overrides: bool = False, rerun_finished: bool = False, dry_run: bool = False, update: Callable[[C], C | None] | None = None)

Decorator for configuring and running MLflow experiments with Hydra.

This decorator combines Hydra configuration management with MLflow experiment tracking. It automatically handles run deduplication and configuration storage.

Parameters

  • node : C | type[C] Configuration node class or instance defining the structure of the configuration.

  • config_name : str Name of the configuration. Defaults to "config".

  • chdir : bool If True, changes working directory to the artifact directory of the run. Defaults to False.

  • force_new_run : bool If True, always creates a new MLflow run instead of reusing existing ones. Defaults to False.

  • match_overrides : bool If True, matches runs based on Hydra CLI overrides instead of full config. Defaults to False.

  • rerun_finished : bool If True, allows rerunning completed runs. Defaults to False.

  • dry_run : bool If True, starts the hydra job but does not run the application itself. This allows users to preview the configuration and settings without executing the actual run. Defaults to False.

  • update : Callable[[C], C | None] | None A function that takes a configuration and returns a new configuration or None. The function can modify the configuration in-place and/or return it. If the function returns None, the original (potentially modified) configuration is used. Changes made by this function are saved to the configuration file. This is useful for adding derived parameters, ensuring consistency between related values, or adding runtime information to the configuration. Defaults to None.

source start_run(*, chdir: bool = False, run_id: str | None = None, experiment_id: str | None = None, run_name: str | None = None, nested: bool = False, parent_run_id: str | None = None, tags: dict[str, str] | None = None, description: str | None = None, log_system_metrics: bool | None = None)Iterator[Run]

Start an MLflow run and log parameters using the provided configuration instance.

This context manager starts an MLflow run and logs parameters using the specified configuration instance. It ensures that the run is properly closed after completion.

Parameters

  • config : object The configuration instance to log parameters from.

  • chdir : bool Whether to change the current working directory to the artifact directory of the current run. Defaults to False.

  • run_id : str | None The existing run ID. Defaults to None.

  • experiment_id : str | None The experiment ID. Defaults to None.

  • run_name : str | None The name of the run. Defaults to None.

  • nested : bool Whether to allow nested runs. Defaults to False.

  • parent_run_id : str | None The parent run ID. Defaults to None.

  • tags : dict[str, str] | None Tags to associate with the run. Defaults to None.

  • description : str | None A description of the run. Defaults to None.

  • log_system_metrics : bool | None Whether to log system metrics. Defaults to None.

  • synchronous : bool | None Whether to log parameters synchronously. Defaults to None.

Yields

  • Run An MLflow Run instance representing the started run.