hydraflow
Integrate Hydra and MLflow to manage and track machine learning experiments.
Classes
-
Collection — A collection of items that implements the Sequence protocol.
-
Run — Represent an MLflow Run in HydraFlow.
-
RunCollection — A collection of Run instances that implements the Sequence protocol.
Functions
-
chdir_artifact — Change the current working directory to the artifact directory of the given run.
-
get_artifact_dir — Retrieve the artifact directory for the given run.
-
get_experiment_names — Get the experiment names from the tracking directory.
-
iter_artifact_paths — Iterate over the artifact paths in the tracking directory.
-
iter_artifacts_dirs — Iterate over the artifacts directories in the tracking directory.
-
iter_experiment_dirs — Iterate over the experiment directories in the tracking directory.
-
iter_run_dirs — Iterate over the run directories in the tracking directory.
-
log_run — Log the parameters from the given configuration instance.
-
main — Decorator for configuring and running MLflow experiments with Hydra.
-
start_run — Start an MLflow run and log parameters using the provided configuration instance.
source class Collection[I](items: Iterable[I], get: Callable[[I, str, Any | Callable[[I], Any]], Any] | None = None)
Bases : Sequence[I]
A collection of items that implements the Sequence protocol.
Methods
-
filter — Filter items based on criteria.
-
try_get — Try to get a single item matching the specified criteria.
-
get — Get a single item matching the specified criteria.
-
first — Get the first item matching the specified criteria.
-
last — Get the last item matching the specified criteria.
-
to_list — Extract a list of values for a specific key from all items.
-
to_numpy — Extract values for a specific key from all items as a NumPy array.
-
to_series — Extract values for a specific key from all items as a Polars series.
-
unique — Get the unique values for a specific key across all items.
-
n_unique — Count the number of unique values for a specific key across all items.
-
sort — Sort items based on one or more keys.
-
map — Apply a function to each item and return an iterator of results.
-
pmap — Apply a function to each item in parallel and return a list of results.
-
to_frame — Convert the collection to a Polars DataFrame.
-
group_by — Group items by one or more keys and return a GroupBy instance.
-
sample — Sample a random subset of items from the collection.
-
shuffle — Shuffle the items in the collection.
-
eq — Create a predicate function that checks if two attributes are equal.
-
ne — Create a predicate function that checks if two attributes are not equal.
-
gt — Create a predicate function that checks if the left > the right.
-
lt — Create a predicate function that checks if the left < the right.
-
ge — Create a predicate function that checks if the left >= the right.
-
le — Create a predicate function that checks if the left <= the right.
-
startswith — Create a predicate function that checks if an attribute starts with a prefix.
-
endswith — Create a predicate function that checks if an attribute ends with a suffix.
-
match — Create a predicate function that checks if an attribute matches a pattern.
source method Collection.filter(*criteria: Callable[[I], bool] | tuple[str, Any], **kwargs: Any) → Self
Filter items based on criteria.
This method allows filtering items using various criteria:
- Callable criteria that take an item and return a boolean
- Key-value tuples where the key is a string and the value
is compared using the
matches
function - Keyword arguments, where the key is a string and the value
is compared using the
matches
function
The matches
function supports the following comparison types:
- Callable: The predicate function is called with the value
- List/Set: Checks if the value is in the list/set
- Tuple of length 2: Checks if the value is in the range [min, max]
- Other: Checks for direct equality
Parameters
-
*criteria : Callable[[I], bool] | tuple[str, Any] — Callable criteria or (key, value) tuples for filtering.
-
**kwargs : Any — Additional key-value pairs for filtering.
Returns
-
Self — A new Collection containing only the items that match all criteria.
Examples
# Filter using a callable
filtered = collection.filter(lambda x: x > 5)
# Filter using a key-value tuple
filtered = collection.filter(("age", 25))
# Filter using keyword arguments
filtered = collection.filter(age=25, name="John")
# Filter using range
filtered = collection.filter(("age", (20, 30)))
# Filter using list membership
filtered = collection.filter(("name", ["John", "Jane"]))
source method Collection.try_get(*criteria: Callable[[I], bool] | tuple[str, Any], **kwargs: Any) → I | None
Try to get a single item matching the specified criteria.
This method applies filters and returns a single matching item if exactly one is found, None if no items are found, or raises ValueError if multiple items match.
Parameters
-
*criteria : Callable[[I], bool] | tuple[str, Any] — Callable criteria or (key, value) tuples for filtering.
-
**kwargs : Any — Additional key-value pairs for filtering.
Returns
-
I | None — A single item that matches the criteria, or None if no matches are found.
Raises
-
ValueError — If multiple items match the criteria.
source method Collection.get(*criteria: Callable[[I], bool] | tuple[str, Any], **kwargs: Any) → I
Get a single item matching the specified criteria.
This method applies filters and returns a single matching item, or raises ValueError if no items or multiple items match.
Parameters
-
*criteria : Callable[[I], bool] | tuple[str, Any] — Callable criteria or (key, value) tuples for filtering.
-
**kwargs : Any — Additional key-value pairs for filtering.
Returns
-
I — A single item that matches the criteria.
Raises
-
ValueError — If no items match or if multiple items match
-
the criteria.
-
_value_error
source method Collection.first(*criteria: Callable[[I], bool] | tuple[str, Any], **kwargs: Any) → I
Get the first item matching the specified criteria.
This method applies filters and returns the first matching item, or raises ValueError if no items match.
Parameters
-
*criteria : Callable[[I], bool] | tuple[str, Any] — Callable criteria or (key, value) tuples for filtering.
-
**kwargs : Any — Additional key-value pairs for filtering.
Returns
-
I — The first item that matches the criteria.
Raises
-
ValueError — If no items match the criteria.
-
_value_error
source method Collection.last(*criteria: Callable[[I], bool] | tuple[str, Any], **kwargs: Any) → I
Get the last item matching the specified criteria.
This method applies filters and returns the last matching item, or raises ValueError if no items match.
Parameters
-
*criteria : Callable[[I], bool] | tuple[str, Any] — Callable criteria or (key, value) tuples for filtering.
-
**kwargs : Any — Additional key-value pairs for filtering.
Returns
-
I — The last item that matches the criteria.
Raises
-
ValueError — If no items match the criteria.
-
_value_error
source method Collection.to_list(key: str, default: Any | Callable[[I], Any] = MISSING) → list[Any]
Extract a list of values for a specific key from all items.
Parameters
-
key : str — The key to extract from each item.
-
default : Any | Callable[[I], Any] — The default value to return if the key is not found. If a callable, it will be called with the item and the value returned will be used as the default.
Returns
-
list[Any] — A list containing the values for the specified key from each item.
source method Collection.to_numpy(key: str, default: Any | Callable[[I], Any] = MISSING) → NDArray
Extract values for a specific key from all items as a NumPy array.
Parameters
-
key : str — The key to extract from each item.
-
default : Any | Callable[[I], Any] — The default value to return if the key is not found. If a callable, it will be called with the item and the value returned will be used as the default.
Returns
-
NDArray — A NumPy array containing the values for the specified key from each item.
source method Collection.to_series(key: str, default: Any = MISSING, *, name: str | None = None) → Series
Extract values for a specific key from all items as a Polars series.
Parameters
-
key : str — The key to extract from each item.
-
default : Any — The default value to return if the key is not found. If a callable, it will be called with the item and the value returned will be used as the default.
-
name : str | None — The name of the series. If not provided, the key will be used.
Returns
-
Series — A Polars series containing the values for the specified key from each item.
source method Collection.unique(key: str, default: Any | Callable[[I], Any] = MISSING) → NDArray
Get the unique values for a specific key across all items.
Parameters
-
key : str — The key to extract unique values for.
-
default : Any | Callable[[I], Any] — The default value to return if the key is not found. If a callable, it will be called with the item and the value returned will be used as the default.
Returns
-
NDArray — A NumPy array containing the unique values for the specified key.
source method Collection.n_unique(key: str, default: Any | Callable[[I], Any] = MISSING) → int
Count the number of unique values for a specific key across all items.
Parameters
-
key : str — The key to count unique values for.
-
default : Any | Callable[[I], Any] — The default value to return if the key is not found. If a callable, it will be called with the item and the value returned will be used as the default.
Returns
-
int — The number of unique values for the specified key.
source method Collection.sort(*keys: str, reverse: bool = False) → Self
Sort items based on one or more keys.
Parameters
-
*keys : str — The keys to sort by, in order of priority.
-
reverse : bool — Whether to sort in descending order (default is ascending).
Returns
-
Self — A new Collection with the items sorted according to the specified keys.
source method Collection.map[**P, R](function: Callable[Concatenate[I, P], R], *args: P.args, **kwargs: P.kwargs) → Iterator[R]
Apply a function to each item and return an iterator of results.
This is a memory-efficient mapping operation that lazily evaluates results. Ideal for large collections where memory usage is a concern.
Parameters
-
function : Callable[Concatenate[I, P], R] — Function to apply to each item. The item is passed as the first argument.
-
*args : P.args — Additional positional arguments to pass to the function.
-
**kwargs : P.kwargs — Additional keyword arguments to pass to the function.
Returns
-
Iterator[R] — An iterator of the function's results.
Examples
# Process results one at a time
for result in collection.map(process_item, additional_arg):
handle_result(result)
# Convert to list if needed
results = list(collection.map(transform_item))
source method Collection.pmap[**P, R](function: Callable[Concatenate[I, P], R], n_jobs: int = -1, backend: str = 'multiprocessing', progress: bool = False, *args: P.args, **kwargs: P.kwargs) → list[R]
Apply a function to each item in parallel and return a list of results.
This method processes items concurrently for improved performance on CPU-bound or I/O-bound operations, depending on the backend.
Parameters
-
function : Callable[Concatenate[I, P], R] — Function to apply to each item. The item is passed as the first argument.
-
n_jobs : int — Number of jobs to run in parallel. -1 means using all processors.
-
backend : str — Parallelization backend.
-
progress : bool — Whether to display a progress bar.
-
*args : P.args — Additional positional arguments to pass to the function.
-
**kwargs : P.kwargs — Additional keyword arguments to pass to the function.
Returns
-
list[R] — A list containing all results of the function applications.
Examples
# Process all items in parallel using all cores
results = collection.pmap(heavy_computation)
# Specify number of parallel jobs and backend
results = collection.pmap(process_files, n_jobs=4, backend="threading")
source method Collection.to_frame(*keys: str | tuple[str, Any | Callable[[I], Any]], defaults: dict[str, Any | Callable[[I], Any]] | None = None, n_jobs: int = 0, backend: str = 'multiprocessing', progress: bool = False, **kwargs: Callable[[I], Any]) → DataFrame
Convert the collection to a Polars DataFrame.
This method converts the items in the collection into a Polars DataFrame. It allows specifying multiple keys, where each key can be a string or a tuple. If a tuple is provided, the first element is treated as the key and the second element as the default value for that key.
Parameters
-
*keys : str | tuple[str, Any | Callable[[I], Any]] — The keys to include as columns in the DataFrame. If a tuple is provided, the first element is the key and the second element is the default value.
-
defaults : dict[str, Any | Callable[[I], Any]] | None — Default values for the keys. If a callable, it will be called with the item and the value returned will be used as the default.
-
n_jobs : int — Number of jobs to run in parallel. 0 means no parallelization. Default to 0.
-
backend : str — Parallelization backend.
-
progress : bool — Whether to display a progress bar.
-
**kwargs : Callable[[I], Any] — Additional columns to compute using callables that take an item and return a value.
Returns
-
DataFrame — A Polars DataFrame containing the specified data from the items.
Examples
# Convert to DataFrame with single keys
df = collection.to_frame("name", "age")
# Convert to DataFrame with keys and default values
df = collection.to_frame(("name", "Unknown"), ("age", 0))
source method Collection.group_by(*by: str) → GroupBy[Self, I]
Group items by one or more keys and return a GroupBy instance.
This method organizes items into groups based on the specified keys and returns a GroupBy instance that contains the grouped collections. The GroupBy instance behaves like a dictionary, allowing access to collections for each group key.
Parameters
-
*by : str — The keys to group by. If a single key is provided, its value will be used as the group key. If multiple keys are provided, a tuple of their values will be used as the group key. Keys can use dot notation (e.g., "model.type") to access nested configuration values.
Returns
-
GroupBy[Self, I] — A GroupBy instance containing the grouped items. Each group is a collection of the same type as the original.
source method Collection.sample(k: int, seed: int | None = None) → Self
Sample a random subset of items from the collection.
This method returns a new collection containing a random sample of items from the original collection. The sample is drawn without replacement, meaning each item can only appear once in the sample.
Parameters
-
k : int — The number of items to sample.
-
seed : int | None — The seed for the random number generator. If provided, the sample will be reproducible.
Returns
-
Self — A new collection containing a random sample of items.
Raises
-
ValueError — If the sample size is greater than the collection size.
source method Collection.shuffle(seed: int | None = None) → Self
Shuffle the items in the collection.
This method returns a new collection with the items in random order.
Parameters
-
seed : int | None — The seed for the random number generator. If provided, the sample will be reproducible.
Returns
-
Self — A new collection containing the items in random order.
source method Collection.eq(left: str, right: str, *, default: Any | Callable[[I], Any] = MISSING) → Callable[[I], bool]
Create a predicate function that checks if two attributes are equal.
Parameters
-
left : str — The name of the left attribute to compare.
-
right : str — The name of the right attribute to compare.
-
default : Any | Callable[[I], Any], optional — The default value to use if either attribute is not found. If callable, it will be called with the item.
Returns
-
Callable[[I], bool] — A function that takes an item and returns True if the values of the specified attributes are equal.
Examples
# Find items where attribute 'a' equals attribute 'b'
equal_items = collection.filter(collection.eq('a', 'b'))
source method Collection.ne(left: str, right: str, *, default: Any | Callable[[I], Any] = MISSING) → Callable[[I], bool]
Create a predicate function that checks if two attributes are not equal.
Parameters
-
left : str — The name of the left attribute to compare.
-
right : str — The name of the right attribute to compare.
-
default : Any | Callable[[I], Any], optional — The default value to use if either attribute is not found. If callable, it will be called with the item.
Returns
-
Callable[[I], bool] — A function that takes an item and returns True if the values of the specified attributes are not equal.
Examples
# Find items where attribute 'a' is not equal to attribute 'b'
unequal_items = collection.filter(collection.ne('a', 'b'))
source method Collection.gt(left: str, right: str, *, default: Any | Callable[[I], Any] = MISSING) → Callable[[I], bool]
Create a predicate function that checks if the left > the right.
Parameters
-
left : str — The name of the left attribute to compare.
-
right : str — The name of the right attribute to compare.
-
default : Any | Callable[[I], Any], optional — The default value to use if either attribute is not found. If callable, it will be called with the item.
Returns
-
Callable[[I], bool] — A function that takes an item and returns True if the left attribute value is greater than the right attribute value.
Examples
# Find items where attribute 'a' is greater than attribute 'b'
items = collection.filter(collection.gt('a', 'b'))
source method Collection.lt(left: str, right: str, *, default: Any | Callable[[I], Any] = MISSING) → Callable[[I], bool]
Create a predicate function that checks if the left < the right.
Parameters
-
left : str — The name of the left attribute to compare.
-
right : str — The name of the right attribute to compare.
-
default : Any | Callable[[I], Any], optional — The default value to use if either attribute is not found. If callable, it will be called with the item.
Returns
-
Callable[[I], bool] — A function that takes an item and returns True if the left attribute value is less than the right attribute value.
Examples
# Find items where attribute 'a' is less than attribute 'b'
items = collection.filter(collection.lt('a', 'b'))
source method Collection.ge(left: str, right: str, *, default: Any | Callable[[I], Any] = MISSING) → Callable[[I], bool]
Create a predicate function that checks if the left >= the right.
Parameters
-
left : str — The name of the left attribute to compare.
-
right : str — The name of the right attribute to compare.
-
default : Any | Callable[[I], Any], optional — The default value.
Returns
-
Callable[[I], bool] — A predicate function for filtering.
source method Collection.le(left: str, right: str, *, default: Any | Callable[[I], Any] = MISSING) → Callable[[I], bool]
Create a predicate function that checks if the left <= the right.
Parameters
-
left : str — The name of the left attribute to compare.
-
right : str — The name of the right attribute to compare.
-
default : Any | Callable[[I], Any], optional — The default value.
Returns
-
Callable[[I], bool] — A predicate function for filtering.
source method Collection.startswith(key: str, prefix: str, *, default: Any | Callable[[I], Any] = MISSING) → Callable[[I], bool]
Create a predicate function that checks if an attribute starts with a prefix.
Parameters
-
key : str — The name of the attribute to check.
-
prefix : str — The prefix to check for.
-
default : Any | Callable[[I], Any], optional — The default value.
Returns
-
Callable[[I], bool] — A predicate function for filtering.
source method Collection.endswith(key: str, suffix: str, *, default: Any | Callable[[I], Any] = MISSING) → Callable[[I], bool]
Create a predicate function that checks if an attribute ends with a suffix.
Parameters
-
key : str — The name of the attribute to check.
-
suffix : str — The suffix to check for.
-
default : Any | Callable[[I], Any], optional — The default value.
Returns
-
Callable[[I], bool] — A predicate function for filtering.
source method Collection.match(key: str, pattern: str | Pattern[str], *, default: Any | Callable[[I], Any] = MISSING, flags: _FlagsType = 0) → Callable[[I], bool]
Create a predicate function that checks if an attribute matches a pattern.
Parameters
-
key : str — The name of the attribute to check.
-
pattern : str | re.Pattern — The pattern to check for.
-
default : Any | Callable[[I], Any], optional — The default value.
-
flags : re.RegexFlag, optional — Flags for the regex pattern.
Returns
-
Callable[[I], bool] — A predicate function for filtering.
source class Run[C, I = None](run_dir: Path, impl_factory: Callable[[Path], I] | Callable[[Path, C], I] | None = None)
Represent an MLflow Run in HydraFlow.
A Run contains information about the run, configuration, and implementation. The configuration type C and implementation type I are specified as type parameters.
Attributes
-
info : RunInfo — Information about the run, such as run directory, run ID, and job name.
-
impl_factory : Callable[[Path], I] | Callable[[Path, C], I] — Factory function to create the implementation instance.
-
cfg : C — The configuration instance loaded from the Hydra configuration file.
-
impl : I — The implementation instance created by the factory function.
Methods
-
load — Load a Run from a run directory.
-
update — Set default value(s) in the configuration if they don't already exist.
-
get — Get a value from the information or configuration.
-
lit — Create a Polars literal expression from a run key.
-
to_frame — Convert the Run to a DataFrame.
-
to_dict — Convert the Run to a dictionary.
-
chdir — Change the current working directory to the artifact directory.
-
path — Return the path relative to the artifact directory.
-
iterdir — Iterate over the artifact directories for the run.
-
glob — Glob the artifact directories for the run.
The configuration instance loaded from the Hydra configuration file.
The implementation instance created by the factory function.
This property dynamically examines the signature of the impl_factory using the inspect module and calls it with the appropriate arguments:
- If the factory accepts one parameter: called with just the artifacts directory
- If the factory accepts two parameters: called with the artifacts directory and the configuration instance
This allows implementation classes to be configuration-aware and utilize both the file system and configuration information.
source classmethod Run.load(run_dir: str | Path | Iterable[str | Path], impl_factory: Callable[[Path], I] | Callable[[Path, C], I] | None = None, *, n_jobs: int = 0) → Self | RunCollection[Self]
Load a Run from a run directory.
Parameters
-
run_dir : str | Path | Iterable[str | Path] — The directory where the MLflow runs are stored, either as a string, a Path instance, or an iterable of them.
-
impl_factory : Callable[[Path], I] | Callable[[Path, C], I] | None — A factory function that creates the implementation instance. It can accept either just the artifacts directory path, or both the path and the configuration instance. Defaults to None, in which case a function that returns None is used.
-
n_jobs : int — The number of parallel jobs. If 0 (default), runs sequentially. If -1, uses all available CPU cores.
Returns
-
Self | RunCollection[Self] — A single Run instance or a RunCollection of Run instances.
source method Run.update(key: str | tuple[str, ...], value: Any | Callable[[Self], Any], *, force: bool = False) → None
Set default value(s) in the configuration if they don't already exist.
This method adds a value or multiple values to the configuration, but only if the corresponding keys don't already have values. Existing values will not be modified.
Parameters
-
key : str | tuple[str, ...] — Either a string representing a single configuration path (can use dot notation like "section.subsection.param"), or a tuple of strings to set multiple related configuration values at once.
-
value : Any | Callable[[Self], Any] — The value to set. This can be:
- For string keys: Any value, or a callable that returns a value
- For tuple keys: An iterable with the same length as the key tuple, or a callable that returns such an iterable
- For callable values: The callable must accept a single argument of type Run (self) and return the appropriate value type
-
force : bool — Whether to force the update even if the key already exists.
Raises
-
TypeError — If a tuple key is provided but the value is not an iterable, or if the callable doesn't return an iterable.
source method Run.get(key: str, default: Any | Callable[[Self], Any] = MISSING) → Any
Get a value from the information or configuration.
Parameters
-
key : str — The key to look for. Can use dot notation for nested keys in configuration. Special keys:
- "cfg": Returns the configuration object
- "impl": Returns the implementation object
- "info": Returns the run information object
-
default : Any | Callable[[Self], Any] — Value to return if the key is not found. If a callable, it will be called with the Run instance and the value returned will be used as the default. If not provided, AttributeError will be raised.
Returns
-
Any — The value associated with the key, or the default value if the key is not found and a default is provided.
Raises
-
AttributeError — If the key is not found and no default is provided.
source method Run.lit(key: str, default: Any | Callable[[Self], Any] = MISSING, *, dtype: PolarsDataType | None = None) → Expr
Create a Polars literal expression from a run key.
Parameters
-
key : str — The key to look up in the run's configuration or info.
-
default : Any | Callable[[Run], Any], optional — Default value to use if the key is missing. If a callable is provided, it will be called with the Run instance.
-
dtype : PolarsDataType | None — Explicit data type for the literal expression.
Returns
-
Expr — A Polars literal expression aliased to the provided key.
Raises
-
AttributeError — If the key is not found and no default is provided.
source method Run.to_frame(function: Callable[[Self], DataFrame], *keys: str | tuple[str, Any | Callable[[Self], Any]]) → DataFrame
Convert the Run to a DataFrame.
Parameters
Returns
-
DataFrame — A DataFrame representation of the Run.
source method Run.to_dict(flatten: bool = True) → dict[str, Any]
Convert the Run to a dictionary.
Parameters
-
flatten : bool, optional — If True, flattens nested dictionaries. Defaults to True.
Returns
-
dict[str, Any] — A dictionary representation of the Run's configuration.
Raises
-
TypeError
source method Run.chdir(relative_dir: str = '') → Iterator[Path]
Change the current working directory to the artifact directory.
This context manager changes the current working directory to the artifact directory of the run. It ensures that the directory is changed back to the original directory after the context is exited.
Parameters
-
relative_dir : str — The relative directory to the artifact directory. Defaults to an empty string.
Yields
-
Path — The artifact directory of the run.
source method Run.path(relative_path: str = '') → Path
Return the path relative to the artifact directory.
Parameters
-
relative_path : str — The relative path to the artifact directory.
Returns
-
Path — The path relative to the artifact directory.
source method Run.iterdir(relative_dir: str = '') → Iterator[Path]
Iterate over the artifact directories for the run.
Parameters
-
relative_dir : str — The relative directory to iterate over.
Yields
-
Path — The artifact directory for the run.
source method Run.glob(pattern: str, relative_dir: str = '') → Iterator[Path]
Glob the artifact directories for the run.
Parameters
-
pattern : str — The pattern to glob.
-
relative_dir : str — The relative directory to glob.
Yields
-
Path — The existing artifact paths that match the pattern.
source class RunCollection[R: Run[Any, Any]](items: Iterable[I], get: Callable[[I, str, Any | Callable[[I], Any]], Any] | None = None)
Bases : Collection[R]
A collection of Run instances that implements the Sequence protocol.
RunCollection provides methods for filtering, sorting, grouping, and analyzing runs, as well as converting run data to various formats such as DataFrames.
Parameters
-
runs : Iterable[Run] — An iterable of Run instances to include in the collection.
Methods
-
preload — Pre-load configuration and implementation objects for all runs in parallel.
-
update — Update configuration values for all runs in the collection.
-
concat — Concatenate the results of a function applied to all runs in the collection.
-
iterdir — Iterate over the artifact directories for all runs in the collection.
-
glob — Glob the artifact directories for all runs in the collection.
source method RunCollection.preload(*, n_jobs: int = 0, cfg: bool = True, impl: bool = True) → Self
Pre-load configuration and implementation objects for all runs in parallel.
This method eagerly evaluates the cfg and impl properties of all runs in the collection, potentially in parallel using joblib. This can significantly improve performance for subsequent operations that access these properties, as they will be already loaded in memory.
Parameters
-
n_jobs : int — Number of parallel jobs to run.
- 0: Run sequentially (default)
- -1: Use all available CPU cores
-
0: Use the specified number of cores
-
cfg : bool — Whether to preload the configuration objects. Defaults to True.
-
impl : bool — Whether to preload the implementation objects. Defaults to True.
Returns
-
Self — The same RunCollection instance with preloaded configuration and implementation objects.
Note
The preloading is done using joblib's threading backend, which is suitable for I/O-bound tasks like loading configuration files and implementation objects.
Examples
# Preload all runs sequentially
runs.preload()
# Preload using all available cores
runs.preload(n_jobs=-1)
# Preload only configurations
runs.preload(impl=False)
# Preload only implementations
runs.preload(cfg=False)
source method RunCollection.update(key: str | tuple[str, ...], value: Any | Callable[[R], Any], *, force: bool = False) → None
Update configuration values for all runs in the collection.
This method calls the update method on each run in the collection.
Parameters
-
key : str | tuple[str, ...] — Either a string representing a single configuration path or a tuple of strings to set multiple configuration values.
-
value : Any | Callable[[R], Any] — The value(s) to set or a callable that returns such values.
-
force : bool — Whether to force updates even if the keys already exist.
source method RunCollection.concat(function: Callable[[R], DataFrame], *keys: str | tuple[str, Any | Callable[[R], Any]]) → DataFrame
Concatenate the results of a function applied to all runs in the collection.
This method applies the provided function to each run in the collection and concatenates the resulting DataFrames along the specified keys.
Parameters
-
function : Callable[[R], DataFrame] — A function that takes a Run instance and returns a DataFrame.
-
keys : str | tuple[str, Any | Callable[[R], Any]] — The keys to add to the DataFrame.
Returns
-
DataFrame — A DataFrame representation of the Run collection.
source method RunCollection.iterdir(relative_dir: str = '') → Iterator[Path]
Iterate over the artifact directories for all runs in the collection.
This method yields all files and directories in the specified relative directory for each run in the collection.
Parameters
-
relative_dir : str — The relative directory within the artifacts directory to iterate over.
Yields
-
Path — Each path in the specified directory for each run in the collection.
source method RunCollection.glob(pattern: str, relative_dir: str = '') → Iterator[Path]
Glob the artifact directories for all runs in the collection.
This method yields all paths matching the specified pattern in the relative directory for each run in the collection.
Parameters
-
pattern : str — The glob pattern to match files or directories.
-
relative_dir : str — The relative directory within the artifacts directory to search in.
Yields
-
Path — Each path matching the pattern for each run in the collection.
source chdir_artifact(run: Run) → Iterator[Path]
Change the current working directory to the artifact directory of the given run.
This context manager changes the current working directory to the artifact directory of the given run. It ensures that the directory is changed back to the original directory after the context is exited.
Parameters
-
run : Run | None — The run to get the artifact directory from.
source get_artifact_dir(run: Run) → Path
Retrieve the artifact directory for the given run.
This function uses MLflow to get the artifact directory for the given run.
Parameters
-
run : Run | None — The run instance. Defaults to None.
Returns
-
Path — The local path to the directory where the artifacts are downloaded.
Raises
-
NotImplementedError
source get_experiment_names(tracking_dir: str | Path) → list[str]
Get the experiment names from the tracking directory.
Returns
-
list[str] — A list of experiment names sorted by the name.
source iter_artifact_paths(tracking_dir: str | Path, artifact_path: str | Path, experiment_names: str | list[str] | Callable[[str], bool] | None = None) → Iterator[Path]
Iterate over the artifact paths in the tracking directory.
source iter_artifacts_dirs(tracking_dir: str | Path, experiment_names: str | list[str] | Callable[[str], bool] | None = None) → Iterator[Path]
Iterate over the artifacts directories in the tracking directory.
source iter_experiment_dirs(tracking_dir: str | Path, experiment_names: str | list[str] | Callable[[str], bool] | None = None) → Iterator[Path]
Iterate over the experiment directories in the tracking directory.
source iter_run_dirs(tracking_dir: str | Path, experiment_names: str | list[str] | Callable[[str], bool] | None = None) → Iterator[Path]
Iterate over the run directories in the tracking directory.
source log_run(run: Run) → Iterator[None]
Log the parameters from the given configuration instance.
This context manager logs the parameters from the provided configuration instance using MLflow. It also manages the MLflow run context, ensuring that artifacts are logged and the run is properly closed.
Parameters
-
run : Run — The run instance.
Yields
-
None — None
source main[C](node: C | type[C], config_name: str = 'config', *, chdir: bool = False, force_new_run: bool = False, match_overrides: bool = False, rerun_finished: bool = False, dry_run: bool = False, update: Callable[[C], C | None] | None = None)
Decorator for configuring and running MLflow experiments with Hydra.
This decorator combines Hydra configuration management with MLflow experiment tracking. It automatically handles run deduplication and configuration storage.
Parameters
-
node : C | type[C] — Configuration node class or instance defining the structure of the configuration.
-
config_name : str — Name of the configuration. Defaults to "config".
-
chdir : bool — If True, changes working directory to the artifact directory of the run. Defaults to False.
-
force_new_run : bool — If True, always creates a new MLflow run instead of reusing existing ones. Defaults to False.
-
match_overrides : bool — If True, matches runs based on Hydra CLI overrides instead of full config. Defaults to False.
-
rerun_finished : bool — If True, allows rerunning completed runs. Defaults to False.
-
dry_run : bool — If True, starts the hydra job but does not run the application itself. This allows users to preview the configuration and settings without executing the actual run. Defaults to False.
-
update : Callable[[C], C | None] | None — A function that takes a configuration and returns a new configuration or None. The function can modify the configuration in-place and/or return it. If the function returns None, the original (potentially modified) configuration is used. Changes made by this function are saved to the configuration file. This is useful for adding derived parameters, ensuring consistency between related values, or adding runtime information to the configuration. Defaults to None.
source start_run(*, chdir: bool = False, run_id: str | None = None, experiment_id: str | None = None, run_name: str | None = None, nested: bool = False, parent_run_id: str | None = None, tags: dict[str, str] | None = None, description: str | None = None, log_system_metrics: bool | None = None) → Iterator[Run]
Start an MLflow run and log parameters using the provided configuration instance.
This context manager starts an MLflow run and logs parameters using the specified configuration instance. It ensures that the run is properly closed after completion.
Parameters
-
config : object — The configuration instance to log parameters from.
-
chdir : bool — Whether to change the current working directory to the artifact directory of the current run. Defaults to False.
-
run_id : str | None — The existing run ID. Defaults to None.
-
experiment_id : str | None — The experiment ID. Defaults to None.
-
run_name : str | None — The name of the run. Defaults to None.
-
nested : bool — Whether to allow nested runs. Defaults to False.
-
parent_run_id : str | None — The parent run ID. Defaults to None.
-
tags : dict[str, str] | None — Tags to associate with the run. Defaults to None.
-
description : str | None — A description of the run. Defaults to None.
-
log_system_metrics : bool | None — Whether to log system metrics. Defaults to None.
-
synchronous : bool | None — Whether to log parameters synchronously. Defaults to None.
Yields
-
Run — An MLflow Run instance representing the started run.