Skip to content

hydraflow

source package hydraflow

Integrate Hydra and MLflow to manage and track machine learning experiments.

Classes

Functions

  • chdir_artifact Change the current working directory to the artifact directory of the given run.

  • get_artifact_dir Retrieve the artifact directory for the given run.

  • get_artifact_path Retrieve the artifact path for the given run and path.

  • get_hydra_output_dir Retrieve the Hydra output directory for the given run.

  • iter_artifact_paths Iterate over the artifact paths in the root directory.

  • iter_artifacts_dirs Iterate over the artifacts directories in the root directory.

  • iter_experiment_dirs Iterate over the experiment directories in the root directory.

  • iter_run_dirs Iterate over the run directories in the root directory.

  • list_run_ids List all run IDs for the specified experiments.

  • list_run_paths List all run paths for the specified experiments.

  • list_runs List all runs for the specified experiments.

  • load_config Load the configuration for a given run.

  • log_run Log the parameters from the given configuration object.

  • main Decorator for configuring and running MLflow experiments with Hydra.

  • remove_run Remove the given run from the MLflow tracking server.

  • start_run Start an MLflow run and log parameters using the provided configuration object.

source dataclass RunCollection(_runs: list[Run])

Represent a collection of MLflow runs.

Provide methods to interact with the runs, such as filtering, retrieving specific runs, and accessing run information.

Key Features

  • Filtering: Easily filter runs based on various criteria.
  • Retrieval: Access specific runs by index or through methods.
  • Metadata: Access run metadata and associated information.

Attributes

Methods

  • one Get the only Run instance in the collection.

  • try_one Try to get the only Run instance in the collection.

  • first Get the first Run instance in the collection.

  • try_first Try to get the first Run instance in the collection.

  • last Get the last Run instance in the collection.

  • try_last Try to get the last Run instance in the collection.

  • filter Filter the Run instances based on the provided configuration.

  • get Retrieve a specific Run instance based on the provided configuration.

  • try_get Try to get a specific Run instance based on the provided configuration.

  • get_param_names Get the parameter names from the runs.

  • get_param_dict Get the parameter dictionary from the list of runs.

  • groupby Group runs by specified parameter names.

  • sort Sort the runs in the collection.

  • values Get the values of specified parameters from the runs.

  • sorted Sort the runs in the collection by specified parameter names.

source property RunCollection.info: RunCollectionInfo

An instance of RunCollectionInfo.

source property RunCollection.data: RunCollectionData

An instance of RunCollectionData.

source method RunCollection.one()Run

Get the only Run instance in the collection.

Returns

  • Run The only Run instance in the collection.

Raises

  • ValueError If the collection does not contain exactly one run.

source method RunCollection.try_one()Run | None

Try to get the only Run instance in the collection.

Returns

  • Run | None The only Run instance in the collection, or None if the collection does not contain exactly one run.

source method RunCollection.first()Run

Get the first Run instance in the collection.

Returns

  • Run The first Run instance in the collection.

Raises

  • ValueError If the collection is empty.

source method RunCollection.try_first()Run | None

Try to get the first Run instance in the collection.

Returns

  • Run | None The first Run instance in the collection, or None if the collection is empty.

source method RunCollection.last()Run

Get the last Run instance in the collection.

Returns

  • Run The last Run instance in the collection.

Raises

  • ValueError If the collection is empty.

source method RunCollection.try_last()Run | None

Try to get the last Run instance in the collection.

Returns

  • Run | None The last Run instance in the collection, or None if the collection is empty.

source method RunCollection.filter(config: object | Callable[[Run], bool] | None = None, *, select: list[str] | None = None, overrides: list[str] | None = None, status: str | list[str] | int | list[int] | None = None, **kwargs)RunCollection

Filter the Run instances based on the provided configuration.

This method filters the runs in the collection according to the specified configuration object and additional key-value pairs. The configuration object and key-value pairs should contain key-value pairs that correspond to the parameters of the runs. Only the runs that match all the specified parameters will be included in the returned RunCollection object.

The filtering supports

  • Exact matches for single values.
  • Membership checks for lists of values.
  • Range checks for tuples of two values (inclusive of both the lower and upper bound).
  • Callable that takes a Run object and returns a boolean value.

Parameters

  • config : object | Callable[[Run], bool] | None The configuration object to filter the runs. This can be any object that provides key-value pairs through the iter_params function, or a callable that takes a Run object and returns a boolean value.

  • select : list[str] | None The list of parameters to select.

  • overrides : list[str] | None The list of overrides to filter the runs.

  • status : str | list[str] | int | list[int] | None The status of the runs to filter.

  • **kwargs Additional key-value pairs to filter the runs.

Returns

source method RunCollection.get(config: object | Callable[[Run], bool] | None = None, **kwargs)Run

Retrieve a specific Run instance based on the provided configuration.

This method filters the runs in the collection according to the specified configuration object and returns the run that matches the provided parameters. If no run matches the criteria, or if more than one run matches the criteria, a ValueError is raised.

Parameters

  • config : object | Callable[[Run], bool] | None The configuration object to identify the run. This can be any object that provides key-value pairs through the iter_params function, or a callable that takes a Run object and returns a boolean value.

  • **kwargs Additional key-value pairs to filter the runs.

Returns

  • Run The Run instance that matches the provided configuration.

Raises

  • ValueError If no run matches the criteria or if more than one run

  • matches the criteria.

See Also

filter: Perform the actual filtering logic.

source method RunCollection.try_get(config: object | Callable[[Run], bool] | None = None, **kwargs)Run | None

Try to get a specific Run instance based on the provided configuration.

This method filters the runs in the collection according to the specified configuration object and returns the run that matches the provided parameters. If no run matches the criteria, None is returned. If more than one run matches the criteria, a ValueError is raised.

Parameters

  • config : object | Callable[[Run], bool] | None The configuration object to identify the run. This can be any object that provides key-value pairs through the iter_params function, or a callable that takes a Run object and returns a boolean value.

  • **kwargs Additional key-value pairs to filter the runs.

Returns

  • Run | None The Run instance that matches the provided configuration, or None if no runs match the criteria.

Raises

  • ValueError If more than one run matches the criteria.

See Also

filter: Perform the actual filtering logic.

source method RunCollection.get_param_names()list[str]

Get the parameter names from the runs.

This method extracts the unique parameter names from the provided list of runs. It iterates through each run and collects the parameter names into a set to ensure uniqueness.

Returns

  • list[str] A list of unique parameter names.

source method RunCollection.get_param_dict(*, drop_const: bool = False)dict[str, list[str]]

Get the parameter dictionary from the list of runs.

This method extracts the parameter names and their corresponding values from the provided list of runs. It iterates through each run and collects the parameter values into a dictionary where the keys are parameter names and the values are lists of parameter values.

Parameters

  • drop_const : bool If True, drop the parameter values that are constant across all runs.

Returns

  • dict[str, list[str]] A dictionary where the keys are parameter names and the values are lists of parameter values.

source method RunCollection.groupby(names: str | list[str])dict[str | None | tuple[str | None, ...], RunCollection]

Group runs by specified parameter names.

Group the runs in the collection based on the values of the specified parameters. Each unique combination of parameter values will form a key in the returned dictionary.

Parameters

  • names : str | list[str] The names of the parameters to group by. This can be a single parameter name or multiple names provided as separate arguments or as a list.

Returns

  • dict[str | None | tuple[str | None, ...], RunCollection] A dictionary where the keys are tuples of parameter values and the values are RunCollection objects containing the runs that match those parameter values.

source method RunCollection.sort(*, key: Callable[[Run], Any] | None = None, reverse: bool = False)None

Sort the runs in the collection.

Sort the runs in the collection according to the provided key function and optional reverse flag.

Parameters

  • key : Callable[[Run], Any] | None A function that takes a run and returns a value to sort by.

  • reverse : bool If True, sort in descending order.

source method RunCollection.values(names: str | list[str])list[Any]

Get the values of specified parameters from the runs.

Parameters

  • names : str | list[str] The names of the parameters to get the values. This can be a single parameter name or multiple names provided as separate arguments or as a list.

Returns

  • list[Any] A list of values for the specified parameters.

source method RunCollection.sorted(names: str | list[str], *, reverse: bool = False)RunCollection

Sort the runs in the collection by specified parameter names.

Sort the runs in the collection based on the values of the specified parameters.

Parameters

  • names : str | list[str] The names of the parameters to sort by. This can be a single parameter name or multiple names provided as separate arguments or as a list.

  • reverse : bool If True, sort in descending order.

source chdir_artifact(run: Run | None = None)Iterator[Path]

Change the current working directory to the artifact directory of the given run.

This context manager changes the current working directory to the artifact directory of the given run. It ensures that the directory is changed back to the original directory after the context is exited.

Parameters

  • run : Run | None The run to get the artifact directory from.

source get_artifact_dir(run: Run | None = None)Path

Retrieve the artifact directory for the given run.

This function uses MLflow to get the artifact directory for the given run.

Parameters

  • run : Run | None The run object. Defaults to None.

Returns

  • Path The local path to the directory where the artifacts are downloaded.

Raises

  • NotImplementedError

source get_artifact_path(run: Run | None, path: str)Path

Retrieve the artifact path for the given run and path.

This function uses MLflow to get the artifact path for the given run and path.

Parameters

  • run : Run | None The run object. Defaults to None.

  • path : str The path to the artifact.

Returns

  • Path The local path to the artifact.

source get_hydra_output_dir(run: Run | None = None)Path

Retrieve the Hydra output directory for the given run.

This function returns the Hydra output directory. If no run is provided, it retrieves the output directory from the current Hydra configuration. If a run is provided, it retrieves the artifact path for the run, loads the Hydra configuration from the downloaded artifacts, and returns the output directory specified in that configuration.

Parameters

  • run : Run | None The run object. Defaults to None.

Returns

  • Path The path to the Hydra output directory.

Raises

  • FileNotFoundError If the Hydra configuration file is not found in the artifacts.

source iter_artifact_paths(artifact_path: str | Path, experiment_names: str | list[str] | None = None, root_dir: str | Path | None = None)Iterator[Path]

Iterate over the artifact paths in the root directory.

source iter_artifacts_dirs(experiment_names: str | list[str] | None = None, root_dir: str | Path | None = None)Iterator[Path]

Iterate over the artifacts directories in the root directory.

source iter_experiment_dirs(experiment_names: str | list[str] | None = None, root_dir: str | Path | None = None)Iterator[Path]

Iterate over the experiment directories in the root directory.

source iter_run_dirs(experiment_names: str | list[str] | None = None, root_dir: str | Path | None = None)Iterator[Path]

Iterate over the run directories in the root directory.

source list_run_ids(experiment_names: str | list[str] | None = None)list[str]

List all run IDs for the specified experiments.

This function retrieves all runs for the given list of experiment names. If no experiment names are provided (None), the function will search all runs for all experiments except the "Default" experiment.

Parameters

  • experiment_names : list[str] | None List of experiment names to search for runs. If None is provided, the function will search all runs for all experiments except the "Default" experiment.

Returns

  • list[str] A list of run IDs for the specified experiments.

source list_run_paths(experiment_names: str | list[str] | None = None, *other: str)list[Path]

List all run paths for the specified experiments.

This function retrieves all run paths for the given list of experiment names. If no experiment names are provided (None), the function will search all runs for all experiments except the "Default" experiment.

Parameters

  • experiment_names : list[str] | None List of experiment names to search for runs. If None is provided, the function will search all runs for all experiments except the "Default" experiment.

  • *other : str The parts of the run directory to join.

Returns

  • list[Path] A list of run paths for the specified experiments.

source list_runs(experiment_names: str | list[str] | None = None, n_jobs: int = 0)RunCollection

List all runs for the specified experiments.

This function retrieves all runs for the given list of experiment names. If no experiment names are provided (None), the function will search all runs for all experiments except the "Default" experiment. The function returns the results as a RunCollection object.

Note

The returned runs are sorted by their start time in ascending order.

Parameters

  • experiment_names : list[str] | None List of experiment names to search for runs. If None is provided, the function will search all runs for all experiments except the "Default" experiment.

  • n_jobs : int The number of jobs to retrieve runs in parallel.

Returns

source load_config(run: Run)DictConfig

Load the configuration for a given run.

This function loads the configuration for the provided Run instance by downloading the configuration file from the MLflow artifacts and loading it using OmegaConf. It returns an empty config if .hydra/config.yaml is not found in the run's artifact directory.

Parameters

  • run : Run The Run instance for which to load the configuration.

Returns

  • DictConfig The loaded configuration as a DictConfig object. Returns an empty DictConfig if the configuration file is not found.

source log_run(config: object | None, *, synchronous: bool | None = None)Iterator[None]

Log the parameters from the given configuration object.

This context manager logs the parameters from the provided configuration object using MLflow. It also manages the MLflow run context, ensuring that artifacts are logged and the run is properly closed.

Parameters

  • config : object The configuration object to log the parameters from.

  • synchronous : bool | None Whether to log the parameters synchronously. Defaults to None.

Yields

  • None None

Example

with log_run(config):
    # Perform operations within the MLflow run context
    pass

source main(node: T | type[T], config_name: str = 'config', *, chdir: bool = False, force_new_run: bool = False, match_overrides: bool = False, rerun_finished: bool = False)

Decorator for configuring and running MLflow experiments with Hydra.

This decorator combines Hydra configuration management with MLflow experiment tracking. It automatically handles run deduplication and configuration storage.

Parameters

  • node : T | type[T] Configuration node class or instance defining the structure of the configuration.

  • config_name : str Name of the configuration. Defaults to "config".

  • chdir : bool If True, changes working directory to the artifact directory of the run. Defaults to False.

  • force_new_run : bool If True, always creates a new MLflow run instead of reusing existing ones. Defaults to False.

  • match_overrides : bool If True, matches runs based on Hydra CLI overrides instead of full config. Defaults to False.

  • rerun_finished : bool If True, allows rerunning completed runs. Defaults to False.

source remove_run(run: Run | Iterable[Run])None

Remove the given run from the MLflow tracking server.

source start_run(config: object, *, chdir: bool = False, run_id: str | None = None, experiment_id: str | None = None, run_name: str | None = None, nested: bool = False, parent_run_id: str | None = None, tags: dict[str, str] | None = None, description: str | None = None, log_system_metrics: bool | None = None, synchronous: bool | None = None)Iterator[Run]

Start an MLflow run and log parameters using the provided configuration object.

This context manager starts an MLflow run and logs parameters using the specified configuration object. It ensures that the run is properly closed after completion.

Parameters

  • config : object The configuration object to log parameters from.

  • chdir : bool Whether to change the current working directory to the artifact directory of the current run. Defaults to False.

  • run_id : str | None The existing run ID. Defaults to None.

  • experiment_id : str | None The experiment ID. Defaults to None.

  • run_name : str | None The name of the run. Defaults to None.

  • nested : bool Whether to allow nested runs. Defaults to False.

  • parent_run_id : str | None The parent run ID. Defaults to None.

  • tags : dict[str, str] | None Tags to associate with the run. Defaults to None.

  • description : str | None A description of the run. Defaults to None.

  • log_system_metrics : bool | None Whether to log system metrics. Defaults to None.

  • synchronous : bool | None Whether to log parameters synchronously. Defaults to None.

Yields

  • Run An MLflow Run object representing the started run.