Skip to content

hydraflow

source package hydraflow

Integrate Hydra and MLflow to manage and track machine learning experiments.

Classes

Functions

  • chdir_artifact Change the current working directory to the artifact directory of the given run.

  • chdir_hydra_output Change the current working directory to the hydra output directory.

  • get_artifact_dir Retrieve the artifact directory for the given run.

  • get_artifact_path Retrieve the artifact path for the given run and path.

  • get_hydra_output_dir Retrieve the Hydra output directory for the given run.

  • get_overrides Retrieve the overrides for the current run.

  • list_runs List all runs for the specified experiments.

  • load_config Load the configuration for a given run.

  • load_overrides Load the overrides for a given run.

  • log_run Log the parameters from the given configuration object.

  • remove_run Remove the given run from the MLflow tracking server.

  • search_runs Search for Runs that fit the specified criteria.

  • select_config Select the given parameters from the configuration object.

  • select_overrides Select the given overrides from the configuration object.

  • set_experiment Set the experiment name and tracking URI optionally.

  • start_run Start an MLflow run and log parameters using the provided configuration object.

source dataclass RunCollection(_runs: list[Run])

Represent a collection of MLflow runs.

Provide methods to interact with the runs, such as filtering, retrieving specific runs, and accessing run information.

Key Features

  • Filtering: Easily filter runs based on various criteria.
  • Retrieval: Access specific runs by index or through methods.
  • Metadata: Access run metadata and associated information.

Attributes

Methods

  • from_list Create a RunCollection instance from a list of MLflow Run instances.

  • take Take the first n runs from the collection.

  • one Get the only Run instance in the collection.

  • try_one Try to get the only Run instance in the collection.

  • first Get the first Run instance in the collection.

  • try_first Try to get the first Run instance in the collection.

  • last Get the last Run instance in the collection.

  • try_last Try to get the last Run instance in the collection.

  • filter Filter the Run instances based on the provided configuration.

  • find Find the first Run instance based on the provided configuration.

  • try_find Try to find the first Run instance based on the provided configuration.

  • find_last Find the last Run instance based on the provided configuration.

  • try_find_last Try to find the last Run instance based on the provided configuration.

  • get Retrieve a specific Run instance based on the provided configuration.

  • try_get Try to get a specific Run instance based on the provided configuration.

  • get_param_names Get the parameter names from the runs.

  • get_param_dict Get the parameter dictionary from the list of runs.

  • map Return an iterator of results by applying a function to each run.

  • map_id Return an iterator of results by applying a function to each run id.

  • map_config Return an iterator of results by applying a function to each run config.

  • map_uri Return an iterator of results by applying a function to each artifact URI.

  • map_dir Return an iterator of results by applying a function to each artifact dir.

  • groupby Group runs by specified parameter names.

  • sort Sort the runs in the collection.

  • values Get the values of specified parameters from the runs.

  • sorted Sort the runs in the collection by specified parameter names.

source classmethod RunCollection.from_list(runs: list[Run])RunCollection

Create a RunCollection instance from a list of MLflow Run instances.

source property RunCollection.info: RunCollectionInfo

An instance of RunCollectionInfo.

source property RunCollection.data: RunCollectionData

An instance of RunCollectionData.

source method RunCollection.take(n: int)RunCollection

Take the first n runs from the collection.

If n is negative, the method returns the last n runs from the collection.

Parameters

  • n : int The number of runs to take. If n is negative, the method

  • returns the last n runs from the collection.

Returns

  • RunCollection A new RunCollection instance containing the first n runs if n is positive, or the last n runs if n is negative.

source method RunCollection.one()Run

Get the only Run instance in the collection.

Returns

  • Run The only Run instance in the collection.

Raises

  • ValueError If the collection does not contain exactly one run.

source method RunCollection.try_one()Run | None

Try to get the only Run instance in the collection.

Returns

  • Run | None The only Run instance in the collection, or None if the collection does not contain exactly one run.

source method RunCollection.first()Run

Get the first Run instance in the collection.

Returns

  • Run The first Run instance in the collection.

Raises

  • ValueError If the collection is empty.

source method RunCollection.try_first()Run | None

Try to get the first Run instance in the collection.

Returns

  • Run | None The first Run instance in the collection, or None if the collection is empty.

source method RunCollection.last()Run

Get the last Run instance in the collection.

Returns

  • Run The last Run instance in the collection.

Raises

  • ValueError If the collection is empty.

source method RunCollection.try_last()Run | None

Try to get the last Run instance in the collection.

Returns

  • Run | None The last Run instance in the collection, or None if the collection is empty.

source method RunCollection.filter(config: object | None = None, *, override: bool = False, select: list[str] | None = None, status: str | list[str] | int | list[int] | None = None, **kwargs)RunCollection

Filter the Run instances based on the provided configuration.

This method filters the runs in the collection according to the specified configuration object and additional key-value pairs. The configuration object and key-value pairs should contain key-value pairs that correspond to the parameters of the runs. Only the runs that match all the specified parameters will be included in the returned RunCollection object.

The filtering supports

  • Exact matches for single values.
  • Membership checks for lists of values.
  • Range checks for tuples of two values (inclusive of both the lower and upper bound).

Parameters

  • config : object | None The configuration object to filter the runs. This can be any object that provides key-value pairs through the iter_params function.

  • override : bool If True, override the configuration object with the provided key-value pairs.

  • select : list[str] | None The list of parameters to select.

  • status : str | list[str] | int | list[int] | None The status of the runs to filter.

  • **kwargs Additional key-value pairs to filter the runs.

Returns

source method RunCollection.find(config: object | None = None, **kwargs)Run

Find the first Run instance based on the provided configuration.

This method filters the runs in the collection according to the specified configuration object and returns the first run that matches the provided parameters. If no run matches the criteria, a ValueError is raised.

Parameters

  • config : object | None The configuration object to identify the run.

  • **kwargs Additional key-value pairs to filter the runs.

Returns

  • Run The first Run instance that matches the provided configuration.

Raises

  • ValueError If no run matches the criteria.

See Also

filter: Perform the actual filtering logic.

source method RunCollection.try_find(config: object | None = None, **kwargs)Run | None

Try to find the first Run instance based on the provided configuration.

This method filters the runs in the collection according to the specified configuration object and returns the first run that matches the provided parameters. If no run matches the criteria, None is returned.

Parameters

  • config : object | None The configuration object to identify the run.

  • **kwargs Additional key-value pairs to filter the runs.

Returns

  • Run | None The first Run instance that matches the provided configuration, or None if no runs match the criteria.

See Also

filter: Perform the actual filtering logic.

source method RunCollection.find_last(config: object | None = None, **kwargs)Run

Find the last Run instance based on the provided configuration.

This method filters the runs in the collection according to the specified configuration object and returns the last run that matches the provided parameters. If no run matches the criteria, a ValueError is raised.

Parameters

  • config : object | None The configuration object to identify the run.

  • **kwargs Additional key-value pairs to filter the runs.

Returns

  • Run The last Run instance that matches the provided configuration.

Raises

  • ValueError If no run matches the criteria.

See Also

filter: Perform the actual filtering logic.

source method RunCollection.try_find_last(config: object | None = None, **kwargs)Run | None

Try to find the last Run instance based on the provided configuration.

This method filters the runs in the collection according to the specified configuration object and returns the last run that matches the provided parameters. If no run matches the criteria, None is returned.

Parameters

  • config : object | None The configuration object to identify the run.

  • **kwargs Additional key-value pairs to filter the runs.

Returns

  • Run | None The last Run instance that matches the provided configuration, or None if no runs match the criteria.

See Also

filter: Perform the actual filtering logic.

source method RunCollection.get(config: object | None = None, **kwargs)Run

Retrieve a specific Run instance based on the provided configuration.

This method filters the runs in the collection according to the specified configuration object and returns the run that matches the provided parameters. If no run matches the criteria, or if more than one run matches the criteria, a ValueError is raised.

Parameters

  • config : object | None The configuration object to identify the run.

  • **kwargs Additional key-value pairs to filter the runs.

Returns

  • Run The Run instance that matches the provided configuration.

Raises

  • ValueError If no run matches the criteria or if more than one run

  • matches the criteria.

See Also

filter: Perform the actual filtering logic.

source method RunCollection.try_get(config: object | None = None, **kwargs)Run | None

Try to get a specific Run instance based on the provided configuration.

This method filters the runs in the collection according to the specified configuration object and returns the run that matches the provided parameters. If no run matches the criteria, None is returned. If more than one run matches the criteria, a ValueError is raised.

Parameters

  • config : object | None The configuration object to identify the run.

  • **kwargs Additional key-value pairs to filter the runs.

Returns

  • Run | None The Run instance that matches the provided configuration, or None if no runs match the criteria.

Raises

  • ValueError If more than one run matches the criteria.

See Also

filter: Perform the actual filtering logic.

source method RunCollection.get_param_names()list[str]

Get the parameter names from the runs.

This method extracts the unique parameter names from the provided list of runs. It iterates through each run and collects the parameter names into a set to ensure uniqueness.

Returns

  • list[str] A list of unique parameter names.

source method RunCollection.get_param_dict(*, drop_const: bool = False)dict[str, list[str]]

Get the parameter dictionary from the list of runs.

This method extracts the parameter names and their corresponding values from the provided list of runs. It iterates through each run and collects the parameter values into a dictionary where the keys are parameter names and the values are lists of parameter values.

Parameters

  • drop_const : bool If True, drop the parameter values that are constant across all runs.

Returns

  • dict[str, list[str]] A dictionary where the keys are parameter names and the values are lists of parameter values.

source method RunCollection.map(func: Callable[Concatenate[Run, P], T], *args: P.args, **kwargs: P.kwargs)Iterator[T]

Return an iterator of results by applying a function to each run.

This method iterates over each run in the collection and applies the provided function to it, along with any additional arguments and keyword arguments.

Parameters

  • func : Callable[[Run, P], T] A function that takes a run and additional arguments and returns a result.

  • *args : P.args Additional arguments to pass to the function.

  • **kwargs : P.kwargs Additional keyword arguments to pass to the function.

Yields

  • T Results obtained by applying the function to each run in the collection.

source method RunCollection.map_id(func: Callable[Concatenate[str, P], T], *args: P.args, **kwargs: P.kwargs)Iterator[T]

Return an iterator of results by applying a function to each run id.

Parameters

  • func : Callable[[str, P], T] A function that takes a run id and returns a result.

  • *args : P.args Additional arguments to pass to the function.

  • **kwargs : P.kwargs Additional keyword arguments to pass to the function.

Yields

  • T Results obtained by applying the function to each run id in the collection.

source method RunCollection.map_config(func: Callable[Concatenate[DictConfig, P], T], *args: P.args, **kwargs: P.kwargs)Iterator[T]

Return an iterator of results by applying a function to each run config.

Parameters

  • func : Callable[[DictConfig, P], T] A function that takes a run configuration and returns a result.

  • *args : P.args Additional arguments to pass to the function.

  • **kwargs : P.kwargs Additional keyword arguments to pass to the function.

Yields

  • T Results obtained by applying the function to each run configuration in the collection.

source method RunCollection.map_uri(func: Callable[Concatenate[str | None, P], T], *args: P.args, **kwargs: P.kwargs)Iterator[T]

Return an iterator of results by applying a function to each artifact URI.

Iterate over each run in the collection, retrieves the artifact URI, and apply the provided function to it. If a run does not have an artifact URI, None is passed to the function.

Parameters

  • func : Callable[[str | None, P], T] A function that takes an artifact URI (string or None) and returns a result.

  • *args : P.args Additional arguments to pass to the function.

  • **kwargs : P.kwargs Additional keyword arguments to pass to the function.

Yields

  • T Results obtained by applying the function to each artifact URI in the collection.

source method RunCollection.map_dir(func: Callable[Concatenate[Path, P], T], *args: P.args, **kwargs: P.kwargs)Iterator[T]

Return an iterator of results by applying a function to each artifact dir.

Iterate over each run in the collection, downloads the artifact directory, and apply the provided function to the directory path.

Parameters

  • func : Callable[[Path, P], T] A function that takes an artifact directory path (string) and returns a result.

  • *args : P.args Additional arguments to pass to the function.

  • **kwargs : P.kwargs Additional keyword arguments to pass to the function.

Yields

  • T Results obtained by applying the function to each artifact directory in the collection.

source method RunCollection.groupby(names: str | list[str])dict[str | None | tuple[str | None, ...], RunCollection]

Group runs by specified parameter names.

Group the runs in the collection based on the values of the specified parameters. Each unique combination of parameter values will form a key in the returned dictionary.

Parameters

  • names : str | list[str] The names of the parameters to group by. This can be a single parameter name or multiple names provided as separate arguments or as a list.

Returns

  • dict[str | None | tuple[str | None, ...], RunCollection] A dictionary where the keys are tuples of parameter values and the values are RunCollection objects containing the runs that match those parameter values.

source method RunCollection.sort(*, key: Callable[[Run], Any] | None = None, reverse: bool = False)None

Sort the runs in the collection.

Sort the runs in the collection according to the provided key function and optional reverse flag.

Parameters

  • key : Callable[[Run], Any] | None A function that takes a run and returns a value to sort by.

  • reverse : bool If True, sort in descending order.

source method RunCollection.values(names: str | list[str])list[Any]

Get the values of specified parameters from the runs.

Parameters

  • names : str | list[str] The names of the parameters to get the values. This can be a single parameter name or multiple names provided as separate arguments or as a list.

Returns

  • list[Any] A list of values for the specified parameters.

source method RunCollection.sorted(names: str | list[str], *, reverse: bool = False)RunCollection

Sort the runs in the collection by specified parameter names.

Sort the runs in the collection based on the values of the specified parameters.

Parameters

  • names : str | list[str] The names of the parameters to sort by. This can be a single parameter name or multiple names provided as separate arguments or as a list.

  • reverse : bool If True, sort in descending order.

source chdir_artifact(run: Run, artifact_path: str | None = None)Iterator[Path]

Change the current working directory to the artifact directory of the given run.

This context manager changes the current working directory to the artifact directory of the given run. It ensures that the directory is changed back to the original directory after the context is exited.

Parameters

  • run : Run The run to get the artifact directory from.

  • artifact_path : str | None The artifact path.

source chdir_hydra_output()Iterator[Path]

Change the current working directory to the hydra output directory.

This context manager changes the current working directory to the hydra output directory. It ensures that the directory is changed back to the original directory after the context is exited.

source get_artifact_dir(run: Run | None = None)Path

Retrieve the artifact directory for the given run.

This function uses MLflow to get the artifact directory for the given run.

Parameters

  • run : Run | None The run object. Defaults to None.

Returns

  • Path The local path to the directory where the artifacts are downloaded.

Raises

  • NotImplementedError

source get_artifact_path(run: Run | None, path: str)Path

Retrieve the artifact path for the given run and path.

This function uses MLflow to get the artifact path for the given run and path.

Parameters

  • run : Run | None The run object. Defaults to None.

  • path : str The path to the artifact.

Returns

  • Path The local path to the artifact.

source get_hydra_output_dir(run: Run | None = None)Path

Retrieve the Hydra output directory for the given run.

This function returns the Hydra output directory. If no run is provided, it retrieves the output directory from the current Hydra configuration. If a run is provided, it retrieves the artifact path for the run, loads the Hydra configuration from the downloaded artifacts, and returns the output directory specified in that configuration.

Parameters

  • run : Run | None The run object. Defaults to None.

Returns

  • Path The path to the Hydra output directory.

Raises

  • FileNotFoundError If the Hydra configuration file is not found in the artifacts.

source get_overrides()list[str]

Retrieve the overrides for the current run.

source list_runs(experiment_names: str | list[str] | None = None, n_jobs: int = 0, status: str | list[str] | int | list[int] | None = None)RunCollection

List all runs for the specified experiments.

This function retrieves all runs for the given list of experiment names. If no experiment names are provided (None), it defaults to searching all runs for the currently active experiment. If an empty list is provided, the function will search all runs for all experiments except the "Default" experiment. The function returns the results as a RunCollection object.

Note

The returned runs are sorted by their start time in ascending order.

Parameters

  • experiment_names : list[str] | None List of experiment names to search for runs. If None or an empty list is provided, the function will search the currently active experiment or all experiments except the "Default" experiment.

  • n_jobs : int The number of jobs to run in parallel. If 0, the function will search runs sequentially.

  • status : str | list[str] | int | list[int] | None The status of the runs to filter.

Returns

source load_config(run: Run)DictConfig

Load the configuration for a given run.

This function loads the configuration for the provided Run instance by downloading the configuration file from the MLflow artifacts and loading it using OmegaConf. It returns an empty config if .hydra/config.yaml is not found in the run's artifact directory.

Parameters

  • run : Run The Run instance for which to load the configuration.

Returns

  • DictConfig The loaded configuration as a DictConfig object. Returns an empty DictConfig if the configuration file is not found.

source load_overrides(run: Run)list[str]

Load the overrides for a given run.

This function loads the overrides for the provided Run instance by downloading the overrides file from the MLflow artifacts and loading it using OmegaConf. It returns an empty config if .hydra/overrides.yaml is not found in the run's artifact directory.

Parameters

  • run : Run The Run instance for which to load the overrides.

Returns

  • list[str] The loaded overrides as a list of strings. Returns an empty list if the overrides file is not found.

source log_run(config: object | None, *, synchronous: bool | None = None)Iterator[None]

Log the parameters from the given configuration object.

This context manager logs the parameters from the provided configuration object using MLflow. It also manages the MLflow run context, ensuring that artifacts are logged and the run is properly closed.

Parameters

  • config : object The configuration object to log the parameters from.

  • synchronous : bool | None Whether to log the parameters synchronously. Defaults to None.

Yields

  • None None

Example

with log_run(config):
    # Perform operations within the MLflow run context
    pass

source remove_run(run: Run | Iterable[Run])None

Remove the given run from the MLflow tracking server.

source search_runs(*, experiment_ids: list[str] | None = None, filter_string: str = '', run_view_type: int = ViewType.ACTIVE_ONLY, max_results: int = SEARCH_MAX_RESULTS_PANDAS, order_by: list[str] | None = None, search_all_experiments: bool = False, experiment_names: list[str] | None = None)RunCollection

Search for Runs that fit the specified criteria.

This function wraps the mlflow.search_runs function and returns the results as a RunCollection object. It allows for flexible searching of MLflow runs based on various criteria.

Note

The returned runs are sorted by their start time in ascending order.

Parameters

  • experiment_ids : list[str] | None List of experiment IDs. Search can work with experiment IDs or experiment names, but not both in the same call. Values other than None or [] will result in error if experiment_names is also not None or []. None will default to the active experiment if experiment_names is None or [].

  • filter_string : str Filter query string, defaults to searching all runs.

  • run_view_type : int one of enum values ACTIVE_ONLY, DELETED_ONLY, or ALL runs defined in :py:class:mlflow.entities.ViewType.

  • max_results : int The maximum number of runs to put in the dataframe. Default is 100,000 to avoid causing out-of-memory issues on the user's machine.

  • order_by : list[str] | None List of columns to order by (e.g., "metrics.rmse"). The order_by column can contain an optional DESC or ASC value. The default is ASC. The default ordering is to sort by start_time DESC, then run_id. start_time DESC, then run_id.

  • search_all_experiments : bool Boolean specifying whether all experiments should be searched. Only honored if experiment_ids is [] or None.

  • experiment_names : list[str] | None List of experiment names. Search can work with experiment IDs or experiment names, but not both in the same call. Values other than None or [] will result in error if experiment_ids is also not None or []. experiment_ids is also not None or []. None will default to the active experiment if experiment_ids is None or [].

Returns

source select_config(config: object, names: list[str])dict[str, Any]

Select the given parameters from the configuration object.

This function selects the given parameters from the configuration object and returns a new configuration object containing only the selected parameters.

Parameters

  • config : object The configuration object to select parameters from.

  • names : list[str] The names of the parameters to select.

Returns

  • DictConfig A new configuration object containing only the selected parameters.

source select_overrides(config: object)dict[str, Any]

Select the given overrides from the configuration object.

source set_experiment(prefix: str = '', suffix: str = '', uri: str | Path | None = None, name: str | None = None)Experiment

Set the experiment name and tracking URI optionally.

This function sets the experiment name by combining the given prefix, the job name from HydraConfig, and the given suffix. Optionally, it can also set the tracking URI.

Parameters

  • prefix : str The prefix to prepend to the experiment name.

  • suffix : str The suffix to append to the experiment name.

  • uri : str | Path | None The tracking URI to use. Defaults to None.

  • name : str | None The name of the experiment. Defaults to None.

Returns

  • Experiment An instance of mlflow.entities.Experiment representing the new active experiment.

source start_run(config: object, *, run_id: str | None = None, experiment_id: str | None = None, run_name: str | None = None, nested: bool = False, parent_run_id: str | None = None, tags: dict[str, str] | None = None, description: str | None = None, log_system_metrics: bool | None = None, synchronous: bool | None = None)Iterator[Run]

Start an MLflow run and log parameters using the provided configuration object.

This context manager starts an MLflow run and logs parameters using the specified configuration object. It ensures that the run is properly closed after completion.

Parameters

  • config : object The configuration object to log parameters from.

  • run_id : str | None The existing run ID. Defaults to None.

  • experiment_id : str | None The experiment ID. Defaults to None.

  • run_name : str | None The name of the run. Defaults to None.

  • nested : bool Whether to allow nested runs. Defaults to False.

  • parent_run_id : str | None The parent run ID. Defaults to None.

  • tags : dict[str, str] | None Tags to associate with the run. Defaults to None.

  • description : str | None A description of the run. Defaults to None.

  • log_system_metrics : bool | None Whether to log system metrics. Defaults to None.

  • synchronous : bool | None Whether to log parameters synchronously. Defaults to None.

Yields

  • Run An MLflow Run object representing the started run.

Example

with start_run(config) as run: # Perform operations within the MLflow run context pass

See Also

  • mlflow.start_run: The MLflow function to start a run directly.
  • log_run: A context manager to log parameters and manage the MLflow run context.