Skip to content

Analyzing Experiment Results

This tutorial demonstrates how to use HydraFlow's powerful analysis capabilities to work with your experiment results.

Prerequisites

Before you begin this tutorial, you should:

  1. Understand the basic structure of a HydraFlow application (from the Basic Application tutorial)
  2. Be familiar with the concept of job definitions (from the Automated Workflows tutorial)

Project Setup

We'll start by running several experiments that we can analyze. We'll execute the three jobs defined in the Automated Workflows tutorial:

$ hydraflow run job_sequential
$ hydraflow run job_parallel
$ hydraflow run job_submit
2025/04/19 02:40:35 INFO mlflow.tracking.fluent: Experiment with name 'job_sequential' does not exist. Creating a new experiment.
[2025-04-19 02:40:37,403][HYDRA] Launching 3 jobs locally                       
[2025-04-19 02:40:37,403][HYDRA]        #0 : width=100 height=100               
[2025-04-19 02:40:37,525][__main__][INFO] - d86e857a39fc4cdd9fd0ce08e42df95c    
[2025-04-19 02:40:37,525][__main__][INFO] - {'width': 100, 'height': 100}       
[2025-04-19 02:40:37,527][HYDRA]        #1 : width=100 height=200               
[2025-04-19 02:40:37,608][__main__][INFO] - 5dc783cd64ff414f9cd6e6fdf3dcd943    
[2025-04-19 02:40:37,608][__main__][INFO] - {'width': 100, 'height': 200}       
[2025-04-19 02:40:37,610][HYDRA]        #2 : width=100 height=300               
[2025-04-19 02:40:37,689][__main__][INFO] - 357f2fa01b5141a0b3193fefd958b7f7    
[2025-04-19 02:40:37,689][__main__][INFO] - {'width': 100, 'height': 300}       
[2025-04-19 02:40:40,134][HYDRA] Launching 3 jobs locally                       
[2025-04-19 02:40:40,135][HYDRA]        #0 : width=300 height=100               
[2025-04-19 02:40:40,259][__main__][INFO] - 39aefd01219d4c5ca50093b03a9b85b7    
[2025-04-19 02:40:40,259][__main__][INFO] - {'width': 300, 'height': 100}       
[2025-04-19 02:40:40,262][HYDRA]        #1 : width=300 height=200               
[2025-04-19 02:40:40,343][__main__][INFO] - ccd8d9e908e44dc5bbc41ae5bd563766    
[2025-04-19 02:40:40,344][__main__][INFO] - {'width': 300, 'height': 200}       
[2025-04-19 02:40:40,346][HYDRA]        #2 : width=300 height=300               
[2025-04-19 02:40:40,428][__main__][INFO] - d053d24b37fb4c6da57f605b0efa2954    
[2025-04-19 02:40:40,428][__main__][INFO] - {'width': 300, 'height': 300}       
  0:00:05 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00:00 2/2 100%
2025/04/19 02:40:43 INFO mlflow.tracking.fluent: Experiment with name 'job_parallel' does not exist. Creating a new experiment.
[2025-04-19 02:40:45,229][HYDRA]                                                
Joblib.Parallel(n_jobs=3,backend=loky,prefer=processes,require=None,verbose=0,ti
meout=None,pre_dispatch=2*n_jobs,batch_size=auto,temp_folder=None,max_nbytes=Non
e,mmap_mode=r) is launching 3 jobs                                              
[2025-04-19 02:40:45,229][HYDRA] Launching jobs, sweep output dir :             
multirun/01JS5YPW079BCQ4VFZVCQSBC51                                             
[2025-04-19 02:40:45,229][HYDRA]        #0 : width=200 height=100               
[2025-04-19 02:40:45,229][HYDRA]        #1 : width=200 height=200               
[2025-04-19 02:40:45,229][HYDRA]        #2 : width=200 height=300               
[2025-04-19 02:40:47,404][__main__][INFO] - 761f0adf37ad4fa0bd1c46dc4cbcb544    
[2025-04-19 02:40:47,405][__main__][INFO] - {'width': 200, 'height': 100}       
[2025-04-19 02:40:48,112][__main__][INFO] - aa2253ad414a4cee9df60994be27b9d7    
[2025-04-19 02:40:48,113][__main__][INFO] - {'width': 200, 'height': 200}       
[2025-04-19 02:40:48,139][__main__][INFO] - b6ee737229834d01900e6bfd34ae2f45    
[2025-04-19 02:40:48,139][__main__][INFO] - {'width': 200, 'height': 300}       
Exception ignored in: <function ResourceTracker.__del__ at 0x7f6b53c10540>      
Traceback (most recent call last):                                              
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 82, in __del__                                                
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 91, in _stop                                                  
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 116, in _stop_locked                                          
ChildProcessError: [Errno 10] No child processes                                
Exception ignored in: <function ResourceTracker.__del__ at 0x7f038e610540>      
Traceback (most recent call last):                                              
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 82, in __del__                                                
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 91, in _stop                                                  
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 116, in _stop_locked                                          
ChildProcessError: [Errno 10] No child processes                                
Exception ignored in: <function ResourceTracker.__del__ at 0x7f1dc2f00540>      
Traceback (most recent call last):                                              
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 82, in __del__                                                
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 91, in _stop                                                  
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 116, in _stop_locked                                          
ChildProcessError: [Errno 10] No child processes                                
[2025-04-19 02:40:51,191][HYDRA]                                                
Joblib.Parallel(n_jobs=3,backend=loky,prefer=processes,require=None,verbose=0,ti
meout=None,pre_dispatch=2*n_jobs,batch_size=auto,temp_folder=None,max_nbytes=Non
e,mmap_mode=r) is launching 3 jobs                                              
[2025-04-19 02:40:51,191][HYDRA] Launching jobs, sweep output dir :             
multirun/01JS5YPW07S12C8P106QWQQDVN                                             
[2025-04-19 02:40:51,191][HYDRA]        #0 : width=400 height=100               
[2025-04-19 02:40:51,191][HYDRA]        #1 : width=400 height=200               
[2025-04-19 02:40:51,191][HYDRA]        #2 : width=400 height=300               
[2025-04-19 02:40:53,464][__main__][INFO] - 62866281207a418999741386b3fd58a2    
[2025-04-19 02:40:53,464][__main__][INFO] - {'width': 400, 'height': 300}       
[2025-04-19 02:40:54,041][__main__][INFO] - 93a459ad7f5743f694d7904b5644a942    
[2025-04-19 02:40:54,042][__main__][INFO] - {'width': 400, 'height': 200}       
[2025-04-19 02:40:54,122][__main__][INFO] - 1ab44ca1852b42a59e391a8b60db2fa3    
[2025-04-19 02:40:54,122][__main__][INFO] - {'width': 400, 'height': 100}       
Exception ignored in: <function ResourceTracker.__del__ at 0x7fd7ee308540>      
Traceback (most recent call last):                                              
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 82, in __del__                                                
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 91, in _stop                                                  
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 116, in _stop_locked                                          
ChildProcessError: [Errno 10] No child processes                                
Exception ignored in: <function ResourceTracker.__del__ at 0x7f9156f04540>      
Traceback (most recent call last):                                              
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 82, in __del__                                                
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 91, in _stop                                                  
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 116, in _stop_locked                                          
ChildProcessError: [Errno 10] No child processes                                
Exception ignored in: <function ResourceTracker.__del__ at 0x7f737550c540>      
Traceback (most recent call last):                                              
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 82, in __del__                                                
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 91, in _stop                                                  
  File                                                                          
"/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/multiprocessing/resource_
tracker.py", line 116, in _stop_locked                                          
ChildProcessError: [Errno 10] No child processes                                
  0:00:11 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00:00 2/2 100%
2025/04/19 02:40:57 INFO mlflow.tracking.fluent: Experiment with name 'job_submit' does not exist. Creating a new experiment.
[2025-04-19 02:40:59,415][HYDRA] Launching 2 jobs locally
[2025-04-19 02:40:59,415][HYDRA]    #0 : width=250 height=150
[2025-04-19 02:40:59,537][__main__][INFO] - 559e7d06eaad4e899d8c32c9961dc2ec
[2025-04-19 02:40:59,538][__main__][INFO] - {'width': 250, 'height': 150}
[2025-04-19 02:40:59,540][HYDRA]    #1 : width=250 height=250
[2025-04-19 02:40:59,624][__main__][INFO] - 39293134219d4ed4ac7310962c39a0a0
[2025-04-19 02:40:59,624][__main__][INFO] - {'width': 250, 'height': 250}
[2025-04-19 02:41:02,055][HYDRA] Launching 2 jobs locally
[2025-04-19 02:41:02,055][HYDRA]    #0 : width=350 height=150
[2025-04-19 02:41:02,177][__main__][INFO] - 5de14c1276014df9bc97851f7b325904
[2025-04-19 02:41:02,177][__main__][INFO] - {'width': 350, 'height': 150}
[2025-04-19 02:41:02,179][HYDRA]    #1 : width=350 height=250
[2025-04-19 02:41:02,262][__main__][INFO] - c69985df3f494423a3a0af92839cb91d
[2025-04-19 02:41:02,262][__main__][INFO] - {'width': 350, 'height': 250}
['/home/runner/work/hydraflow/hydraflow/.venv/bin/python', 'example.py', '--multirun', 'width=250', 'height=150,250', 'hydra.job.name=job_submit', 'hydra.sweep.dir=multirun/01JS5YQ9VWJVD5GYDZT4GY787N']
['/home/runner/work/hydraflow/hydraflow/.venv/bin/python', 'example.py', '--multirun', 'width=350', 'height=150,250', 'hydra.job.name=job_submit', 'hydra.sweep.dir=multirun/01JS5YQ9VWPMXAF1AN0JFWQ7JN']

After running these commands, our project structure looks like this:

.
├── mlruns
│   ├── 0
│   │   └── meta.yaml
│   ├── 598909990868537605
│   │   ├── 1ab44ca1852b42a59e391a8b60db2fa3
│   │   ├── 62866281207a418999741386b3fd58a2
│   │   ├── 761f0adf37ad4fa0bd1c46dc4cbcb544
│   │   ├── 93a459ad7f5743f694d7904b5644a942
│   │   ├── aa2253ad414a4cee9df60994be27b9d7
│   │   ├── b6ee737229834d01900e6bfd34ae2f45
│   │   └── meta.yaml
│   ├── 847638656450324050
│   │   ├── 39293134219d4ed4ac7310962c39a0a0
│   │   ├── 559e7d06eaad4e899d8c32c9961dc2ec
│   │   ├── 5de14c1276014df9bc97851f7b325904
│   │   ├── c69985df3f494423a3a0af92839cb91d
│   │   └── meta.yaml
│   └── 919969478854325962
│       ├── 357f2fa01b5141a0b3193fefd958b7f7
│       ├── 39aefd01219d4c5ca50093b03a9b85b7
│       ├── 5dc783cd64ff414f9cd6e6fdf3dcd943
│       ├── ccd8d9e908e44dc5bbc41ae5bd563766
│       ├── d053d24b37fb4c6da57f605b0efa2954
│       ├── d86e857a39fc4cdd9fd0ce08e42df95c
│       └── meta.yaml
├── example.py
├── hydraflow.yaml
└── submit.py

The mlruns directory contains all our experiment data. Let's explore how to access and analyze this data using HydraFlow's API.

Discovering Runs

Finding Run Directories

HydraFlow provides the iter_run_dirs function to discover runs in your MLflow tracking directory:

>>> from hydraflow import iter_run_dirs
>>> run_dirs = list(iter_run_dirs("mlruns"))
>>> print(len(run_dirs))
>>> for run_dir in run_dirs[:4]:
...     print(run_dir)
16
mlruns/847638656450324050/c69985df3f494423a3a0af92839cb91d
mlruns/847638656450324050/559e7d06eaad4e899d8c32c9961dc2ec
mlruns/847638656450324050/39293134219d4ed4ac7310962c39a0a0
mlruns/847638656450324050/5de14c1276014df9bc97851f7b325904

This function finds all run directories in your MLflow tracking directory, making it easy to collect runs for analysis.

Filtering by Experiment Name

You can filter runs by experiment name to focus on specific experiments:

>>> print(len(list(iter_run_dirs("mlruns", "job_sequential"))))
>>> names = ["job_sequential", "job_parallel"]
>>> print(len(list(iter_run_dirs("mlruns", names))))
>>> print(len(list(iter_run_dirs("mlruns", "job_*"))))
6
12
16

As shown above, you can:

  • Filter by a single experiment name
  • Provide a list of experiment names
  • Use pattern matching with wildcards

Working with Individual Runs

Loading a Run

The Run class represents a single experiment run in HydraFlow:

>>> from hydraflow import Run
>>> run_dirs = iter_run_dirs("mlruns")
>>> run_dir = next(run_dirs)  # run_dirs is an iterator
>>> run = Run(run_dir)
>>> print(run)
>>> print(type(run))
Run('c69985df3f494423a3a0af92839cb91d')
<class 'hydraflow.core.run.Run'>

You can also use the load class method, which accepts both string paths and Path objects:

>>> Run.load(str(run_dir))
>>> print(run)
Run('c69985df3f494423a3a0af92839cb91d')

Accessing Run Information

Each Run instance provides access to run information and configuration:

>>> print(run.info.run_dir)
>>> print(run.info.run_id)
>>> print(run.info.job_name)  # Hydra job name = MLflow experiment name
mlruns/847638656450324050/c69985df3f494423a3a0af92839cb91d
c69985df3f494423a3a0af92839cb91d
job_submit

The configuration is available through the cfg attribute:

>>> print(run.cfg)
{'width': 350, 'height': 250}

Type-Safe Configuration Access

For better IDE integration and type checking, you can specify the configuration type:

from dataclasses import dataclass

@dataclass
class Config:
    width: int = 1024
    height: int = 768
>>> run = Run[Config](run_dir)
>>> print(run)
Run('c69985df3f494423a3a0af92839cb91d')

When you use Run[Config], your IDE will recognize run.cfg as having the specified type, enabling autocompletion and type checking.

Accessing Configuration Values

The get method provides a unified interface to access values from a run:

>>> print(run.get("width"))
>>> print(run.get("height"))
350
250

Adding Custom Implementations

Basic Implementation

You can extend runs with custom implementation classes to add domain-specific functionality:

from pathlib import Path

class Impl:
    root_dir: Path

    def __init__(self, root_dir: Path):
        self.root_dir = root_dir

    def __repr__(self) -> str:
        return f"Impl({self.root_dir.stem!r})"
>>> run = Run[Config, Impl](run_dir, Impl)
>>> print(run)
Run[Impl]('c69985df3f494423a3a0af92839cb91d')

The implementation is lazily initialized when you first access the impl attribute:

>>> print(run.impl)
>>> print(run.impl.root_dir)
Impl('artifacts')
mlruns/847638656450324050/c69985df3f494423a3a0af92839cb91d/artifacts

Configuration-Aware Implementation

Implementations can also access the run's configuration:

from dataclasses import dataclass, field

@dataclass
class Size:
    root_dir: Path = field(repr=False)
    cfg: Config

    @property
    def size(self) -> int:
        return self.cfg.width * self.cfg.height

    def is_large(self) -> bool:
        return self.size > 100000
>>> run = Run[Config, Size].load(run_dir, Size)
>>> print(run)
>>> print(run.impl)
>>> print(run.impl.size)
Run[Size]('c69985df3f494423a3a0af92839cb91d')
Size(cfg={'width': 350, 'height': 250})
87500

This allows you to define custom analysis methods that use both the run's artifacts and its configuration.

Working with Multiple Runs

Creating a Run Collection

The RunCollection class helps you analyze multiple runs:

>>> run_dirs = iter_run_dirs("mlruns")
>>> rc = Run[Config, Size].load(run_dirs, Size)
>>> print(rc)
RunCollection(Run[Size], n=16)

The load method automatically creates a RunCollection when given multiple run directories.

Basic Run Collection Operations

You can perform basic operations on a collection:

>>> print(rc.first())
>>> print(rc.last())
Run[Size]('c69985df3f494423a3a0af92839cb91d')
Run[Size]('d053d24b37fb4c6da57f605b0efa2954')

Filtering Runs

The filter method lets you select runs based on various criteria:

>>> print(rc.filter(width=400))
RunCollection(Run[Size], n=3)

You can use lists to filter by multiple values (OR logic):

>>> print(rc.filter(height=[100, 300]))
RunCollection(Run[Size], n=8)

Tuples create range filters (inclusive):

>>> print(rc.filter(height=(100, 300)))
RunCollection(Run[Size], n=16)

You can even use custom filter functions:

>>> print(rc.filter(lambda r: r.impl.is_large()))
RunCollection(Run[Size], n=1)

Finding Specific Runs

The get method returns a single run matching your criteria:

>>> run = rc.get(width=250, height=(100, 200))
>>> print(run)
>>> print(run.impl)
Run[Size]('559e7d06eaad4e899d8c32c9961dc2ec')
Size(cfg={'width': 250, 'height': 150})

Converting to DataFrames

For data analysis, you can convert runs to a Polars DataFrame:

>>> print(rc.to_frame("width", "height", "size"))
shape: (16, 3)
┌───────┬────────┬───────┐
│ width ┆ height ┆ size  │
│ ---   ┆ ---    ┆ ---   │
│ i64   ┆ i64    ┆ i64   │
╞═══════╪════════╪═══════╡
│ 350   ┆ 250    ┆ 87500 │
│ 250   ┆ 150    ┆ 37500 │
│ 250   ┆ 250    ┆ 62500 │
│ 350   ┆ 150    ┆ 52500 │
│ 400   ┆ 100    ┆ 40000 │
│ …     ┆ …      ┆ …     │
│ 100   ┆ 200    ┆ 20000 │
│ 300   ┆ 100    ┆ 30000 │
│ 100   ┆ 100    ┆ 10000 │
│ 100   ┆ 300    ┆ 30000 │
│ 300   ┆ 300    ┆ 90000 │
└───────┴────────┴───────┘

You can add custom columns using callables:

>>> print(rc.to_frame("width", "height", is_large=lambda r: r.impl.is_large()))
shape: (16, 3)
┌───────┬────────┬──────────┐
│ width ┆ height ┆ is_large │
│ ---   ┆ ---    ┆ ---      │
│ i64   ┆ i64    ┆ bool     │
╞═══════╪════════╪══════════╡
│ 350   ┆ 250    ┆ false    │
│ 250   ┆ 150    ┆ false    │
│ 250   ┆ 250    ┆ false    │
│ 350   ┆ 150    ┆ false    │
│ 400   ┆ 100    ┆ false    │
│ …     ┆ …      ┆ …        │
│ 100   ┆ 200    ┆ false    │
│ 300   ┆ 100    ┆ false    │
│ 100   ┆ 100    ┆ false    │
│ 100   ┆ 300    ┆ false    │
│ 300   ┆ 300    ┆ false    │
└───────┴────────┴──────────┘

Functions can return lists for multiple values:

>>> def to_list(run: Run) -> list[int]:
...     return [2 * run.get("width"), 3 * run.get("height")]
>>> print(rc.to_frame("width", from_list=to_list))
shape: (16, 2)
┌───────┬────────────┐
│ width ┆ from_list  │
│ ---   ┆ ---        │
│ i64   ┆ list[i64]  │
╞═══════╪════════════╡
│ 350   ┆ [700, 750] │
│ 250   ┆ [500, 450] │
│ 250   ┆ [500, 750] │
│ 350   ┆ [700, 450] │
│ 400   ┆ [800, 300] │
│ …     ┆ …          │
│ 100   ┆ [200, 600] │
│ 300   ┆ [600, 300] │
│ 100   ┆ [200, 300] │
│ 100   ┆ [200, 900] │
│ 300   ┆ [600, 900] │
└───────┴────────────┘

Or dictionaries for multiple named columns:

>>> def to_dict(run: Run) -> dict[int, str]:
...     width2 = 2 * run.get("width")
...     name = f"h{run.get('height')}"
...     return {"width2": width2, "name": name}
>>> print(rc.to_frame("width", from_dict=to_dict))
shape: (16, 2)
┌───────┬──────────────┐
│ width ┆ from_dict    │
│ ---   ┆ ---          │
│ i64   ┆ struct[2]    │
╞═══════╪══════════════╡
│ 350   ┆ {700,"h250"} │
│ 250   ┆ {500,"h150"} │
│ 250   ┆ {500,"h250"} │
│ 350   ┆ {700,"h150"} │
│ 400   ┆ {800,"h100"} │
│ …     ┆ …            │
│ 100   ┆ {200,"h200"} │
│ 300   ┆ {600,"h100"} │
│ 100   ┆ {200,"h100"} │
│ 100   ┆ {200,"h300"} │
│ 300   ┆ {600,"h300"} │
└───────┴──────────────┘

Grouping Runs

The group_by method organizes runs by common attributes:

>>> grouped = rc.group_by("width")
>>> for key, group in grouped.items():
...     print(key, group)
350 RunCollection(Run[Size], n=2)
250 RunCollection(Run[Size], n=2)
400 RunCollection(Run[Size], n=3)
200 RunCollection(Run[Size], n=3)
300 RunCollection(Run[Size], n=3)
100 RunCollection(Run[Size], n=3)

You can group by multiple keys:

>>> grouped = rc.group_by("width", "height")
>>> for key, group in grouped.items():
...     print(key, group)
(350, 250) RunCollection(Run[Size], n=1)
(250, 150) RunCollection(Run[Size], n=1)
(250, 250) RunCollection(Run[Size], n=1)
(350, 150) RunCollection(Run[Size], n=1)
(400, 100) RunCollection(Run[Size], n=1)
(400, 300) RunCollection(Run[Size], n=1)
(200, 100) RunCollection(Run[Size], n=1)
(400, 200) RunCollection(Run[Size], n=1)
(200, 200) RunCollection(Run[Size], n=1)
(200, 300) RunCollection(Run[Size], n=1)
(300, 200) RunCollection(Run[Size], n=1)
(100, 200) RunCollection(Run[Size], n=1)
(300, 100) RunCollection(Run[Size], n=1)
(100, 100) RunCollection(Run[Size], n=1)
(100, 300) RunCollection(Run[Size], n=1)
(300, 300) RunCollection(Run[Size], n=1)

Adding aggregation functions transforms the result into a DataFrame:

>>> df = rc.group_by("width", n=lambda runs: len(runs))
>>> print(df)
shape: (6, 2)
┌───────┬─────┐
│ width ┆ n   │
│ ---   ┆ --- │
│ i64   ┆ i64 │
╞═══════╪═════╡
│ 350   ┆ 2   │
│ 250   ┆ 2   │
│ 400   ┆ 3   │
│ 200   ┆ 3   │
│ 300   ┆ 3   │
│ 100   ┆ 3   │
└───────┴─────┘

Summary

In this tutorial, you've learned how to:

  1. Discover experiment runs in your MLflow tracking directory
  2. Load and access information from individual runs
  3. Add custom implementation classes for domain-specific analysis
  4. Filter, group, and analyze collections of runs
  5. Convert run data to DataFrames for advanced analysis

These capabilities enable you to efficiently analyze your experiments and extract valuable insights from your machine learning workflows.

Next Steps

Now that you understand HydraFlow's analysis capabilities, you can:

  • Dive deeper into the Run Class and Run Collection documentation
  • Explore advanced analysis techniques in the Analyzing Results section
  • Apply these analysis techniques to your own machine learning experiments