1,004 reads

How to Configure Experiments With Hydra - From an ML Enginner Perspective

by Ivan KharitonovOctober 7th, 2024

Too Long; Didn't Read

Machine learning experiments require extensive parametrization, including optimizer parameters, network architecture, and data augmentation. However, we strive for concise, readable code instead of a cumbersome 200 lines dedicated to argparse. Our goal is to focus on programming logic rather than threading new parameters through function signatures.

featured image - How to Configure Experiments With Hydra - From an ML Enginner Perspective

Additionally, we seek a structure that is easily expandable without burdening the project, ensuring the reproducibility of experiments.

Hydra offers a solution to these challenges. Below, you will find a basic guide on how to use it.

What is Hydra?

Hydra is a library with rich capabilities for managing configurations. The main site describes the name like this:

“The key feature is the ability to dynamically create a hierarchical configuration by composition and override it through config files and the command line.

The name Hydra comes from its ability to run multiple similar jobs - much like a Hydra with multiple heads.”

But I have my own interpretation of its name. It is just a combination of some tools that work together impressively well!

Hydra offers a seamless solution to the common headaches faced by ML engineers when attempting to replicate experiments. It elegantly replaces the need for argparse or YAML configurations, allowing access to parameters both from the command line and YAML files.

Consider the pain points:

Replicating experiments using argparse forces reliance on string inputs and complicates YAML config launches.

Relying solely on YAML files for configuration leads to duplication and potential errors when only a single parameter needs alteration.

Hydra addresses these issues by enabling dynamic configuration adjustments without the need for multiple bulky files or rigid command-line arguments.

Furthermore, it simplifies the process of passing complex configurations, such as model architectures or functions, directly from the config file to the model. This capability eliminates the tedious task of manually feeding parameters into the model, streamlining the workflow and reducing the margin for error.

Basic Setup

Let's imagine the simplest setup: multiclass classification on MNIST using an MLP. We have a configuration and a training script.

.
├── configs
│   └── config.yaml
└── main.py

The main script can look like this.

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

import hydra
from omegaconf import DictConfig

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(28*28, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 10)
        )
    
    def forward(self, x):
        return self.model(x.view(-1, 28*28))

@hydra.main(version_base=None, config_path=None, config_name="config")
def main(cfg: DictConfig):
    # Load MNIST dataset
    transform = transforms.Compose([
        transforms.ToTensor(), 
        transforms.Normalize((0.5,), (0.5,))
    ])
    train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
    train_loader = DataLoader(train_dataset, batch_size=cfg.batch_size, shuffle=True)

    # Initialize the network, loss function, and optimizer
    model = Net()
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(model.parameters(), lr=cfg.lr, momentum=cfg.momentum)

    # Train the network
    for epoch in range(cfg.epochs):  # loop over the dataset multiple times
        for i, (inputs, labels) in enumerate(train_loader, 0):
            optimizer.zero_grad()  # zero the parameter gradients
            outputs = model(inputs)  # forward pass
            loss = criterion(outputs, labels)  # calculate loss
            loss.backward()  # backward pass
            optimizer.step()  # optimize

    print('Finished Training')

if __name__ == "__main__":
    main()

The configuration has the following structure. It is the default, and parameters from it will be used by default if not specified otherwise.

# conf/config.yaml
batch_size: 64
lr: 0.01
momentum: 0.9
epochs: 1

To start training you can use:

python main.py

And for changing parameters, you can do it not only from YAML but also from cli:

python main.py lr=0.03

If you use it, parameters from cli override the first parameters.

As you can see, there is no argparse or additional middleware. It's easy to change parameters from the command line.

Pass Class Object With Hydra

We see that in class Net, the architecture can be highly customizable. Even more, we can use also another net for this, e.g. CNN. But we do not want to change our pipeline.

Hydra can construct almost any Python object with specified parameters. Let’s describe our net in another way, using YAML.

model:
  _target_: torch.nn.Sequential
  layers:
    - _target_: torch.nn.Flatten
    - _target_: torch.nn.Linear
      in_features: 784  # 28x28 images are flattened into 784
      out_features: 128
    - _target_: torch.nn.ReLU
    - _target_: torch.nn.Linear
      in_features: 128
      out_features: 64
    - _target_: torch.nn.ReLU
    - _target_: torch.nn.Linear
      in_features: 64
      out_features: 10

and for CNN

_target_: torch.nn.Sequential
  layers:
    - _target_: torch.nn.Conv2d
      in_channels: 1  # MNIST images are grayscale, so 1 input channel
      out_channels: 32  # Number of output channels
      kernel_size: 3  # Size of the convolutional kernel
      stride: 1
      padding: 1
    - _target_: torch.nn.ReLU
    - _target_: torch.nn.MaxPool2d
      kernel_size: 2  # Pooling window size
      stride: 2
    - _target_: torch.nn.Conv2d
      in_channels: 32
      out_channels: 64
      kernel_size: 3
      stride: 1
      padding: 1
    - _target_: torch.nn.ReLU
    - _target_: torch.nn.MaxPool2d
      kernel_size: 2
      stride: 2
    - _target_: torch.nn.Flatten  # Flatten the output for the fully connected layer
    - _target_: torch.nn.Linear
      in_features: 7*7*64  # Size after convolutions and pooling
      out_features: 128
    - _target_: torch.nn.ReLU
    - _target_: torch.nn.Dropout
      p: 0.5  # Dropout rate
    - _target_: torch.nn.Linear
      in_features: 128
      out_features: 10  # Number of classes in MNIST

Pass Function Through Config

Sometimes, our parameters can also function. For example,

How to Access Config Without Decorating Main - e.g., in ipython Notebook.

So, you can use Compose API.

from hydra import compose, initialize from omegaconf import OmegaConf

with initialize(version_base=None, config_path="conf", job_name="run_0001"):
  cfg = compose(config_name="config", overrides= ["parameter=value"])

print (OmegaConf.to_yaml(cfg))

Multi Runs

For example, you want to find some good hyperparam and Hydra allows a lot for that. Basically, you can launch several experiments sequentially like.

python train_model.py -m "batch_size=16,32,64"

It launches sequentially, but it is easy to extend to parallel if turned on some launcher like joblib.

python train_model.py -m "batch_size=16,32,64 hydra/launcher=joblib"

Run All Experiments From the Folder

for config_file in configs/*.yaml; do python main.py --config-name="${config_file}"; done

Resources

hydra - https://hydra.cc/
good template for DL - https://github.com/ashleve/lightning-hydra-template/tree/main

L O A D I N G
. . . comments & more!

About Author

Ivan Kharitonov@neer201

ML engineer at self-driving project

Read my stories About @neer201

TOPICS

machine-learning #deep-learning #mlops #ml-engineering #hydra #what-is-hydra #hydra-guide #hydra-tutorial #class-object-with-hydra

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave

Terminal

Lite

Join HackerNoon

Latest technology trends. Customized Experience. Curated Stories. Publish Your Ideas