Experiment arguments

Spiking-FullSubNet uses TOML configuration files (*.toml) to configure and manage experiments. Each experiment is configured by a *.toml file, which contains the experiment meta information, trainer, loss function, learning rate scheduler, optimizer, model, dataset, and acoustic features. the basename of the *.toml file is used as the experiment ID or identifier. You can track configuration changes using version control and reproduce experiments by using the same configuration file. For more information on TOML syntax, visit the TOML website.

Sample *.toml file

This sample file demonstrates many settings available for configuration in AudioZEN.

[meta]
save_dir = "sdnn_delays/exp"
seed = 0
use_amp = false
use_deterministic_algorithms = false

[trainer]
path = "trainer.Trainer"
[trainer.args]
max_epoch = 9999
clip_grad_norm_value = 5

[acoustics]
n_fft = 512
win_length = 256
sr = 16000
hop_length = 256

[loss]
path = "audiozen.loss.SoftDTWLoss"
[loss.args]
gamma = 0.1

[optimizer]
path = "torch.optim.RAdam"
[optimizer.args]
lr = 0.01
weight_decay = 1e-5

[model]
path = "model.Model"
[model.args]
threshold = 0.1
tau_grad = 0.1
scale_grad = 0.8
max_delay = 64
out_delay = 0

Check any experiment configuration file in the recipes directory for more details.

Configuration details

In the audiozen configuration file, we must contain the following sections:

  • meta: Configure the experiment meta information, such as save_dir, seed, etc.

  • trainer: Configure the trainer.

  • loss_function: Configure the loss function.

  • lr_scheduler: Configure the learning rate scheduler.

  • optimizer: Configure the optimizer.

  • model: Configure the model.

  • dataset: Configure the dataset.

  • acoustics: Configure the acoustic features.

meta section

The meta section is used to configure the meta information.

Item

Description

save_dir

The directory where the experiment is saved. The log information, model checkpoints, and enhanced audio files will be stored in this directory.

seed

The random seed used to initialize the random number generator.

use_amp

Whether to use automatic mixed precision (AMP) to accelerate the training.

use_deterministic_algorithms

Whether to use nondeterministic algorithms to accelerate the training. If it is True, the training will be slower but more reproducible.

trainer section

The trainer section is used to configure a trainer. It contains two parts: path and args. path is a string that specifies the path to the trainer class. args is a dictionary that specifies the arguments of the trainer class. It should be like:

[trainer]
path = "trainer.Trainer"
[trainer.args]
max_epochs = 100
clip_grad_norm_value = 5
...

In this example, AudioZEN will load a custom Trainer class from trainer.py in the python search path and initialize it with the arguments in the [trainer.args] section. You are able to use multiple ways to specify the path argument. See the next section for more details. In AudioZEN, Trainer class must be a subclass of audiozen.trainer.base_trainer.BaseTrainer. It supports the following arguments at least:

Item

Default

Description

debug

false

Whether to enable debug mode. If it is true, we will collect the happening time of NaN and Inf.

max_steps

999999999

The maximum number of steps to train.

max_epochs

9999

The maximum number of epochs to train. If max_steps is set, max_epochs will be ignored.

max_grad_norm

-1

The maximum norm of the gradients used for clipping. “-1” means no clipping.

save_max_score

true

Whether to find the best model by the maximum score.

save_ckpt_interval

1

The interval of saving checkpoints.

max_patience

10

The number of epochs with no improvement after which the training will be stopped.

plot_norm

true

Whether to plot the norm of the gradients.

validation_interval

1

The interval of validation.

max_num_checkpoints

10

The maximum number of checkpoints to keep. Saving too many checkpoints causes disk space to run out.

scheduler_name

"constant_schedule_with_warmup"

The name of the scheduler.

warmup_steps

0

The number of warmup steps.

warmup_ratio

0.0

The ratio of warmup steps. If warmup_steps is set, warmup_ratio will be ignored.

gradient_accumulation_steps

1

The number of gradient accumulation steps. It is used to simulate a larger batch size.

Loading a module by path argument

We support multiple ways to load the module by the path argument in the *.toml. For example, we have the following directory structure:

recipes/intel_ndns
├── README.md
├── run.py
└── sdnn_delays
    ├── baseline.toml
    ├── model.py
    └── trainer.py

In recipes/intel_ndns/sdnn_delays/baseline.toml, the path of the trainer is set to:

[trainer]
path = "sdnn_delays.trainer.Trainer"

In this case, we will call the Trainer class in the module recipes/intel_ndns/sdnn_delays/trainer. If we set the path to:

[trainer]
path = "audiozen.trainer.custom_trainer.CustomTrainer"

We will call the CustomTrainer class in audiozen/trainer/custom_trainer.py.

Important

If you want to get the Trainer in audiozen package, you must install it in editable way by pip install -e . first.

loss_function, optimizer, model, and dataset sections

loss_function, optimizer, model, dataset sections are used to configure the loss function, optimizer, model, and dataset, respectively. They have the same logic as the trainer section.

[loss_function]
path = "..."
[loss_function.args]
...

[optimizer]
path = "..."
[optimizer.args]
...

...

You may use the loss function provided by PyTorch or implement your own loss function. For example, the following configuration is used to configure the MSELoss of PyTorch:

[loss_function]
path = "torch.nn.MSELoss"
[loss_function.args]

Use a custom loss function from audiozen.loss:

[loss_function]
path = "audiozen.loss.MyLoss"
[loss_function.args]
weights = [1.0, 1.0]
...

Note

You must keep the [loss_function.args] section even this loss function does not need any arguments.

You may use the learning rate scheduler provided by PyTorch or implement your own learning rate scheduler. For example, the following configuration is used to configure the StepLR:

[lr_scheduler]
path = "torch.optim.lr_scheduler.StepLR"
[lr_scheduler.args]
step_size = 100
gamma = 0.5
...

acoustics section

The acoustics section is used to configure the acoustic features. These configurations are used for the whole project, like visualization, except for the dataloader and model sections. You are able to call them in any place of the customTrainer class.

Item

Description

sr

The sample rate of the audio.

n_fft

The number of FFT points.

hop_length

The number of samples between successive frames.

win_length

The length of the STFT window.