Experiment arguments
Spiking-FullSubNet uses TOML configuration files (*.toml
) to configure and manage experiments.
Each experiment is configured by a *.toml
file, which contains the experiment meta information, trainer, loss function, learning rate scheduler, optimizer, model, dataset, and acoustic features. the basename of the *.toml
file is used as the experiment ID or identifier.
You can track configuration changes using version control and reproduce experiments by using the same configuration file. For more information on TOML syntax, visit the TOML website.
Sample *.toml
file
This sample file demonstrates many settings available for configuration in AudioZEN.
[meta]
save_dir = "sdnn_delays/exp"
seed = 0
use_amp = false
use_deterministic_algorithms = false
[trainer]
path = "trainer.Trainer"
[trainer.args]
max_epoch = 9999
clip_grad_norm_value = 5
[acoustics]
n_fft = 512
win_length = 256
sr = 16000
hop_length = 256
[loss]
path = "audiozen.loss.SoftDTWLoss"
[loss.args]
gamma = 0.1
[optimizer]
path = "torch.optim.RAdam"
[optimizer.args]
lr = 0.01
weight_decay = 1e-5
[model]
path = "model.Model"
[model.args]
threshold = 0.1
tau_grad = 0.1
scale_grad = 0.8
max_delay = 64
out_delay = 0
Check any experiment configuration file in the recipes
directory for more details.
Configuration details
In the audiozen configuration file, we must contain the following sections:
meta
: Configure the experiment meta information, such assave_dir
,seed
, etc.trainer
: Configure the trainer.loss_function
: Configure the loss function.lr_scheduler
: Configure the learning rate scheduler.optimizer
: Configure the optimizer.model
: Configure the model.dataset
: Configure the dataset.acoustics
: Configure the acoustic features.
meta
section
The meta
section is used to configure the meta information.
Item |
Description |
---|---|
|
The directory where the experiment is saved. The log information, model checkpoints, and enhanced audio files will be stored in this directory. |
|
The random seed used to initialize the random number generator. |
|
Whether to use automatic mixed precision (AMP) to accelerate the training. |
|
Whether to use nondeterministic algorithms to accelerate the training. If it is True, the training will be slower but more reproducible. |
trainer
section
The trainer
section is used to configure a trainer. It contains two parts: path
and args
.
path
is a string that specifies the path to the trainer class. args
is a dictionary that specifies the arguments of the trainer class. It should be like:
[trainer]
path = "trainer.Trainer"
[trainer.args]
max_epochs = 100
clip_grad_norm_value = 5
...
In this example, AudioZEN will load a custom Trainer
class from trainer.py
in the python search path and initialize it with the arguments in the [trainer.args]
section. You are able to use multiple ways to specify the path
argument. See the next section for more details.
In AudioZEN, Trainer
class must be a subclass of audiozen.trainer.base_trainer.BaseTrainer
. It supports the following arguments at least:
Item |
Default |
Description |
---|---|---|
|
|
Whether to enable debug mode. If it is true, we will collect the happening time of NaN and Inf. |
|
|
The maximum number of steps to train. |
|
|
The maximum number of epochs to train. If |
|
|
The maximum norm of the gradients used for clipping. “-1” means no clipping. |
|
|
Whether to find the best model by the maximum score. |
|
|
The interval of saving checkpoints. |
|
|
The number of epochs with no improvement after which the training will be stopped. |
|
|
Whether to plot the norm of the gradients. |
|
|
The interval of validation. |
|
|
The maximum number of checkpoints to keep. Saving too many checkpoints causes disk space to run out. |
|
|
The name of the scheduler. |
|
|
The number of warmup steps. |
|
|
The ratio of warmup steps. If |
|
|
The number of gradient accumulation steps. It is used to simulate a larger batch size. |
Loading a module by path
argument
We support multiple ways to load the module by the path
argument in the *.toml
. For example, we have the following directory structure:
recipes/intel_ndns
├── README.md
├── run.py
└── sdnn_delays
├── baseline.toml
├── model.py
└── trainer.py
In recipes/intel_ndns/sdnn_delays/baseline.toml
, the path
of the trainer
is set to:
[trainer]
path = "sdnn_delays.trainer.Trainer"
In this case, we will call the Trainer
class in the module recipes/intel_ndns/sdnn_delays/trainer
. If we set the path
to:
[trainer]
path = "audiozen.trainer.custom_trainer.CustomTrainer"
We will call the CustomTrainer
class in audiozen/trainer/custom_trainer.py
.
Important
If you want to get the Trainer
in audiozen
package, you must install it in editable way by pip install -e .
first.
loss_function
, optimizer
, model
, and dataset
sections
loss_function
, optimizer
, model
, dataset
sections are used to configure the loss function, optimizer, model, and dataset, respectively. They have the same logic as the trainer
section.
[loss_function]
path = "..."
[loss_function.args]
...
[optimizer]
path = "..."
[optimizer.args]
...
...
You may use the loss function provided by PyTorch or implement your own loss function. For example, the following configuration is used to configure the MSELoss
of PyTorch:
[loss_function]
path = "torch.nn.MSELoss"
[loss_function.args]
Use a custom loss function from audiozen.loss
:
[loss_function]
path = "audiozen.loss.MyLoss"
[loss_function.args]
weights = [1.0, 1.0]
...
Note
You must keep the [loss_function.args]
section even this loss function does not need any arguments.
You may use the learning rate scheduler provided by PyTorch or implement your own learning rate scheduler. For example, the following configuration is used to configure the StepLR
:
[lr_scheduler]
path = "torch.optim.lr_scheduler.StepLR"
[lr_scheduler.args]
step_size = 100
gamma = 0.5
...
acoustics
section
The acoustics
section is used to configure the acoustic features.
These configurations are used for the whole project, like visualization, except for the dataloader
and model
sections.
You are able to call them in any place of the customTrainer
class.
Item |
Description |
---|---|
|
The sample rate of the audio. |
|
The number of FFT points. |
|
The number of samples between successive frames. |
|
The length of the STFT window. |