Optim

Optimizer

class lightning_asr.optim.optimizer.Optimizer(optim, scheduler=None, scheduler_period=None, max_grad_norm=0)[source]

This is wrapper classs of torch.optim.Optimizer. This class provides functionalities for learning rate scheduling and gradient norm clipping.

Parameters
  • optim (torch.optim.Optimizer) – optimizer object, the parameters to be optimized should be given when instantiating the object, e.g. torch.optim.Adam, torch.optim.SGD

  • scheduler (kospeech.optim.lr_scheduler, optional) – learning rate scheduler

  • scheduler_period (int, optional) – timestep with learning rate scheduler

  • max_grad_norm (int, optional) – value used for gradient norm clipping

AdamP

class lightning_asr.optim.adamp.AdamP(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, delta=0.1, wd_ratio=0.1, nesterov=False)[source]

Paper: “AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights” Copied from https://github.com/clovaai/AdamP/ Copyright (c) 2020 Naver Corp. MIT License

step(closure=None)[source]

Performs a single optimization step (parameter update).

Parameters

closure (callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.

Note

Unless otherwise specified, this function should not modify the .grad field of the parameters.

RAdam

class lightning_asr.optim.radam.RAdam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, degenerated_to_sgd=True)[source]

Paper: “On the Variance of the Adaptive Learning Rate and Beyond” Refer to https://github.com/LiyuanLucasLiu/RAdam Copyright (c) LiyuanLucasLiu Apache 2.0 License

step(closure=None)[source]

Performs a single optimization step (parameter update).

Parameters

closure (callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.

Note

Unless otherwise specified, this function should not modify the .grad field of the parameters.