Optim

Optimizer

class kospeech.optim.__init__.Optimizer(optim, scheduler=None, scheduler_period=None, max_grad_norm=0)[source]

This is wrapper classs of torch.optim.Optimizer. This class provides functionalities for learning rate scheduling and gradient norm clipping.

Parameters
  • optim (torch.optim.Optimizer) – optimizer object, the parameters to be optimized should be given when instantiating the object, e.g. torch.optim.Adam, torch.optim.SGD

  • scheduler (kospeech.optim.lr_scheduler, optional) – learning rate scheduler

  • scheduler_period (int, optional) – timestep with learning rate scheduler

  • max_grad_norm (int, optional) – value used for gradient norm clipping

RAdam

class kospeech.optim.radam.RAdam(params, lr=0.001, betas=0.9, 0.999, eps=1e-08, weight_decay=0, degenerated_to_sgd=True)[source]

Paper: “On the Variance of the Adaptive Learning Rate and Beyond”

Refer to https://github.com/LiyuanLucasLiu/RAdam Copyright (c) LiyuanLucasLiu Apache 2.0 License

step(closure=None)[source]

Performs a single optimization step (parameter update).

Parameters

closure (callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.

Note

Unless otherwise specified, this function should not modify the .grad field of the parameters.

AdamP

class kospeech.optim.adamp.AdamP(params, lr=0.001, betas=0.9, 0.999, eps=1e-08, weight_decay=0, delta=0.1, wd_ratio=0.1, nesterov=False)[source]

Paper: “AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights”

Copied from https://github.com/clovaai/AdamP/ Copyright (c) 2020 Naver Corp. MIT License

step(closure=None)[source]

Performs a single optimization step (parameter update).

Parameters

closure (callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.

Note

Unless otherwise specified, this function should not modify the .grad field of the parameters.

Novograd

class kospeech.optim.novograd.Novograd(params, lr=0.001, betas=0.95, 0, eps=1e-08, weight_decay=0, grad_averaging=False, amsgrad=False)[source]

Novograd algorithm.

Copied from https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/SpeechRecognition/Jasper/optimizers.py Copyright (c) 2019 NVIDIA Corp. Apache-2.0 License

Parameters
  • params (iterable) – iterable of parameters to optimize or dicts defining parameter groups

  • lr (float, optional) – learning rate (default: 1e-3)

  • betas (Tuple[float, float], optional) – coefficients used for computing running averages of gradient and its square (default: (0.95, 0))

  • eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8)

  • weight_decay (float, optional) – weight decay (L2 penalty) (default: 0)

  • grad_averaging – gradient averaging

  • amsgrad (boolean, optional) – whether to use the AMSGrad variant of this algorithm from the paper `On the Convergence of Adam and Beyond`_ (default: False)

step(closure=None)[source]

Performs a single optimization step. :param closure: A closure that reevaluates the model :type closure: callable, optional :param and returns the loss.: