Optim¶

Optimizer¶

class kospeech.optim.__init__.Optimizer(optim, scheduler=None, scheduler_period=None, max_grad_norm=0)[source]¶

This is wrapper classs of torch.optim.Optimizer. This class provides functionalities for learning rate scheduling and gradient norm clipping.

Parameters

optim (torch.optim.Optimizer) – optimizer object, the parameters to be optimized should be given when instantiating the object, e.g. torch.optim.Adam, torch.optim.SGD
scheduler (kospeech.optim.lr_scheduler, optional) – learning rate scheduler
scheduler_period (int, optional) – timestep with learning rate scheduler
max_grad_norm (int, optional) – value used for gradient norm clipping

RAdam¶

class kospeech.optim.radam.RAdam(params, lr=0.001, betas=0.9, 0.999, eps=1e-08, weight_decay=0, degenerated_to_sgd=True)[source]¶

Paper: “On the Variance of the Adaptive Learning Rate and Beyond”

Refer to https://github.com/LiyuanLucasLiu/RAdam Copyright (c) LiyuanLucasLiu Apache 2.0 License

step(closure=None)[source]¶

Performs a single optimization step (parameter update).

Parameters: closure (callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.

Note

Unless otherwise specified, this function should not modify the .grad field of the parameters.

AdamP¶

class kospeech.optim.adamp.AdamP(params, lr=0.001, betas=0.9, 0.999, eps=1e-08, weight_decay=0, delta=0.1, wd_ratio=0.1, nesterov=False)[source]¶

Paper: “AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights”

step(closure=None)[source]¶

Performs a single optimization step (parameter update).

Parameters: closure (callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.

Note

Unless otherwise specified, this function should not modify the .grad field of the parameters.

Novograd¶

class kospeech.optim.novograd.Novograd(params, lr=0.001, betas=0.95, 0, eps=1e-08, weight_decay=0, grad_averaging=False, amsgrad=False)[source]¶

Novograd algorithm.

Parameters

params (iterable) – iterable of parameters to optimize or dicts defining parameter groups
lr (float, optional) – learning rate (default: 1e-3)
betas (Tuple[float, float], optional) – coefficients used for computing running averages of gradient and its square (default: (0.95, 0))
eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8)
weight_decay (float, optional) – weight decay (L2 penalty) (default: 0)
grad_averaging – gradient averaging
amsgrad (boolean, optional) – whether to use the AMSGrad variant of this algorithm from the paper `On the Convergence of Adam and Beyond`_ (default: False)

step(closure=None)[source]¶: Performs a single optimization step. :param closure: A closure that reevaluates the model :type closure: callable, optional :param and returns the loss.: