Optim¶
Optimizer¶
-
class
kospeech.optim.__init__.
Optimizer
(optim, scheduler=None, scheduler_period=None, max_grad_norm=0)[source]¶ This is wrapper classs of torch.optim.Optimizer. This class provides functionalities for learning rate scheduling and gradient norm clipping.
- Parameters
optim (torch.optim.Optimizer) – optimizer object, the parameters to be optimized should be given when instantiating the object, e.g. torch.optim.Adam, torch.optim.SGD
scheduler (kospeech.optim.lr_scheduler, optional) – learning rate scheduler
scheduler_period (int, optional) – timestep with learning rate scheduler
max_grad_norm (int, optional) – value used for gradient norm clipping
RAdam¶
-
class
kospeech.optim.radam.
RAdam
(params, lr=0.001, betas=0.9, 0.999, eps=1e-08, weight_decay=0, degenerated_to_sgd=True)[source]¶ Paper: “On the Variance of the Adaptive Learning Rate and Beyond”
Refer to https://github.com/LiyuanLucasLiu/RAdam Copyright (c) LiyuanLucasLiu Apache 2.0 License
-
step
(closure=None)[source]¶ Performs a single optimization step (parameter update).
- Parameters
closure (callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.
Note
Unless otherwise specified, this function should not modify the
.grad
field of the parameters.
-
AdamP¶
-
class
kospeech.optim.adamp.
AdamP
(params, lr=0.001, betas=0.9, 0.999, eps=1e-08, weight_decay=0, delta=0.1, wd_ratio=0.1, nesterov=False)[source]¶ Paper: “AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights”
Copied from https://github.com/clovaai/AdamP/ Copyright (c) 2020 Naver Corp. MIT License
-
step
(closure=None)[source]¶ Performs a single optimization step (parameter update).
- Parameters
closure (callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.
Note
Unless otherwise specified, this function should not modify the
.grad
field of the parameters.
-
Novograd¶
-
class
kospeech.optim.novograd.
Novograd
(params, lr=0.001, betas=0.95, 0, eps=1e-08, weight_decay=0, grad_averaging=False, amsgrad=False)[source]¶ Novograd algorithm.
Copied from https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/SpeechRecognition/Jasper/optimizers.py Copyright (c) 2019 NVIDIA Corp. Apache-2.0 License
- Parameters
params (iterable) – iterable of parameters to optimize or dicts defining parameter groups
lr (float, optional) – learning rate (default: 1e-3)
betas (Tuple[float, float], optional) – coefficients used for computing running averages of gradient and its square (default: (0.95, 0))
eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8)
weight_decay (float, optional) – weight decay (L2 penalty) (default: 0)
grad_averaging – gradient averaging
amsgrad (boolean, optional) – whether to use the AMSGrad variant of this algorithm from the paper `On the Convergence of Adam and Beyond`_ (default: False)