Jasper

Jasper

class kospeech.models.jasper.model.Jasper(num_classes: int, version: str = '10x5', device: torch.device = 'cuda')[source]

Jasper: An End-to-End Convolutional Neural Acoustic Model Jasper (Just Another Speech Recognizer), an ASR model comprised of 54 layers proposed by NVIDIA. Jasper achieved sub 3 percent word error rate (WER) on the LibriSpeech dataset. More details: https://arxiv.org/pdf/1904.03288.pdf

Parameters
  • num_classes (int) – number of classification

  • version (str) – version of jasper. Marked as BxR: B - number of blocks, R - number of sub-blocks

  • device (torch.device) – device - ‘cuda’ or ‘cpu’

Inputs: inputs, input_lengths, residual
  • inputs: tensor contains input sequence vector

  • input_lengths: tensor contains sequence lengths

Returns: output, output_lengths
  • output: tensor contains output sequence vector

  • output: tensor contains output sequence lengths

forward(inputs: torch.Tensor, input_lengths: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]

Forward propagate a inputs for ctc training.

Parameters
  • inputs (torch.FloatTensor) – A input sequence passed to encoder. Typically for inputs this will be a padded FloatTensor of size (batch, seq_length, dimension).

  • input_lengths (torch.LongTensor) – The length of input tensor. (batch)

Returns

  • predicted_log_prob (torch.FloatTensor)s: Log probability of model predictions.

  • output_lengths (torch.LongTensor): The length of output tensor (batch)

Return type

(Tensor, Tensor)

Sublayers

class kospeech.models.jasper.sublayers.JasperBlock(num_sub_blocks: int, in_channels: int, out_channels: int, kernel_size: int, stride: int = 1, dilation: int = 1, bias: bool = True, dropout_p: float = 0.2, activation: str = 'relu')[source]

Jasper Block: The Jasper Block consists of R Jasper sub-block.

Parameters
  • num_sub_blocks (int) – number of sub block

  • in_channels (int) – number of channels in the input feature

  • out_channels (int) – number of channels produced by the convolution

  • kernel_size (int) – size of the convolving kernel

  • stride (int) – stride of the convolution. (default: 1)

  • dilation (int) – spacing between kernel elements. (default: 1)

  • bias (bool) – if True, adds a learnable bias to the output. (default: True)

  • dropout_p (float) – probability of dropout

  • activation (str) – activation function

Inputs: inputs, input_lengths, residual
  • inputs: tensor contains input sequence vector

  • input_lengths: tensor contains sequence lengths

  • residual: tensor contains residual vector

Returns: output, output_lengths
  • output: tensor contains output sequence vector

  • output: tensor contains output sequence lengths

class kospeech.models.jasper.sublayers.JasperSubBlock(in_channels: int, out_channels: int, kernel_size: int, stride: int = 1, dilation: int = 1, padding: int = 0, bias: bool = False, dropout_p: float = 0.2, activation: str = 'relu')[source]

Jasper sub-block applies the following operations: a 1D-convolution, batch norm, ReLU, and dropout.

Parameters
  • in_channels (int) – number of channels in the input feature

  • out_channels (int) – number of channels produced by the convolution

  • kernel_size (int) – size of the convolving kernel

  • stride (int) – stride of the convolution. (default: 1)

  • dilation (int) – spacing between kernel elements. (default: 1)

  • padding (int) – zero-padding added to both sides of the input. (default: 0)

  • bias (bool) – if True, adds a learnable bias to the output. (default: False)

  • dropout_p (float) – probability of dropout

  • activation (str) – activation function

Inputs: inputs, input_lengths, residual
  • inputs: tensor contains input sequence vector

  • input_lengths: tensor contains sequence lengths

  • residual: tensor contains residual vector

Returns: output, output_lengths
  • output: tensor contains output sequence vector

  • output: tensor contains output sequence lengths