Jasper¶
Jasper¶
-
class
kospeech.models.jasper.model.
Jasper
(num_classes: int, version: str = '10x5', device: torch.device = 'cuda')[source]¶ Jasper: An End-to-End Convolutional Neural Acoustic Model Jasper (Just Another Speech Recognizer), an ASR model comprised of 54 layers proposed by NVIDIA. Jasper achieved sub 3 percent word error rate (WER) on the LibriSpeech dataset. More details: https://arxiv.org/pdf/1904.03288.pdf
- Parameters
- Inputs: inputs, input_lengths, residual
inputs: tensor contains input sequence vector
input_lengths: tensor contains sequence lengths
- Returns: output, output_lengths
output: tensor contains output sequence vector
output: tensor contains output sequence lengths
-
forward
(inputs: torch.Tensor, input_lengths: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]¶ Forward propagate a inputs for ctc training.
- Parameters
inputs (torch.FloatTensor) – A input sequence passed to encoder. Typically for inputs this will be a padded FloatTensor of size
(batch, seq_length, dimension)
.input_lengths (torch.LongTensor) – The length of input tensor.
(batch)
- Returns
predicted_log_prob (torch.FloatTensor)s: Log probability of model predictions.
output_lengths (torch.LongTensor): The length of output tensor
(batch)
- Return type
(Tensor, Tensor)
Sublayers¶
-
class
kospeech.models.jasper.sublayers.
JasperBlock
(num_sub_blocks: int, in_channels: int, out_channels: int, kernel_size: int, stride: int = 1, dilation: int = 1, bias: bool = True, dropout_p: float = 0.2, activation: str = 'relu')[source]¶ Jasper Block: The Jasper Block consists of R Jasper sub-block.
- Parameters
num_sub_blocks (int) – number of sub block
in_channels (int) – number of channels in the input feature
out_channels (int) – number of channels produced by the convolution
kernel_size (int) – size of the convolving kernel
stride (int) – stride of the convolution. (default: 1)
dilation (int) – spacing between kernel elements. (default: 1)
bias (bool) – if True, adds a learnable bias to the output. (default: True)
dropout_p (float) – probability of dropout
activation (str) – activation function
- Inputs: inputs, input_lengths, residual
inputs: tensor contains input sequence vector
input_lengths: tensor contains sequence lengths
residual: tensor contains residual vector
- Returns: output, output_lengths
output: tensor contains output sequence vector
output: tensor contains output sequence lengths
-
class
kospeech.models.jasper.sublayers.
JasperSubBlock
(in_channels: int, out_channels: int, kernel_size: int, stride: int = 1, dilation: int = 1, padding: int = 0, bias: bool = False, dropout_p: float = 0.2, activation: str = 'relu')[source]¶ Jasper sub-block applies the following operations: a 1D-convolution, batch norm, ReLU, and dropout.
- Parameters
in_channels (int) – number of channels in the input feature
out_channels (int) – number of channels produced by the convolution
kernel_size (int) – size of the convolving kernel
stride (int) – stride of the convolution. (default: 1)
dilation (int) – spacing between kernel elements. (default: 1)
padding (int) – zero-padding added to both sides of the input. (default: 0)
bias (bool) – if True, adds a learnable bias to the output. (default: False)
dropout_p (float) – probability of dropout
activation (str) – activation function
- Inputs: inputs, input_lengths, residual
inputs: tensor contains input sequence vector
input_lengths: tensor contains sequence lengths
residual: tensor contains residual vector
- Returns: output, output_lengths
output: tensor contains output sequence vector
output: tensor contains output sequence lengths