Deep Speech 2¶
Deep Speech 2¶
-
class
kospeech.models.deepspeech2.model.
BNReluRNN
(input_size: int, hidden_state_dim: int = 512, rnn_type: str = 'gru', bidirectional: bool = True, dropout_p: float = 0.1)[source]¶ Recurrent neural network with batch normalization layer & ReLU activation function.
- Parameters
input_size (int) – size of input
hidden_state_dim (int) – the number of features in the hidden state h
rnn_type (str, optional) – type of RNN cell (default: gru)
bidirectional (bool, optional) – if True, becomes a bidirectional encoder (defulat: True)
dropout_p (float, optional) – dropout probability (default: 0.1)
- Inputs: inputs, input_lengths
inputs (batch, time, dim): Tensor containing input vectors
input_lengths: Tensor containing containing sequence lengths
- Returns: outputs
outputs: Tensor produced by the BNReluRNN module
-
class
kospeech.models.deepspeech2.model.
DeepSpeech2
(input_dim: int, num_classes: int, rnn_type='gru', num_rnn_layers: int = 5, rnn_hidden_dim: int = 512, dropout_p: float = 0.1, bidirectional: bool = True, activation: str = 'hardtanh', device: torch.device = 'cuda')[source]¶ Deep Speech2 model with configurable encoder and decoder. Paper: https://arxiv.org/abs/1512.02595
- Parameters
input_dim (int) – dimension of input vector
num_classes (int) – number of classfication
rnn_type (str, optional) – type of RNN cell (default: gru)
num_rnn_layers (int, optional) – number of recurrent layers (default: 5)
rnn_hidden_dim (int) – the number of features in the hidden state h
dropout_p (float, optional) – dropout probability (default: 0.1)
bidirectional (bool, optional) – if True, becomes a bidirectional encoder (defulat: True)
activation (str) – type of activation function (default: hardtanh)
device (torch.device) – device - ‘cuda’ or ‘cpu’
- Inputs: inputs, input_lengths
inputs: list of sequences, whose length is the batch size and within which each sequence is list of tokens
input_lengths: list of sequence lengths
- Returns: output
output: tensor containing the encoded features of the input sequence
-
forward
(inputs: torch.Tensor, input_lengths: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]¶ Forward propagate a inputs for ctc training.
- Parameters
inputs (torch.FloatTensor) – A input sequence passed to encoder. Typically for inputs this will be a padded FloatTensor of size
(batch, seq_length, dimension)
.input_lengths (torch.LongTensor) – The length of input tensor.
(batch)
- Returns
predicted_log_prob (torch.FloatTensor)s: Log probability of model predictions.
output_lengths (torch.LongTensor): The length of output tensor
(batch)
- Return type
(Tensor, Tensor)