Interface¶
Model¶
-
class
kospeech.models.model.
EncoderDecoderModel
(encoder: kospeech.models.encoder.BaseEncoder, decoder: kospeech.models.decoder.BaseDecoder)[source]¶ Super class of KoSpeech’s Encoder-Decoder Models
-
forward
(inputs: torch.Tensor, input_lengths: torch.Tensor, targets: torch.Tensor, *args) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]¶ Forward propagate a inputs and targets pair for training.
- Parameters
inputs (torch.FloatTensor) – A input sequence passed to encoder. Typically for inputs this will be a padded FloatTensor of size
(batch, seq_length, dimension)
.input_lengths (torch.LongTensor) – The length of input tensor.
(batch)
targets (torch.LongTensr) – A target sequence passed to decoder. IntTensor of size
(batch, seq_length)
- Returns
(Tensor, Tensor, Tensor)
predicted_log_probs (torch.FloatTensor): Log probability of model predictions.
encoder_output_lengths: The length of encoder outputs.
(batch)
- encoder_log_probs: Log probability of encoder outputs will be passed to CTC Loss.
If joint_ctc_attention is False, return None.
-
recognize
(inputs: torch.Tensor, input_lengths: torch.Tensor) → torch.Tensor[source]¶ Recognize input speech. This method consists of the forward of the encoder and the decode() of the decoder.
- Parameters
inputs (torch.FloatTensor) – A input sequence passed to encoder. Typically for inputs this will be a padded FloatTensor of size
(batch, seq_length, dimension)
.input_lengths (torch.LongTensor) – The length of input tensor.
(batch)
- Returns
Result of model predictions.
- Return type
predictions (torch.FloatTensor)
-
-
class
kospeech.models.model.
EncoderModel
[source]¶ Super class of KoSpeech’s Encoder only Models
-
decode
(predicted_log_probs: torch.Tensor) → torch.Tensor[source]¶ Decode encoder_outputs.
- Parameters
predicted_log_probs (torch.FloatTensor) – Log probability of model predictions. FloatTensor of size
(batch, seq_length, dimension)
- Returns
Result of model predictions.
- Return type
predictions (torch.FloatTensor)
-
forward
(inputs: torch.Tensor, input_lengths: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]¶ Forward propagate a inputs for ctc training.
- Parameters
inputs (torch.FloatTensor) – A input sequence passed to encoder. Typically for inputs this will be a padded FloatTensor of size
(batch, seq_length, dimension)
.input_lengths (torch.LongTensor) – The length of input tensor.
(batch)
- Returns
predicted_log_prob (torch.FloatTensor)s: Log probability of model predictions.
output_lengths (torch.LongTensor): The length of output tensor
(batch)
- Return type
(Tensor, Tensor)
-
recognize
(inputs: torch.Tensor, input_lengths: torch.Tensor) → torch.Tensor[source]¶ Recognize input speech.
- Parameters
inputs (torch.FloatTensor) – A input sequence passed to encoder. Typically for inputs this will be a padded FloatTensor of size
(batch, seq_length, dimension)
.input_lengths (torch.LongTensor) – The length of input tensor.
(batch)
- Returns
Result of model predictions.
- Return type
predictions (torch.FloatTensor)
-
-
class
kospeech.models.model.
TransducerModel
(encoder: kospeech.models.encoder.TransducerEncoder, decoder: kospeech.models.decoder.TransducerDecoder, d_model: int, num_classes: int)[source]¶ Super class of KoSpeech’s Transducer Models
-
decode
(encoder_output: torch.Tensor, max_length: int) → torch.Tensor[source]¶ Decode encoder_outputs.
- Parameters
encoder_output (torch.FloatTensor) – A output sequence of encoder. FloatTensor of size
(seq_length, dimension)
max_length (int) – max decoding time step
- Returns
Log probability of model predictions.
- Return type
predicted_log_probs (torch.FloatTensor)
-
forward
(inputs: torch.Tensor, input_lengths: torch.Tensor, targets: torch.Tensor, target_lengths: torch.Tensor) → torch.Tensor[source]¶ Forward propagate a inputs and targets pair for training.
- Parameters
inputs (torch.FloatTensor) – A input sequence passed to encoder. Typically for inputs this will be a padded FloatTensor of size
(batch, seq_length, dimension)
.input_lengths (torch.LongTensor) – The length of input tensor.
(batch)
targets (torch.LongTensr) – A target sequence passed to decoder. IntTensor of size
(batch, seq_length)
target_lengths (torch.LongTensor) – The length of target tensor.
(batch)
- Returns
Result of model predictions.
- Return type
predictions (torch.FloatTensor)
-
joint
(encoder_outputs: torch.Tensor, decoder_outputs: torch.Tensor) → torch.Tensor[source]¶ Joint encoder_outputs and decoder_outputs.
- Parameters
encoder_outputs (torch.FloatTensor) – A output sequence of encoder. FloatTensor of size
(batch, seq_length, dimension)
decoder_outputs (torch.FloatTensor) – A output sequence of decoder. FloatTensor of size
(batch, seq_length, dimension)
- Returns
outputs of joint encoder_outputs and decoder_outputs..
- Return type
outputs (torch.FloatTensor)
-
recognize
(inputs: torch.Tensor, input_lengths: torch.Tensor)[source]¶ Recognize input speech. This method consists of the forward of the encoder and the decode() of the decoder.
- Parameters
inputs (torch.FloatTensor) – A input sequence passed to encoder. Typically for inputs this will be a padded FloatTensor of size
(batch, seq_length, dimension)
.input_lengths (torch.LongTensor) – The length of input tensor.
(batch)
- Returns
Result of model predictions.
- Return type
predictions (torch.FloatTensor)
-
Encoder¶
-
class
kospeech.models.encoder.
BaseEncoder
(input_dim: int, extractor: str = 'vgg', d_model: int = None, num_classes: int = None, dropout_p: float = None, activation: str = 'hardtanh', joint_ctc_attention: bool = False)[source]¶ ASR Encoder Super Class for KoSpeech model implementation
-
forward
(inputs: torch.Tensor, input_lengths: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]¶ Forward propagate a inputs for encoder training.
- Parameters
inputs (torch.FloatTensor) – A input sequence passed to encoder. Typically for inputs this will be a padded FloatTensor of size
(batch, seq_length, dimension)
.input_lengths (torch.LongTensor) – The length of input tensor.
(batch)
- Returns
encoder_outputs: A output sequence of encoder. FloatTensor of size
(batch, seq_length, dimension)
encoder_output_lengths: The length of encoder outputs.
(batch)
- encoder_log_probs: Log probability of encoder outputs will be passed to CTC Loss.
If joint_ctc_attention is False, return None.
- Return type
(Tensor, Tensor, Tensor)
-
-
class
kospeech.models.encoder.
EncoderInterface
[source]¶ Base Interface of Encoder
-
forward
(inputs: torch.Tensor, input_lengths: torch.Tensor)[source]¶ Forward propagate for encoder training.
- Parameters
inputs (torch.FloatTensor) – A input sequence passed to encoder. Typically for inputs this will be a padded FloatTensor of size
(batch, seq_length, dimension)
.input_lengths (torch.LongTensor) – The length of input tensor.
(batch)
-
-
class
kospeech.models.encoder.
TransducerEncoder
[source]¶ ASR Transducer Encoder Super class for KoSpeech model implementation
-
forward
(inputs: torch.Tensor, input_lengths: torch.Tensor) → torch.Tensor[source]¶ Forward propagate a inputs for encoder training.
- Parameters
inputs (torch.FloatTensor) – A input sequence passed to encoder. Typically for inputs this will be a padded FloatTensor of size
(batch, seq_length, dimension)
.input_lengths (torch.LongTensor) – The length of input tensor.
(batch)
- Returns
(Tensor, Tensor)
- outputs (torch.FloatTensor): A output sequence of encoder. FloatTensor of size
(batch, seq_length, dimension)
output_lengths (torch.LongTensor): The length of output tensor.
(batch)
-
Decoder¶
-
class
kospeech.models.decoder.
BaseDecoder
[source]¶ ASR Decoder Super Class for KoSpeech model implementation
-
decode
(encoder_outputs: torch.Tensor, *args) → torch.Tensor[source]¶ Decode encoder_outputs.
- Parameters
encoder_outputs (torch.FloatTensor) – A output sequence of encoder. FloatTensor of size
(batch, seq_length, dimension)
- Returns
Log probability of model predictions.
- Return type
predicted_log_probs (torch.FloatTensor)
-
forward
(targets: torch.Tensor, encoder_outputs: torch.Tensor, **kwargs) → torch.Tensor[source]¶ Forward propagate a encoder_outputs for training.
- Parameters
targets (torch.LongTensr) – A target sequence passed to decoder. IntTensor of size
(batch, seq_length)
encoder_outputs (torch.FloatTensor) – A output sequence of encoder. FloatTensor of size
(batch, seq_length, dimension)
- Returns
Log probability of model predictions.
- Return type
predicted_log_probs (torch.FloatTensor)
-
-
class
kospeech.models.decoder.
TransducerDecoder
[source]¶ ASR Transducer Decoder Super Class for KoSpeech model implementation
-
forward
(inputs: torch.Tensor, input_lengths: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]¶ Forward propage a inputs (targets) for training.
- Parameters
inputs (torch.LongTensor) – A target sequence passed to decoder. IntTensor of size
(batch, seq_length)
input_lengths (torch.LongTensor) – The length of input tensor.
(batch)
- Returns
- decoder_outputs (torch.FloatTensor): A output sequence of decoder. FloatTensor of size
(batch, seq_length, dimension)
- hidden_states (torch.FloatTensor): A hidden state of decoder. FloatTensor of size
(batch, seq_length, dimension)
- Return type
(Tensor, Tensor)
-