Interface

Model

class kospeech.models.model.BaseModel[source]
count_parameters()int[source]

Count parameters of encoder

update_dropout(dropout_p: float)None[source]

Update dropout probability of encoder

class kospeech.models.model.EncoderDecoderModel(encoder: kospeech.models.encoder.BaseEncoder, decoder: kospeech.models.decoder.BaseDecoder)[source]

Super class of KoSpeech’s Encoder-Decoder Models

count_parameters()int[source]

Count parameters of encoder

forward(inputs: torch.Tensor, input_lengths: torch.Tensor, targets: torch.Tensor, *args) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]

Forward propagate a inputs and targets pair for training.

Parameters
  • inputs (torch.FloatTensor) – A input sequence passed to encoder. Typically for inputs this will be a padded FloatTensor of size (batch, seq_length, dimension).

  • input_lengths (torch.LongTensor) – The length of input tensor. (batch)

  • targets (torch.LongTensr) – A target sequence passed to decoder. IntTensor of size (batch, seq_length)

Returns

(Tensor, Tensor, Tensor)

  • predicted_log_probs (torch.FloatTensor): Log probability of model predictions.

  • encoder_output_lengths: The length of encoder outputs. (batch)

  • encoder_log_probs: Log probability of encoder outputs will be passed to CTC Loss.

    If joint_ctc_attention is False, return None.

recognize(inputs: torch.Tensor, input_lengths: torch.Tensor)torch.Tensor[source]

Recognize input speech. This method consists of the forward of the encoder and the decode() of the decoder.

Parameters
  • inputs (torch.FloatTensor) – A input sequence passed to encoder. Typically for inputs this will be a padded FloatTensor of size (batch, seq_length, dimension).

  • input_lengths (torch.LongTensor) – The length of input tensor. (batch)

Returns

Result of model predictions.

Return type

  • predictions (torch.FloatTensor)

set_decoder(decoder)[source]

Setter for decoder

set_encoder(encoder)[source]

Setter for encoder

update_dropout(dropout_p)None[source]

Update dropout probability of model

class kospeech.models.model.EncoderModel[source]

Super class of KoSpeech’s Encoder only Models

decode(predicted_log_probs: torch.Tensor)torch.Tensor[source]

Decode encoder_outputs.

Parameters

predicted_log_probs (torch.FloatTensor) – Log probability of model predictions. FloatTensor of size (batch, seq_length, dimension)

Returns

Result of model predictions.

Return type

  • predictions (torch.FloatTensor)

forward(inputs: torch.Tensor, input_lengths: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]

Forward propagate a inputs for ctc training.

Parameters
  • inputs (torch.FloatTensor) – A input sequence passed to encoder. Typically for inputs this will be a padded FloatTensor of size (batch, seq_length, dimension).

  • input_lengths (torch.LongTensor) – The length of input tensor. (batch)

Returns

  • predicted_log_prob (torch.FloatTensor)s: Log probability of model predictions.

  • output_lengths (torch.LongTensor): The length of output tensor (batch)

Return type

(Tensor, Tensor)

recognize(inputs: torch.Tensor, input_lengths: torch.Tensor)torch.Tensor[source]

Recognize input speech.

Parameters
  • inputs (torch.FloatTensor) – A input sequence passed to encoder. Typically for inputs this will be a padded FloatTensor of size (batch, seq_length, dimension).

  • input_lengths (torch.LongTensor) – The length of input tensor. (batch)

Returns

Result of model predictions.

Return type

  • predictions (torch.FloatTensor)

set_decoder(decoder)[source]

Setter for decoder

class kospeech.models.model.TransducerModel(encoder: kospeech.models.encoder.TransducerEncoder, decoder: kospeech.models.decoder.TransducerDecoder, d_model: int, num_classes: int)[source]

Super class of KoSpeech’s Transducer Models

count_parameters()int[source]

Count parameters of encoder

decode(encoder_output: torch.Tensor, max_length: int)torch.Tensor[source]

Decode encoder_outputs.

Parameters
  • encoder_output (torch.FloatTensor) – A output sequence of encoder. FloatTensor of size (seq_length, dimension)

  • max_length (int) – max decoding time step

Returns

Log probability of model predictions.

Return type

  • predicted_log_probs (torch.FloatTensor)

forward(inputs: torch.Tensor, input_lengths: torch.Tensor, targets: torch.Tensor, target_lengths: torch.Tensor)torch.Tensor[source]

Forward propagate a inputs and targets pair for training.

Parameters
  • inputs (torch.FloatTensor) – A input sequence passed to encoder. Typically for inputs this will be a padded FloatTensor of size (batch, seq_length, dimension).

  • input_lengths (torch.LongTensor) – The length of input tensor. (batch)

  • targets (torch.LongTensr) – A target sequence passed to decoder. IntTensor of size (batch, seq_length)

  • target_lengths (torch.LongTensor) – The length of target tensor. (batch)

Returns

Result of model predictions.

Return type

  • predictions (torch.FloatTensor)

joint(encoder_outputs: torch.Tensor, decoder_outputs: torch.Tensor)torch.Tensor[source]

Joint encoder_outputs and decoder_outputs.

Parameters
  • encoder_outputs (torch.FloatTensor) – A output sequence of encoder. FloatTensor of size (batch, seq_length, dimension)

  • decoder_outputs (torch.FloatTensor) – A output sequence of decoder. FloatTensor of size (batch, seq_length, dimension)

Returns

outputs of joint encoder_outputs and decoder_outputs..

Return type

  • outputs (torch.FloatTensor)

recognize(inputs: torch.Tensor, input_lengths: torch.Tensor)[source]

Recognize input speech. This method consists of the forward of the encoder and the decode() of the decoder.

Parameters
  • inputs (torch.FloatTensor) – A input sequence passed to encoder. Typically for inputs this will be a padded FloatTensor of size (batch, seq_length, dimension).

  • input_lengths (torch.LongTensor) – The length of input tensor. (batch)

Returns

Result of model predictions.

Return type

  • predictions (torch.FloatTensor)

set_decoder(decoder)[source]

Setter for decoder

set_encoder(encoder)[source]

Setter for encoder

update_dropout(dropout_p)None[source]

Update dropout probability of model

Encoder

class kospeech.models.encoder.BaseEncoder(input_dim: int, extractor: str = 'vgg', d_model: int = None, num_classes: int = None, dropout_p: float = None, activation: str = 'hardtanh', joint_ctc_attention: bool = False)[source]

ASR Encoder Super Class for KoSpeech model implementation

forward(inputs: torch.Tensor, input_lengths: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]

Forward propagate a inputs for encoder training.

Parameters
  • inputs (torch.FloatTensor) – A input sequence passed to encoder. Typically for inputs this will be a padded FloatTensor of size (batch, seq_length, dimension).

  • input_lengths (torch.LongTensor) – The length of input tensor. (batch)

Returns

  • encoder_outputs: A output sequence of encoder. FloatTensor of size (batch, seq_length, dimension)

  • encoder_output_lengths: The length of encoder outputs. (batch)

  • encoder_log_probs: Log probability of encoder outputs will be passed to CTC Loss.

    If joint_ctc_attention is False, return None.

Return type

(Tensor, Tensor, Tensor)

class kospeech.models.encoder.EncoderInterface[source]

Base Interface of Encoder

count_parameters()int[source]

Count parameters of encoder

forward(inputs: torch.Tensor, input_lengths: torch.Tensor)[source]

Forward propagate for encoder training.

Parameters
  • inputs (torch.FloatTensor) – A input sequence passed to encoder. Typically for inputs this will be a padded FloatTensor of size (batch, seq_length, dimension).

  • input_lengths (torch.LongTensor) – The length of input tensor. (batch)

update_dropout(dropout_p: float)None[source]

Update dropout probability of encoder

class kospeech.models.encoder.TransducerEncoder[source]

ASR Transducer Encoder Super class for KoSpeech model implementation

forward(inputs: torch.Tensor, input_lengths: torch.Tensor)torch.Tensor[source]

Forward propagate a inputs for encoder training.

Parameters
  • inputs (torch.FloatTensor) – A input sequence passed to encoder. Typically for inputs this will be a padded FloatTensor of size (batch, seq_length, dimension).

  • input_lengths (torch.LongTensor) – The length of input tensor. (batch)

Returns

(Tensor, Tensor)

  • outputs (torch.FloatTensor): A output sequence of encoder. FloatTensor of size

    (batch, seq_length, dimension)

  • output_lengths (torch.LongTensor): The length of output tensor. (batch)

Decoder

class kospeech.models.decoder.BaseDecoder[source]

ASR Decoder Super Class for KoSpeech model implementation

decode(encoder_outputs: torch.Tensor, *args)torch.Tensor[source]

Decode encoder_outputs.

Parameters

encoder_outputs (torch.FloatTensor) – A output sequence of encoder. FloatTensor of size (batch, seq_length, dimension)

Returns

Log probability of model predictions.

Return type

  • predicted_log_probs (torch.FloatTensor)

forward(targets: torch.Tensor, encoder_outputs: torch.Tensor, **kwargs)torch.Tensor[source]

Forward propagate a encoder_outputs for training.

Parameters
  • targets (torch.LongTensr) – A target sequence passed to decoder. IntTensor of size (batch, seq_length)

  • encoder_outputs (torch.FloatTensor) – A output sequence of encoder. FloatTensor of size (batch, seq_length, dimension)

Returns

Log probability of model predictions.

Return type

  • predicted_log_probs (torch.FloatTensor)

class kospeech.models.decoder.DecoderInterface[source]
count_parameters()int[source]

Count parameters of encoder

update_dropout(dropout_p: float)None[source]

Update dropout probability of encoder

class kospeech.models.decoder.TransducerDecoder[source]

ASR Transducer Decoder Super Class for KoSpeech model implementation

forward(inputs: torch.Tensor, input_lengths: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]

Forward propage a inputs (targets) for training.

Parameters
  • inputs (torch.LongTensor) – A target sequence passed to decoder. IntTensor of size (batch, seq_length)

  • input_lengths (torch.LongTensor) – The length of input tensor. (batch)

Returns

  • decoder_outputs (torch.FloatTensor): A output sequence of decoder. FloatTensor of size

    (batch, seq_length, dimension)

  • hidden_states (torch.FloatTensor): A hidden state of decoder. FloatTensor of size

    (batch, seq_length, dimension)

Return type

(Tensor, Tensor)