Data

Data Loader

class lightning_asr.data.data_loader.AudioDataLoader(*args, **kwargs)[source]

Audio Data Loader

class lightning_asr.data.data_loader.BucketingSampler(data_source, batch_size: int = 32)[source]

Samples batches assuming they are in order of size to batch similarly sized samples together.

Dataset

class lightning_asr.data.dataset.AudioDataset(dataset_path: str, audio_paths: list, transcripts: list, apply_spec_augment: bool = False, sos_id: int = 1, eos_id: int = 2, sample_rate: int = 16000, num_mels: int = 80, frame_length: float = 25.0, frame_shift: float = 10.0, freq_mask_para: int = 27, time_mask_num: int = 4, freq_mask_num: int = 2)[source]

Dataset for audio & transcript matching

Note

Do not use this class directly, use one of the sub classes.

Parameters
  • dataset_path (str) – path of librispeech dataset

  • audio_paths (list) – list of audio path

  • transcripts (list) – list of transript

  • apply_spec_augment (bool) – flag indication whether to apply spec augment or not

  • sos_id (int) – identification of <sos>

  • eos_id (int) – identification of <eos>

  • sample_rate (int) – sampling rate of audio

  • num_mels (int) – the number of mfc coefficients to retain.

  • frame_length (float) – frame length for spectrogram (ms)

  • frame_shift (float) – length of hop between STFT (short time fourier transform) windows.

  • freq_mask_para (int) – hyper Parameter for freq masking to limit freq masking length

  • time_mask_num (int) – how many time-masked area to make

  • freq_mask_num (int) – how many freq-masked area to make

class lightning_asr.data.dataset.FBankDataset(dataset_path: str, audio_paths: list, transcripts: list, apply_spec_augment: bool = False, sos_id: int = 1, eos_id: int = 2, sample_rate: int = 16000, num_mels: int = 80, frame_length: float = 25.0, frame_shift: float = 10.0, freq_mask_para: int = 27, time_mask_num: int = 4, freq_mask_num: int = 2)[source]

Dataset for filter bank & transcript matching

class lightning_asr.data.dataset.MFCCDataset(dataset_path: str, audio_paths: list, transcripts: list, apply_spec_augment: bool = False, sos_id: int = 1, eos_id: int = 2, sample_rate: int = 16000, num_mels: int = 80, frame_length: float = 25.0, frame_shift: float = 10.0, freq_mask_para: int = 27, time_mask_num: int = 4, freq_mask_num: int = 2)[source]

Dataset for MFCC & transcript matching

class lightning_asr.data.dataset.MelSpectrogramDataset(dataset_path: str, audio_paths: list, transcripts: list, apply_spec_augment: bool = False, sos_id: int = 1, eos_id: int = 2, sample_rate: int = 16000, num_mels: int = 80, frame_length: float = 25.0, frame_shift: float = 10.0, freq_mask_para: int = 27, time_mask_num: int = 4, freq_mask_num: int = 2)[source]

Dataset for mel-spectrogram & transcript matching

class lightning_asr.data.dataset.SpectrogramDataset(dataset_path: str, audio_paths: list, transcripts: list, apply_spec_augment: bool = False, sos_id: int = 1, eos_id: int = 2, sample_rate: int = 16000, num_mels: int = 80, frame_length: float = 25.0, frame_shift: float = 10.0, freq_mask_para: int = 27, time_mask_num: int = 4, freq_mask_num: int = 2)[source]

Dataset for spectrogram & transcript matching

Librispeech Preprocess

lightning_asr.data.librispeech.preprocess.collect_transcripts(dataset_path)[source]

Collect librispeech transcripts

lightning_asr.data.librispeech.preprocess.generate_manifest_file(dataset_path: str, part: str, transcripts: list)[source]

Generate manifest file

lightning_asr.data.librispeech.preprocess.prepare_tokenizer(train_transcripts, vocab_size)[source]

Prepare sentencepice tokenizer

Lightning Data Module