Data¶

Data Loader¶

class lightning_asr.data.data_loader.AudioDataLoader(*args, **kwargs)[source]¶: Audio Data Loader

class lightning_asr.data.data_loader.BucketingSampler(data_source, batch_size: int = 32)[source]¶: Samples batches assuming they are in order of size to batch similarly sized samples together.

Dataset¶

class lightning_asr.data.dataset.AudioDataset(dataset_path: str, audio_paths: list, transcripts: list, apply_spec_augment: bool = False, sos_id: int = 1, eos_id: int = 2, sample_rate: int = 16000, num_mels: int = 80, frame_length: float = 25.0, frame_shift: float = 10.0, freq_mask_para: int = 27, time_mask_num: int = 4, freq_mask_num: int = 2)[source]¶

Dataset for audio & transcript matching

Note

Do not use this class directly, use one of the sub classes.

Parameters

dataset_path (str) – path of librispeech dataset
audio_paths (list) – list of audio path
transcripts (list) – list of transript
apply_spec_augment (bool) – flag indication whether to apply spec augment or not
sos_id (int) – identification of <sos>
eos_id (int) – identification of <eos>
sample_rate (int) – sampling rate of audio
num_mels (int) – the number of mfc coefficients to retain.
frame_length (float) – frame length for spectrogram (ms)
frame_shift (float) – length of hop between STFT (short time fourier transform) windows.
freq_mask_para (int) – hyper Parameter for freq masking to limit freq masking length
time_mask_num (int) – how many time-masked area to make
freq_mask_num (int) – how many freq-masked area to make

class lightning_asr.data.dataset.FBankDataset(dataset_path: str, audio_paths: list, transcripts: list, apply_spec_augment: bool = False, sos_id: int = 1, eos_id: int = 2, sample_rate: int = 16000, num_mels: int = 80, frame_length: float = 25.0, frame_shift: float = 10.0, freq_mask_para: int = 27, time_mask_num: int = 4, freq_mask_num: int = 2)[source]¶: Dataset for filter bank & transcript matching

class lightning_asr.data.dataset.MFCCDataset(dataset_path: str, audio_paths: list, transcripts: list, apply_spec_augment: bool = False, sos_id: int = 1, eos_id: int = 2, sample_rate: int = 16000, num_mels: int = 80, frame_length: float = 25.0, frame_shift: float = 10.0, freq_mask_para: int = 27, time_mask_num: int = 4, freq_mask_num: int = 2)[source]¶: Dataset for MFCC & transcript matching

class lightning_asr.data.dataset.MelSpectrogramDataset(dataset_path: str, audio_paths: list, transcripts: list, apply_spec_augment: bool = False, sos_id: int = 1, eos_id: int = 2, sample_rate: int = 16000, num_mels: int = 80, frame_length: float = 25.0, frame_shift: float = 10.0, freq_mask_para: int = 27, time_mask_num: int = 4, freq_mask_num: int = 2)[source]¶: Dataset for mel-spectrogram & transcript matching

class lightning_asr.data.dataset.SpectrogramDataset(dataset_path: str, audio_paths: list, transcripts: list, apply_spec_augment: bool = False, sos_id: int = 1, eos_id: int = 2, sample_rate: int = 16000, num_mels: int = 80, frame_length: float = 25.0, frame_shift: float = 10.0, freq_mask_para: int = 27, time_mask_num: int = 4, freq_mask_num: int = 2)[source]¶: Dataset for spectrogram & transcript matching

Librispeech Preprocess¶

lightning_asr.data.librispeech.preprocess.collect_transcripts(dataset_path)[source]¶: Collect librispeech transcripts

lightning_asr.data.librispeech.preprocess.generate_manifest_file(dataset_path: str, part: str, transcripts: list)[source]¶: Generate manifest file

lightning_asr.data.librispeech.preprocess.prepare_tokenizer(train_transcripts, vocab_size)[source]¶: Prepare sentencepice tokenizer

Data¶

Data Loader¶

Dataset¶

Librispeech Preprocess¶

Lightning Data Module¶