Data¶
Data Loader¶
Dataset¶
-
class
lightning_asr.data.dataset.
AudioDataset
(dataset_path: str, audio_paths: list, transcripts: list, apply_spec_augment: bool = False, sos_id: int = 1, eos_id: int = 2, sample_rate: int = 16000, num_mels: int = 80, frame_length: float = 25.0, frame_shift: float = 10.0, freq_mask_para: int = 27, time_mask_num: int = 4, freq_mask_num: int = 2)[source]¶ Dataset for audio & transcript matching
Note
Do not use this class directly, use one of the sub classes.
- Parameters
dataset_path (str) – path of librispeech dataset
audio_paths (list) – list of audio path
transcripts (list) – list of transript
apply_spec_augment (bool) – flag indication whether to apply spec augment or not
sos_id (int) – identification of <sos>
eos_id (int) – identification of <eos>
sample_rate (int) – sampling rate of audio
num_mels (int) – the number of mfc coefficients to retain.
frame_length (float) – frame length for spectrogram (ms)
frame_shift (float) – length of hop between STFT (short time fourier transform) windows.
freq_mask_para (int) – hyper Parameter for freq masking to limit freq masking length
time_mask_num (int) – how many time-masked area to make
freq_mask_num (int) – how many freq-masked area to make
-
class
lightning_asr.data.dataset.
FBankDataset
(dataset_path: str, audio_paths: list, transcripts: list, apply_spec_augment: bool = False, sos_id: int = 1, eos_id: int = 2, sample_rate: int = 16000, num_mels: int = 80, frame_length: float = 25.0, frame_shift: float = 10.0, freq_mask_para: int = 27, time_mask_num: int = 4, freq_mask_num: int = 2)[source]¶ Dataset for filter bank & transcript matching
-
class
lightning_asr.data.dataset.
MFCCDataset
(dataset_path: str, audio_paths: list, transcripts: list, apply_spec_augment: bool = False, sos_id: int = 1, eos_id: int = 2, sample_rate: int = 16000, num_mels: int = 80, frame_length: float = 25.0, frame_shift: float = 10.0, freq_mask_para: int = 27, time_mask_num: int = 4, freq_mask_num: int = 2)[source]¶ Dataset for MFCC & transcript matching
-
class
lightning_asr.data.dataset.
MelSpectrogramDataset
(dataset_path: str, audio_paths: list, transcripts: list, apply_spec_augment: bool = False, sos_id: int = 1, eos_id: int = 2, sample_rate: int = 16000, num_mels: int = 80, frame_length: float = 25.0, frame_shift: float = 10.0, freq_mask_para: int = 27, time_mask_num: int = 4, freq_mask_num: int = 2)[source]¶ Dataset for mel-spectrogram & transcript matching
-
class
lightning_asr.data.dataset.
SpectrogramDataset
(dataset_path: str, audio_paths: list, transcripts: list, apply_spec_augment: bool = False, sos_id: int = 1, eos_id: int = 2, sample_rate: int = 16000, num_mels: int = 80, frame_length: float = 25.0, frame_shift: float = 10.0, freq_mask_para: int = 27, time_mask_num: int = 4, freq_mask_num: int = 2)[source]¶ Dataset for spectrogram & transcript matching
Librispeech Preprocess¶
-
lightning_asr.data.librispeech.preprocess.
collect_transcripts
(dataset_path)[source]¶ Collect librispeech transcripts