Exploring Lexicon-Free Modeling Units for End-to-End Korean and Korean-English Code-Switching Speech Recognition
- Oct. 2020
- by Jisung Wang et. al.
Automatic speech recognition (ASR) tasks are usually solved using lexicon-based hybrid systems or character-based acoustic models to automatically translate speech data into written text. While hybrid systems require a manually designed lexicon, end-to-end models can process character-based speech data. This resolves the need to define a lexicon for non-English languages for which a standard lexicon may be absent. Korean is relatively phonetic and has a unique writing system, and it is thus worth investigating useful modeling units for end-to-end Korean ASR. Our work is the first to compare the performance of deep neural networks (DNNs), designed as a combination of connectionist temporal classification and attention-based encoder-decoder, on various lexicon-free Korean models. Experiments on the Zeroth-Korean dataset and medical records, which consist of Korean-only and Korean-English code-switching corpora respectively, show how DNNs based on syllables and sub-words significantly outperform Jamo-based models on Korean ASR tasks. Our successful application of using lexicon-free modeling units on non-English ASR tasks provides compelling evidence that lexicon-free approaches can alleviate the heavy code-switching involved in non-English medical transcriptions.
Jisung Wang, Jihwan Kim, Sangki Kim, and Yeha Lee