Automatic Speech Recognition by Machines
- Autori: Siniscalchi S.M.; Lee C.-H.
- Anno di pubblicazione: 2021
- Tipologia: Capitolo o Saggio
- OA Link: http://hdl.handle.net/10447/649494
Abstract
Building machines to converse with human beings through automatic speech recognition (ASR) and understanding (ASU) has long been a topic of great interest for scientists and engineers, and we have recently witnessed rapid technological advances in this area. Here, we first cast the ASR problem as a pattern-matching and channel-decoding paradigm. We then follow this with a discussion of the Hidden Markov Model (HMM), which is the most successful technique for modelling fundamental speech units, such as phones and words, in order to solve ASR as a search through a top-down decoding network. Recent advances using deep neural networks as parts of an ASR system are also highlighted. We then compare the conventional top-down decoding approach with the recently proposed automatic speech attribute transcription (ASAT) paradigm, which can better leverage knowledge sources in speech production, auditory perception and language theory through bottom-up integration. Finally we discuss how the processing-based speech engineering and knowledge-based speech science communities can work collaboratively to improve our understanding of speech and enhance ASR capabilities.