Pubblicazione | SABATO MARCO SINISCALCHI | Università degli Studi di Palermo

Joint training of multi-channel-condition dereverberation and acoustic modeling of microphone array speech for robust distant speech recognition

Authors: Ge F.; Li K.; Wu B.; Siniscalchi S.M.; Yan Y.; Lee C.-H.
Publication year: 2017
Type: Contributo in atti di convegno pubblicato in volume
OA Link: http://hdl.handle.net/10447/649496

Abstract

We propose a novel data utilization strategy, called multichannel-condition learning, leveraging upon complementary information captured in microphone array speech to jointly train dereverberation and acoustic deep neural network (DNN) models for robust distant speech recognition. Experimental results, with a single automatic speech recognition (ASR) system, on the REVERB2014 simulated evaluation data show that, on 1-channel testing, the baseline joint training scheme attains a word error rate (WER) of 7.47%, reduced from 8.72% for separate training. The proposed multi-channel-condition learning scheme has been experimented on different channel data combinations and usage showing many interesting implications. Finally, training on all 8-channel data and with DNN-based language model rescoring, a state-of-the-art WER of 4.05% is achieved. We anticipate an even lower WER when combining more top ASR systems.