Skip to main content
Passa alla visualizzazione normale.

SABATO MARCO SINISCALCHI

Tensor-To-Vector Regression for Multi-Channel Speech Enhancement Based on Tensor-Train Network

  • Authors: Qi, Jun; Hu, Hu; Wang, Yannan; Yang, Chao-Han Huck; Marco Siniscalchi, Sabato; Lee, Chin-Hui
  • Publication year: 2020
  • Type: Contributo in atti di convegno pubblicato in volume
  • OA Link: http://hdl.handle.net/10447/636674

Abstract

We propose a tensor-to-vector regression approach to multi-channel speech enhancement in order to address the issue of input size explosion and hidden-layer size expansion. The key idea is to cast the conventional deep neural network (DNN) based vector-to-vector regression formulation under a tensor-train network (TTN) framework. TTN is a recently emerged solution for compact representation of deep models with fully connected hidden layers. Thus TTN maintains DNN's expressive power yet involves a much smaller amount of trainable parameters. Furthermore, TTN can handle a multi-dimensional tensor input by design, which exactly matches the desired setting in multi-channel speech enhancement. We first provide a theoretical extension from DNN to TTN based regression. Next, we show that TTN can attain speech enhancement quality comparable with that for DNN but with much fewer parameters, e.g., a reduction from 27 million to only 5 million parameters is observed in a single-channel scenario. TTN also improves PESQ over DNN from 2.86 to 2.96 by slightly increasing the number of trainable parameters. Finally, in 8-channel conditions, a PESQ of 3.12 is achieved using 20 million parameters for TTN, whereas a DNN with 68 million parameters can only attain a PESQ of 3.06.