Salta al contenuto principale
Passa alla visualizzazione normale.

SABATO MARCO SINISCALCHI

An Investigation of Incorporating Mamba For Speech Enhancement

  • Autori: Chao, Rong; Cheng, Wen-Huang; Quatra, Moreno La; Siniscalchi, Sabato Marco; Yang, Chao-Han Huck; Fu, Szu-Wei; Tsao, Yu
  • Anno di pubblicazione: 2024
  • Tipologia: Contributo in atti di convegno pubblicato in volume
  • OA Link: http://hdl.handle.net/10447/673127

Abstract

This work aims to investigate the use of a recently proposed, attention-free, scalable state-space model (SSM), Mamba, for the speech enhancement (SE) task. In particular, we employ Mamba to deploy different regression-based SE models (SEMamba) with different configurations, namely basic, advanced, causal, and non-causal. Furthermore, loss functions either based on signal-level distances or metric-oriented are considered. Experimental evidence shows that SEMamba attains a competitive PESQ of 3.55 on the VoiceBank-DEMAND dataset with the advanced, non-causal configuration. A new state-of-the-art PESQ of 3.69 is also reported when SEMamba is combined with Perceptual Contrast Stretching (PCS). Compared against Transformed-based equivalent SE solutions, a noticeable FLOPs reduction up to ∼ 12% is observed with the advanced non-causal configurations. Finally, SEMamba can be used as a pre-processing step before automatic speech recognition (ASR), showing competitive performance against recent SE solutions.