Salta al contenuto principale
Passa alla visualizzazione normale.

SABATO MARCO SINISCALCHI

DEEP LEARNING WITH MAXIMAL FIGURE-OF-MERIT COST TO ADVANCE MULTI-LABEL SPEECH ATTRIBUTE DETECTION

  • Autori: Kukanov, I.; Hautamäki, V.; SINISCALCHI, SABATO MARCO; Li, K.
  • Anno di pubblicazione: 2017
  • Tipologia: Contributo in atti di convegno pubblicato in volume
  • OA Link: http://hdl.handle.net/10447/649501

Abstract

In this work, we are interested in boosting speech attribute detection by formulating it as a multi-label classification task, and deep neural networks (DNNs) are used to design speech attribute detectors. A straightforward way to tackle the speech attribute detection task is to estimate DNN parameters using the mean squared error (MSE) loss function and employ a sigmoid function in the DNN output nodes. A more principled way is nonetheless to incorporate the micro-F1 measure, which is a widely used metric in the multi-label classification, into the DNN loss function to directly improve the metric of interest at training time. Micro-F1 is not differentiable, yet we overcome such a problem by casting our task under the maximal figure-of-merit (MFoM) learning framework. The results demonstrate that our MFoM approach consistently outperforms the baseline systems.