An experimental study on continuous phone recognition with little or no language specific-training data
- Authors: D.-C. LYU; S. M. SINISCALCHI; AND C. H. LEE
- Publication year: 2008
- Type: Contributo in atti di convegno pubblicato in volume
- OA Link: http://hdl.handle.net/10447/663736
Abstract
We study continuous phone recognition with little or no language-specific speech training data. The phone recognizer integrates three levels of information from: (1) frame based speech attribute detectors, (2) artificial neural network based phone event mergers, and (3) decoding based evidence verifiers. With a set of acoustic phonetic attributes defined over a number of available languages, a collection of attribute-to-phone mapping rules can either be specified in a language-dependent way, one for each language, or even independently for all languages if the attribute specification is complete to cover all phones and the phone definition is universal to cover all spoken languages. We report on experimental results on Japanese phone recognition with the OGI Multilingual Speech Corpus. It is interesting that a good performance can be achieved without using any Japanese speech training data, and the phone accuracy rates vary depending on how the attribute detectors and phone mergers are configured. Further improvement is observed by adding little Japanese data to train the attribute-to-phone mergers.