Detection-Based ASR in the Automatic Speech Attribute Transcription Project
- Authors: I. BROMBERG; Q. FU; J. HOU; J. LI; C. MA; B. MATTHEWS; A. D. MORENO; J. MORRIS; S. M. SINISCALCHI; Y. TSAO; Y. WANG
- Publication year: 2007
- Type: Contributo in atti di convegno pubblicato in volume
- OA Link: http://hdl.handle.net/10447/649499
Abstract
We present methods of detector design in the Automatic Speech Attribute Transcription project. This paper details the results of a student-led, cross-site collaboration between Georgia Institute of Technology, The Ohio State University and Rutgers University. The work reported in this paper describes and evaluates the detection-based ASR paradigm and discusses phonetic attribute classes, methods of detecting framewise phonetic attributes and methods of combining attribute detectors for ASR. We use Multi-Layer Perceptrons, Hidden Markov Models and Support Vector Machines to compute confidence scores for several prescribed sets of phonetic attribute classes. We use Conditional Random Fields (CRFs) and knowledge-based rescoring of phone lattices to combine framewise detection scores for continuous phone recognition on the TIMIT database. With CRFs, we achieve a phone accuracy of 70.63%, outperforming the baseline and enhanced HMM systems, by incorporating all of the attribute detectors discussed in the paper