Pubblicazione | GIOVANNI PILATO | Università degli Studi di Palermo

A Web-oriented Java 3D Talking Head

Authors: Gambino,O; Augello,A;Caronia, A;Pilato,G;Pirrone,R.;Gaglio,S
Publication year: 2009
Type: Capitolo o Saggio (Capitolo o saggio)
OA Link: http://hdl.handle.net/10447/59798

Abstract

Facial animation denotes all those systems performing speech synchro- nization with an animated face model. These kinds of systems are named Talking Heads or Talking Faces. At the same time simple dialogue systems called chatbots have been developed. Chatbots are software agents able to interact with users through pattern-matching based rules. In this paper a Talking Head oriented to the creation of a Chatbot is presented. An answer is generated in form of text trig- gered by an input query. The answer is converted into a facial animation using a 3D face model whose lips movements are synchronized with the sound produced by a speech synthesis module. Our Talking Head exploits the naturalness of the facial animation and provides a real-time interactive interface to the user. Besides, it is specifically suited for being used on the web. This leads to a set of require- ments to be satisfied, like: simple installation, visual quality, fast download, and interactivity in real time. The web infrastructure has been realized using the Cli- ent-Server model. The Chatbot, the Natural Language Processing and the Digital Signal Processing services are delegated to the server. The client is involved in an- imation and synchronization. This way, the server can handle multiple requests from clients. The conversation module has been implemented using the A.L.I.C.E. (Artificial Linguistic Internet Computer Entity) technology. The output of the chatbot is given input to the Natural Language Processing (Comedia Speech), in- corporating a text analyzer, a letter-to-sound module and a module for the genera- tion of prosody. The client, through the synchronization module, computes the time of real duration of the animation and the duration of each phoneme and con- sequently of each viseme. The morphing module performs the animation of the fa- cial model and the voice reproduction. As a result, the user will see the answer to question both in textual form and in the form of visual animation.