Speech-Driven Animation with Meaningful Behaviors

Sadoughi, Najmeh; Busso, Carlos

Speech-Driven Animation with Meaningful Behaviors

dc.contributor.author	Sadoughi, Najmeh
dc.contributor.author	Busso, Carlos
dc.contributor.utdAuthor	Sadoughi, Najmeh
dc.contributor.utdAuthor	Busso, Carlos
dc.date.accessioned	2020-02-28T17:26:06Z
dc.date.available	2020-02-28T17:26:06Z
dc.date.issued	2019-04-05
dc.description	Due to copyright restrictions and/or publisher's policy full text access from Treasures at UT Dallas is limited to current UTD affiliates (use the provided Link to Article).
dc.description	Supplementary material is available on publisher's website. Use the doi.org link below.
dc.description.abstract	Conversational agents (CAs) play an important role in human computer interaction (HCI). Creating believable movements for CAs is challenging, since the movements have to be meaningful and natural, reflecting the coupling between gestures and speech. Studies in the past have mainly relied on rule-based or data-driven approaches. Rule-based methods focus on creating meaningful behaviors conveying the underlying message, but the gestures cannot be easily synchronized with speech. Data-driven approaches, especially speech-driven models, can capture the relationship between speech and gestures. However, they create behaviors disregarding the meaning of the message. This study proposes to bridge the gap between these two approaches overcoming their limitations. The approach builds a dynamic Bayesian network (DBN), where a discrete variable is added to constrain the behaviors on the underlying constraint. The study implements and evaluates the approach with two constraints: discourse functions and prototypical behaviors. By constraining on the discourse functions (e.g., questions), the model learns the characteristic behaviors associated with a given discourse class learning the rules from the data. By constraining on prototypical behaviors (e.g., head nods), the approach can be embedded in a rule-based system as a behavior realizer creating trajectories that are timely synchronized with speech. The study proposes a DBN structure and a training approach that (1) models the cause-effect relationship between the constraint and the gestures, and (2) captures the differences in the behaviors across constraints by enforcing sparse transitions between shared and exclusive states per constraint. Objective and subjective evaluations demonstrate the benefits of the proposed approach over an unconstrained baseline model. ©2019 Elsevier B.V.
dc.description.department	Erik Jonsson School of Engineering and Computer Science
dc.description.sponsorship	National Science Foundation grants IIS:1718944
dc.identifier.bibliographicCitation	Sadoughi, N., and C. Busso. 2019. "Speech-driven animation with meaningful behaviors." Speech Communication 110: 90-100, doi: 10.1016/j.specom.2019.04.005
dc.identifier.issn	0167-6393
dc.identifier.uri	http://doi.org/10.1016/j.specom.2019.04.005
dc.identifier.uri	https://hdl.handle.net/10735.1/7312
dc.identifier.volume	110
dc.language.iso	en
dc.publisher	Elsevier B.V.
dc.rights	©2019 Elsevier B.V. All Rights Reserved.
dc.source.journal	Speech Communication
dc.subject	Computer animation
dc.subject	Human-computer interaction
dc.subject	Conversation
dc.subject	Variables (Mathematics)
dc.subject	Speech
dc.title	Speech-Driven Animation with Meaningful Behaviors
dc.type.genre	article

Files

Original bundle

Now showing 1 - 1 of 1

Name:: JECS-6770-260953.62-LINK.pdf
Size:: 164.48 KB
Format:: Adobe Portable Document Format
Description:: Link to Article

Download

Collections

Busso-Recabarren, Carlos A.