Speech-Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks

Sadoughi, Najmeh; Busso, Carlos

Speech-Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks

dc.contributor.author	Sadoughi, Najmeh
dc.contributor.author	Busso, Carlos
dc.contributor.utdAuthor	Sadoughi, Najmeh
dc.contributor.utdAuthor	Busso, Carlos
dc.date.accessioned	2020-07-15T21:31:07Z
dc.date.available	2020-07-15T21:31:07Z
dc.date.issued	2019-05-07
dc.description	Due to copyright restrictions and/or publisher's policy full text access from Treasures at UT Dallas is limited to current UTD affiliates (use the provided Link to Article).
dc.description.abstract	Articulation, emotion, and personality play strong roles in the orofacial movements. To improve the naturalness and expressiveness of virtual agents(VAs), it is important that we carefully model the complex interplay between these factors. This paper proposes a conditional generative adversarial network, called conditional sequential GAN(CSG), which learns the relationship between emotion, lexical content and lip movements in a principled manner. This model uses a set of spectral and emotional speech features directly extracted from the speech signal as conditioning inputs, generating realistic movements. A key feature of the approach is that it is a speech-driven framework that does not require transcripts. Our experiments show the superiority of this model over three state-of-the-art baselines in terms of objective and subjective evaluations. When the target emotion is known, we propose to create emotionally dependent models by either adapting the base model with the target emotional data (CSG-Emo-Adapted), or adding emotional conditions as the input of the model(CSG-Emo-Aware). Objective evaluations of these models show improvements for the CSG-Emo-Adapted compared with the CSG model, as the trajectory sequences are closer to the original sequences. Subjective evaluations show significantly better results for this model compared with the CSG model when the target emotion is happiness. IEEE
dc.description.department	Erik Jonsson School of Engineering and Computer Science
dc.description.sponsorship	National Science Foundation (NSF) award IIS-1718944
dc.identifier.bibliographicCitation	Sadoughi, N., and C. Busso. 2019. "Speech-Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks." IEEE Transactions on Affective Computing, doi: 10.1109/TAFFC.2019.2916031
dc.identifier.issn	1949-3045
dc.identifier.uri	http://dx.doi.org/10.1109/TAFFC.2019.2916031
dc.identifier.uri	https://hdl.handle.net/10735.1/8711
dc.language.iso	en
dc.publisher	Institute of Electrical and Electronics Engineers Inc.
dc.rights	©2019 IEEE
dc.source.journal	IEEE Transactions on Affective Computing
dc.subject	Hidden Markov models
dc.subject	Lips
dc.subject	Data structures (Computer science)
dc.subject	Information visualization
dc.subject	Flow visualization
dc.subject	Speech
dc.title	Speech-Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks
dc.type.genre	article

Files

Original bundle

Now showing 1 - 1 of 1

Name:: JECS-6770-261644.12-LINK.pdf
Size:: 183.78 KB
Format:: Adobe Portable Document Format
Description:: Link to Article

Download

Collections

Busso-Recabarren, Carlos A.