Speech-Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks

Sadoughi, Najmeh; Busso, Carlos

Speech-Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks

Files

JECS-6770-261644.12-LINK.pdf (183.78 KB)

Date

2019-05-07

Authors

Sadoughi, Najmeh

Busso, Carlos

Publisher

Institute of Electrical and Electronics Engineers Inc.

URI

http://dx.doi.org/10.1109/TAFFC.2019.2916031
https://hdl.handle.net/10735.1/8711

Abstract

Articulation, emotion, and personality play strong roles in the orofacial movements. To improve the naturalness and expressiveness of virtual agents(VAs), it is important that we carefully model the complex interplay between these factors. This paper proposes a conditional generative adversarial network, called conditional sequential GAN(CSG), which learns the relationship between emotion, lexical content and lip movements in a principled manner. This model uses a set of spectral and emotional speech features directly extracted from the speech signal as conditioning inputs, generating realistic movements. A key feature of the approach is that it is a speech-driven framework that does not require transcripts. Our experiments show the superiority of this model over three state-of-the-art baselines in terms of objective and subjective evaluations. When the target emotion is known, we propose to create emotionally dependent models by either adapting the base model with the target emotional data (CSG-Emo-Adapted), or adding emotional conditions as the input of the model(CSG-Emo-Aware). Objective evaluations of these models show improvements for the CSG-Emo-Adapted compared with the CSG model, as the trajectory sequences are closer to the original sequences. Subjective evaluations show significantly better results for this model compared with the CSG model when the target emotion is happiness. IEEE

Description

Due to copyright restrictions and/or publisher's policy full text access from Treasures at UT Dallas is limited to current UTD affiliates (use the provided Link to Article).

Keywords

Hidden Markov models, Lips, Data structures (Computer science), Information visualization, Flow visualization, Speech

item.page.sponsorship

National Science Foundation (NSF) award IIS-1718944

Rights

Collections

Busso-Recabarren, Carlos A.

Full item page

Speech-Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks

Files

Date

Authors

ORCID

Journal Title

Journal ISSN

Volume Title

Publisher

item.page.doi

URI

Abstract

Description

Keywords

item.page.sponsorship

Rights

Citation

Collections