Speech-Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks

dc.contributor.authorSadoughi, Najmeh
dc.contributor.authorBusso, Carlos
dc.contributor.utdAuthorSadoughi, Najmeh
dc.contributor.utdAuthorBusso, Carlos
dc.date.accessioned2020-07-15T21:31:07Z
dc.date.available2020-07-15T21:31:07Z
dc.date.issued2019-05-07
dc.descriptionDue to copyright restrictions and/or publisher's policy full text access from Treasures at UT Dallas is limited to current UTD affiliates (use the provided Link to Article).
dc.description.abstractArticulation, emotion, and personality play strong roles in the orofacial movements. To improve the naturalness and expressiveness of virtual agents(VAs), it is important that we carefully model the complex interplay between these factors. This paper proposes a conditional generative adversarial network, called conditional sequential GAN(CSG), which learns the relationship between emotion, lexical content and lip movements in a principled manner. This model uses a set of spectral and emotional speech features directly extracted from the speech signal as conditioning inputs, generating realistic movements. A key feature of the approach is that it is a speech-driven framework that does not require transcripts. Our experiments show the superiority of this model over three state-of-the-art baselines in terms of objective and subjective evaluations. When the target emotion is known, we propose to create emotionally dependent models by either adapting the base model with the target emotional data (CSG-Emo-Adapted), or adding emotional conditions as the input of the model(CSG-Emo-Aware). Objective evaluations of these models show improvements for the CSG-Emo-Adapted compared with the CSG model, as the trajectory sequences are closer to the original sequences. Subjective evaluations show significantly better results for this model compared with the CSG model when the target emotion is happiness. IEEE
dc.description.departmentErik Jonsson School of Engineering and Computer Science
dc.description.sponsorshipNational Science Foundation (NSF) award IIS-1718944
dc.identifier.bibliographicCitationSadoughi, N., and C. Busso. 2019. "Speech-Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks." IEEE Transactions on Affective Computing, doi: 10.1109/TAFFC.2019.2916031
dc.identifier.issn1949-3045
dc.identifier.urihttp://dx.doi.org/10.1109/TAFFC.2019.2916031
dc.identifier.urihttps://hdl.handle.net/10735.1/8711
dc.language.isoen
dc.publisherInstitute of Electrical and Electronics Engineers Inc.
dc.rights©2019 IEEE
dc.source.journalIEEE Transactions on Affective Computing
dc.subjectHidden Markov models
dc.subjectLips
dc.subjectData structures (Computer science)
dc.subjectInformation visualization
dc.subjectFlow visualization
dc.subjectSpeech
dc.titleSpeech-Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks
dc.type.genrearticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
JECS-6770-261644.12-LINK.pdf
Size:
183.78 KB
Format:
Adobe Portable Document Format
Description:
Link to Article