Expressive Speech-Driven Lip Movements with Multitask Learning

Sadoughi, Najmeh; Busso, Carlos A.

Expressive Speech-Driven Lip Movements with Multitask Learning

Files

JECS-6770-279919.12-Link.pdf (164.67 KB)

Authors

Sadoughi, Najmeh

Busso, Carlos A.

Publisher

Institute of Electrical and Electronics Engineers Inc.

URI

https://hdl.handle.net/10735.1/6771

Abstract

The orofacial area conveys a range of information, including speech articulation and emotions. These two factors add constraints to the facial movements, creating non-trivial integrations and interplays. To generate more expressive and naturalistic movements for conversational agents (CAs) the relationship between these factors should be carefully modeled. Data-driven models are more appropriate for this task than rule-based systems. This paper provides two deep learning speech-driven structures to integrate speech articulation and emotional cues. The proposed approaches rely on multitask learning (MTL) strategies, where related secondary tasks are jointly solved when synthesizing orofacial movements. In particular, we evaluate emotion recognition and viseme recognition as secondary tasks. The approach creates shared representations that generate behaviors that not only are closer to the original orofacial movements, but also are perceived more natural than the results from single task learning.

Description

Full text access from Treasures at UT Dallas is restricted to current UTD affiliates (use the provided Link to Article).

Keywords

Learning, Gesture, Speech, Conversation, Emotion recognition

item.page.sponsorship

US National Science Foundation grant IIS-1718944.

Rights

Collections

Busso-Recabarren, Carlos A.

Full item page

Expressive Speech-Driven Lip Movements with Multitask Learning

Files

Date

Authors

ORCID

Journal Title

Journal ISSN

Volume Title

Publisher

item.page.doi

URI

Abstract

Description

Keywords

item.page.sponsorship

Rights

Citation

Collections