Busso-Recabarren, Carlos A.

Permanent URI for this collectionhttps://hdl.handle.net/10735.1/6770

Carlos Busso-Recabarren is a Professor of Electrical Engineering and Principal Investigator of the MSP (Multimodal Signal Processing) Laboratory. His research interests include:

Modeling and synthesis of human behavior
Affective State Recognition
Multimodal Interfaces
Sensing Participant Interaction
Digital Signal Processing
Speech and Video Processing

Browse

Now showing 1 - 2 of 2

Expressive Speech-Driven Lip Movements with Multitask Learning
(Institute of Electrical and Electronics Engineers Inc.) Sadoughi, Najmeh; Busso, Carlos A.; Sadoughi, Najmeh; Busso, Carlos A.
The orofacial area conveys a range of information, including speech articulation and emotions. These two factors add constraints to the facial movements, creating non-trivial integrations and interplays. To generate more expressive and naturalistic movements for conversational agents (CAs) the relationship between these factors should be carefully modeled. Data-driven models are more appropriate for this task than rule-based systems. This paper provides two deep learning speech-driven structures to integrate speech articulation and emotional cues. The proposed approaches rely on multitask learning (MTL) strategies, where related secondary tasks are jointly solved when synthesizing orofacial movements. In particular, we evaluate emotion recognition and viseme recognition as secondary tasks. The approach creates shared representations that generate behaviors that not only are closer to the original orofacial movements, but also are perceived more natural than the results from single task learning.
Speech-Driven Animation with Meaningful Behaviors
(Elsevier B.V., 2019-04-05) Sadoughi, Najmeh; Busso, Carlos; Sadoughi, Najmeh; Busso, Carlos
Conversational agents (CAs) play an important role in human computer interaction (HCI). Creating believable movements for CAs is challenging, since the movements have to be meaningful and natural, reflecting the coupling between gestures and speech. Studies in the past have mainly relied on rule-based or data-driven approaches. Rule-based methods focus on creating meaningful behaviors conveying the underlying message, but the gestures cannot be easily synchronized with speech. Data-driven approaches, especially speech-driven models, can capture the relationship between speech and gestures. However, they create behaviors disregarding the meaning of the message. This study proposes to bridge the gap between these two approaches overcoming their limitations. The approach builds a dynamic Bayesian network (DBN), where a discrete variable is added to constrain the behaviors on the underlying constraint. The study implements and evaluates the approach with two constraints: discourse functions and prototypical behaviors. By constraining on the discourse functions (e.g., questions), the model learns the characteristic behaviors associated with a given discourse class learning the rules from the data. By constraining on prototypical behaviors (e.g., head nods), the approach can be embedded in a rule-based system as a behavior realizer creating trajectories that are timely synchronized with speech. The study proposes a DBN structure and a training approach that (1) models the cause-effect relationship between the constraint and the gestures, and (2) captures the differences in the behaviors across constraints by enforcing sparse transitions between shared and exclusive states per constraint. Objective and subjective evaluations demonstrate the benefits of the proposed approach over an unconstrained baseline model. ©2019 Elsevier B.V.

Browse

Browsing Busso-Recabarren, Carlos A. by Subject "Conversation"