Now showing items 1-5 of 5
Deep Neural Networks and Model-Based Approaches for Robust Speaker Diarization in Naturalistic Audio Streams
Speaker diarization is an unsupervised task that determines "who spoke and when" within input audio stream. It consists of four sub-systems: (i) speech activity detection (SAD); (ii) speaker segmentation and modeling; ...
A Pipeline-Based Task-Oriented Dialogue System on DSTC2 Dataset
Dialogue systems have attracted a lot of attention since some conversational products like Google Assistant and Amazon Echo smart speaker have achieved big successes recently. In this work, we try to build a pipeline-based ...
Domain Adaptation for Speech Based Emotion Recognition
One of the main barriers in the deployment of speech emotion recognition systems in real applications is the lack of generalization of the emotion classifiers. The recognition performance achieved in controlled recordings ...
Learning Based Algorithms for Speech Applications Under Domain Mismatch Conditions
Recent years have experienced a tremendous growth in the use of voice based virtual assistants. Communicating with the digital world via voice is becoming a very easy and pleasant experience, since this mode of interaction ...
Novel Frameworks for Attribute-Based Speech Emotion Recognition using Time-Continuous Traces and Sentence-Level Annotations
Speech emotion recognition (SER) plays an important role in a growing world of automation and artificial intelligence. Robust and accurate SER systems are crucial for enhancing human-computer interaction. Emotional ...