Laughter and Filler Detection in Naturalistic Audio

Kaushik, Lakshmish .; Sangwan, Abhijeet; Hansen, John H. L.

Laughter and Filler Detection in Naturalistic Audio

Files

JECS-3626-4639.10.pdf (696.53 KB)

Authors

Kaushik, Lakshmish .

Sangwan, Abhijeet

Hansen, John H. L.

Publisher

International Speech and Communication Association

URI

http://hdl.handle.net/10735.1/5058

Abstract

Laughter and fillers are common phenomenon in speech, and play an important role in communication. In this study, we present Deep Neural Network (DNN) and Convolutional Neural Network (CNN) based systems to classify non-verbal cues (laughter and fillers) from verbal speech in naturalistic audio. We propose improvements over a deep learning system proposed in 1]. Particularly, we propose a simple method to combine spectral features with pitch information to capture prosodic and spectral cues for filler/laughter. Additionally, we propose using a wider time context for feature extraction so that the time evolution of the spectral and prosodic structure can also be exploited for classification. Furthermore, we propose to use CNN for classification. The new method is evaluated on conversational telephony speech (CTS, drawn from Switchboard and Fisher) data and UT-Opinion corpus. Our results shows that the new system improves the AUC (area under the curve) metric by 8.15% and 11.9% absolute for laughters, and 4.85% and 6.01% absolute for fillers, over the baseline system, for CTS and UT-Opinion data, respectively. Finally, we analyze the results to explain the difference in performance between traditional CTS data and naturalistic audio (UT-Opinion), and identify challenges that need to be addressed to make systems perform better for practical data. Copyright

Keywords

Neural networks (Computer science)--Convolutional, Neural networks (Computer science)--Deep, Transmutation (Linguistics), Nonverbal communication

item.page.sponsorship

Partially supported by AFRL (contract # FA8750-12-0188) and NSF (grant # 1218159)

Rights

Collections

Hansen, John H. L.
JECS Staff and Student Research

Full item page

Laughter and Filler Detection in Naturalistic Audio

Files

Date

Authors

ORCID

Journal Title

Journal ISSN

Volume Title

Publisher

item.page.doi

URI

Abstract

Description

Keywords

item.page.sponsorship

Rights

Citation

Collections