A Multimodal Analysis of Synchrony During Dyadic Interaction Using a Metric Based on Sequential Pattern Mining
Jakkam, Anil Kumar
MetadataShow full item record
In human-human interaction, people tend to temporally adapt to each other as the conversation progresses, changing their intonation, speech rate, fundamental frequency, word selection, hand gestures, and head movements. This phenomenon is known as synchrony, convergence, entrainment, and adaptation. Recent studies have investigated this phenomenon at different dimensions and levels for single modalities. However, studying synchrony as an interplay between modalities at a local level between conversational partners is an open question. This study explores synchrony using a multimodal approach based on sequential pattern mining in dyadic conversations. This analysis considers acoustic, text and video-based features at a turn level. The proposed data-driven framework identifies frequent sequences containing events from multiple modalities that can quantify the synchrony between conversational partners (e.g., a speaker reduces speech rate when the other utters disfluencies). The evaluation relies on 90 sessions from the Fishers corpus, which comprises telephone conversations between two people, and 54 sessions of audio-visual recordings of dyadic interactions from the MAHNOB MHI-Mimicry database. We develop a multimodal metric to quantify synchrony between conversational partners using this framework. We report results on this metric by comparing actual dyadic conversations with pseudo-interactions, which are artificially created by randomly pairing the speakers. Our results show that the proposed metric provides the temporal evolution of synchrony identifying non-trivial sequence of events across multimodal features.