Advances in Methodologies Using EEG to Characterize the Cortical Processing of Speech and Its Perceived Sound Quality
Date
Authors
ORCID
Journal Title
Journal ISSN
Volume Title
Publisher
item.page.doi
Abstract
Speech perception is dependent on access to the amplitude, spectral and temporal information in speech. This dissertation focuses on the temporal structure of speech, which consists of a slow- varying amplitude (temporal envelope, ENV) and a rapid-varying frequency (temporal fine structure, TFS). Past studies on speech perception [for review, see Lorenzi and Moore 2008] suggest ENV alone is sufficient for speech perception in quiet and TFS alone is used to segregate speech from the background noise (e.g., a competing talker scenario). It has been shown that the reduction in subjective quality ratings obtained through behavioral quality assessment is correlated to the degree of degradation in the temporal envelope. However, the neural correlates of sound quality perception with continuous speech are still unclear. This dissertation explores two complementary research goals proposed as studies which consider speech perception as it relates to ENV and TFS and its sound quality perception. The dissertation is comprised of two studies: Study 1 attempts to characterize the cortical processing of speech and Study 2 attempts to characterize the perceived sound quality in normal- hearing listeners. First, the overall introduction to both studies is provided in Chapter 1. Next, we lay out the background of both studies in detail in Chapter 2. Chapter 3 presents Study 1 of this dissertation and investigates the role and relative contribution of ENV and TFS to speech perception in normal hearing listeners in quiet. The synchronization between brain oscillations at different frequency bands is commonly used as a marker for the key mechanisms in coordinating neural dynamics for different temporal and spatial domains [Canolty and Knight 2010]. When neural oscillations of two different frequency bands synchronize, their “peak” frequencies usually exhibit a harmonic relationship. A recent study [Rodriguez and Alaerts 2019] showed a prominent occurrence of this 2:1 harmonic cross-frequency relationship between alpha (8-14 Hz) and theta (4-8 Hz) rhythms when task-relevant efficient cognitive processing is engaged. Study 1 examined this power-power cross-frequency coupling (CFC) between alpha (8-14 Hz) and theta (4-8 Hz) and also between gamma (30-100 Hz) and theta frequency bands of cortical activity in normal- hearing listeners using electroencephalography (EEG) signals when processing ENV and TFS of speech. The results showed a relatively increased CFC when listening to ENV alone. This finding may suggest more synchrony across different frequency bands of cortical activity in processing ENV than TFS. Recent studies have shown that cortical activity basically tracks the envelope of continuous natural speech, which could potentially serve as a useful method to study the underlying processes for speech perception. Study 2 of the dissertation, presented in Chapters 4 & 5) investigates the differences in cortical entrainment to the envelope of speech spoken by cochlear implant (CI) talkers (degraded speech) and normal-hearing (NH) talkers. Although, a CI may help individuals with hearing loss to restore or improve the ability to hear and provide the auditory feedback necessary for improved speech production, speech produced by CI users is mostly abnormal compared to normal hearing individuals (Gautam et al., 2019). The motivation is to achieve a metric to assess “how well” hard-of-hearing talkers have spoken and the auditory feedback they received in their current aural compensation. The results showed higher perceived sound quality and closer tracking of speech envelope in normal-hearing listeners when listening to a sample of speech produced by NH talkers than that for CI talkers. Finally, Chapter 6 presents overall conclusions with contributions of the dissertation, and a discussion of possible directions for future work. The two key research aims pertaining to Study 1 and Study 2 respectively were to: 1) examine the brain electrical activity and brain networks underlying the perception of ENV and TFS information as compared to processing the original speech itself and thereby investigating the relative role of ENV and TFS in speech perception in normal-hearing listeners and 2) to determine how well the envelope of speech is represented neurophysiologically by objectively quantifying the cortical tracking of speech envelope and to show how this cortical tracking of speech envelope differentiate between the sample of speech produced by CI talkers and NH talkers in relation to speech’s perceived sound quality. The findings together from Study 1 and Study 2 provide insight into the neural mechanisms involved in the cortical processing of ENV and TFS of continuous speech and its perceived sound quality in normal-hearing listeners.