Browsing by Author "Tan, Chin-Tuan"

Now showing 1 - 7 of 7

A Speech Processing Strategy Based on Sinusoidal Speech Model for Cochlear Implant Users
(Institute of Electrical and Electronics Engineers Inc.) Lee, Sungmin; Akbarzadeh, Sara; Singh, Satnam; Tan, Chin-Tuan; 0000-0002-4676-4917 (Tan, C-T); 78509491 (Tan, C-T); Lee, Sungmin; Akbarzadeh, Sara; Singh, Satnam; Tan, Chin-Tuan
In sinusoidal modeling (SM), speech signal, which is pseudo-periodic in structure, can be approximated by sinusoids and noise without losing significant speech information. A speech processing strategy based on this sinusoidal speech model will be relevant for encoding electric pulse streams in cochlear implant (CI) processing, where the number of channels available is limited. In this study, 5 normal hearing (NH) listeners and 2 CI users were asked to perform the task of speech recognition and perceived sound quality rating on speech sentences processed in 12 different test conditions. The sinusoidal analysis/synthesis algorithm was limited to 1, 3 or 6 sinusoids from the sentences low-pass filtered at either 1 kHz, 1.5 kHz, 3 kHz, or 6 kHz, re-synthesized as the test conditions. Each of 12 lists of AzBio sentences was randomly chosen and process with one of 12 test conditions, before they were presented to each participant at 65 dB SPL (Sound Pressure Level). Participant was instructed to repeat the sentence as they perceived, and the number of words correctly recognized was scored. They were also asked to rate the perceived sound quality of the sentences including original speech sentence, on the scale of 1 (distorted) to 10 (clean). Both speech recognition score and perceived sound quality rating across all participants increase when the number of sinusoids increases and low-pass filter broadens. Our current finding showed that three sinusoids may be sufficient to elicit the nearly maximum speech intelligibility and quality necessary for both NH and CI listeners. Sinusoidal speech model has the potential in facilitating the basis for a speech processing strategy in CI. ©2018 APSIPA.
Advances in Methodologies Using EEG to Characterize the Cortical Processing of Speech and Its Perceived Sound Quality
(2022-12-01T06:00:00.000Z) Raghavendra, Shruthi; Tan, Chin-Tuan; Hansen, John H. L.; Marcus, Andrian; Martin, Brett A.; Nourani, Mehrdad; Assmann, Peter F.
Speech perception is dependent on access to the amplitude, spectral and temporal information in speech. This dissertation focuses on the temporal structure of speech, which consists of a slow- varying amplitude (temporal envelope, ENV) and a rapid-varying frequency (temporal fine structure, TFS). Past studies on speech perception [for review, see Lorenzi and Moore 2008] suggest ENV alone is sufficient for speech perception in quiet and TFS alone is used to segregate speech from the background noise (e.g., a competing talker scenario). It has been shown that the reduction in subjective quality ratings obtained through behavioral quality assessment is correlated to the degree of degradation in the temporal envelope. However, the neural correlates of sound quality perception with continuous speech are still unclear. This dissertation explores two complementary research goals proposed as studies which consider speech perception as it relates to ENV and TFS and its sound quality perception. The dissertation is comprised of two studies: Study 1 attempts to characterize the cortical processing of speech and Study 2 attempts to characterize the perceived sound quality in normal- hearing listeners. First, the overall introduction to both studies is provided in Chapter 1. Next, we lay out the background of both studies in detail in Chapter 2. Chapter 3 presents Study 1 of this dissertation and investigates the role and relative contribution of ENV and TFS to speech perception in normal hearing listeners in quiet. The synchronization between brain oscillations at different frequency bands is commonly used as a marker for the key mechanisms in coordinating neural dynamics for different temporal and spatial domains [Canolty and Knight 2010]. When neural oscillations of two different frequency bands synchronize, their “peak” frequencies usually exhibit a harmonic relationship. A recent study [Rodriguez and Alaerts 2019] showed a prominent occurrence of this 2:1 harmonic cross-frequency relationship between alpha (8-14 Hz) and theta (4-8 Hz) rhythms when task-relevant efficient cognitive processing is engaged. Study 1 examined this power-power cross-frequency coupling (CFC) between alpha (8-14 Hz) and theta (4-8 Hz) and also between gamma (30-100 Hz) and theta frequency bands of cortical activity in normal- hearing listeners using electroencephalography (EEG) signals when processing ENV and TFS of speech. The results showed a relatively increased CFC when listening to ENV alone. This finding may suggest more synchrony across different frequency bands of cortical activity in processing ENV than TFS. Recent studies have shown that cortical activity basically tracks the envelope of continuous natural speech, which could potentially serve as a useful method to study the underlying processes for speech perception. Study 2 of the dissertation, presented in Chapters 4 & 5) investigates the differences in cortical entrainment to the envelope of speech spoken by cochlear implant (CI) talkers (degraded speech) and normal-hearing (NH) talkers. Although, a CI may help individuals with hearing loss to restore or improve the ability to hear and provide the auditory feedback necessary for improved speech production, speech produced by CI users is mostly abnormal compared to normal hearing individuals (Gautam et al., 2019). The motivation is to achieve a metric to assess “how well” hard-of-hearing talkers have spoken and the auditory feedback they received in their current aural compensation. The results showed higher perceived sound quality and closer tracking of speech envelope in normal-hearing listeners when listening to a sample of speech produced by NH talkers than that for CI talkers. Finally, Chapter 6 presents overall conclusions with contributions of the dissertation, and a discussion of possible directions for future work. The two key research aims pertaining to Study 1 and Study 2 respectively were to: 1) examine the brain electrical activity and brain networks underlying the perception of ENV and TFS information as compared to processing the original speech itself and thereby investigating the relative role of ENV and TFS in speech perception in normal-hearing listeners and 2) to determine how well the envelope of speech is represented neurophysiologically by objectively quantifying the cortical tracking of speech envelope and to show how this cortical tracking of speech envelope differentiate between the sample of speech produced by CI talkers and NH talkers in relation to speech’s perceived sound quality. The findings together from Study 1 and Study 2 provide insight into the neural mechanisms involved in the cortical processing of ENV and TFS of continuous speech and its perceived sound quality in normal-hearing listeners.
Analyzing Auditory Evoked Cortical Response to Noise-Suppressed Speech in Cochlear Implant Users Using Mismatch Negativity
(IEEE Computer Society, 2019-03) Yu, F.; Tan, Chin-Tuan; Chen, F.; 0000-0002-4676-4917 (Tan, C-T); 78509491 (Tan, C-T); Tan, Chin-Tuan
Speech perception in background noise remains a challenge in cochlear implant (CI) users, and noise-suppression processing (e.g., Wiener filtering) has been commonly utilized to improve speech perception for CI users. It is crucial to objectively examine the perception of the noise-suppressed speech in CI users. The purpose of this work was to investigate whether the mismatch negativity (MMN) response could objectively assess the quality of the noise-suppressed speech as perceived by CI users. A vowel /a/ stimulus was masked by a steady-state noise, creating two noisy stimuli at two signal-to-noise ratios (SNRs) of -5 and +5 dB. The two noisy stimuli were processed by Wiener filtering. Electroencephalogram data obtained from 7 CI users who participated in an auditory oddball paradigm was analyzed to extract the MMN. The two noise-suppressed stimuli served as the deviant stimuli and the clean vowel stimulus as the standard stimulus. Experimental results showed that the noise-suppressed stimuli at -5 dB SNR evoked a larger MMN amplitude than that at +5 dB SNR, accounting for the effect of SNR level on the auditory evoked cortical response to the noise-suppressed speech. The MMN may be potentially used as an objective biomarker to evaluate the perception of the noise-suppressed speech in CI users. © 2019 IEEE.
Implication of Speech Level Control in Noise to Sound Quality Judgement
(Institute of Electrical and Electronics Engineers Inc.) Akbarzadeh, Sara; Lee, Sungman; Singh, Satnam; Tan, Chin-Tuan; 0000-0002-4676-4917 (Tan, C-T); 78509491 (Tan, C-T); Akbarzadeh, Sara; Lee, Sungman; Singh, Satnam; Tan, Chin-Tuan
Relative levels of speech and noise, which is signal-to-noise ratio (SNR), alone as a metric may not fully account how human perceives speech in noise or making judgement on the sound quality of the speech component. To date, the most common rationale in front-end processing of noisy speech in assistive hearing devices is to reduce 'noise' (estimated) with a sole objective to improve the overall SNR. Absolute sound pressure level of speech in the remaining noise, which is necessary for listeners to anchor their perceptual judgement, is assumed to be restored by the subsequent dynamic range compression stage intended to compensate for the loudness recruitment in hearing impaired (HI). However, un-coordinated setting of thresholds that trigger the nonlinear processing in these two separate stages, amplify the remaining 'noise' and/or distortion instead. This will confuse listener's judgement of sound quality and deviate from the usual perceptual trend as one would expect when more noise was present. In this study, both normal hearing (NH) and HI listeners were asked to rate the sound quality of noisy speech and noise reduced speech as they perceived. The result found that speech processed by noise reduction algorithms were lower in quality compared to original unprocessed speech in noise conditions. The outcomes also showed that sound quality judgement was dependent on both input SNR and absolute level of speech, with a greater weightage on the latter, across both NH and HI listeners. The outcome of this study potentially suggests that integrating the two separate processing stages into one will better match with the underlying mechanism in auditory reception of sound. Further work will attempt to identify settings of these two processing stages for a better speech reception in assistive hearing device users. ©2018 APSIPA.
Neural Entrainment to Speech Envelope in Response to Perceived Sound Quality
(IEEE Computer Society, 2019-03) Ngo, Dat Quoc; Oliver, Garret; Tcheslavski, Gleb; Tan, Chin-Tuan; 0000-0002-4676-4917 (Tan, C-T); 78509491 (Tan, C-T); Ngo, Dat Quoc; Oliver, Garret; Tcheslavski, Gleb; Tan, Chin-Tuan
The extent, to which people listen to and perceive the speech content at different noise levels varies from individual to individual. In past research projects, the speech intelligibility was determined by rating assessment, which suffered from variation of subjects' physical features. The purpose of this study is to investigate electroencephalography (EEG) by implementing multi-variate Temporal Response Function (mTRF) to examine neural responses to speech stimuli at different sound and noise levels. The result of this study shows that the front-central area of the brain clearly shows the envelope entrainment to speech stimuli. ©2019 IEEE.
Speech Perception of Hearing-impaired Listeners in Challenging Listening Environments and Personalization of Hearing Assistive Devices via Inverse Reinforcement Learning
(2022-05-01T05:00:00.000Z) Akbarzadeh, Sara; Kehtarnavaz, Nasser; Tan, Chin-Tuan; Auciello, Orlando; Lobarinas, Edward; Tamil, Lakshman
Listening to a speech in presence of a noise has always been a challenge specially for individuals with hearing impairment. There are many aspects that needs to be considered in designing and fitting of hearing assistive devices to provide users with a more preferred hearing experience or increase the perceived quality of speech and decrease the listening effort. This dissertation focuses on this topic in two major research thrust. In the first research thrust, the speech perception of hearing impaired listeners has been studied in challenging hearing environments. Behavioral and electrophysiological experiments have been designed to evaluate the effect of speech and noise levels on the perceived quality of speech and selective auditory attention in normal hearing and hearing impaired listeners. The perception of degraded speeches in normal hearing and hearing impaired listeners have been measured and the differences between the hearing patterns in these groups have been described. It has been shown that to achieve an optimal hearing experience, the listener’s hearing situation should be taken into account. In the second research thrust, the maximum likelihood inverse reinforcement learning approach has been followed to develop an algorithm to personalize the hearing aids fitting in an online manner. The results of the experiments conducted on subjects with hearing loss demonstrates the outperformance of the developed personalized setting over the standard prescriptive setting.
Wavelet Scattering Transform for Variability Reduction in Cortical Potentials Evoked by Pitch Matched Electro-Acoustic Stimulation in Unilateral Cochlear Implant Patients
(Institute of Electrical and Electronics Engineers Inc.) Heydarzadeh, Mehrdad; Akbarzadeh, Sara; Tan, Chin-Tuan; 0000-0002-4676-4917 (Tan, C-T); 78509491 (Tan, C-T); Heydarzadeh, Mehrdad; Akbarzadeh, Sara; Tan, Chin-Tuan
Cochlear implant (CI) restores the hearing sensation in profoundly deafen patients by directly stimulating auditory nerve with electric pulses using an array of tonotopically inserted electrodes. Basal electrodes stimulate in response to high input frequencies while apical electrodes stimulate to low input frequencies. The problem with this electrical stimulation, particularly in unilaterally implanted users who has residual hearing in the contra-lateral ear, lies in the frequency mismatch between characteristic frequency of auditory nerve and input signal. In this paper, we revisit our previously proposed mechanism for tuning intra-cochlear electrode to its pitch matched frequency using a single channel EEG [1]. We apply the wavelet scattering transform to extract a deformation invariant from the EEG signal recorded from each of 10 CI subjects when they were listening to pitch matched electro-acoustic stimulation. Results show that the wavelet scattering transform is able to capture the variability introduced by different subjects, and a more robust alternative to reveal the underlying neuro-physiological responses to this perceptual event. ©2018 APSIPA