Perception of Novel Sounds in the Presence of Background Noise : Comparison Between Individuals with Normal Hearing, Cochlear Implant Users, and Recurrent Neural Networks



Journal Title

Journal ISSN

Volume Title



The goal of this dissertation is to investigate how listeners and learning machines cope with the ambiguity caused by interfering multiple novel sound sources. Starting from an ambiguous auditory scene with competing sound sources, this dissertation investigates how a particular sound source draws listeners’ attention while the remaining sources lose their salience and become background (noise). Listeners’ perception of competing novel sounds is investigated in a series of experiments that varied in terms of listening conditions, simulating the difficulties experienced by hearing-impaired individuals in noise. In Chapter 1, the mechanisms behind listeners' perception of speech in the presence of competing sounds are reviewed. Chapter 2 describes three experiments that investigated the recognition of novel sounds in the presence of background noise. The chapter begins with a replication of a previous study, providing evidence that listeners can segregate a novel target sound from the competing distractor only if it repeats across different distractors. A subsequent experiment tested the hypothesis that listeners’ ability to detect change in a sound depends on their knowledge of its source, which is gained via repetition. It is concluded that listeners are able to perceptually learn patterns of the repeating target while suppressing the changes in the masker stream. Two neural network architectures previously employed to study mechanisms of learning, generalized Hebbian and anti-Hebbian, are evaluated. It is shown that the generalized Hebbian learning network produces similar results to those obtained from the listeners. Experiments in Chapter 3 provide evidence that recognition of a novel target sound becomes robust against new (unheard) distractors when listeners go through an exposure stage in which the target is presented repeatedly across multiple distractors. Chapter 3 concludes by reporting experiments 3-2 and 3-3 that investigated recognition of consonant-vowel-consonant-vowel (CVCV) words in the presence of novel distractors. Experiment 3-2 showed that upon exposing the listeners to target tokens across multiple distractors, the process of learning new CVCV tokens shifts from context-specificity to an adaptation-plus-prototype mechanism. The goal in experiment 3-3 was to investigate whether or not cochlear implant users, who have limited spectral resolution, would show the same behavior as listeners with normal hearing in experiment 3-2. The main goal in Chapter 4 is to investigate the extent to which the findings in experiment 3-2 can be replicated by recurrent neural networks (RNNs). This chapter begins with a brief introduction to RNNs and long short-term memories (LSTMs). In experiment 4-1 a recurrent LSTM auto-encoder was trained to reconstruct an input CVCV target when mixed with a distractor with or without the presence of a context sequence prior to the input. It was shown that the network could reconstruct the input with better accuracy when the context sequence contained the repeating CVCV target across multiple distractors. Furthermore, similar to the findings in experiment 3-2, the presence of such a context sequence improved the network’s generalizability to unseen data (novel distractors). Experiment 4-2 showed that the presence of the context sequence led to an improved semi-supervised speech enhancement algorithm that recovered the target CVCV tokens while suppressing the distractors.



Cochlear implants, Noise, Deafness, Neural networks (Neurobiology), Speech