Real-Time Single and Dual-Channel Speech Enhancement on Edge Devices for Hearing Applications
Abstract
Abstract
Speech Enhancement (SE) is an important module in the signal processing pipeline for
hearing applications and it helps enhance the comfort of listening. Many single and dualmicrophone SE techniques have been developed by researchers over the last few decades.
In this thesis, novel single and dual-channel SE techniques have been proposed and are implemented on edge devices as an assistive tool for hearing applications. The smartphone
is considered as the processing platform for real-time implementation and testing. In this
work, both statistical signal processing and deep learning algorithms are proposed for SE.
Firstly, we compare different two-channel beamformers for SE. Later, the Minimum Variance
Distortionless Response (MVDR) beamformer assisted by a voice activity detector (VAD) is
used as a Signal to Noise Ratio (SNR) booster for the SE method. Deep neural network architectures comprising of convolutional neural network (CNN) and recurrent neural network
(RNN) layers are proposed in this thesis for real-time SE. Finally to filter out background
noise, the SE gain estimation for noisy speech mixture is smoothed along the frequency
axis by a Mel filter-bank, resulting in a Mel-warped frequency-domain gain estimation. In
comparison with existing SE methods, objective assessment and subjective results of the
developed methods indicate substantial improvements in speech quality and intelligibility.