Advancements in Statistical Signal Processing and Machine Learning for Speech Enhancement
MetadataShow full item record
This dissertation focuses on advancing the statistical signal processing and machine learning technologies in speech enhancement tasks. Statistical signal processing and machine learning are the most important and widely used technologies in investigations of speech areas. Although both of the two group technologies have achieved significant success in speech enhancement tasks, they are usually applied independently in the existing solutions. In this dissertation we verified the idea that the combination of techniques from the two domains could further boost the effectiveness in enhancement performance by applying them corporately in two speech enhancement problems. In the first investigated problem, estimating frequency shift estimation in single sideband speech, the existing methods, exclusively based on signal processing, suffered the non-uniqueness issue caused by the periodic characteristic of the voiced speech. To address this issue, a pre-step of uniqueness interval detection is proposed based on the analysis of origins of estimation errors. Three machine learning techniques, GMM-SVM, i-Vector and stacked Autoencoder, are adopted to detect the uniqueness interval in the first step. A unique feature specially designed to represent the frequency shift property is developed as the input of classifiers . Experimental results verifies the effect of introducing machine learning techniques to the existing solution. In the second investigated problem, speech denoising, the linear spectral pair (LSP), an efficient representation for speech which encodes the information of spectral formant location and bandwidth, was adopted as the input of DNN regression architecture along with the conventional log-spectra features. To capture the dynamic property in speech signal, the first and second derivatives of LSPs are also used. In addition, Auto-LSP, an efficient iterative denoising algorithm is applied as a post-processing to further promote the enhancing result. The effectiveness of the proposed feature and post-processing operation was confirmed by the denoising results in terms of three objective criteria. Collectively, the contributions made in these two speech enhancement topics supported the idea that the advantages of signal processing and machine learning could compliment each other to improve the performance of the overall enhancement techniques.