Robust acoustic modeling and front-end design for distant speech recognition

Mirsamadi, Seyedmahdad

Robust acoustic modeling and front-end design for distant speech recognition

dc.contributor.advisor	Hansen, John H.L.
dc.creator	Mirsamadi, Seyedmahdad
dc.creator.orcid	0000-0002-4810-5632
dc.date.accessioned	2018-03-28T20:10:28Z
dc.date.available	2018-03-28T20:10:28Z
dc.date.created	2017-12
dc.date.issued	2017-12
dc.date.submitted	December 2017
dc.date.updated	2018-03-28T20:10:28Z
dc.description.abstract	In recent years, there has been a significant increase in the popularity of voice-enabled technologies which use human speech as the primary interface with machines. Recent advancements in acoustic modeling and feature design have increased the accuracy of Automatic Speech Recognition (ASR) to levels that enable voice interfaces to be used in many applications. However, much of the current performance is dependent on the use of close-talking microphones, (i.e., scenarios in which the user speaks directly into a hand-held or body-worn microphone). There is still a rather large performance gap experienced in distant-talking scenarios in which speech is recorded by far-field microphones that are placed at a distance from the speaker. In such scenarios, the distorting effects of distance (such as room reverberation and environment noise) make the recognition task significantly more challenging. In this dissertation, we propose novel approaches for designing a distant-talking ASR front-end as well as training robust acoustic models to reduce the existing gap between far-field and close-talking ASR performance. Specifically, we i) propose a novel multi-channel front-end enhancement algorithm for improved ASR in reverberant rooms using distributed non-uniform microphone arrays with random unknown locations; ii) propose a novel neural network model training approach using adversarial training to improve the robustness of multi-condition acoustic models that are trained directly on far-field data; iii) study alternate neural network adaptation strategies for far-field adaptation to the acoustic properties of specific target environments. Experimental results are provided based on far-field benchmark tasks and datasets which demonstrate the effectiveness of the proposed approaches for increasing far-field robustness in ASR. Based on experiments using reverberated TIMIT sentences, the proposed multi-channel front-end provides WER improvements of +21.5% and +37.7% in two-channel and four-channel scenarios over a single-channel scenario in which the channel with best signal quality is selected. On the acoustic modeling side and based on results of experiments on AMI corpus, the proposed multi-domain training approach provides a relative character error rate reduction of +3.3% with respect to a conventional multi-condition trained baseline, and +25.4% with respect to a clean-trained baseline.
dc.format.mimetype	application/pdf
dc.identifier.uri	http://hdl.handle.net/10735.1/5673
dc.language.iso	en
dc.rights	Copyright ©2017 is held by the author. Digital access to this material is made possible by the Eugene McDermott Library. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
dc.subject	Automatic speech recognition
dc.subject	Microphone arrays
dc.subject	Neural networks (Computer science)
dc.subject	Sound—Reverberation
dc.subject	Acoustical engineering
dc.title	Robust acoustic modeling and front-end design for distant speech recognition
dc.type	Dissertation
dc.type.material	text
thesis.degree.department	Electrical Engineering
thesis.degree.grantor	The University of Texas at Dallas
thesis.degree.level	Doctoral
thesis.degree.name	PHD

Files

Original bundle

Now showing 1 - 1 of 1

Name:: ETD-5608-7449.33.pdf
Size:: 2.72 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 2 of 2

Name:: LICENSE.txt
Size:: 1.85 KB
Format:: Plain Text
Description:

Download

Name:: PROQUEST_LICENSE.txt
Size:: 5.85 KB
Format:: Plain Text
Description:

Download

Collections

UTD Theses and Dissertations