Robust acoustic modeling and front-end design for distant speech recognition

dc.contributor.advisorHansen, John H.L.
dc.creatorMirsamadi, Seyedmahdad
dc.creator.orcid0000-0002-4810-5632
dc.date.accessioned2018-03-28T20:10:28Z
dc.date.available2018-03-28T20:10:28Z
dc.date.created2017-12
dc.date.issued2017-12
dc.date.submittedDecember 2017
dc.date.updated2018-03-28T20:10:28Z
dc.description.abstractIn recent years, there has been a significant increase in the popularity of voice-enabled technologies which use human speech as the primary interface with machines. Recent advancements in acoustic modeling and feature design have increased the accuracy of Automatic Speech Recognition (ASR) to levels that enable voice interfaces to be used in many applications. However, much of the current performance is dependent on the use of close-talking microphones, (i.e., scenarios in which the user speaks directly into a hand-held or body-worn microphone). There is still a rather large performance gap experienced in distant-talking scenarios in which speech is recorded by far-field microphones that are placed at a distance from the speaker. In such scenarios, the distorting effects of distance (such as room reverberation and environment noise) make the recognition task significantly more challenging. In this dissertation, we propose novel approaches for designing a distant-talking ASR front-end as well as training robust acoustic models to reduce the existing gap between far-field and close-talking ASR performance. Specifically, we i) propose a novel multi-channel front-end enhancement algorithm for improved ASR in reverberant rooms using distributed non-uniform microphone arrays with random unknown locations; ii) propose a novel neural network model training approach using adversarial training to improve the robustness of multi-condition acoustic models that are trained directly on far-field data; iii) study alternate neural network adaptation strategies for far-field adaptation to the acoustic properties of specific target environments. Experimental results are provided based on far-field benchmark tasks and datasets which demonstrate the effectiveness of the proposed approaches for increasing far-field robustness in ASR. Based on experiments using reverberated TIMIT sentences, the proposed multi-channel front-end provides WER improvements of +21.5% and +37.7% in two-channel and four-channel scenarios over a single-channel scenario in which the channel with best signal quality is selected. On the acoustic modeling side and based on results of experiments on AMI corpus, the proposed multi-domain training approach provides a relative character error rate reduction of +3.3% with respect to a conventional multi-condition trained baseline, and +25.4% with respect to a clean-trained baseline.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10735.1/5673
dc.language.isoen
dc.rightsCopyright ©2017 is held by the author. Digital access to this material is made possible by the Eugene McDermott Library. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
dc.subjectAutomatic speech recognition
dc.subjectMicrophone arrays
dc.subjectNeural networks (Computer science)
dc.subjectSound—Reverberation
dc.subjectAcoustical engineering
dc.titleRobust acoustic modeling and front-end design for distant speech recognition
dc.typeDissertation
dc.type.materialtext
thesis.degree.departmentElectrical Engineering
thesis.degree.grantorThe University of Texas at Dallas
thesis.degree.levelDoctoral
thesis.degree.namePHD

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ETD-5608-7449.33.pdf
Size:
2.72 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.85 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.85 KB
Format:
Plain Text
Description: