Advancements in Acoustic Based Language Identification/Recognition

dc.contributor.advisorHansen, John H.L.
dc.creatorZhang, Qian
dc.date.accessioned2017-09-08T07:44:45Z
dc.date.available2017-09-08T07:44:45Z
dc.date.created2017-08
dc.date.issued2017-08
dc.date.submittedAugust 2017
dc.date.updated2017-09-08T07:44:46Z
dc.description.abstractWith over 6,000 languages spoken worldwide, effective language recognition(LR) is needed prior to employing any speech technologies. Language identification (LID) is essential in speech pre-processing which is typically followed by automatic speech recognition or target speech post-processing. There are closed-set and open-set LID tasks according to the specific test condition. In real scenarios, closed-set robust language identification is usually hindered by mismatch factors such as background noise, channel, and speech duration. In addition, unknown/out-of-set (OOS) language rejection is another major challenge for open-set LID because of the increased cost/resources necessary in collecting effective OOS data. To address the close-set LID problem, this dissertation focuses on advancements based on diverse acoustic features and back-ends, and their influence on LID system fusion. A set of distinct acoustic features are considered, which are grouped into three categories: classical features, innovative features, and extensional features. In addition, both front-end concatenation and back-end fusion are considered. The results suggest that no single feature type is universally vital across all LID tasks and that a fusion of a diverse set is needed to ensure sustained LID performance in challenging scenarios. More specifically, the proposed hybrid fusion method improves LID system performance by +38.5% and +46.2% on the highly noisy DARPA RATS dataset and the large scale NIST LRE-09 dataset, respectively. To address a related scenario, for closely spaced dialect identification, two types of unsupervised deep learning methods are introduced for feature extraction. First, an unsupervised bottleneck feature extraction diagram is proposed, which is derived from the traditional bottleneck structure but trained with estimated phonetic label knowledge. Secondly, two types of latent variable learning algorithms are introduced to speech feature processing based on generative modeling auto-encoder. Compared with the baseline MFCC i-Vector system, the proposed methods can achieve up to a relative 58% performance improvement for a 4-way Chinese dialect corpus. For open-set LID, we propose three effective and flexible OOS candidate selection methods in order to boost OOS language rejection and improve overall classification performance. Specifically, two selection strategies are proposed at the front-end feature level, (i) k-means clustering selection and (ii) complementary candidate selection with a minimum Kullback-Leibler divergence versus the closed-set as a baseline. In addition, a (iii) general candidate selection method is proposed according to an engineering perspective based language relationship, which is explored based on the back-end score vectors of each language. With these proposed selection methods, data enhancement will be more effective and efficient than that based on an alternative baseline random selection option. To the best of our knowledge, this is the first major effort on effective OOS language selection to improve OOS rejection in open-set LID. As speech technology is employed in more diverse consumer, commercial, government, social, and global human engagement scenarios, advancing effective LR is needed as individual language diversity expanded for voice engagement and communication/electronic interaction.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10735.1/5505
dc.language.isoen
dc.rightsCopyright ©2017 is held by the author. Digital access to this material is made possible by the Eugene McDermott Library. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
dc.subjectAutomatic speech recognition
dc.subjectLanguage and languages
dc.subjectComputer sound processing
dc.titleAdvancements in Acoustic Based Language Identification/Recognition
dc.typeDissertation
dc.type.materialtext
thesis.degree.departmentElectrical Engineering
thesis.degree.grantorUniversity of Texas at Dallas
thesis.degree.levelDoctoral
thesis.degree.namePHD

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ETD-5608-7379.36.pdf
Size:
17.74 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description: