Robust Speaker Modeling in Non-Neutral Environments with Application to Large Scale Multi-Speaker Audio Streams

Yu, Chengzhu

Robust Speaker Modeling in Non-Neutral Environments with Application to Large Scale Multi-Speaker Audio Streams

dc.contributor.advisor	Hansen, John H. L.
dc.creator	Yu, Chengzhu
dc.date.accessioned	2017-09-08T04:48:51Z
dc.date.available	2017-09-08T04:48:51Z
dc.date.created	2017-08
dc.date.issued	2017-08
dc.date.submitted	August 2017
dc.date.updated	2017-09-08T04:48:51Z
dc.description.abstract	With an explosive increase in the amount of multimedia content available worldwide and through the web, automatically detecting who spoke when in an audio stream is an important technique that has many practical applications. The task of automatically annotating speech segments with speaker labels could be considered as either a speaker recognition or speaker diarization problem depending on whether the voice samples of the speakers are available as a priori knowledge. Despite the differences, the success of both speaker recognition and speaker diarization hinge on accurate and robust modeling of speaker voice characteristics. Over the past several decades, the technology of statistical speaker modeling has achieved signiﬁcant advancements. However, the applications of speaker modeling technology in real world by means of speaker recognition and speaker diarization has considerably limited performance. In this dissertation, we investigate the applications of speaker recognition and speaker diarization on The National Aeronautics and Space Administration (NASA) Apollo-11 mission audio corpus to advance their performance in practical applications. In the ﬁrst part of this dissertation, we focus on understanding the problems and challenges of applying speaker recognition techniques on a subset of the Apollo-11 space-to-ground audio corpus to automatically recognize all three astronauts. Speciﬁcally, we investigate the variations of astronauts voices characteristics across different phases of the lunar mission and their impact on speaker recognition performance. In the second part of this dissertation, we focus on the development of robust speaker recognition and diarization algorithms. We illustrate the challenge of applying speaker diarization techniques on multi-speaker naturalistic audio streams such as Apollo-11 mission control center (MCC) audio corpus, and propose active learning based algorithms to effectively incorporate limited human effort in the current speaker diarization process. Moreover, we propose several robust speaker modeling techniques that improve speaker recognition in generally mismatched or noisy environments. Lastly, the application of speaker recognition and speaker diarization for conversation analysis on the Apollo-11 MCC audio corpus is discussed. This dissertation therefore advances speech and language technology to address diarization of multi-speaker naturalistic audio streams for real task oriented teams. It is expected that these advancements will contribute signiﬁcantly for research on human-to-human voice interaction for team oriented tasks in business, social, government, and security applications.
dc.format.mimetype	application/pdf
dc.identifier.uri	http://hdl.handle.net/10735.1/5502
dc.language.iso	en
dc.rights	Copyright ©2017 is held by the author. Digital access to this material is made possible by the Eugene McDermott Library. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
dc.subject	Automatic speech recognition
dc.subject	Voice frequency
dc.subject	Voiceprints
dc.subject	Streaming audio
dc.title	Robust Speaker Modeling in Non-Neutral Environments with Application to Large Scale Multi-Speaker Audio Streams
dc.type	Dissertation
dc.type.material	text
thesis.degree.department	Electrical Engineering
thesis.degree.grantor	University of Texas at Dallas
thesis.degree.level	Doctoral
thesis.degree.name	PHD

Files

Original bundle

Now showing 1 - 1 of 1

Name:: ETD-5608-7449.64.pdf
Size:: 14.55 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 2 of 2

Name:: LICENSE.txt
Size:: 1.84 KB
Format:: Plain Text
Description:

Download

Name:: PROQUEST_LICENSE.txt
Size:: 5.84 KB
Format:: Plain Text
Description:

Download

Collections

UTD Theses and Dissertations