Robust Speaker Modeling in Non-Neutral Environments with Application to Large Scale Multi-Speaker Audio Streams

dc.contributor.advisorHansen, John H. L.
dc.creatorYu, Chengzhu
dc.date.accessioned2017-09-08T04:48:51Z
dc.date.available2017-09-08T04:48:51Z
dc.date.created2017-08
dc.date.issued2017-08
dc.date.submittedAugust 2017
dc.date.updated2017-09-08T04:48:51Z
dc.description.abstractWith an explosive increase in the amount of multimedia content available worldwide and through the web, automatically detecting who spoke when in an audio stream is an important technique that has many practical applications. The task of automatically annotating speech segments with speaker labels could be considered as either a speaker recognition or speaker diarization problem depending on whether the voice samples of the speakers are available as a priori knowledge. Despite the differences, the success of both speaker recognition and speaker diarization hinge on accurate and robust modeling of speaker voice characteristics. Over the past several decades, the technology of statistical speaker modeling has achieved significant advancements. However, the applications of speaker modeling technology in real world by means of speaker recognition and speaker diarization has considerably limited performance. In this dissertation, we investigate the applications of speaker recognition and speaker diarization on The National Aeronautics and Space Administration (NASA) Apollo-11 mission audio corpus to advance their performance in practical applications. In the first part of this dissertation, we focus on understanding the problems and challenges of applying speaker recognition techniques on a subset of the Apollo-11 space-to-ground audio corpus to automatically recognize all three astronauts. Specifically, we investigate the variations of astronauts voices characteristics across different phases of the lunar mission and their impact on speaker recognition performance. In the second part of this dissertation, we focus on the development of robust speaker recognition and diarization algorithms. We illustrate the challenge of applying speaker diarization techniques on multi-speaker naturalistic audio streams such as Apollo-11 mission control center (MCC) audio corpus, and propose active learning based algorithms to effectively incorporate limited human effort in the current speaker diarization process. Moreover, we propose several robust speaker modeling techniques that improve speaker recognition in generally mismatched or noisy environments. Lastly, the application of speaker recognition and speaker diarization for conversation analysis on the Apollo-11 MCC audio corpus is discussed. This dissertation therefore advances speech and language technology to address diarization of multi-speaker naturalistic audio streams for real task oriented teams. It is expected that these advancements will contribute significantly for research on human-to-human voice interaction for team oriented tasks in business, social, government, and security applications.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10735.1/5502
dc.language.isoen
dc.rightsCopyright ©2017 is held by the author. Digital access to this material is made possible by the Eugene McDermott Library. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
dc.subjectAutomatic speech recognition
dc.subjectVoice frequency
dc.subjectVoiceprints
dc.subjectStreaming audio
dc.titleRobust Speaker Modeling in Non-Neutral Environments with Application to Large Scale Multi-Speaker Audio Streams
dc.typeDissertation
dc.type.materialtext
thesis.degree.departmentElectrical Engineering
thesis.degree.grantorUniversity of Texas at Dallas
thesis.degree.levelDoctoral
thesis.degree.namePHD

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ETD-5608-7449.64.pdf
Size:
14.55 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description: