Robust I-Vector Extraction for Neural Network Adaptation in Noisy Environment

Yu, Chengzhu; Ogawa, A.; Delcroix, M.; Yoshioka, T.; Nakatani, T.; Hansen, John H. L.

Robust I-Vector Extraction for Neural Network Adaptation in Noisy Environment

Files

JECS-3626-4714.65.pdf (210.5 KB)

Authors

Publisher

International Speech and Communication Association

URI

http://hdl.handle.net/10735.1/5109

Abstract

In this study, we explore an i-vector based adaptation of deep neural network (DNN) in noisy environment. We first demonstrate the importance of encapsulating environment and channel variability into i-vectors for DNN adaptation in noisy conditions. To be able to obtain robust i-vector without losing noise and channel variability information, we investigate the use of parallel feature based i-vector extraction for DNN adaptation. Specifically, different types of features are used separately during two different stages of i-vector extraction namely universal background model (UBM) state alignment and i-vector computation. To capture noise and channel-specific feature variation, the conventional MFCC features are still used for i-vector computation. However, much more robust features such as Vector Taylor Series (VTS) enhanced as well as bottleneck features are exploited for UBM state alignment. Experimental results on Aurora-4 show that the parallel feature-based i-vectors yield performance gains of up to 9.2% relative compared to a baseline DNN-HMM system and 3.3% compared to a system using conventional MFCC-based i-vectors.

Keywords

Acoustic models, Automatic speech recognition, Neural networks (Computer science), Vector analysis, Noise, Speech processing

Rights

Collections

Hansen, John H. L.

Full item page

Robust I-Vector Extraction for Neural Network Adaptation in Noisy Environment

Files

Date

Authors

ORCID

Journal Title

Journal ISSN

Volume Title

Publisher

item.page.doi

URI

Abstract

Description

Keywords

item.page.sponsorship

Rights

Citation

Collections