Robust I-Vector Extraction for Neural Network Adaptation in Noisy Environment

Date

ORCID

Journal Title

Journal ISSN

Volume Title

Publisher

International Speech and Communication Association

item.page.doi

Abstract

In this study, we explore an i-vector based adaptation of deep neural network (DNN) in noisy environment. We first demonstrate the importance of encapsulating environment and channel variability into i-vectors for DNN adaptation in noisy conditions. To be able to obtain robust i-vector without losing noise and channel variability information, we investigate the use of parallel feature based i-vector extraction for DNN adaptation. Specifically, different types of features are used separately during two different stages of i-vector extraction namely universal background model (UBM) state alignment and i-vector computation. To capture noise and channel-specific feature variation, the conventional MFCC features are still used for i-vector computation. However, much more robust features such as Vector Taylor Series (VTS) enhanced as well as bottleneck features are exploited for UBM state alignment. Experimental results on Aurora-4 show that the parallel feature-based i-vectors yield performance gains of up to 9.2% relative compared to a baseline DNN-HMM system and 3.3% compared to a system using conventional MFCC-based i-vectors.

Description

Keywords

Acoustic models, Automatic speech recognition, Neural networks (Computer science), Vector analysis, Noise, Speech processing

item.page.sponsorship

Rights

©2015 ISCA

Citation