Compensation of SNR and Noise Type Mismatch using an Environmental Sniffing Based Speech Recognition Solution

Chung, Y.; Hansen, John H. L.

Compensation of SNR and Noise Type Mismatch using an Environmental Sniffing Based Speech Recognition Solution

dc.contributor.ISNI	0000 0001 1604 5383 (Hansen, JHL)
dc.contributor.LCNA	92101568‏ (Hansen, JHL)
dc.contributor.author	Chung, Y.	en_US
dc.contributor.author	Hansen, John H. L.	en_US
dc.date.accessioned	2014-07-02T20:45:22Z
dc.date.available	2014-07-02T20:45:22Z
dc.date.created	2013-06-20	en_US
dc.date.submitted	2012-11-21	en_US
dc.description.abstract	Multiple-model based speech recognition (MMSR) has been shown to be quite successful in noisy speech recognition. Since it employs multiple hidden Markov model (HMM) sets that correspond to various noise types and signal-to-noise ratio (SNR) values, the selected acoustic model can be closely matched with the test noisy speech, which leads to improved performance when compared with other state-of-the-art speech recognition systems that employ a single HMM set. However, as the number of HMM sets is usually limited due to practical considerations as well as effective model selection, acoustic mismatch can still be a problem in MMSR. In this study, we proposed methods to improve recognition performance by mitigating the mismatch in SNR and noise type for an MMSR solution. For the SNR mismatch, an optimal SNR mapping between the test noisy speech and the HMM was determined by experimental investigation. Improved performance was demonstrated by employing the SNR mapping instead of using the estimated SNR of the test noisy speech directly. We also proposed a novel method to reduce the effect of noise type mismatch by compensating the test noisy speech in the log-spectrum domain. We first derive the relation between the log-spectrum vectors in the test and training noisy speech. Since the relation is a non-linear function of the speech and noise parameters, the statistical information regarding the testing log-spectrum vectors was obtained by approximation using vector Taylor series (VTS) algorithm. Finally, the minimum mean square error estimation of the training log-spectrum vectors was used to reduce the mismatch between the training and test noisy speech. By employing the proposed methods in the MMSR framework, relative word error rate reduction of 18.7% and 21.3% was achieved on the Aurora 2 task when compared to a conventional MMSR and multi-condition training (MTR) method, respectively.	en_US
dc.identifier.citation	Chung, Y., and J. H. L. Hansen. 2013. "Compensation of SNR and noise type mismatch using an environmental sniffing based speech recognition solution." 2013(12): 1-14.	en_US
dc.identifier.issn	1687-4714	en_US
dc.identifier.issue	12	en_US
dc.identifier.uri	http://hdl.handle.net/10735.1/3627
dc.identifier.volume	2013	en_US
dc.language.iso	en	en_US
dc.relation.uri	http://dx.doi.org/10.1186/1687-4722-2013-12	en_US
dc.rights	CC BY 4.0 (Attribution)	en_US
dc.rights	©2013 The Authors.	en_US
dc.source.journal	EURASIP Journal on Audio, Speech, and Music Processing	en_US
dc.subject	Environmental sniffing	en_US
dc.subject	Multiple-model frame	en_US
dc.subject	Noise robustness	en_US
dc.subject	Speech perception	en_US
dc.subject	Signal-To-Noise Ratio	en_US
dc.title	Compensation of SNR and Noise Type Mismatch using an Environmental Sniffing Based Speech Recognition Solution	en_US
dc.type	text	en_US
dc.type.genre	article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: ECS-FR-JHLHansen-309486.75.pdf
Size:: 1.17 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: SpringerOpen.pdf
Size:: 45.37 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Hansen, John H. L.