Compensation of SNR and Noise Type Mismatch using an Environmental Sniffing Based Speech Recognition Solution

dc.contributor.ISNI0000 0001 1604 5383 (Hansen, JHL)
dc.contributor.LCNA92101568‏ (Hansen, JHL)
dc.contributor.authorChung, Y.en_US
dc.contributor.authorHansen, John H. L.en_US
dc.date.accessioned2014-07-02T20:45:22Z
dc.date.available2014-07-02T20:45:22Z
dc.date.created2013-06-20en_US
dc.date.submitted2012-11-21en_US
dc.description.abstractMultiple-model based speech recognition (MMSR) has been shown to be quite successful in noisy speech recognition. Since it employs multiple hidden Markov model (HMM) sets that correspond to various noise types and signal-to-noise ratio (SNR) values, the selected acoustic model can be closely matched with the test noisy speech, which leads to improved performance when compared with other state-of-the-art speech recognition systems that employ a single HMM set. However, as the number of HMM sets is usually limited due to practical considerations as well as effective model selection, acoustic mismatch can still be a problem in MMSR. In this study, we proposed methods to improve recognition performance by mitigating the mismatch in SNR and noise type for an MMSR solution. For the SNR mismatch, an optimal SNR mapping between the test noisy speech and the HMM was determined by experimental investigation. Improved performance was demonstrated by employing the SNR mapping instead of using the estimated SNR of the test noisy speech directly. We also proposed a novel method to reduce the effect of noise type mismatch by compensating the test noisy speech in the log-spectrum domain. We first derive the relation between the log-spectrum vectors in the test and training noisy speech. Since the relation is a non-linear function of the speech and noise parameters, the statistical information regarding the testing log-spectrum vectors was obtained by approximation using vector Taylor series (VTS) algorithm. Finally, the minimum mean square error estimation of the training log-spectrum vectors was used to reduce the mismatch between the training and test noisy speech. By employing the proposed methods in the MMSR framework, relative word error rate reduction of 18.7% and 21.3% was achieved on the Aurora 2 task when compared to a conventional MMSR and multi-condition training (MTR) method, respectively.en_US
dc.identifier.citationChung, Y., and J. H. L. Hansen. 2013. "Compensation of SNR and noise type mismatch using an environmental sniffing based speech recognition solution." 2013(12): 1-14.en_US
dc.identifier.issn1687-4714en_US
dc.identifier.issue12en_US
dc.identifier.urihttp://hdl.handle.net/10735.1/3627
dc.identifier.volume2013en_US
dc.language.isoenen_US
dc.relation.urihttp://dx.doi.org/10.1186/1687-4722-2013-12en_US
dc.rightsCC BY 4.0 (Attribution)en_US
dc.rights©2013 The Authors.en_US
dc.source.journalEURASIP Journal on Audio, Speech, and Music Processingen_US
dc.subjectEnvironmental sniffingen_US
dc.subjectMultiple-model frameen_US
dc.subjectNoise robustnessen_US
dc.subjectSpeech perceptionen_US
dc.subjectSignal-To-Noise Ratioen_US
dc.titleCompensation of SNR and Noise Type Mismatch using an Environmental Sniffing Based Speech Recognition Solutionen_US
dc.typetexten_US
dc.type.genrearticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ECS-FR-JHLHansen-309486.75.pdf
Size:
1.17 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
SpringerOpen.pdf
Size:
45.37 KB
Format:
Adobe Portable Document Format
Description: