Multi-Task Adversarial Network Bottleneck Features for Noise-Robust Speaker Verification

Abstract

Modern automatic speaker verification (ASV) systems need to be robust under various noisy conditions. Motivated by the success of generative adversarial networks (GANs), this paper proposes a multi-task adversarial network (MAN) for extracting noise invariant bottleneck (BN) features. The MAN consists of three component networks, a feature encoding network (FEN), a speaker discriminative network (SDN) and a noise-domain adaptation network (NAN). The FEN aims to generate noise-robustness BN features, the SDN makes the features from the FEN more speaker-discriminative and the NAN guides the FEN to learn more noise-invariant feature representations. The MAN is trained using an adversarial method. When training FEN and SDN, speaker identities and the label of being clean speech are used as target labels, which can make BN features, extracted from noisy or clean speech, similar. When training NAN, on the contrary, noise types are used as training targets. We evaluate the newly proposed MAN-BN feature extraction method on a Gaussian mixture model-universal background model (GMM-UBM) based ASV system. The experimental results on the RSR2015 database show that the proposed MAN-BN feature can dramatically improve the accuracy of the ASV system under different noise-type and signal to-noise-ratio conditions.

Publication
2018 International Conference on Network Infrastructure and Digital Content
Date