123,123

基于生成对抗网络合成噪声的语音增强方法研究

2020年电子技术应用第11期

夏鼎，徐文涛

南京航空航天大学理学院，江苏南京211106

摘要： 在语音增强领域，深度神经网络通过对大量含有不同噪声的语音以监督学习方式进行训练建模，从而提升网络的语音增强能力。然而不同类型噪声的获取成本较大，噪声类型难以全面采集，影响了模型的泛化能力。针对这个问题，提出一种基于生成对抗网络(Generative Adversarial Networks，GAN)的噪声数据样本增强方法，该方法对真实噪声数据进行学习，根据数据特征合成虚拟噪声，以此扩充训练集中噪声数据的数量和类型。通过实验验证，所采用的噪声合成方法能够有效扩展训练集中噪声来源，增强模型的泛化能力，有效提高语音信号去噪处理后的信噪比和可理解性。

關(guān)鍵詞： 语音增强生成对抗网络数据增强

中圖分類號(hào)： TN912.3
文獻(xiàn)標(biāo)識(shí)碼： A
DOI：10.16157/j.issn.0258-7998.200327
中文引用格式： 夏鼎，徐文濤. 基于生成對(duì)抗網(wǎng)絡(luò)合成噪聲的語(yǔ)音增強(qiáng)方法研究[J].電子技術(shù)應(yīng)用，2020，46(11)：56-59，64.
英文引用格式： Xia Ding，Xu Wentao. Research on speech enhancement method based on generating noise using GAN[J]. Application of Electronic Technique，2020，46(11)：56-59，64.

Research on speech enhancement method based on generating noise using GAN

Xia Ding，Xu Wentao

School of Science，Nanjing University of Aeronautics and Astronautics，Nanjing 211106，China

Abstract： In the field of speech enhancement, deep neural network can improve the enhancement ability of the model by training and modeling a large number of data with different noises in the supervised learning way. However, the acquisition cost of different types of noise is large and the noise types are difficult to be comprehensive, which affects the generalization ability of the model. Aiming at this problem, this paper proposes a noise data augmentation method based on generative adversarial network(GAN), which learns from the real noise data and synthesizes virtual noises according to the data features, so as to expand the number and type of the noise data in the training set. Experimental results show that the method of noise synthesis adopted in this article can effectively expand the source of noise in the training set, enhance the generalization ability of the model, and effectively improve the signal-to-noise ratio and intelligibility of speech signal after denoising.

Key words : speech enhancement；generative adversarial network；data augmentation

0 引言

在語(yǔ)音信號(hào)處理的過(guò)程中，背景噪聲和環(huán)境干擾嚴(yán)重影響了信號(hào)處理的可靠性，需要通過(guò)語(yǔ)音增強(qiáng)處理方法去除信號(hào)中的噪聲干擾，改善含噪語(yǔ)音的質(zhì)量。因此，語(yǔ)音增強(qiáng)技術(shù)在語(yǔ)音識(shí)別、聽(tīng)力輔助和語(yǔ)音通信等領(lǐng)域中具有非常重要的作用。

傳統(tǒng)的語(yǔ)音增強(qiáng)方法有譜減法^[1]、維納濾波^[2-3]以及之后出現(xiàn)的基于統(tǒng)計(jì)模型的處理方法^[4]等，這些方法都是基于已知噪聲的統(tǒng)計(jì)特性來(lái)進(jìn)行建模，得到噪聲的功率譜信息，對(duì)含噪語(yǔ)音信號(hào)進(jìn)行降噪處理，以估計(jì)純凈語(yǔ)音信號(hào)。這些傳統(tǒng)方法的準(zhǔn)確性嚴(yán)重依賴數(shù)據(jù)特征工程處理方法和數(shù)據(jù)類型，對(duì)于未知的噪聲干擾，其適應(yīng)能力較差^[5]。隨著人工智能的發(fā)展，深度神經(jīng)網(wǎng)絡(luò)被應(yīng)用于語(yǔ)音增強(qiáng)領(lǐng)域^[6]。利用深層神經(jīng)網(wǎng)絡(luò)的特征學(xué)習(xí)，可以將含噪語(yǔ)音映射為純凈語(yǔ)音，達(dá)到去除噪聲的目的。為了提高深度神經(jīng)網(wǎng)絡(luò)進(jìn)行語(yǔ)音增強(qiáng)方法的泛化能力，最直接的手段是進(jìn)行數(shù)據(jù)增強(qiáng)，包括增加數(shù)據(jù)的多樣性、擴(kuò)大數(shù)據(jù)集等。實(shí)驗(yàn)表明，在深度神經(jīng)網(wǎng)絡(luò)訓(xùn)練的過(guò)程中采用更多種類的噪聲數(shù)據(jù)，語(yǔ)音信噪比質(zhì)量可以顯著提高^[7-8]。但是，真實(shí)的噪聲數(shù)據(jù)獲取難度較大，成本較高，這限制了網(wǎng)絡(luò)去噪能力的適用性。針對(duì)這一問(wèn)題，本文基于生成對(duì)抗網(wǎng)絡(luò)GAN設(shè)計(jì)了一種訓(xùn)練數(shù)據(jù)集增強(qiáng)方法，通過(guò)生成虛擬噪聲，擴(kuò)充訓(xùn)練集中噪聲數(shù)據(jù)的類型和數(shù)量，提高模型的泛化能力。

本文詳細(xì)內(nèi)容請(qǐng)下載:http://www.ihrv.cn/resource/share/2000003050

作者信息:

夏鼎，徐文濤

(南京航空航天大學(xué) 理學(xué)院，江蘇南京211106)

原創(chuàng)聲明：此內(nèi)容為AET網(wǎng)站原創(chuàng)，未經(jīng)授權(quán)禁止轉(zhuǎn)載。

相關(guān)內(nèi)容