Processing

Please wait...

Settings

Settings

Goto Application

1. CN109166593 - Audio data processing method, device thereof and storage medium

Office China
Application Number 201810941442.4
Application Date 17.08.2018
Publication Number 109166593
Publication Date 08.01.2019
Publication Kind A
IPC
G10L 25/51
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
25Speech or voice analysis techniques not restricted to a single one of groups G10L15/-G10L21/129
48specially adapted for particular use
51for comparison or discrimination
G10L 25/18
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
25Speech or voice analysis techniques not restricted to a single one of groups G10L15/-G10L21/129
03characterised by the type of extracted parameters
18the extracted parameters being spectral information of each sub-band
G10L 25/30
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
25Speech or voice analysis techniques not restricted to a single one of groups G10L15/-G10L21/129
27characterised by the analysis technique
30using neural networks
CPC
G10L 25/18
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
25Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
03characterised by the type of extracted parameters
18the extracted parameters being spectral information of each sub-band
G10L 25/30
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
25Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
27characterised by the analysis technique
30using neural networks
G10L 25/51
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
25Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
48specially adapted for particular use
51for comparison or discrimination
Applicants TENCENT MUSIC ENTERTAINMENT TECHNOLOGY (SHENZHEN) CO., LTD.
腾讯音乐娱乐科技(深圳)有限公司
Inventors WANG ZHENGTAO
王征韬
Agents 深圳翼盛智成知识产权事务所(普通合伙) 44300
Title
(EN) Audio data processing method, device thereof and storage medium
(ZH) 音频数据处理方法、装置及存储介质
Abstract
(EN)
The invention discloses an audio data processing method, a device thereof and a storage medium. The method comprises: obtaining training samples, and then extracting a plurality of pieces of feature information from the training sample, wherein the plurality of pieces of feature information include spectral height features, the distinctive features of pure music and human voice, the audio beginning feature corresponding to the first preset time length and the audio ending feature corresponding to the second preset time length, Then the feature information is input into the neural network for feature fusion training to obtain the trained feature fusion parameters, and the audio classification model is generated according to the feature fusion parameters. if the test audio is received, classifying the test audio by the audio classification model. The embodiment of the invention obtains the feature fusion parameters combining the features of multiple aspects through the feature fusion ofmultiple feature information, and reintegrates the feature fusion parameters into the audio classification model for audio classification, thereby improving the accuracy of audio classification and effectively distinguishing the live audio from the audio of the recording studio.

(ZH)
本发明公开了一种音频数据处理方法、装置及存储介质,所述方法包括:获取训练样本,然后提取训练样本中的多个特征信息,多个特征信息包括频谱高度特征、纯音乐与人声的鉴别特征、第一预设时长对应的音频开头特征以及第二预设时长对应的音频结尾特征,再将多个特征信息输入神经网络中进行特征融合训练,以得到训练后的特征融合参数,并根据特征融合参数生成音频分类模型,若接收到测试音频,则通过音频分类模型对测试音频进行分类。本发明实施例通过多个特征信息的特征融合,得到结合了多个方面特征的特征融合参数,并将特征融合参数再融入到音频分类模型中进行音频分类,提升了音频分类的准确率,能够有效区分现场音频与录音棚音频。