Processing

Please wait...

Settings

Settings

Goto Application

1. CN108510985 - Systems and methods for principled bias reduction in production speech models

Office China
Application Number 201810159989.9
Application Date 26.02.2018
Publication Number 108510985
Publication Date 07.09.2018
Publication Kind A
IPC
G10L 15/22
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
22Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 15/16
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
08Speech classification or search
16using artificial neural networks
CPC
G10L 15/16
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
08Speech classification or search
16using artificial neural networks
G10L 15/02
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
02Feature extraction for speech recognition; Selection of recognition unit
G10L 15/04
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
04Segmentation; Word boundary detection
G10L 15/22
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
22Procedures used during a speech recognition process, e.g. man-machine dialogue
G10L 25/18
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
25Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
03characterised by the type of extracted parameters
18the extracted parameters being spectral information of each sub-band
G10L 15/06
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
Applicants BAIDU USA LLC
百度(美国)有限责任公司
Inventors HUANG, JIAJI
埃里克·巴顿伯格
KUMAR, ATUL
瑞万·蔡尔德
CHILD, REWON
亚当·科茨
LIU, HAIRONG
克里斯托弗·丰纳
SATHEESH, SANJEEV
雅舍施·高尔
FOUGNER, CHRISTOPHER
黄家骥
SEETAPUN, DAVID
俊熙雄
JUN, HEEWOO
阿贾伊·卡恩纳恩
COATES, ADAM
马库斯·基尔
GAUR, YASHESH
奥提尓·库马尔
RAO, VINAY
刘海容
SRIRAM, ANUROOP
维奈·朗
ZHU, ZHENYAO
桑吉夫·萨西斯
BATTENBERG, ERIC
大卫·西塔潘
KANNAN, AJAY
安鲁普·西瑞兰姆
KLIEGL, MARKUS
朱臻垚
Agents 北京英赛嘉华知识产权代理有限责任公司 11204
北京英赛嘉华知识产权代理有限责任公司 11204
Priority Data 15884239 30.01.2018 US
62/463,547 24.02.2017 US
Title
(EN) Systems and methods for principled bias reduction in production speech models
(ZH) 用于减小生产语音模型中的原则性偏差的系统和方法
Abstract
(EN)
Described herein are systems and methods to identify and address sources of bias in an end-to-end speech model. In one or more embodiments, the end-to-end model may be a recurrent neural network withtwo 2D-convolutional input layers, followed by multiple bidirectional recurrent layers and one fully connected layer before a softmax layer. In one or more embodiments, the network is trained end-to-end using the CTC loss function to directly predict sequences of characters from log spectrograms of audio. With optimized recurrent layers and training together with alignment information, some unwanted bias induced by using purely forward only recurrences may be removed in a deployed model.

(ZH)
本文中描述的是识别和解决端对端语音模型中的偏差源的系统和方法。在一个或多个实施方式中,端对端模型可以是递归神经网络,该递归神经网络具有两个2D卷积输入层,接着是多个双向递归层以及在softmax层之前的一个完全连接层。在一个或多个实施方式中,使用CTC损失函数训练端对端,以从音频的对数频谱直接预测字符的序列。通过优化的递归层和与对齐信息一起训练,可去除所配置的模型中的一些不希望有的偏差,这些不希望有的偏差是通过使用仅纯粹前向递归而引起的。

Also published as