Processing

Please wait...

Settings

Settings

Goto Application

1. US20220013123 - MULTI-TAP MINIMUM VARIANCE DISTORTIONLESS RESPONSE BEAMFORMER WITH NEURAL NETWORKS FOR TARGET SPEECH SEPARATION

Office
United States of America
Application Number 16926138
Application Date 10.07.2020
Publication Number 20220013123
Publication Date 13.01.2022
Publication Kind A1
IPC
G10L 15/25
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
24Speech recognition using non-acoustical features
25using position of the lips, movement of the lips or face analysis
CPC
G10L 15/25
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
24Speech recognition using non-acoustical features
25using position of the lips, movement of the lips or face analysis
Applicants TENCENT AMERICA LLC
Inventors Yong XU
Meng Yu
Shi-Xiong Zhang
Chao Weng
Jianming Liu
Dong Yu
Title
(EN) MULTI-TAP MINIMUM VARIANCE DISTORTIONLESS RESPONSE BEAMFORMER WITH NEURAL NETWORKS FOR TARGET SPEECH SEPARATION
Abstract
(EN)

A method, computer system, and computer readable medium are provided for automatic speech recognition. Video data and audio data corresponding to one or more speakers is received. A minimum variance distortionless response function is applied to the received audio and video data. A predicted target waveform corresponding to a target speaker from among the one or more speakers is generated based on back-propagating the output of the applied minimum variance distortionless response function.


Related patent documents