Processing

Please wait...

PATENTSCOPE will be unavailable a few hours for maintenance reason on Tuesday 25.01.2022 at 9:00 AM CET
Settings

Settings

Goto Application

1. WO2022010613 - MULTI-TAP MINIMUM VARIANCE DISTORTIONLESS RESPONSE BEAMFORMER WITH NEURAL NETWORKS FOR TARGET SPEECH SEPARATION

Publication Number WO/2022/010613
Publication Date 13.01.2022
International Application No. PCT/US2021/036821
International Filing Date 10.06.2021
IPC
G10L 15/16 2006.1
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
08Speech classification or search
16using artificial neural networks
G06N 3/08 2006.1
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3Computer systems based on biological models
02using neural network models
08Learning methods
G10L 15/00 2013.1
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
G10L 15/08 2006.1
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
08Speech classification or search
G10L 15/22 2006.1
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
22Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 15/25 2013.1
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
24Speech recognition using non-acoustical features
25using position of the lips, movement of the lips or face analysis
Applicants
  • TENCENT AMERICA LLC [US]/[US]
Inventors
  • XU, Yong
  • YU, Meng
  • ZHANG, Shi-Xiong
  • WENG, Chao
  • LIU, Jianming
  • YU, Dong
Agents
  • RABENA, John, F.
  • EMERY, David P.
  • TUNG, Loren H.
Priority Data
16/926,13810.07.2020US
Publication Language English (en)
Filing Language English (EN)
Designated States
Title
(EN) MULTI-TAP MINIMUM VARIANCE DISTORTIONLESS RESPONSE BEAMFORMER WITH NEURAL NETWORKS FOR TARGET SPEECH SEPARATION
(FR) DISPOSITIF DE FORMATION DE FAISCEAU DE RÉPONSE SANS DISTORSION À VARIANCE MINIMALE À PRISES MULTIPLES AVEC RÉSEAUX NEURONAUX POUR LA SÉPARATION DE LA PAROLE CIBLE
Abstract
(EN) A method, computer system, and computer readable medium are provided for automatic speech recognition. Video data and audio data corresponding to one or more speakers is received. A minimum variance distortionless response function is applied to the received audio and video data. A predicted target waveform corresponding to a target speaker from among the one or more speakers is generated based on back-propagating the output of the applied minimum variance distortionless response function.
(FR) L'invention concerne un procédé, un système informatique et un support lisible par ordinateur pour la reconnaissance automatique de la parole. Des données vidéo et des données audio correspondant à un ou plusieurs haut-parleurs sont reçues. Une fonction de réponse sans distorsion à variance minimale est appliquée aux données audio et vidéo reçues. Une forme d'onde cible prédite correspondant à un haut-parleur cible parmi le ou les haut-parleurs est générée sur la base de la propagation en retour de la sortie de la fonction de réponse sans distorsion à variance minimale appliquée.
Related patent documents
Latest bibliographic data on file with the International Bureau