Processing

Please wait...

Settings

Settings

Goto Application

1. WO2020068056 - SPEAKER DIARIZATION USING SPEAKER EMBEDDING(S) AND TRAINED GENERATIVE MODEL

Publication Number WO/2020/068056
Publication Date 02.04.2020
International Application No. PCT/US2018/052724
International Filing Date 25.09.2018
IPC
G10L 15/16 2006.01
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
08Speech classification or search
16using artificial neural networks
G10L 15/20 2006.01
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise or of stress induced speech
G10L 15/30 2013.01
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
28Constructional details of speech recognition systems
30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
CPC
G10L 15/16
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
08Speech classification or search
16using artificial neural networks
G10L 15/20
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
G10L 15/30
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
15Speech recognition
28Constructional details of speech recognition systems
30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
Applicants
  • GOOGLE LLC [US]/[US]
Inventors
  • MORENO, Ignacio Lopez
  • COBO RUS, Luis Carlos
Agents
  • HIGDON, Scott
  • MIDDLETON REUTLINGER
  • CUMMINS, Patrick
  • SALAZAR, John
  • SHUMAKER, Brantley
Priority Data
Publication Language English (EN)
Filing Language English (EN)
Designated States
Title
(EN) SPEAKER DIARIZATION USING SPEAKER EMBEDDING(S) AND TRAINED GENERATIVE MODEL
(FR) JOURNALISATION DE LOCUTEUR PAR INTÉGRATION(S) DE LOCUTEUR ET MODÈLE GÉNÉRATIF ENTRAÎNÉ
Abstract
(EN)
Speaker diarization techniques that enable processing of audio data to generate one or more refined versions of the audio data, where each of the refined versions of the audio data isolates one or more utterances of a single respective human speaker. Various implementations generate a refined version of audio data that isolates utterance(s) of a single human speaker by generating a speaker embedding for the single human speaker, and processing the audio data using a trained generative model - and using the speaker embedding in determining activations for hidden layers of the trained generative model during the processing. Output is generated over the trained generative model based on the processing, and the output is the refined version of the audio data.
(FR)
L'invention concerne des techniques de journalisation de locuteur qui permettent le traitement de données audio pour générer une ou plusieurs versions affinées des données audio, chacune des versions affinées des données audio isolant un ou plusieurs énoncés d'un seul locuteur humain respectif. Divers modes de réalisation génèrent une version affinée de données audio qui isole un ou des énoncés d'un seul locuteur humain en générant une intégration de locuteur pour le seul locuteur humain, en traitant les données audio à l'aide d'un modèle génératif entraîné, et en utilisant l'intégration de locuteur pour déterminer des activations pour des couches cachées du modèle génératif entraîné pendant le traitement. Une sortie est générée sur le modèle génératif entraîné sur la base du traitement, et la sortie est la version affinée des données audio.
Also published as
EP2018786558
Latest bibliographic data on file with the International Bureau