Processing

Please wait...

Settings

Settings

Goto Application

1. WO2019222591 - SYNTHESIS OF SPEECH FROM TEXT IN A VOICE OF A TARGET SPEAKER USING NEURAL NETWORKS

Publication Number WO/2019/222591
Publication Date 21.11.2019
International Application No. PCT/US2019/032815
International Filing Date 17.05.2019
IPC
G10L 13/033 2013.01
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
13Speech synthesis; Text to speech systems
02Methods for producing synthetic speech; Speech synthesisers
033Voice editing, e.g. manipulating the voice of the synthesiser
G10L 13/04 2013.01
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
13Speech synthesis; Text to speech systems
02Methods for producing synthetic speech; Speech synthesisers
04Details of speech synthesis systems, e.g. synthesiser structure or memory management
G10L 25/30 2013.01
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
25Speech or voice analysis techniques not restricted to a single one of groups G10L15/-G10L21/129
27characterised by the analysis technique
30using neural networks
CPC
G06N 3/08
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3Computer systems based on biological models
02using neural network models
08Learning methods
G10L 13/033
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
13Speech synthesis; Text to speech systems
02Methods for producing synthetic speech; Speech synthesisers
033Voice editing, e.g. manipulating the voice of the synthesiser
G10L 13/04
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
13Speech synthesis; Text to speech systems
02Methods for producing synthetic speech; Speech synthesisers
04Details of speech synthesis systems, e.g. synthesiser structure or memory management
G10L 25/18
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
25Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
03characterised by the type of extracted parameters
18the extracted parameters being spectral information of each sub-band
G10L 25/30
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
25Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
27characterised by the analysis technique
30using neural networks
Applicants
  • GOOGLE LLC [US]/[US]
Inventors
  • JIA, Ye
  • CHEN, Zhifeng
  • WU, Yonghui
  • SHEN, Jonathan
  • PANG, Ruoming
  • WEISS, Ron J.
  • MORENO, Ignacio Lopez
  • REN, Fei
  • ZHANG, Yu
  • WANG, Quan
  • NGUYEN, Patrick An Phu
Agents
  • KRUEGER, Brett A.
Priority Data
62/672,83517.05.2018US
Publication Language English (EN)
Filing Language English (EN)
Designated States
Title
(EN) SYNTHESIS OF SPEECH FROM TEXT IN A VOICE OF A TARGET SPEAKER USING NEURAL NETWORKS
(FR) SYNTHÈSE DE LA PAROLE D'UN TEXTE EN UNE VOIX D'UN LOCUTEUR CIBLE À L'AIDE DE RÉSEAUX NEURONAUX
Abstract
(EN)
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.
(FR)
La présente invention concerne des procédés, des systèmes et un appareil, incluant des programmes d'ordinateur codés sur un support d'enregistrement informatique, pour une synthèse de la parole. Les procédés, les systèmes et l'appareil comprennent des actions consistant à obtenir une représentation audio de paroles d'un locuteur cible, obtenir un texte d'entrée pour lequel des paroles doivent être synthétisées en une voix du locuteur cible, générer un vecteur de locuteur en fournissant la représentation audio à un moteur de codage de locuteur qui est entraîné pour distinguer des locuteurs les uns des autres, générer une représentation audio du texte d'entrée parlé dans la voix du locuteur cible par la fourniture du texte d'entrée et du vecteur de locuteur à un moteur de génération de spectrogrammes qui est entraîné à utiliser des voix de locuteurs de référence pour générer des représentations audio, et fournir la représentation audio du texte d'entrée parlé dans la voix du locuteur cible aux fins de sortie.
Also published as
JP2020564447
Latest bibliographic data on file with the International Bureau