Processing

Please wait...

Settings

Settings

Goto Application

1. KR1020210008510 - 뉴럴 네트워크들을 사용하여 대상 화자의 음성으로 텍스트로부터의 스피치의 합성

Office
Republic of Korea
Application Number 1020207035508
Application Date 17.05.2019
Publication Number 1020210008510
Publication Date 22.01.2021
Publication Kind A
IPC
G10L 13/033
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
13Speech synthesis; Text to speech systems
02Methods for producing synthetic speech; Speech synthesisers
033Voice editing, e.g. manipulating the voice of the synthesiser
G06N 3/08
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3Computer systems based on biological models
02using neural network models
08Learning methods
G10L 13/04
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
13Speech synthesis; Text to speech systems
02Methods for producing synthetic speech; Speech synthesisers
04Details of speech synthesis systems, e.g. synthesiser structure or memory management
G10L 25/18
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
25Speech or voice analysis techniques not restricted to a single one of groups G10L15/-G10L21/129
03characterised by the type of extracted parameters
18the extracted parameters being spectral information of each sub-band
G10L 25/30
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
25Speech or voice analysis techniques not restricted to a single one of groups G10L15/-G10L21/129
27characterised by the analysis technique
30using neural networks
CPC
G10L 13/033
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
13Speech synthesis; Text to speech systems
02Methods for producing synthetic speech; Speech synthesisers
033Voice editing, e.g. manipulating the voice of the synthesiser
G06N 3/08
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3Computer systems based on biological models
02using neural network models
08Learning methods
G10L 13/04
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
13Speech synthesis; Text to speech systems
02Methods for producing synthetic speech; Speech synthesisers
04Details of speech synthesis systems, e.g. synthesiser structure or memory management
G10L 25/18
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
25Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
03characterised by the type of extracted parameters
18the extracted parameters being spectral information of each sub-band
G10L 25/30
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
25Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
27characterised by the analysis technique
30using neural networks
Applicants 구글 엘엘씨
Inventors 지아, 예
첸, 지펭
우, 용휘
센, 조나단
팡, 루오밍
웨이스, 론 제이.
모레노, 이그나시오 로페즈
렌, 페이
장, 유
왕, 콴
응우옌, 패트릭 안 푸
Agents 특허법인 남앤남
Priority Data 62/672,835 17.05.2018 US
Title
(KO) 뉴럴 네트워크들을 사용하여 대상 화자의 음성으로 텍스트로부터의 스피치의 합성
Abstract
(KO) 스피치 합성을 위한, 컴퓨터 저장 매체 상에서 인코딩되는 컴퓨터 프로그램들을 포함하는 방법들, 시스템들 및 장치들. 방법들, 시스템들 및 장치들은, 대상 화자의 스피치의 오디오 표현을 획득하고, 대상 화자의 음성으로 스피치가 합성될 입력 텍스트를 획득하고, 화자들을 서로 구별하도록 훈련된 화자 인코더 엔진에 오디오 표현을 제공함으로써 화자 벡터를 생성하고, 오디오 표현들을 생성하기 위해 기준 화자들의 음성들을 사용하여 훈련된 스펙트로그램 생성 엔진에 입력 텍스트 및 화자 벡터를 제공함으로써 대상 화자의 음성으로 발성된 입력 텍스트의 오디오 표현을 생성하고, 그리고 출력을 위해 대상 화자의 음성으로 발성된 입력 텍스트의 오디오 표현을 제공하는 동작들을 포함한다.