Processing

Please wait...

Settings

Settings

Goto Application

1. WO2022025359 - METHOD AND APPARATUS FOR GENERATING SPEECH VIDEO

Publication Number WO/2022/025359
Publication Date 03.02.2022
International Application No. PCT/KR2020/017848
International Filing Date 08.12.2020
IPC
G10L 21/10 2013.1
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
21Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
10Transforming into visible information
G10L 25/30 2013.1
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
25Speech or voice analysis techniques not restricted to a single one of groups G10L15/-G10L21/129
27characterised by the analysis technique
30using neural networks
H04N 21/43 2011.1
HELECTRICITY
04ELECTRIC COMMUNICATION TECHNIQUE
NPICTORIAL COMMUNICATION, e.g. TELEVISION
21Selective content distribution, e.g. interactive television or video on demand
40Client devices specifically adapted for the reception of, or interaction with, content, e.g. STB ; Operations thereof
43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronizing decoder's clock; Client middleware
G06N 3/04 2006.1
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3Computer systems based on biological models
02using neural network models
04Architecture, e.g. interconnection topology
G06N 3/08 2006.1
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3Computer systems based on biological models
02using neural network models
08Learning methods
CPC
G06N 3/04
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3Computer systems based on biological models
02using neural network models
04Architectures, e.g. interconnection topology
G06N 3/0454
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3Computer systems based on biological models
02using neural network models
04Architectures, e.g. interconnection topology
0454using a combination of multiple neural nets
G06N 3/08
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3Computer systems based on biological models
02using neural network models
08Learning methods
G10L 2021/105
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
21Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
10Transforming into visible information
105Synthesis of the lips movements from speech, e.g. for talking heads
G10L 21/10
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
21Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
10Transforming into visible information
G10L 25/30
GPHYSICS
10MUSICAL INSTRUMENTS; ACOUSTICS
LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
25Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
27characterised by the analysis technique
30using neural networks
Applicants
  • 주식회사 딥브레인에이아이 DEEPBRAIN AI INC. [KR]/[KR]
Inventors
  • 채경수 CHAE, Gyeongsu
  • 황금별 HWANG, Guembuel
Agents
  • 장 영태 CHANG, Young Tae
Priority Data
10-2020-009337427.07.2020KR
Publication Language Korean (ko)
Filing Language Korean (KO)
Designated States
Title
(EN) METHOD AND APPARATUS FOR GENERATING SPEECH VIDEO
(FR) PROCÉDÉ ET APPAREIL POUR GÉNÉRER UNE VIDÉO DE PAROLE
(KO) 발화 영상 생성 방법 및 장치
Abstract
(EN) Disclosed are a method and an apparatus for generating a speech video. The disclosed speech video generating apparatus according to an embodiment corresponds to a speech video generating apparatus having at least one processor and a memory for storing at least one program executed by the at least one processor, and comprises: a first machine learning model which receives an input of a speech video of a person, extracts a video feature therefrom, and reconstructs the speech video from the extracted video feature; and a second machine learning model which receives an input of a speech audio signal of a person and predicts a video feature therefrom.
(FR) Sont divulgués un procédé et un appareil pour générer une vidéo de parole. L'appareil de génération de vidéo de parole divulgué selon un mode de réalisation correspond à un appareil de génération de vidéo de parole ayant au moins un processeur et une mémoire pour stocker au moins un programme exécuté par le ou les processeurs et comprend : un premier modèle d'apprentissage automatique qui reçoit une entrée d'une vidéo de parole d'une personne, extrait une caractéristique vidéo de celle-ci et reconstruit la vidéo de parole à partir de la caractéristique vidéo extraite; et un second modèle d'apprentissage automatique qui reçoit une entrée d'un signal audio de parole d'une personne et prédit une caractéristique vidéo à partir de celui-ci.
(KO) 발화 영상 생성 방법 및 장치가 개시된다. 개시되는 일 실시예에 따른 발화 영상 생성 장치는, 하나 이상의 프로세서들, 및 하나 이상의 프로세서들에 의해 실행되는 하나 이상의 프로그램들을 저장하는 메모리를 구비한 발화 영상 생성 장치로서, 인물의 발화 영상을 입력으로 하여 영상 특징을 추출하고, 추출한 영상 특징으로부터 발화 영상을 복원하도록 하는 제1 머신 러닝 모델 및 인물의 발화 오디오 신호를 입력으로 하여 영상 특징을 예측하도록 하는 제2 머신 러닝 모델을 포함한다.
Related patent documents
Latest bibliographic data on file with the International Bureau