Processing

Please wait...

Settings

Settings

Goto Application

1. WO2020112808 - SYSTEM AND METHOD FOR CONVERTING IMAGE DATA INTO A NATURAL LANGUAGE DESCRIPTION

Publication Number WO/2020/112808
Publication Date 04.06.2020
International Application No. PCT/US2019/063298
International Filing Date 26.11.2019
IPC
G06K 9/00 2006.01
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
9Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
G06K 9/46 2006.01
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
9Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
36Image preprocessing, i.e. processing the image information without deciding about the identity of the image
46Extraction of features or characteristics of the image
H04N 5/278 2006.01
HELECTRICITY
04ELECTRIC COMMUNICATION TECHNIQUE
NPICTORIAL COMMUNICATION, e.g. TELEVISION
5Details of television systems
222Studio circuitry; Studio devices; Studio equipment
262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects
278Subtitling
CPC
G06F 16/383
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
30of unstructured textual data
38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
383using metadata automatically derived from the content
G06F 16/583
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
FELECTRIC DIGITAL DATA PROCESSING
16Information retrieval; Database structures therefor; File system structures therefor
50of still image data
58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
583using metadata automatically derived from the content
G06K 9/3241
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
9Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
20Image acquisition
32Aligning or centering of the image pick-up or image-field
3233Determination of region of interest
3241Recognising objects as potential recognition candidates based on visual cues, e.g. shape
G06N 5/046
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
5Computer systems using knowledge-based models
04Inference methods or devices
046Forward inferencing; Production systems
Applicants
  • SONY INTERACTIVE ENTERTAINMENT INC. [JP]/[JP]
  • ZHENG, Jian [US]/[US] (US)
  • CHEN, Ruxin [US]/[US] (US)
Inventors
  • ZHENG, Jian
  • CHEN, Ruxin
Agents
  • ROGITZ, John L.
Priority Data
16/206,43930.11.2018US
Publication Language English (EN)
Filing Language English (EN)
Designated States
Title
(EN) SYSTEM AND METHOD FOR CONVERTING IMAGE DATA INTO A NATURAL LANGUAGE DESCRIPTION
(FR) SYSTÈME ET PROCÉDÉ DE CONVERSION DE DONNÉES D'IMAGE EN UNE DESCRIPTION DE LANGAGE NATUREL
Abstract
(EN)
For image captioning such as for computer game images or other images, bottom-up attention (400) is combined with top-down attention (402) to provide a multi-level residual attention-based image captioning model. A residual attention mechanism (500) is first applied in the Faster R-CNN network to learn better feature representations for each region by taking spatial information into consideration. In the image captioning network, taking the extracted regional features as input, a second residual attention network (1204) is implemented to fuse the regional features attentionally for subsequent caption generation.
(FR)
Pour le sous-titrage d'images tel que pour des images de jeu d'ordinateur ou d'autres images, une attention de bas en haut (400) est combinée à une attention de haut en bas (402) pour fournir un modèle de sous-titrage d'image basé sur l'attention résiduelle à plusieurs niveaux. Un mécanisme d'attention résiduel (500) est tout d'abord appliqué dans le réseau R-CNN plus rapide pour apprendre de meilleures représentations de caractéristiques pour chaque région en prenant en considération des informations spatiales. Dans le réseau de sous-titrage d'image, en adoptant les caractéristiques régionales extraites en tant qu'entrée, un second réseau d'attention résiduelle (1204) est implémenté pour fusionner les caractéristiques régionales en fonction de l'attention pour la génération de sous-titres ultérieurs.
Also published as
Latest bibliographic data on file with the International Bureau