Processing

Please wait...

Settings

Settings

Goto Application

1. EP3568810 - ACTION SELECTION FOR REINFORCEMENT LEARNING USING NEURAL NETWORKS

Office European Patent Office
Application Number 18705929
Application Date 19.02.2018
Publication Number 3568810
Publication Date 20.11.2019
Publication Kind A1
IPC
G06N 3/04
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3Computer systems based on biological models
02using neural network models
04Architecture, e.g. interconnection topology
G06N 3/00
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3Computer systems based on biological models
G06N 3/08
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3Computer systems based on biological models
02using neural network models
08Learning methods
CPC
G06N 3/006
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3Computer systems based on biological models
004Artificial life, i.e. computers simulating life
006based on simulated virtual individual or collective life forms, e.g. single "avatar", social simulations, virtual worlds or particle swarm optimisation
G06N 3/0445
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3Computer systems based on biological models
02using neural network models
04Architectures, e.g. interconnection topology
0445Feedback networks, e.g. hopfield nets, associative networks
G06N 3/0454
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3Computer systems based on biological models
02using neural network models
04Architectures, e.g. interconnection topology
0454using a combination of multiple neural nets
G06N 3/08
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3Computer systems based on biological models
02using neural network models
08Learning methods
G06N 3/04
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3Computer systems based on biological models
02using neural network models
04Architectures, e.g. interconnection topology
Applicants DEEPMIND TECH LTD
Inventors OSINDERO SIMON
KAVUKCUOGLU KORAY
VEZHNEVETS ALEXANDER
Designated States
Priority Data 01624635 25.07.1932 US
Title
(DE) AKTIONSAUSWAHL FÜR VERSTÄRKENDES LERNEN UNTER VERWENDUNG NEURONALER NETZE
(EN) ACTION SELECTION FOR REINFORCEMENT LEARNING USING NEURAL NETWORKS
(FR) SÉLECTION D'ACTION DESTINÉE À L'APPRENTISSAGE DE CONSOLIDATION À L'AIDE DE RÉSEAUX NEURONAUX
Abstract
(EN)
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for a system configured to select actions to be performed by an agent that interacts with an environment. The system comprises a manager neural network subsystem and a worker neural network subsystem. The manager subsystem is configured to, at each of the multiple time steps, generate a final goal vector for the time step. The worker subsystem is configured to, at each of multiple time steps, use the final goal vector generated by the manager subsystem to generate a respective action score for each action in a predetermined set of actions.

(FR)
La présente invention concerne des procédés, des systèmes et un appareil, comprenant des programmes informatiques codés sur des supports de stockage informatique, destinés à un système conçu pour sélectionner des actions à effectuer par un agent qui interagit avec un environnement. Le système comprend un sous-système de réseau neuronal de gestionnaire et un sous-système de réseau neuronal de travailleur. Le sous-système de gestionnaire est configuré, à chacune des multiples étapes temporelles, pour générer un vecteur d'objectif final pour l'étape temporelle. Le sous-système de travailleur est configuré, à chacune de multiples étapes temporelles, pour utiliser le vecteur d'objectif final généré par le sous-système de gestionnaire pour générer un score d'action respectif pour chaque action dans un ensemble prédéfini d'actions.