Processing

Please wait...

Settings

Settings

Goto Application

1. WO2018153807 - ACTION SELECTION FOR REINFORCEMENT LEARNING USING NEURAL NETWORKS

Publication Number WO/2018/153807
Publication Date 30.08.2018
International Application No. PCT/EP2018/054002
International Filing Date 19.02.2018
IPC
G06N 3/04 2006.01
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3Computer systems based on biological models
02using neural network models
04Architecture, e.g. interconnection topology
G06N 3/08 2006.01
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3Computer systems based on biological models
02using neural network models
08Learning methods
G06N 3/00 2006.01
GPHYSICS
06COMPUTING; CALCULATING OR COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3Computer systems based on biological models
CPC
G06N 3/006
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3Computer systems based on biological models
004Artificial life, i.e. computers simulating life
006based on simulated virtual individual or collective life forms, e.g. single "avatar", social simulations, virtual worlds or particle swarm optimisation
G06N 3/04
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3Computer systems based on biological models
02using neural network models
04Architectures, e.g. interconnection topology
G06N 3/0445
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3Computer systems based on biological models
02using neural network models
04Architectures, e.g. interconnection topology
0445Feedback networks, e.g. hopfield nets, associative networks
G06N 3/0454
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3Computer systems based on biological models
02using neural network models
04Architectures, e.g. interconnection topology
0454using a combination of multiple neural nets
G06N 3/08
GPHYSICS
06COMPUTING; CALCULATING; COUNTING
NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3Computer systems based on biological models
02using neural network models
08Learning methods
Applicants
  • DEEPMIND TECHNOLOGIES LIMITED [GB]/[GB]
Inventors
  • OSINDERO, Simon
  • KAVUKCUOGLU, Koray
  • VEZHNEVETS, Alexander
Agents
  • KUNZ, Herbert
Priority Data
62/463,53224.02.2017US
Publication Language English (EN)
Filing Language English (EN)
Designated States
Title
(EN) ACTION SELECTION FOR REINFORCEMENT LEARNING USING NEURAL NETWORKS
(FR) SÉLECTION D'ACTION DESTINÉE À L'APPRENTISSAGE DE CONSOLIDATION À L'AIDE DE RÉSEAUX NEURONAUX
Abstract
(EN)
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for a system configured to select actions to be performed by an agent that interacts with an environment. The system comprises a manager neural network subsystem and a worker neural network subsystem. The manager subsystem is configured to, at each of the multiple time steps, generate a final goal vector for the time step. The worker subsystem is configured to, at each of multiple time steps, use the final goal vector generated by the manager subsystem to generate a respective action score for each action in a predetermined set of actions.
(FR)
La présente invention concerne des procédés, des systèmes et un appareil, comprenant des programmes informatiques codés sur des supports de stockage informatique, destinés à un système conçu pour sélectionner des actions à effectuer par un agent qui interagit avec un environnement. Le système comprend un sous-système de réseau neuronal de gestionnaire et un sous-système de réseau neuronal de travailleur. Le sous-système de gestionnaire est configuré, à chacune des multiples étapes temporelles, pour générer un vecteur d'objectif final pour l'étape temporelle. Le sous-système de travailleur est configuré, à chacune de multiples étapes temporelles, pour utiliser le vecteur d'objectif final généré par le sous-système de gestionnaire pour générer un score d'action respectif pour chaque action dans un ensemble prédéfini d'actions.
Also published as
Latest bibliographic data on file with the International Bureau