Some content of this application is unavailable at the moment.
If this situation persist, please contact us atFeedback&Contact
1. (WO2018083532) TRAINING ACTION SELECTION NEURAL NETWORKS
Latest bibliographic data on file with the International Bureau    Submit observation

Pub. No.: WO/2018/083532 International Application No.: PCT/IB2017/001329
Publication Date: 11.05.2018 International Filing Date: 03.11.2017
Chapter 2 Demand Filed: 03.09.2018
IPC:
G06N 3/08 (2006.01) ,G06N 3/04 (2006.01)
G PHYSICS
06
COMPUTING; CALCULATING; COUNTING
N
COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3
Computer systems based on biological models
02
using neural network models
08
Learning methods
G PHYSICS
06
COMPUTING; CALCULATING; COUNTING
N
COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3
Computer systems based on biological models
02
using neural network models
04
Architecture, e.g. interconnection topology
Applicants:
DEEPMIND TECHNOLOGIES LIMITED [GB/GB]; 5 New Street Square London EC4A 3TW, GB
Inventors:
WANG, Ziyu; GB
HEESS, Nicolas, Manfred, Otto; GB
BAPST, Victore; GB
MNIH, Volodymyr; GB
MUNOS, Remi; GB
KAVUKCUOGLU, Koray; GB
DE FREITAS, Joao, Ferdinando, Gomes; GB
Agent:
MARKS & CLERK LLP; 1 New York Street Manchester Manchester M1 4HD, GB
Priority Data:
62/417,23503.11.2016US
Title (EN) TRAINING ACTION SELECTION NEURAL NETWORKS
(FR) FORMATION DE RÉSEAUX NEURONAUX DE SÉLECTION D'ACTION
Abstract:
(EN) Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network. One of the methods includes maintaining a replay memory that stores trajectories generated as a result of interaction of an agent with an environment; and training an action selection neural network having policy parameters on the trajectories in the replay memory, wherein training the action selection neural network comprises : sampling a trajectory from the replay memory; and adjusting current values of the policy parameters by training the action selection neural network on the trajectory using an off-policy actor critic reinforcement learning technique.
(FR) L'invention concerne des procédés, des systèmes et un appareil, y compris des programmes informatiques codés sur un support d'enregistrement informatique, permettant de former un réseau neuronal de sélection d'action. L'un des procédés consiste à : gérer une mémoire de relecture qui stocke des trajectoires générées suite à l'interaction d'un agent avec un environnement ; et former un réseau neuronal de sélection d'action ayant des paramètres de politique sur les trajectoires de la mémoire de relecture, la formation du réseau neuronal de sélection d'action consistant à : échantillonner une trajectoire à partir de la mémoire de relecture ; et adapter les valeurs actuelles des paramètres de politique en formant le réseau neuronal de sélection d'action sur la trajectoire à l'aide d'une technique d'apprentissage par renforcement acteur-critique hors politique.
front page image
Designated States: AE, AG, AL, AM, AO, AT, AU, AZ, BA, BB, BG, BH, BN, BR, BW, BY, BZ, CA, CH, CL, CN, CO, CR, CU, CZ, DE, DJ, DK, DM, DO, DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT, HN, HR, HU, ID, IL, IN, IR, IS, JO, JP, KE, KG, KH, KN, KP, KR, KW, KZ, LA, LC, LK, LR, LS, LU, LY, MA, MD, ME, MG, MK, MN, MW, MX, MY, MZ, NA, NG, NI, NO, NZ, OM, PA, PE, PG, PH, PL, PT, QA, RO, RS, RU, RW, SA, SC, SD, SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ, TM, TN, TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW
African Regional Intellectual Property Organization (ARIPO) (BW, GH, GM, KE, LR, LS, MW, MZ, NA, RW, SD, SL, ST, SZ, TZ, UG, ZM, ZW)
Eurasian Patent Office (AM, AZ, BY, KG, KZ, RU, TJ, TM)
European Patent Office (EPO) (AL, AT, BE, BG, CH, CY, CZ, DE, DK, EE, ES, FI, FR, GB, GR, HR, HU, IE, IS, IT, LT, LU, LV, MC, MK, MT, NL, NO, PL, PT, RO, RS, SE, SI, SK, SM, TR)
African Intellectual Property Organization (BF, BJ, CF, CG, CI, CM, GA, GN, GQ, GW, KM, ML, MR, NE, SN, TD, TG)
Publication Language: English (EN)
Filing Language: English (EN)