Some content of this application is unavailable at the moment.
If this situation persists, please contact us atFeedback&Contact
1. (WO2017096079) SELECTING ACTION SLATES USING REINFORCEMENT LEARNING
Latest bibliographic data on file with the International Bureau

Pub. No.: WO/2017/096079 International Application No.: PCT/US2016/064476
Publication Date: 08.06.2017 International Filing Date: 01.12.2016
IPC:
G06N 3/08 (2006.01) ,G06F 17/30 (2006.01) ,G06Q 30/06 (2012.01)
G PHYSICS
06
COMPUTING; CALCULATING; COUNTING
N
COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
3
Computer systems based on biological models
02
using neural network models
08
Learning methods
G PHYSICS
06
COMPUTING; CALCULATING; COUNTING
F
ELECTRIC DIGITAL DATA PROCESSING
17
Digital computing or data processing equipment or methods, specially adapted for specific functions
30
Information retrieval; Database structures therefor
G PHYSICS
06
COMPUTING; CALCULATING; COUNTING
Q
DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
30
Commerce, e.g. shopping or e-commerce
06
Buying, selling or leasing transactions
Applicants:
DEEPMIND TECHNOLOGIES LIMITED [GB/GB]; 5 New Street Square London EC4A 3TW, GB
Inventors:
SUNEHAG, Peter Goran; GB
Agent:
PORTNOV, Michael; US
TROESCH, Hans R.; US
Priority Data:
62/261,78101.12.2015US
Title (EN) SELECTING ACTION SLATES USING REINFORCEMENT LEARNING
(FR) SÉLECTION DE LISTES D'ACTIONS EN UTILISANT L'APPRENTISSAGE PAR RENFORCEMENT
Abstract:
(EN) Methods, systems, and apparatus, including computer programs encoded on computer storage media, for selecting action slates using reinforcement learning. One of the methods includes receiving an observation characterizing a current state of an environment; selecting an action slate by processing the observation and a plurality of candidate action slates using a deep neural network, wherein each candidate action slate comprises a respective plurality of actions from the set of actions, and wherein the deep neural network is configured to, for each of the candidate action slates, process the observation and the actions in the candidate action slate to generate a slate Q value for the candidate action slate that is an estimate of a long-term reward resulting from the candidate action slate being provided to the action selector in response to the observation; and providing the selected action slate to an action selector in response to the observation.
(FR) L'invention concerne des procédés, des systèmes et des appareils, y compris des programmes d'ordinateur codés sur des supports de stockage informatiques, pour la sélection de listes d'actions en utilisant l'apprentissage par renforcement. L'un des procédés comprend la réception d'une observation caractérisant un état actuel d'un environnement; la sélection d'une liste d'actions en traitant l'observation et une pluralité de listes d'actions candidates en utilisant un réseau neuronal profond. Chaque liste d'actions candidate comprend une pluralité respective d'actions parmi l'ensemble d'actions, et le réseau neuronal profond est conçu, pour chacune des listes d'actions candidate, pour traiter l'observation et les actions dans la liste d'actions candidate afin de générer une valeur de liste Q pour la liste d'actions candidate qui est une estimation d'une récompense à long terme résultant du fait que la liste d'actions candidate est fournie au sélecteur d'action en réponse à l'observation. Le procédé comprend en outre la fourniture de la liste d'actions sélectionnée à un sélecteur d'action en réponse à l'observation.
front page image
Designated States: AE, AG, AL, AM, AO, AT, AU, AZ, BA, BB, BG, BH, BN, BR, BW, BY, BZ, CA, CH, CL, CN, CO, CR, CU, CZ, DE, DJ, DK, DM, DO, DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT, HN, HR, HU, ID, IL, IN, IR, IS, JP, KE, KG, KN, KP, KR, KW, KZ, LA, LC, LK, LR, LS, LU, LY, MA, MD, ME, MG, MK, MN, MW, MX, MY, MZ, NA, NG, NI, NO, NZ, OM, PA, PE, PG, PH, PL, PT, QA, RO, RS, RU, RW, SA, SC, SD, SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ, TM, TN, TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW
African Regional Intellectual Property Organization (ARIPO) (BW, GH, GM, KE, LR, LS, MW, MZ, NA, RW, SD, SL, ST, SZ, TZ, UG, ZM, ZW)
Eurasian Patent Organization (AM, AZ, BY, KG, KZ, RU, TJ, TM)
European Patent Office (AL, AT, BE, BG, CH, CY, CZ, DE, DK, EE, ES, FI, FR, GB, GR, HR, HU, IE, IS, IT, LT, LU, LV, MC, MK, MT, NL, NO, PL, PT, RO, RS, SE, SI, SK, SM, TR)
African Intellectual Property Organization (BF, BJ, CF, CG, CI, CM, GA, GN, GQ, GW, KM, ML, MR, NE, SN, TD, TG)
Publication Language: English (EN)
Filing Language: English (EN)
Also published as:
EP3384435CN108604314