(EN) The present invention discloses methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for a system configured to select actions to be performed by an agentthat interacts with an environment. The system comprises a manager neural network subsystem and a worker neural network subsystem. The manager subsystem is configured to, at each of the multiple timesteps, generate a final goal vector for the time step. The worker subsystem is configured to, at each of multiple time steps, use the final goal vector generated by the manager subsystem to generatea respective action score for each action in a predetermined set of actions.
(ZH) 公开了方法、系统、和装置,包括在计算机存储介质上编码的计算机程序,用于被配置为选择要由与环境交互的代理执行的动作的系统。系统包括管理者神经网络子系统和工作者神经网络子系统。管理者子系统被配置为在多个时间步中的每一个时间步处生成时间步的最终目标向量。工作者子系统被配置为在多个时间步中的每一个时间步处,使用由管理者子系统生成的最终目标向量来为预定动作集中的每个动作生成相应的动作得分。