(EN)
There is described a machine learning system comprising a first subsystem and a second subsystem remote from the first subsystem. The first subsystem comprises an environment having multiple possible states and a decision making subsystem comprising one or more agents. Each agent is arranged to receive state information indicative of a current state of the environment and to generate an action signal dependent on the received state information and a policy associated with that agent, the action signal being operable to cause a change in a state of the environment. Each agent is further arranged to generate experience data dependent on the received state information and information conveyed by the action signal. The first subsystem includes a first network interface configured to send said experience data to the second subsystem and to receive policy data from the second subsystem. The second subsystem comprises: a second network interface configured to receive experience data from the first subsystem and send policy data to the first subsystem; and a policy learner configured to process said received experience data to generate said policy data, dependent on the experience data, for updating one or more policies associated with the one or more agents. The decision making subsystem is operable to update the one or more policies associated with the one or more agents in accordance with policy data received from the second subsystem.