Processing

Please wait...

Settings

Settings

Goto Application

1. WO2021144126 - CONTROL OF DATA TRANSFER BETWEEN PROCESSORS

Note: Text based on automatic Optical Character Recognition processes. Please use the PDF version for legal matters

[ EN ]

Claims

1. A data processing system comprising a plurality of processors, wherein each of the processors comprises at least one circuit configured to perform data transfer operations during each of at least some of a plurality of exchange stages to transfer data determined in dependence upon data received at the respective processor in a preceding one of the exchange stages from at least one other of the processors, each of the data transfer operations being for transfer of data to another one of the plurality of processors, wherein each at least one circuit is configured to:

perform data transfer operations to transfer outgoing data to one or more others of the processors during a first of the exchange stages;

receive incoming data from the one or more others of the processors during the first of the exchange stages;

determine further outgoing data in dependence upon at least part of the incoming data;

count an amount of at least part of the incoming data received during the first of the exchange stages from the one or more others of the processors; and

in response to determining that the amount of the at least part of the incoming data received has reached a predefined amount, perform data transfer operations to transfer the further outgoing data to the one or more others of the processors during a second of the exchange stages.

2. A data processing system as claimed in claim 1, wherein each of the at least one circuits is configured to:

prior to the determining that the amount of the at least part of the incoming data received has reached the predefined amount, perform only some of the data transfer operations to transfer only part of the outgoing data to one or more others of the processors; and

in response to the determining that the amount of incoming data received has reached the predefined amount:

perform remaining data transfer operations to transfer a remaining part of the outgoing data to the one or more others of the processors during the first of the exchange stages; and

subsequently, perform the data transfer operations to transfer the further outgoing data to the one or more others of the processors during the second of the exchange stages.

3. A data processing system as claimed in claim 2, wherein each of the at least one circuits is configured to:

count an amount of a further part of the incoming data received during the first of the exchange stages from the one or more others of the processors; and

following starting to perform the remaining data transfer operations, determine that the amount of the further part of the incoming data received has reached a predefined amount,

wherein the subsequently, perform the data transfer operations to transfer the further outgoing data to the one or more others of the processors during the second of the exchange stages is performed in response to determining that the amount of the further part of the incoming data received has reached a predefined amount.

4. A data processing system as claimed in claim 3, wherein the at least part of the incoming data is addressed to a first location in the processor, wherein the further part of the incoming data is addressed to a second location in the processor.

5. A data processing system as claimed in any preceding claim, wherein, for each of the processors, the one or more others of the processors comprises two or more processors.

6. A data processing system as claimed in claim 5, wherein, for each of the processors, the two or more processors comprises only two processors.

7. A data processing system as claimed in any preceding claim, wherein each of the processors comprises a plurality of processing units, each of at least some of the plurality of processing units being configured to:

receive part of the incoming data from the one or more others of the processors; and

send part of the outgoing data to the one or more others of the processors; wherein the steps of counting the amount of incoming data received and determining that the amount of the incoming data received has reached the predefined amount are performed by one or more of the plurality of processing units of a first type.

8. A data processing system as claimed in claim 7, when dependent upon claim 3, wherein each processor comprises two of the plurality of processing units of the first type, wherein for each processor:

a first of the plurality of processing units of the first type is configured to perform the steps of counting the amount of incoming data received and determining that the amount of the incoming data received has reached the predefined amount,

a second of the plurality of processing units of the first type is configured to perform the steps of counting the amount of the further part of the incoming data received and determine that the amount of the further part of the incoming data received has reached a predefined amount.

9. A data processing system as claimed in claim 7 or claim 8, wherein each of some of the at least some of the plurality of processing units is configured to, subsequent to performing its respective operations to send part of the outgoing data, cause control to pass to another one of the at least some of the plurality of processing units for that another one to perform its respective operations to send part of the outgoing data.

10. A data processing system as claimed in claim 9, wherein each of the one or more of the plurality of processing units of the first type is configured to perform the causing of control to pass in response to determining that an amount of a part of the incoming data received has reached a predetermined amount.

11. A data processing system as claimed in any preceding claim, wherein each of the incoming data, outgoing data, and further outgoing data comprise a set of gradients for weights of a machine learning model.

12. A data processing system as claimed in any preceding claim, wherein each of the at least one circuit comprises:

counting circuitry configured to perform the counting an amount of the incoming data received during the first of the exchange stages; and

an execution unit configured to execute computer readable instructions to:

poll the counting circuitry to determine the amount of the incoming data received; and

determine that the amount of the incoming data received has reached the predefined amount.

13. A data processing system as claimed in any preceding claim, wherein the at least one circuit comprises a remote direct memory access engine configured to perform the data transfer operations during each of a plurality of exchange stages.

14. A data processing system as claimed in any preceding claim, wherein the plurality of processors are arranged in a ring topology such that the at least one circuit of each processor is configured to perform the data transfer operations during each of the plurality of exchange stages to transfer data to its two neighbouring processors in the ring, wherein the counting the amount of the incoming data received during the first of the exchange stages from the one or more others of the processors comprises counting an amount of data received from the two neighbouring processors during the first of the exchange stages.

15. A data processing system as claimed in any preceding claim, wherein the determining further outgoing data in dependence upon at least part of the incoming data comprises reducing the at least part of the incoming data with data stored in memory of the respective processor.

16. A data processing system as claimed in claim 15, wherein the at least one circuits of the plurality of processors are configured to implement a reduce-scatter collective comprising the steps of each of the at least one circuits:

transferring data determined in dependence upon data received at the respective processor in a preceding stage from at least one other of the processors; and determining further outgoing data in dependence upon at least part of the incoming data.

17. A data processing system as claimed in any preceding claim, wherein the at least one circuit comprises at least one of a field programmable gate array or application specific integrated circuit configured to performing the counting of an amount of the incoming data received during the first of the exchange stages from the one or more others of the processors.

18. A method implemented in a data processing system comprising a plurality of processors, the method comprising at each of the processors:

performing data transfer operations during each of at least some of a plurality of exchange stages to transfer data determined in dependence upon data received at the respective processor in a preceding one of the exchange stages from at least one other of the processors, each of the data transfer operations being for transfer of data to another one of the plurality of processors;

performing data transfer operations to transfer outgoing data to one or more others of the processors during a first of the exchange stages;

receiving incoming data from the one or more others of the processors during the first of the exchange stages;

determining further outgoing data in dependence upon at least part of the incoming data;

counting an amount of at least part the incoming data received during the first of the exchange stages from the one or more others of the processors; and

in response to determining that the amount of the at least part of the incoming data received has reached a predefined amount, performing data transfer operations to transfer the further outgoing data to the one or more others of the processors during a second of the exchange stages.