Traitement en cours

Veuillez attendre...

Paramétrages

Paramétrages

Aller à Demande

1. WO2020160787 - PROCÉDÉ DE QUANTIFICATION DE RÉSEAU NEURONAL FAISANT INTERVENIR DE MULTIPLES NOYAUX QUANTIFIÉS AFFINÉS POUR UN DÉPLOIEMENT DE MATÉRIEL CONTRAINT

Note: Texte fondé sur des processus automatiques de reconnaissance optique de caractères. Seule la version PDF a une valeur juridique

[ EN ]

CLAIMS

1. A method of configuring a neural network, trained from a plurality of data samples, comprising:

quantizing each layer of the neural network to produce a quantized neural network with a plurality of respective scaling factors;

locating one or more layers of the quantized neural network;

computing a modified quantization for the one or more located layers to produce a modified quantized neural network; and

adjusting the plurality of scaling factors of the modified quantized neural network by computing a similarity between a plurality of neural network outputs and a plurality of modified quantized neural network outputs.

2. The method of claim 1, wherein the neural network is a convolutional neural network.

3. The method of claim 1, wherein the configuration is performed on a plurality of weights of the neural network, by:

quantizing each layer of the neural network by quantizing each kernel of the plurality of kernels of each layer of the neural network to produce a quantized neural network with a plurality of respective scaling factors.

4. The method of claim 3, wherein applying the quantization of the plurality of kernels is performed uniformly for groups of kernels of the plurality of kernels.

5. The method of claim 3, wherein locating one or more layers of the quantized neural network further comprises:

comparing a reconstruction error computed between the quantized neural network and the neural network to a predefined error threshold.

6. The method of claim 3, wherein computing a modified quantization for the one or more located layers further comprises:

alternating between each located layer of the one or more located layers, until a predefined convergence criteria is met:

computing a modified quantization for a respective located layer by using one or more additional quantization(s) for the respective located layer, to produce an intermediately modified quantized neural network;

computing a modified scaling factor for the respective located layer, by minimizing a distance metric between the quantized neural network and the intermediately modified quantized neural network; and

assigning the modified quantization(s) and the modified scaling factor to the respective located layer of the intermediately modified quantized neural network.

7. The method of claim 3, wherein adjusting the plurality of scaling factors of the modified quantized neural network further comprises:

for each layer of the modified quantized neural network:

computing a scaling factor by minimizing a distance metric between outputs of the neural network and the modified quantized neural network, using a plurality of calibration data sets; and

assigning the scaling factor to the respective layer.

8. The method of claim 7, wherein data labels of the plurality of calibration data sets are used in adjusting the plurality of scaling factors.

9. The method of claim 1, wherein the configuration is performed on a plurality of activations of the neural network by:

computing a scaling factor for each layer of the neural network by minimizing a reconstruction error estimated between activations of the neural network on a respective layer and approximations of activations of the respective layer, wherein the activations are calculated on a plurality of calibration datasets;

assigning each layer of the neural network a respective computed scaling factor to produce a modified neural network;

locating one or more layers of the modified neural network according to a predefined weight error threshold computed on each layer of the modified neural network; and

assigning a second scaling factor for each located layer by minimizing a reconstruction error estimated between activations of the modified neural network on a respective located layer and approximations of activations of the respective located layer, wherein the activations are calculated on a plurality of calibration datasets.

10. The method of claim 6, wherein computing a modified quantization for a located layer is performed by using one additional quantization for the respective located layer, and the first and second scaling factors are computed by minimizing a reconstruction error consisting of a quadratic term.

11. The methods of any one of the claims 3,9,10, further comprising:

configuring the neural network by configuring the plurality of weights of the neural network, and/or configuring the plurality of activations of the neural network; and

processing data inputs by using the configured neural network.

12. A system for configuring a neural network, trained from a plurality of data samples, comprising:

processing circuitry, configured to:

quantizing each layer of the neural network to produce a quantized neural network with a plurality of respective scaling factors;

locating one or more layers of the quantized neural network;

computing a modified quantization for the one or more located layers to produce a modified quantized neural network; and

adjusting the plurality of scaling factors of the modified quantized neural network by computing a similarity between a plurality of neural network outputs and a plurality of modified quantized neural network outputs.

13. A non-transitory computer-readable storage medium comprising a program code which, when executed by a computer, causes the computer to execute the method of claim 12.