Traitement en cours

Veuillez attendre...

Paramétrages

Paramétrages

Aller à Demande

1. WO2020157731 - EXÉCUTION DE TÂCHES MULTI-OBJECTIFS AU MOYEN DE RÉSEAUX PRIMAIRES APPRIS AVEC DES RÉSEAUX DOUBLES

Note: Texte fondé sur des processus automatiques de reconnaissance optique de caractères. Seule la version PDF a une valeur juridique

[ EN ]

CLAIMS

1. A system, comprising a processor to:

receive data for a multi-objective task; and

perform the multi-objective task on the received data via a trained primal network, wherein the primal network and a dual network are trained for a multi-objective task using a Lagrangian loss function representing a plurality of objectives, wherein the primal network is trained to minimize the Lagrangian loss function and the dual network is trained to maximize the Lagrangian loss function.

2. The system of claim 1 , wherein the multi-objective task comprises a Markov Decision process comprising a finite state space and a finite action space.

3. The system of claim 1, wherein the primal network is pretrained using a general policy learned from another setting or a random initialization.

4. The system of claim 1, wherein the dual network is randomly initialized during training.

5. The system of claim 1 , wherein the primal network comprises a step size that is different than a step size of the dual network during training.

6. The system of claim 1 , wherein the processor is operable to estimate gradients based on a likelihood ratio estimate.

7. The system of claim 1 , wherein the multi-objective task comprises a selection, a classification, a regression, a recommendation, a generation, or a prediction task.

8. The system of claim 1 , wherein the data that the processor is operable to receive is a prefix of conversation and a text input and wherein the processor is operable to generate a completed response based on the prefix of conversation and the text input via the trained primal network.

9. The system of claim 8, wherein the processor is operable to:

generate a plurality of completed responses;

present the plurality of completed responses comprising the completed response to a user for selection; receive a selected response from the completed responses; and

send the selected response to a second user.

10. The system of claim 8, wherein the prefix of conversation comprises a dialogue between a first user and a second user, and the text input comprises a portion of the completed response.

11. The system of claim 8, wherein the primal network and the dual network comprise long short-term memory (LSTM) models with different parameters and possibly additional network elements.

12. The system of claim 8, wherein the plurality of objectives comprises a perplexity objective or a relevance objective.

13. The system of claim 8, wherein the plurality of objectives comprises a redundancy unlikelihood objective or a semantic dissimilarity objective.

14. The system of claim 13, wherein the plurality of objectives comprises a semantic coherence objective.

15. A computer-implemented method, comprising:

training a primal network and a dual network for a multi-objective task using a Lagrangian loss function representing a plurality of objectives, wherein training the primal network and the dual network comprises training the primal network to minimize the Lagrangian loss function and training the dual network to maximize the Lagrangian loss function;

receiving data for the multi-objective task; and

performing the multi-objective task on the received data via the trained primal network.

16. The computer-implemented method of claim 15, comprising the multi-objective task as a Markov Decision process comprising a finite state space and a finite action space.

17. The computer-implemented method of claim 15, comprising pretraining the primal network using a general policy learned from another setting or randomly initializing the primal network during training.

18. The computer-implemented method of claim 15, comprising randomly initializing the dual network during training.

19. The computer-implemented method of claim 15, wherein training the primal network and the dual network comprises estimating gradients for the primal network and the dual network based on a likelihood ratio.

20. The computer-implemented method of claim 15, comprising updating policy gradients of the primal network and the dual network based on different step sizes for the primal network and the dual network.

21. The computer-implemented method of claim 15, wherein training the primal network and the dual network comprises alternately training the primal network and the dual network.

22. The computer-implemented method of claim 15 wherein the data received is a prefix of conversation and a text input, and wherein the method comprises:

generating a completed response based on the prefix of conversation and the text input via the trained primal network.

23. The computer-implemented method of claim 22, comprising:

generating a plurality of completed responses;

presenting the plurality of completed responses comprising the completed response to a user for selection; receiving a selected response from the completed responses; and

sending the selected response to a second user.

24. The computer-implemented method of claim 22, comprising:

sending the completed response as a response to an inquiry in response to detecting that a confidence score of the completed response exceeds a threshold score.

25. The computer-implemented method of claim 22, wherein generating the completed response comprises iteratively building the completed response beginning with the text input word by word.

26. The computer-implemented method of claim 22, wherein generating the completed response comprises beam searching to generate a plurality of completed responses.

27. The computer-implemented method of claim 22, comprising training the primal network using a first limit of turns of conversation and gradually increasing the first limit to a second limit of turns of conversation.

28. The computer-implemented method of claim 22, comprising training the primal network using sequences with lower likelihood of generating redundant responses among all sequences in a training dataset.

29. A computer program product for training neural networks to perform multi-objective tasks, the computer program product comprising a computer-readable storage medium having program code embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program code executable by a processor to cause the processor to:

train a primal network and a dual network for a multi-objective task using a Lagrangian loss function representing a plurality of objectives;

train the primal network to minimize the Lagrangian loss function and train the dual network to maximize the Lagrangian loss function;

receive data for the multi-objective task; and

perform the multi-objective task on the received data via the trained primal network.

30. The computer program product of claim 29, further comprising program code executable by the processor to train the primal network and the dual network using a pre-existing dataset, a simulator, a feedback from an environment, or any combination thereof.

31. The computer program product of claim 29, further comprising program code executable by the processor to pretrain the primal network using a general policy learned from another setting or by randomly initializing the primal network during training.

32. The computer program product of claim 29, further comprising program code executable by the processor to estimate gradients for the primal network and the dual network based on a likelihood ratio.

33. The computer program product of claim 29, further comprising program code executable by the processor to update policy gradients of the primal network and the dual network based on different step sizes for the primal network and the dual network.

34. The computer program product of claim 29, further comprising program code executable by the processor to randomly initialize the dual network during training.

35. The computer program product of claim 29, wherein the data received is a prefix of conversation and a text input, and wherein the program code is executable by the processor to cause the processor to:

generate a completed response based on the prefix of conversation and the text input via the trained primal network.

36. The computer program product of claim 35, further comprising program code executable by the processor to:

generate a plurality of completed responses;

present the plurality of completed responses comprising the completed response to a user for selection; receive a selected response from the completed responses; and

send the selected response to a second user.

37. The computer program product of claim 35, further comprising program code executable by the processor to:

send the completed response as a response to an inquiry in response to detecting that a confidence score of the completed response exceeds a threshold score.

38. The computer program product of claim 35, further comprising program code executable by the processor to iteratively build a sentence beginning with the text input word by word.

39. The computer program product of claim 35, further comprising program code executable by the processor to generate a plurality of completed responses comprising the completed response using beam searching.

40. The computer program product of claim 35, further comprising program code executable by the processor to train the primal network using a first limit of turns of conversation and gradually increase the first limit to a second limit of turns of conversation.

41. A computer program comprising program code means adapted to perform the method of any of claims 1 to 28 when said program is run on a computer.