Processing

Please wait...

Settings

Settings

Goto Application

1. WO2021044365 - METHOD AND SYSTEM FOR GENERATING SYNTHETICALLY ACCESSIBLE MOLECULES WITH CHEMICAL REACTION TRAJECTORIES USING REINFORCEMENT LEARNING

Note: Text based on automatic Optical Character Recognition processes. Please use the PDF version for legal matters

[ EN ]

CLAIMS

1. A method for generating a synthetically accessible molecule using a Markov decision process (MDP) in reinforcement learning, the method being executed by a processor, the processor being operatively connected to a database, the database comprising a plurality of transformations to apply on molecules, the method comprising:

receiving an indication of a given molecule;

generating, based on the indication of the given molecule, a current state; selecting, by a control policy, from the database, an action to apply on the current state to obtain a product state, the action comprising a transformation of the plurality of transformations and the product state corresponding to an other molecule;

generating, based on the current state and the action, the product state; determining if the product state corresponds to a terminal state;

in response to the product state corresponding to the terminal state:

updating, based on the reward value, at least one parameter of the control policy to obtain at least one updated parameter; and outputting the product state corresponding to the other molecule as the synthetically accessible molecule;

2. The method of claim 1, further comprising:

in response to the product state not corresponding to the terminal state: selecting, by the control policy, a further action to apply on the product state to obtain a further product state, the further action corresponding to a further transformation and the further product state corresponding to a further molecule.

3. The method of claim 1 or 2, wherein the method further comprises, prior to said determining, if the product state corresponds to a terminal state:

calculating a reward value of the product state using a reward function; and wherein

said determining if the product state corresponds to the terminal state is based on at least one of: the reward value, a number of steps, a set of properties of the product state and a selected action.

4. The method of any one of claims 1 to 3, wherein said selecting, from the database, the action to apply on the current state to obtain the product state comprises: selecting, from the database, at least one reactant to perform the transformation.

5. The method of any one of claims 1 to 4, wherein said selecting, from the database, the action comprising the transformation to apply on the current state to obtain the product state is based on the indication of the given molecule.

6. The method of any one of claims 1 to 5, wherein the transformation comprises one of: an addition of a molecular fragment, a deletion of a molecular fragment, and a substitution of a molecular fragment.

7. The method of any one of claims 1 to 6, further comprising, prior to said calculating the reward value using the reward function:

receiving, from the database, based on the product state, a set of properties; and wherein

said calculating the reward value using the reward function is based on the set of properties.

8. The method of claim 7, wherein the set of properties comprise one or more of: an absorption distribution metabolism and excretion (ADME), an ADME-toxicity, a liberation, a bioavailability, a ligand efficiency, a lipophilic efficiency, a potency at a biological target, and a solubility.

9. The method of claim 8, wherein the calculating the reward value using the reward function based on the set of properties comprises scalarizing a reward vector comprising the set of properties to obtain a final reward value.

10. The method of claim 9, wherein the reward function comprises a deterministic reward function.

11. The method of any one of claims 1 to 10, further comprising, prior to said generating, based on the indication of the given molecule, the initial state:

generating, using the indication of the given molecule, a feature vector thereof; and wherein

said generating the initial state is based on the feature vector of the given molecule.

12. The method of any one of claims 1 to 11, wherein the feature vector is generated using a Morgan Fingerprint.

13. The method of any one of claims 1 to 12, further comprising, prior to said receiving the indication of the given molecule:

learning the control policy using one of a value-based approach and a policy-based approach.

14. The method of any one of claims 1 to 12, further comprising, prior to said receiving the indication of the given molecule:

learning the control policy using a non-hierarchical approach.

15. The method of claim 14, wherein the non-hierarchical approach comprises an an actor-critic architecture.

16. The method of any one of claims 1 to 12, further comprising, prior to said receiving the indication of the given molecule:

learning the control policy using a hierarchical approach.

17. The method of claim 16, wherein the hierarchical approach comprises an option- critic architecture.

18. The method of any one of claims 1 to 12, wherein said selecting the action to apply on the current state to obtain the product state comprises:

selecting, from the database, an option corresponding to the transformation; and

selecting, from the database, a reactant of a plurality of reactant for applying the transformation.

19. The method of any one of claims 1 to 18, wherein said receiving the indication of the given molecule comprises selecting, based on a set of chemical transformations, the given molecule.

20. A system for generating a synthetically accessible molecule using a Markov decision process (MDP) in reinforcement learning, the system comprising:

a processor;

a non-transitory storage medium operatively connected to the processor, the non-transitory storage medium comprising:

a plurality of transformations to apply on molecules, and computer-readable instructions;

the processor, upon executing the computer-readable instructions, being configured for:

receiving an indication of a given molecule;

generating, based on the indication of the given molecule, a current state; selecting, by a control policy, from the non-transitory storage medium, an action to apply on the current state to obtain a product state, the action comprising a transformation of the plurality of transformations and the product state corresponding to an other molecule;

generating, based on the current state and the action, the product state; determining if the product state corresponds to a terminal state; in response to the product state corresponding to the terminal state:

updating, based on a reward value, at least one parameter of the control policy to obtain at least one updated parameter; and outputting the product state corresponding to the other molecule as the synthetically accessible molecule;

21. The system of claim 20, wherein the processor is further configured for:

in response to the product state not corresponding to the terminal state: selecting, by the control policy, a further action to apply on the product state to obtain a further product state, the further action corresponding to a further transformation and the further product state corresponding to a further molecule.

22. The system of claim 20 or 21, wherein the processor is further configured for prior to said determining, if the product state corresponds to a terminal state: calculating the reward value of the product state using a reward function; and wherein

said determining if the product state corresponds to the terminal state is based on at least one of: the reward value, a number of steps, a set of properties of the product state and a selected action.

23. The system of any one of claims 20 to 22, wherein said selecting, from the non- transitory storage medium, the action to apply on the current state to obtain the product state comprises: selecting, from the non-transitory storage medium, at least one reactant to perform the transformation.

24. The system of any one of claims 20 to 23, wherein said selecting, from the non- transitory storage medium, the action comprising the transformation to apply on the current state to obtain the product state is based on the indication of the given molecule.

25. The system of any one of claims 20 to 24, wherein the transformation comprises one of: an addition of a molecular fragment, a deletion of a molecular fragment, and a substitution of a molecular fragment.

26. The system of any one of claims 20 to 25, wherein the processor is further configured for, prior to said calculating the reward value using the reward function:

receiving, from the non-transitory storage medium, based on the product state, a set of properties; and wherein

said calculating the reward value using the reward function is based on the set of properties.

27. The system of claim 26, wherein the set of properties comprise one or more of: an absorption distribution metabolism and excretion (ADME), an ADME-toxicity, a liberation, a bioavailability, a ligand efficiency, a lipophilic efficiency, a potency at a biological target, and a solubility.

28. The system of claim 27, wherein the calculating the reward value using the reward function based on the set of properties comprises scalarizing a reward vector comprising the set of properties to obtain a final reward value.

29. The system of claim 28, wherein the reward function comprises a deterministic reward function.

30. The system of any one of claims 20 to 29, wherein the processor is further configured for, prior to said generating, based on the indication of the given molecule, the initial state:

generating, using the indication of the given molecule, a feature vector thereof; and wherein

said generating the initial state is based on the feature vector of the given molecule.

31. The system of any one of claims 20 to 30, wherein the feature vector is generated using a Morgan Fingerprint.

32. The system of any one of claims 20 to 31, wherein the processor is further configured for, prior to said receiving the indication of the given molecule:

learning the control policy using one of a value-based approach and a policy-based approach.

33. The system of any one of claims 20 to 31, wherein the processor is further configured for, prior to said receiving the indication of the given molecule:

learning the control policy using a non-hierarchical approach.

34. The system of claim 33, wherein the non-hierarchical approach comprises an actor-critic architecture.

35. The system of any one of claims 20 to 31, wherein the processor is further configured for, prior to said receiving the indication of the given molecule:

learning the control policy using a hierarchical approach.

36. The system of claim 35, wherein the hierarchical approach comprises an option- critic architecture.

37. The system of any one of claims 20 to 31, wherein said selecting the action to apply on the current state to obtain the product state comprises:

selecting, from the non-transitory storage medium, an option corresponding to the transformation; and

selecting, from the non-transitory storage medium, a reactant of a plurality of reactant for applying the transformation.

38. The system of any one of claims 20 to 37, wherein the processor is further configured for selecting, based on a set of chemical transformations, the given molecule.