Processing

Please wait...

Settings

Settings

Goto Application

1. WO2020109608 - MACHINE LEARNING FOR PROTEIN BINDING SITES

Note: Text based on automatic Optical Character Recognition processes. Please use the PDF version for legal matters

[ EN ]

Claims

1 . A computer-implemented method of training a machine learning model to learn ligand binding similarities between protein binding sites, the method comprising:

inputting to the machine learning model:

a representation of a first binding site;

a representation of a second binding site, wherein the representations of the first and second binding sites comprise structural information; and

a label comprising an indication of ligand binding similarity between the first binding site and the second binding site;

outputting from the machine model a similarity indicator based on the representations of the first and second binding sites;

performing a comparison between the similarity indicator and the label; and

updating the machine learning model based on the comparison.

2. The computer-implemented method of claim 1 , wherein the structural information relates to three-dimensional structure of the binding sites.

3. The computer-implemented method of claim 2, wherein the structural information comprises volumetric information.

4. The computer-implemented method of claim 3, wherein the representations of the first and second binding sites each comprise an encoded three-dimensional grid of voxels, each voxel being associated with an occupancy value indicating whether an atom is present.

5. The computer-implemented method of claim 4, wherein each voxel is associated with a further value indicating a property selected from the set of hydrophobicity, aromaticity, acceptance or donation of a hydrogen bond, positive or negative ionizability, and being metallic.

6. The computer-implemented method of any preceding claim, wherein the machine learning model comprises a neural network.

7. The computer-implemented method of claim 6, wherein the neural network comprises one or more convolutional layers.

8. The computer-implemented method of claim 6 or 7, wherein the neural network comprises one or more max-pooling layers.

9. The computer-implemented method of claim 6, 7 or 8, wherein the neural network comprises a steerable three-dimensional convolutional neural network.

10. The computer-implemented method of any of claims 6 to 9, wherein the neural network comprises a deep learning neural network.

1 1 . The computer-implemented method of any preceding claim, wherein performing the comparison comprises minimising a loss function.

12. The computer-implemented method of claim 1 1 , wherein updating the machine learning model comprises performing back propagation using the minimised loss function.

13. The computer-implemented method of claim 1 1 or 12, wherein the loss function comprises a contrastive loss representing a loss between the similarity indicator and the label.

14. The computer-implemented method of claim 1 1 or 12, wherein the loss function comprises a triplet loss based on a pair of binding sites, a reference binding site and the label.

15. The computer-implemented method of any preceding claim, comprising jittering the binding sites in input space.

16. The computer-implemented method of any preceding claim, wherein the label comprises a binary value indicating whether the first and second binding sites bind structurally similar ligands.

17. A neural network model obtained from a computer implemented method according to any one of claims 6 to 10.

18. A computer-implemented method of using a neural network model, wherein the neural network model is obtained from a computer implemented method according to any one of claims 6 to 10, the method of using the neural network model comprising:

inputting to the neural network model respective representations of third and fourth binding sites; and

using the neural network model to output a ligand binding similarity indicator.

19. The computer-implemented method of claim 18, wherein the ligand binding similarity indicator comprises an indication of whether the first and second binding sites are likely to bind structurally similar ligands.

20. An apparatus comprising a processor, a memory unit and a communication interface, wherein the processor is connected to the memory unit and the communication interface, wherein the processor and memory are configured to implement the computer-implemented method according to any one of claims 1 to 16, 18 or 19.

21 . A computer-readable medium comprising data or instruction code representative of a machine learning model generated according to the method of any one of claims 1 to 16, 18 or 19, which when executed on a processor causes the processor to implement the machine learning model.

22. A computer-readable medium comprising data or instruction code which, when executed on a processor, causes the processor to implement the computer-implemented method of any of claims 1 to 16, 18 or 19.