Search International and National Patent Collections
Some content of this application is unavailable at the moment.
If this situation persists, please contact us atFeedback&Contact
1. (WO2019030409) SYSTEMS AND METHODS FOR JOINING DATASETS
Note: Text based on automatic Optical Character Recognition processes. Please use the PDF version for legal matters

CLAIMS:

1. A method of joining a first dataset with a second dataset, the first dataset configured to store a set of data entries each identified by a respective key of a first type and the second dataset configured to store a second set of data entries identified by a respective key of a second type, the method comprising:

selecting an intermediate mapping entity from a set of possible intermediate mapping entities, each mapping entity storing association between keys of the first type and keys of the second type;

providing the selected intermediate mapping entity for use in joining the first data set with the second data set;

wherein the step of selecting the intermediate mapping entity is based on the intersection weight between the first and second data sets via each of the intermediate mapping entities, wherein the intersection weight is the proportion of overlapping data entries between the first and second datasets.

2. A method according to claim 1 , wherein the step of selecting an intermediate mapping entity comprises locating a plurality of possible mapping entities and defining multiple paths from the first dataset to the second dataset, each path including at least one of the mapping entities.

3. A method according to claim 2 comprising the step of determining whether to use a single best one of the multiple paths or more than one of the multiple paths.

4. A method according to claim 3 when it is determined to use a single best paths, the method comprising the step of determining a respective set of path parameters for utilising each of the multiple paths, and selecting the best single path based on optimising a heuristic of the set of path parameters.

5. A method according to claim 3, comprising the step of determining an overlap intersection weight between an intermediate mapping entity of a first one of the multiple paths and an intermediate mapping entity of a second one of the multiple paths in each set of possible multiple paths, and determining which set of multiple paths to utilise based on the overlap intersection weights.

6. A method according to claim 5 comprising the step of using the overlap intersection weights to determine that a single best path should be utilised.

7. A method according to any preceding claim, wherein the intersection weights are stored in a data structure in which each intermediate mapping entity is held in association with its intersection weights between each of the first and second datasets.

8. A method according to claim 1 , wherein the step of selecting is based on a first intersection weight which represents the overlap in a first direction from the data set to the intermediate mapping entity.

9. A method according to claim 1 , wherein the step of selecting is based on a second intersection weight in a second direction from the intermediate mapping entity to the data set.

10. A method according to any preceding claim, wherein the proportion is based on the absolute number of overlapping keys with the intermediate mapping entity, relative to the size of the dataset.

1 1. A method according to claim 4, wherein the proportion is the number of overlapping keys as a percentage of the size of the source dataset.

12. A method according to any preceding claim, wherein the step of selecting the intermediate mapping entity comprises locating a plurality of possible intermediate mapping entities, determining for each of the possible intermediate mapping entities the intersection weight between the first data set and the intermediate mapping entity, and between the second data set and the intermediate mapping entity , and selecting a preferred intermediate mapping entity by comparing the combined intersection weights between the first and second data sets and each intermediate mapping entity and selecting the greatest combined intersection weights.

13. A method according to any preceding claim comprising joining the first data set with the second data using multiple selected intermediate mapping entities.

14. A method according to claim 7 comprising:

a first step of joining the first data set with the second data set using one of the multiple intermediate mapping entities, and

a second step of joining the first data set with the second data set using a second one of the multiple intermediate mapping entities;

combining joined entries resulting from the first and second steps.

15. A method according to claim 3, wherein the path parameters include quality and speed of executing a join using the single best path.

16. A method according to any preceding claim comprising:

accessing the selected intermediate mapping entity to transform the set of keys of the first type into a generated set of keys of the second type;

supplying the generated set of key of the second type to the second data set to cause a second data set to:

determine at least one second data entry which identified by a key of the second type which matches one of the set generated set of keys of the second type;

return the at least one second data entry for joining with a first set of data entries identified by the set of keys of the first type.

17. A computer comprising:

a memory holding a data structure which stores intermediate mapping entities in association with intersection weights between the intermediate mapping entity and each of first and second datasets, and

a processor configured to execute a computer program which when executed carries out the method according to any of claims 1 to 16.

18. A computer according to claim 17, wherein the memory additionally holds for each intermediate mapping entity intersection weights with other intermediate mapping entities.