Certains contenus de cette application ne sont pas disponibles pour le moment.
Si cette situation persiste, veuillez nous contacter àObservations et contact
1. (WO2016196004) JOINTURE DE DONNÉES SÉMANTIQUEMENT LIÉES À L'AIDE DE GRANDS CORPUS DE TABLES
Note: Texte fondé sur des processus automatiques de reconnaissance optique de caractères. Seule la version PDF a une valeur juridique

CLAIMS

1. A system for performing semantic join operations on data in different representations, said system comprising:

a memory area associated with a computing device, said memory area storing a plurality of tables having columns of values; and

a processor programmed to:

receive a request to perform a semantic join operation on at least two of the tables stored in the memory area;

in response to the received request, identify pairs of values from the at least two tables, the pairs of values including one value from a column in a first one of the tables and one value from a column in a second one of the tables;

determine, based on historical co-occurrence data, statistical co-occurrence scores for the identified pairs of values; and

infer a join relationship between the at least two tables using the statistical co-occurrence scores.

2. The system of claim 1, wherein the processor is programmed to identify the pairs of values by identifying a plurality of combinations of values from the first one of the tables and values from the second one of the tables.

3. The system of any of the preceding claims, wherein the processor is programmed to infer the join relationship by generating a maximum aggregate correlation score among all pairs of values, wherein the join relationship is automatically inferred if the maximum aggregate correlation score is above a threshold.

4. The system of any of the preceding claims, wherein the statistical co-occurrence scores for the identified pairs of values are based on one or more of a row-level statistical cooccurrence score and a column-level statistical co-occurrence score.

5. The system of any of the preceding claims, wherein the processor is further

programmed to:

output a bridge table based on the inferred join relationship; and

perform a semantic join of the at least two of the tables using the bridge table.

6. The system of any of the preceding claims, wherein the statistical co-occurrence scores for the identified pairs of values is determined by:

crawling a plurality of relational tables;

measuring an aggregate correlation score for the identified pairs of values based on the crawled plurality of relational tables, wherein the aggregate correlation score is a measure of semantic relation between the identified pairs of values, the measure of semantic relation being derived from the plurality of relational tables;

generating a portion of the historical co-occurrence data using the measured aggregate correlation score; and

calculating the statistical co-occurrence scores for the identified pairs of values based on the generated portion of the historical co-occurrence data.

7. The system of any of the preceding claims, wherein the processor is further

programmed to:

crawl a plurality of relational tables, in an offline mode, to generate the statistical co-occurrence scores for pairs of values in the plurality of tables; and

store the generated statistical co-occurrence scores in the memory area.

8. A method comprising:

identifying pairs of values from at least two data sets, the pairs of values including one value from a first one of the data sets and one value from a second one of the data sets;

determining, based on historical co-occurrence data, statistical co-occurrence scores for the identified pairs of values; and

predicting, by a processor associated with a computing device, a semantic relationship between the at least two data sets using the determined statistical cooccurrence scores to enable a semantic join operation between the at least two data sets.

9. The method of claim 8, wherein the identified pairs of values have different representations for the one value from the first one of the data sets and the one value from the second one of the data sets.

10. The method of claims 8 or 9, wherein the statistical co-occurrence scores for the identified pairs of values are based on one or more of a row-level statistical co-occurrence score and a column-level statistical co-occurrence score.

11. The method of any of claims 8-10, further comprising:

materializing the predicted semantic relationship as a bridge table;

enabling corrections, by a user of the computing device, to the predicted semantic relationship in the bridge table;

updating the bridge table with the corrections to the predicted join relationship; and storing the updated bridge table as a portion of the historical co-occurrence data.

12. The method of any of claims 8-11, wherein the statistical co-occurrence scores are determined by:

calculating an aggregate correlation score for the identified pairs of values based on a plurality of relational tables; and

maximizing the aggregate correlation score for the identified pairs of values for predicting the semantic relationship.

13. The method of claim 12, wherein the aggregate correlation score is a measure of semantic relation between the identified pairs of values, the measure of semantic relation being derived from the plurality of relational tables.

14. The method of claim 12 or 13, wherein the plurality of relational tables are sourced from at least one of public web pages or an enterprise database.

15. The method of any of claims 8-12, wherein the semantic j oin operation comprises performing an equi-j oin between the at least two data sets without using an intermediate mapping table.